Skip to content

fix(ci): raise AVM check-circuit per-tx timeout to 120s#23662

Draft
AztecBot wants to merge 1 commit into
nextfrom
cb/avm-check-circuit-timeout
Draft

fix(ci): raise AVM check-circuit per-tx timeout to 120s#23662
AztecBot wants to merge 1 commit into
nextfrom
cb/avm-check-circuit-timeout

Conversation

@AztecBot
Copy link
Copy Markdown
Collaborator

@AztecBot AztecBot commented May 29, 2026

Summary

Raise the per-input avm_check_circuit timeout from 30s to a 120s default (overridable via AVM_CHECK_CIRCUIT_TIMEOUT), so heavy AVM inputs complete with comfortable margin.

Recurred again on 2026-06-02 as CI run 26795871178 (exit code 124, head 1f6248d). It keeps recurring because this PR has the fix but remains a draft and has never been mergednext still runs the 30s limit. Same input (e2e_multiple_blobs, ~700k rows) timed out mid circuit-check; every other input passed in 3–5s. This PR was first opened for the same failure on 2026-05-29 and rebased onto next on 2026-06-01.

Root cause

yarn-project/end-to-end/bootstrap.sh:avm_check_circuit runs bb-avm avm_check_circuit on every dumped e2e AVM input in parallel, each wrapped in a per-test timeout (exec_testtimeout -v $TIMEOUT). The runner uses --halt now,fail=1, so a single timeout fails the entire job. It is not a circuit-correctness failure.

One input — the e2e_multiple_blobs tx — produces a ~700,560-row AVM trace. On the default 2 CPUs the per-input log shows trace generation alone takes ~21s, so the circuit check is killed only a few seconds in at the 30s cap:

Generating trace...
Checking circuit... (3865 MiB)            (trace gen ~21s)
Running check (with skippable) circuit over 700560 rows.
timeout: sending signal TERM to command 'bash'   (killed mid-check at 30s)

The check was progressing, not hung — it simply needs more than 30s on 2 CPUs. Every other input passed in 3–5s.

Fix

  • Bump the per-check timeout to a 120s default, overridable via AVM_CHECK_CIRCUIT_TIMEOUT.
  • CPU allocation stays at the default 2: parallelize runs num_cpus/2 concurrent jobs, so the runner is already sized to be fully utilized at 2 CPUs/job. Raising CPUS without lowering the job count would oversubscribe the box. Only the wall-clock budget was the constraint.

Testing

  • bash -O extglob -n yarn-project/end-to-end/bootstrap.sh — passes.

Action needed: this PR is still marked draft, which is the only thing blocking the merge (GitHub returns 405 "Pull Request is still a draft"). Mark it Ready for review and it can land on next.

@AztecBot AztecBot added the claudebox Owned by claudebox. it can push to this PR. label May 29, 2026
The avm-check-circuit job runs bb-avm avm_check_circuit on every dumped
e2e AVM input in parallel, each wrapped in a 30s timeout (exec_test's
timeout -v $TIMEOUT). The runner uses --halt now,fail=1, so a single
timeout fails the whole job.

The e2e_multiple_blobs tx produces a ~700k-row AVM trace. On the default
2 CPUs, trace generation (~22s) plus the row check exceeded 30s and the
check was killed with exit 124 (CI run 26755632012); every other input
passed in 3-6s.

Raise the per-check timeout to a 120s default and make it overridable via
AVM_CHECK_CIRCUIT_TIMEOUT, so the heaviest inputs complete with margin
while the common case still finishes quickly. CPU allocation stays at the
default 2 (the runner core count is tuned so the parallel job count
saturates it at 2 CPUs each); only wall-clock budget was the constraint.

Supersedes the stale draft branch for #23662 (rebased onto current next).
@AztecBot AztecBot changed the title fix(ci): raise AVM check-circuit input timeout fix(ci): raise AVM check-circuit per-tx timeout to 120s Jun 1, 2026
@AztecBot AztecBot force-pushed the cb/avm-check-circuit-timeout branch from b6a8894 to e0f791e Compare June 1, 2026 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant