fix(sandbox): tolerate unsealed inbox in simulation, drop pipelining from docs/playground#23315
Closed
spalladino wants to merge 2 commits into
Closed
fix(sandbox): tolerate unsealed inbox in simulation, drop pipelining from docs/playground#23315spalladino wants to merge 2 commits into
spalladino wants to merge 2 commits into
Conversation
`AztecNodeService.simulatePublicCalls` opens a fork of world state at the latest proposed block and, when the next block would start a new checkpoint, appends that checkpoint's L1->L2 messages to the fork's message tree so the simulated tx sees them. Under proposer pipelining with non-trivial `inboxLag`, the next-checkpoint's messages are not yet sealed on L1 — the archiver's message store throws `L1ToL2MessagesNotReadyError` when queried for an in-progress checkpoint (see `message_store.ts:233`). This makes every public-call simulation at a checkpoint boundary deterministically fail, which is the issue tracked by the existing `TODO(palla/pipelining): re-opt-in once public-call simulation handles inboxLag` comments in `e2e_bot.test.ts`, `e2e_fees/*.test.ts`, and `e2e_avm_simulator.test.ts`, and which surfaced as the `aztecjs_advanced` failures on PR #23253's merge-queue run. Catch the error by name (`L1ToL2MessagesNotReadyError`) and proceed with no next-checkpoint messages. Simulation becomes best-effort across checkpoint boundaries under pipelining: a tx that depends on a not-yet-sealed message may simulate incorrectly, but block production will use the real (sealed) messages when they are available. All other errors continue to throw.
Two unrelated failure modes surfaced when PR #23277 enabled `SEQ_ENABLE_PROPOSER_PIPELINING=true` on the docs-examples and playground compose sandboxes: 1. `example_swap` polls `getBlockNumber('proven')` after the swap's final tx lands and the sandbox goes idle. Under pipelining the proven tip only catches up via the watcher's slow-path wall-clock warp (~72s/slot), which can SIGTERM the example under merge-queue load. See http://ci.aztec-labs.com/b08ac48286302949 (block 86). 2. `aztecjs_advanced` deterministically failed in `AztecNodeService.simulatePublicCalls` with `L1ToL2MessagesNotReadyError` — that's the simulator+inboxLag mismatch fixed in the preceding commit. The simulator commit lands the actual bug-fix. This commit ships the narrower workarounds for the docs/playground demo sandboxes: - Remove `SEQ_ENABLE_PROPOSER_PIPELINING=true` from `docs/examples/ts/docker-compose.yml` and `playground/docker-compose.yml`. These are developer-facing demos, not pipelining test coverage; the real coverage lives in `yarn-project/end-to-end/scripts/docker-compose.yml` and the `aztec-up/test/*.sh` shell scripts, both untouched. - Drop `example_swap` from the default docs runner list, matching the existing `aave_bridge` precedent, since the proven-tip stall is a sandbox-side limitation that needs a separate sequencer-team fix. - Bump `docs/examples/bootstrap.sh` `test_cmds` TIMEOUT to 20m to match the compose/web3signer/ha bumps in #23275 — defense-in-depth against cumulative runtime growth, no longer the primary fix. Re-enable in a follow-up once the sandbox advances the proven tip without a continuous tx stream.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Two failure modes surfaced on the spartan merge train after PR #23277 enabled
SEQ_ENABLE_PROPOSER_PIPELINING=truein the sandbox-based test composes. Both showed up indocs/examples/bootstrap.sh execute(e.g. http://ci.aztec-labs.com/7f325afea4f00b31): (1)aztecjs_advanceddeterministically failed inAztecNodeService.simulatePublicCallswithL1ToL2MessagesNotReadyError— the simulator+inboxLag mismatch that's TODO'd ine2e_bot.test.ts:39,e2e_fees/*.test.ts, ande2e_avm_simulator.test.ts; (2)example_swapSIGTERMd at the docs-compose 600s mark while pollinggetBlockNumber('proven')because the local sandbox's proven tip only advances via the slow-path wall-clock warp once the chain goes idle.Approach
Two commits. The first is the real bug-fix:
AztecNodeService.simulatePublicCallscatchesL1ToL2MessagesNotReadyErrorthrown when querying the not-yet-sealed next-checkpoint's L1→L2 messages, and simulates without those messages. Simulation becomes best-effort across checkpoint boundaries under pipelining; block production continues to use sealed messages as before. The second commit narrows the blast radius for the demo sandboxes: removesSEQ_ENABLE_PROPOSER_PIPELINING=truefromdocs/examples/ts/docker-compose.ymlandplayground/docker-compose.yml, dropsexample_swapfrom the default docs runner (matching the existingaave_bridgeprecedent), and bumpsdocs/examples/bootstrap.shtest_cmdsTIMEOUTto 20m to match the bumps from #23275.Pipelining coverage is retained where it actually exercises sequencer/watcher behaviour:
yarn-project/end-to-end/scripts/docker-compose.yml(compose-routed e2e + cli-wallet flows) andaztec-up/test/{amm_flow,basic_install,bridge_and_claim}.sh. The proven-tip stall and re-enabling ofexample_swapare deferred to a follow-up that gives the sandbox a way to advance the proven tip without a continuous tx stream.Changes
AztecNodeService.simulatePublicCalls): narrowtry/catchonL1ToL2MessagesNotReadyError(matched byerr.name); rethrow anything else.example_swapfrom defaults, bump compose TIMEOUT to 20m.Codex reviewed both rounds of the design; the unsuccessful
buildCheckpointIfEmpty+ watcher-gate variant was abandoned after a confirmed cascade race / deadlock and reverted before commit.