feat: Optimistic Proving#22990
Open
PhilWindle wants to merge 19 commits intomerge-train/spartanfrom
Open
Conversation
…y promise, and reorg safety (A-954) Restructures the orchestrator to support optimistic proving: - startNewEpoch() no longer takes totalNumCheckpoints or finalBlobBatchingChallenges - New finalizeEpochStructure() method called after all checkpoints are processed - Checkpoints-ready promise (callback-based) resolves when all checkpoints complete block-level proving - Two-input gate: checkpoint roots only enqueue when both block merge proofs and epoch structure are ready - removeLastCheckpoint() for L1 reorg safety — closes world state forks and cleans up chonk verifier cache - Checkpoints must be registered sequentially (enforced); block processing remains parallel - EpochProvingJob registers checkpoints in order, then processes blocks via asyncPool - No behavior change: proving still triggers at epoch end Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ucture-deferred-finalization-checkpoints
…ream (A-955) Reworks the prover node around an L2BlockStream-driven flow where each EpochProvingJob owns its own lifecycle: - The job stores all checkpoints (pending and tracked) in a single map keyed by checkpoint number, with the caller-supplied orchestrator index pinned at register time so registration and addCheckpoint can race in either order. - New `completeEpoch()` / `whenComplete()` API: the prover node hands the epoch off to the job once L1 has signalled completion and every expected checkpoint has been registered; the job waits for any in-flight gather to settle, optionally sleeps for `finalizationDelayMs` (lets late-arriving events be processed), then runs `finalizeAndProve` and resolves the completion promise. - `removeCheckpoint(N)` is now async, idempotent, and serialises against any in-flight `addCheckpoint` so a re-org's prune + replacement can re-register the same checkpoint number without racing the orchestrator. - Per-entry `attestations` and `previousBlockHeader` (and a per-entry `checkpointIndex`) eliminate the previous job-level state derived from addition order, fixing the subtle bug where a partially-proven epoch's checkpoints could be skipped. Orchestrator side gains `removeCheckpoint(index)` for arbitrary slot removal, removing the LIFO-only constraint and the tail bookkeeping on the job. Prover node simplifications: - Drops `epochsReadyToFinalize`, `epochsFinalizing`, `epochAttestations`, `tryFinalizeEpoch`, and `finalizeEpochJob` in favour of a single `tryCompleteEpoch` hand-off and a `cleanupCompletedJob` chained off `whenComplete()`. - `computeStartingBlock` and `handleCheckpointEvent` use the same "is the proven block the last block of its epoch?" rule via a new `isEpochFullyProven` helper, so partially-proven epochs (e.g. epoch with 32 slots but proven only at slot 10) ingest every checkpoint correctly. - Catch-up sleep removed from `gatherAndAddCheckpoint`; the same `proverNodeEpochProvingDelayMs` config now feeds the job's `finalizationDelayMs` and the EpochMonitor's existing delay. E2E coverage: - Happy-path single + multi-epoch checkpoint-driven proving. - Mid-epoch reorg with replacement, mid-epoch reorg without replacement, last-slot reorg without replacement. - Reorg during proving: gates `prover.finalizeEpoch`, computes reorg depth from the published L1 block, cheats `proven` on the rollup so the next epoch's proof can land, and asserts a subsequent epoch is proven on chain.
…eckpoint `monitor.run()` reads the rollup contract directly so its `checkpointNumber` can be ahead of what `node.getCheckpoints` (archiver-backed) has indexed, producing an empty array and a `Cannot read properties of undefined` crash on the destructured `cp`. Wrap each lookup in `retryUntil` so the helpers wait up to 30s for the archiver to catch up.
…reeOrchestrator Adds the two new orchestrators that will replace the monolithic ProvingOrchestrator in subsequent commits. ProvingOrchestrator stays unchanged so the existing EpochProver flow keeps working. CheckpointSubTreeOrchestrator extends ProvingOrchestrator and stops at the checkpoint root rollup boundary, resolving a Promise<SubTreeResult> once block-level proving completes. TopTreeOrchestrator drives checkpoint-root through root rollup. Inputs include per-checkpoint Promise<BlockProofs> so checkpoint root rollups pipeline against in-flight sub-tree proving — out-hash and blob accumulator hint chains are precomputed synchronously from archiver-derivable data. Also exposes getSubTreeOutputProofs / getLastArchiveSiblingPath on CheckpointProvingState, makes ProvingOrchestrator's checkAndEnqueueCheckpointRootRollup protected, and surfaces the checkpoint object on TestContext.makeCheckpoint. 11 new unit tests; the existing 86 orchestrator tests still pass.
…rators directly EpochProvingJob now holds a CheckpointSubTreeOrchestrator per checkpoint and constructs a TopTreeOrchestrator inside finalizeAndProve, instead of holding a single EpochProver. The previous waitForAllCheckpointsReady step is gone — the top tree starts as soon as every checkpoint is tracked and pipelines its checkpoint root rollups against the still-pending sub-tree result promises. Each sub-tree owns its own per-checkpoint state, so removing one (e.g. via a prune) is now atomic and does not affect the others — the cross-checkpoint state coupling that triggered Palla's review concerns on #22783 is contained to the top-tree's lifetime. Also: - ProverClient implements a new EpochProverFactory interface with createCheckpointSubTreeOrchestrator and createTopTreeOrchestrator. The legacy createEpochProver remains for the orchestrator_*.test.ts (deleted in commit 2b). - EpochProvingJob accepts an EpochProvingJobHooks bag (beforeTopTreeProve, afterTopTreeProve, topTreeProveOverride) that gives the e2e tests a clean patch surface — but the four affected tests migrate to spying on createTopTreeOrchestrator and patching prove(), which is the closer analog to the legacy finalizeEpoch patch. - BrokerCircuitProverFacade is exported from @aztec/prover-client/broker so the job can manage its lifecycle. - CheckpointSubTreeOrchestrator gains getPreviousArchiveSiblingPath() so the top-tree data assembly is synchronous (no awaiting block-level proving). epoch-proving-job.test.ts is rewritten end-to-end to mock the new factory and the per-checkpoint sub-trees (28 tests, all passing). The four e2e tests that used to spy on createEpochProver().finalizeEpoch are migrated to spy on createTopTreeOrchestrator().orchestrator.prove.
…orting ProvingOrchestrator Now that EpochProvingJob talks to CheckpointSubTreeOrchestrator and TopTreeOrchestrator directly, the wrapper layer has no production users: - Delete the EpochProver interface (stdlib/src/interfaces/epoch-prover.ts) and its re-export from interfaces/server.ts. - Delete ServerEpochProver, the adapter that translated EpochProver calls onto ProvingOrchestrator + a broker facade. - Drop createEpochProver from EpochProverManager and from ProverClient. ProverClient now exposes only the split factories. - Drop ProvingOrchestrator from prover-client/orchestrator's package exports, and remove its `implements EpochProver` clause. The class file stays as the base for CheckpointSubTreeOrchestrator (which extends it) and as the single-class end-to-end driver used by orchestrator_*.test.ts; it is no longer reachable from outside the package. - Switch test_context.ts to import ProvingOrchestrator via its relative module path (the orchestrator-internal test driver TestProvingOrchestrator still extends it). All 301 prover-client tests, 87 prover-node tests, and 801 stdlib tests still pass.
…ors and epoch jobs CI on the previous commit caught the symptom — finalize timed out at ~30s while waiting for sub-tree results. Root cause: the broker maintains a single global `completedJobNotifications` queue that is drained by the first caller of `getCompletedJobs([])`. When multiple `BrokerCircuitProverFacade` instances poll the same broker, the first one to poll consumes every notification — including notifications for jobs the others care about. The losers only catch up via the periodic 30-second snapshot sync, which is far longer than the proof deadline for short epochs. Commit 2a turned this into a fast-path bug by accidentally creating N+1 facades per epoch (one per sub-tree, one for the top-tree). The same race also exists across concurrent epoch jobs, so the right fix is one shared facade for the whole prover-client lifetime, not just one per job. - `ProverClient` now owns a single `BrokerCircuitProverFacade`, started in `start()` and stopped in `stop()`. - `createCheckpointSubTreeOrchestrator()` and `createTopTreeOrchestrator()` no longer take a facade argument — they wire the orchestrator to the shared facade. - `EpochProvingJob` no longer creates or manages a facade. - The facade's job map deletes entries on resolve/reject, so memory growth is bounded by concurrent in-flight work, not by lifetime jobs. All 28 epoch-proving-job tests, 87 prover-node tests, and 301 prover-client tests still pass.
Previously, `removeCheckpoint` was a no-op once `finalizationStarted` was
set — a late prune (e.g. an L1 reorg) couldn't be reflected in the proof
that had already been kicked off, so the only options were "ignore the
prune and submit a stale proof" or "fail the epoch". Both bad.
Now `removeCheckpoint` is allowed at any point until the job reaches a
terminal state. If the top tree is in flight when a removal lands, the
removal cancels it (`cancel({ abortJobs: true })`); the catch arm in
`finalizeAndProve` recognises `TopTreeCancelledError`, drops the cancelled
top tree, and the surrounding loop rebuilds `CheckpointTopTreeData[]` and
the blob batching challenges from the surviving sub-trees and tries again.
The only bound on retries is the job's existing deadline. No retry counter:
a pathological reorg loop fails the epoch via the deadline path it would
have taken anyway, with one less knob to tune.
If every checkpoint is removed mid-finalize, the next loop iteration sees
`survivors === 0` and throws — the catch arm transitions the job to
`failed`, no proof is published.
Three new tests in `reorg-after-finalize`:
- `removeCheckpoint` after finalize-start cancels the top tree and the
loop restarts with the surviving set; second prove is given the smaller
count; epoch completes.
- A middle-of-the-list checkpoint pruned mid-prove; submitted proof carries
the surviving from/to range.
- All checkpoints removed mid-finalize → state transitions to `failed`,
nothing is published.
90 prover-node tests pass; 301 prover-client tests still pass.
…rove The reorg-during-proving e2e test (`epochs_optimistic_proving.parallel.test.ts`, "handles a reorg arriving while proving is in progress") gates `topTree.prove` via a test patch and fires the L1 reorg while the gate is held. The prover-node receives the L2BlockStream prune events and calls `removeCheckpoint`, which after commit 3 cancels the in-flight top tree. But the patch had not yet released the gate, so `cancel()` runs *before* `prove()`. The previous code only set `this.cancelled = true` and then, when prove eventually ran, the per-checkpoint `.then` handlers all bailed on the flag and the completion promise never resolved — prove hung forever. Fix: check `this.cancelled` at the top of `prove()` and short-circuit with `TopTreeCancelledError` immediately. Adds a unit test that constructs a top-tree, cancels it, then calls prove — expecting the immediate rejection.
Before commit 3 the prover-node could not handle a reorg that removed a checkpoint after `finalizationStarted` — the in-flight proof referenced a checkpoint that no longer existed on L1, so its submission was rejected. The test simulated recovery by writing the new `proven` pointer directly into the rollup's `stf` storage slot, releasing the gate, and asserting that a *subsequent* epoch eventually proved. With commit 3 the prover-node now cancels the in-flight top tree when a prune lands and rebuilds with the surviving checkpoints. The test should demonstrate that correct recovery, not the storage-cheat workaround. Changes: - After firing the L1 reorg, poll the in-flight job's tracked-checkpoint count until it drops. This is the deterministic signal that the prover- node observed the prune and called `removeCheckpoint`, which cancelled the in-flight top tree. (Without this we'd race the L2BlockStream poll and risk top tree #1 starting its real prove before cancellation lands.) - Drop the storage cheat, the post-cheat block-production resume, and the wait-for-next-epoch sequence. - Assert the *in-flight* epoch is proven up to `afterReorgCheckpoint` — the surviving range — directly on L1, no cheats needed. - Make `ProverNode.epochJobs` `protected` and expose it on `TestProverNode` so the test can poll per-job tracked counts.
Until now each `CheckpointSubTreeOrchestrator` carried its own chonk-verifier proof cache (on the sub-tree's internal `EpochProvingState`). When a checkpoint was reorged out and a replacement landed in the same epoch, every public tx in the replacement re-proved its chonk circuit, even though the proof had already been computed for the original. Introduces `EpochProvingContext`: a small per-epoch holder for the cache that: - Submits chonk-verifier broker jobs through its own AbortController list (not the sub-tree's). Sub-tree cancellation (e.g. `removeCheckpoint` with `abortJobs: true`) does **not** abort context-owned chonk jobs, so a replacement sub-tree can pick up the cached promise. - Self-cleans cache entries on rejection so a future caller can re-enqueue. - Exposes `stop()` to abort every in-flight chonk job at job teardown. Plumbing: - New `EpochProverFactory.createEpochProvingContext()` returns a context wired to `ProverClient`'s shared broker facade. `EpochProvingJob` constructs one per epoch and passes it to every sub-tree it creates. - `CheckpointSubTreeOrchestrator` accepts an optional context. When supplied, its overrides for `startChonkVerifierCircuits` and `getOrEnqueueChonkVerifier` route through `context.enqueue` / `context.getCached` instead of the inherited per-sub-tree path. - The legacy `ProvingOrchestrator` (test-only) is unchanged: it continues to use `EpochProvingState.cachedChonkVerifierProofs`. 5 new unit tests on `EpochProvingContext` cover dedup, get-after-enqueue, reject-then-retry, abort-on-stop, and post-stop enqueue. All 307 prover-client tests, 90 prover-node tests, and existing e2e build pass.
…ry call `CheckpointSubTreeOrchestrator` now requires an `EpochProvingContext` (no fallback to a private chonk cache) and starts its single epoch in the constructor by reading the epoch number from the context. A new static `start(...)` factory does the construction plus the single internal `startNewCheckpoint(0, ...)` and stops the orchestrator cleanly if the start fails. `ProverClient.createCheckpointSubTreeOrchestrator` becomes async and takes the per-checkpoint args, replacing the old three-step dance of factory + `startNewEpoch` + `startNewCheckpoint` in `EpochProvingJob`. `createEpochProvingContext` now takes an `epochNumber`, which is also the per-call arg dropped from `EpochProvingContext.enqueue`.
… enumeration API
Restructure `EpochProvingJob` as an orchestrator over a registry of self-contained
jobs:
- `CheckpointJob` — identified by `(checkpoint number, slot)` so a stale orphan
and a re-org replacement at the same number coexist in the parent's map without
colliding. Owns its own register-time data, sub-tree, blockProofs resolvers,
abort controller, and tx-processing loop. `cancel()` is idempotent and
fire-and-forget; `whenDone()` resolves once `provideTxs` and the cancel-driven
teardown have unwound.
- `TopTreeJob` — built from a snapshot of `CheckpointJob`s. `start()` runs
`topTree.prove(...)`; `cancel()` rejects with `TopTreeCancelledError`.
Hooks (`beforeTopTreeProve` / `afterTopTreeProve` / `topTreeProveOverride`)
forward to it from `EpochProvingJobHooks`.
- `EpochProvingJob` — slimmed from ~1020 lines to ~530. The job is now a thin
driver over `Map<string, CheckpointJob>` and a single `topTreeJob`. The old
`CheckpointStatus` (pending/tracked), `addCheckpointPromise` synchronisation,
`accumulatedTxs` / `accumulatedL1ToL2Messages`, and inline restart-loop are
gone.
`EpochProvingJob`'s public API now reflects intent rather than registry internals.
Three intent-level methods replace the tracked/pending list pair the prover-node
had to enumerate:
- `removeCheckpointsAfter(threshold): number` — bulk remove for prune
- `getCheckpointCount(): number` — total registered (live, uncancelled)
- `cancelPendingCheckpoints(): void` — drop registered jobs that never got txs
`registerPendingCheckpoint` / `addCheckpoint` are renamed to `registerCheckpoint` /
`provideTxs` to reflect that registration carries all data the top tree needs and
txs arrive later. `removeCheckpoint` is now synchronous and idempotent — the
`(number, slot)` identity means multiple removes for the same number don't need
the old "await addCheckpointPromise" serialisation.
Test coverage: 89 prover-node tests pass (down from 90 — two "doesn't finalize
while gathering" tests merged into one that asserts the new early-start invariant).
…estrator infra
`ProvingOrchestrator` and `TopTreeOrchestrator` had duplicated copies of the same
broker-job submission envelope (~80 lines apiece): the `pendingProvingJobs`
controller list, the `SerialQueue` lifecycle, the `cancel`/`stop` plumbing, and
the `deferredProving<T>(state, request, callback)` wrapper that drops obsolete
results and routes errors to `state.reject(...)`.
Lift these to a new abstract `ProvingScheduler` base class. The minimal state
contract is `ProvingStateLike { verifyState(): boolean; reject(reason: string):
void }`, which both `EpochProvingState` / `CheckpointProvingState` /
`BlockProvingState` and `TopTreeProvingState` satisfy. The base owns:
- `pendingProvingJobs`, `deferredJobQueue`, `getNumPendingProvingJobs`
- `resetSchedulerState(abortJobs)` — drain + recreate queue, optionally abort
in-flight jobs (the per-call abort flag covers both parent's
`cancelJobsOnStop` config and top-tree's `{abortJobs}` arg)
- `stop()` — standard "grab old queue, cancelInternal, await drain"
- `deferredProving<S, T>(state, request, callback, isCancelled?)` — unified
submit envelope. The `isCancelled` predicate covers top-tree's `cancelled`
flag; the parent uses the default `() => false` and relies on `verifyState`.
Subclasses define `cancelInternal()` for their own cleanup (closing world-state
forks for the parent, propagating cancel into the proving state for top-tree).
Net code reduction: ~120 lines across the two orchestrators. The merge / padding
/ root rollup methods stay subclass-specific — they depend on state-class
methods that aren't unified here.
…rovingOrchestrator
The earlier iteration of this feature handled re-org recovery by mutating an
in-flight `EpochProvingState`: marking individual checkpoints as `removed`,
notifying ready-waiters when block-merge trees completed, and gating the
checkpoint-root rollup behind a two-input guard. The current design replaces all
of that — `EpochProvingJob` cancels and tears down the affected
`CheckpointSubTreeOrchestrator` outright and the top tree rebuilds from the
surviving snapshot.
Nothing in production code (or in any non-deleted test) reaches the old paths;
the only consumer was `orchestrator_deferred_finalization.test.ts`, which
exercised the abandoned mechanism end-to-end. Its happy-path coverage is
duplicated by `orchestrator_workflow.test.ts` /
`orchestrator_single_checkpoint.test.ts` and friends.
Removed:
- `EpochProvingState`: `removeCheckpoint`, `waitForAllCheckpointsReady`,
`notifyCheckpointBlockLevelComplete`, `areAllCheckpointsBlockLevelReady`,
`checkpointsReadyCallbacks`, `rejectCheckpointsReadyWaiters`, the `log`
field, and the `rejectCheckpointsReadyWaiters(reason)` call in `reject()`.
- `CheckpointProvingState`: `removed` field, `markRemoved`, `isRemoved`,
`isBlockMergeTreeComplete`, the `!this.removed &&` clause in `verifyState`,
and the `if (this.removed) return;` early-return in `reject`.
- `ProvingOrchestrator`: `removeCheckpoint(checkpointIndex)` (with its
DB-fork-close and chonk-cache cleanup loops), `waitForAllCheckpointsReady`,
and the `isBlockMergeTreeComplete`/`notifyCheckpointBlockLevelComplete`
block + `isEpochStructureFinalized` redundant guard inside
`checkAndEnqueueCheckpointRootRollup`. The remaining `isReadyForCheckpointRoot`
check naturally waits for `finalizeEpochStructure` because it gates on
`previousOutHashHint` and `startBlobAccumulator`, both populated only there.
- `orchestrator_deferred_finalization.test.ts` (entire file).
Kept (still used):
- `setFinalBlobBatchingChallenges` and the `FinalBlobBatchingChallenges | undefined`
shape on `CheckpointProvingState` — integration tests construct checkpoints
before `finalizeEpochStructure`.
- `getSubTreeOutputProofs`, `getLastArchiveSiblingPath` — used by
`CheckpointSubTreeOrchestrator`.
- `isEpochStructureFinalized`, `finalizeEpochStructure`, `isAcceptingCheckpoints`,
`_totalNumCheckpoints`/`_finalBlobBatchingChallenges` on `EpochProvingState` —
required by integration tests that defer finalize.
Test coverage: 89 prover-node + 262 prover-client tests pass (down from 307
because the deleted file held ~45 tests).
This was referenced May 6, 2026
PhilWindle
added a commit
that referenced
this pull request
May 6, 2026
…pair Introduces a sub-tree + top-tree orchestrator pair that decomposes the existing single-class proving orchestrator along the natural state-coupling boundary — per-checkpoint block-level work vs. epoch-level top-tree work — while leaving every existing API on the legacy `EpochProver` / `ProvingOrchestrator` / `EpochProvingState` path untouched. The prover-node and e2e tests build unchanged; this PR is purely additive in surface area, with one structural refactor on `ProvingOrchestrator` to share scheduling infrastructure with the new `TopTreeOrchestrator`. Split out from #22990 so it can land independently. ## What's new - **`CheckpointSubTreeOrchestrator`** (`checkpoint-sub-tree-orchestrator.ts`): extends `ProvingOrchestrator`, single-checkpoint by construction. Drives chonk-verifier / base / merge / block-root / block-merge for one checkpoint and resolves a `SubTreeResult` instead of escalating to the checkpoint root — the parent's `checkAndEnqueueCheckpointRootRollup` is overridden to short-circuit. The constructor calls `super.startNewEpoch(epoch, 1, empty challenges)` to set up a single-checkpoint mini-epoch; the count and challenges are never read because the override prevents the parent's finalize / root path from running. - **`TopTreeOrchestrator`** + **`TopTreeProvingState`**: self-contained driver from checkpoint-root through epoch-root rollup. Takes per-checkpoint block-proof promises and pipelines its hint chain against them. Cancellation surfaces as `TopTreeCancelledError` so callers can distinguish reorg-driven cancel from a genuine proving failure. - **`EpochProvingContext`** (`epoch-proving-context.ts`): per-epoch shared cache for chonk-verifier proofs. Survives sub-tree cancellation so a tx that gets reorged out and re-appears in a replacement checkpoint reuses the cached proof. - **`ProvingScheduler`** (`proving-scheduler.ts`): abstract base shared by `ProvingOrchestrator` and `TopTreeOrchestrator`. Owns the `SerialQueue` deferred-job lifecycle, the `pendingProvingJobs` controller list, and a unified `deferredProving<S, T>(state, request, callback, isCancelled?)` submit envelope. The minimal `ProvingStateLike` contract is just `verifyState()` + `reject(reason)`. - **`EpochProverFactory` interface on `ProverClient`**: new factory methods `createEpochProvingContext(epochNumber)`, `createCheckpointSubTreeOrchestrator(...)`, and `createTopTreeOrchestrator()`. A single shared `BrokerCircuitProverFacade` is owned by `ProverClient` and shared across every orchestrator. ## What changes in existing code - `ProvingOrchestrator` extends `ProvingScheduler`; the inline broker-job submit envelope and queue lifecycle are inherited from the base. `cancel()` delegates the queue-recreate + abort-jobs logic to `resetSchedulerState(this.cancelJobsOnStop)`. Three internal methods (`getOrEnqueueChonkVerifier`, `checkAndEnqueueBaseRollup`, `checkAndEnqueueCheckpointRootRollup`) become `protected` so the sub-tree can override them; `provingState` and `provingPromise` likewise become `protected` so the sub-tree can hook the parent's failure stream onto `subTreeResult`. No public API change on `ProvingOrchestrator`. - `CheckpointProvingState`: gains two read-only accessors used by the sub-tree's checkpoint-root override — `getSubTreeOutputProofs()` and `getLastArchiveSiblingPath()`. No state changes. - `ProverClient` keeps `createEpochProver()` exactly as before (each call spawns its own `BrokerCircuitProverFacade`); the new factory methods share a `getFacade()` set up in `start()` and torn down in `stop()`. `EpochProver`, `EpochProverManager`, `ServerEpochProver`, `EpochProvingState`, the integration tests in `orchestrator_*.test.ts`, `bb_prover_full_rollup.test.ts`, and `stdlib/interfaces/*` are all unchanged from `merge-train/spartan` — the prover-node and e2e tests continue to build against the existing `EpochProver` API. Migrating the prover-node onto the new factories (and the deferred-finalize flow that goes with optimistic proving) is the follow-up PR. ## Test plan - [x] 261 prover-client tests pass (full `yarn workspace @aztec/prover-client test`). - [x] `yarn build` clean against current merge-train/spartan (modulo the pre-existing `@aztec/sqlite3mc-wasm` issue inherited from baseline).
PhilWindle
added a commit
that referenced
this pull request
May 6, 2026
…pair Introduces a sub-tree + top-tree orchestrator pair that decomposes the existing single-class proving orchestrator along the natural state-coupling boundary — per-checkpoint block-level work vs. epoch-level top-tree work — while leaving every existing API on the legacy `EpochProver` / `ProvingOrchestrator` / `EpochProvingState` path untouched. The prover-node and e2e tests build unchanged; this PR is purely additive in surface area, with structural refactors on `ProvingOrchestrator` to share scheduling and top-tree drivers with the new `TopTreeOrchestrator`. Split out from #22990 so it can land independently. ## What's new - **`CheckpointSubTreeOrchestrator`** (`checkpoint-sub-tree-orchestrator.ts`): extends `ProvingOrchestrator`, single-checkpoint by construction. Drives chonk-verifier / base / merge / block-root / block-merge for one checkpoint and resolves a `SubTreeResult` instead of escalating to the checkpoint root — the parent's `checkAndEnqueueCheckpointRootRollup` is overridden to short-circuit. The constructor calls `super.startNewEpoch(epoch, 1, empty challenges)` to set up a single-checkpoint mini-epoch; the count and challenges are never read because the override prevents the parent's finalize / root path from running. - **`TopTreeOrchestrator`** + **`TopTreeProvingState`**: self-contained driver from checkpoint-root through epoch-root rollup. Takes per-checkpoint block-proof promises and pipelines its hint chain against them. Cancellation surfaces as `TopTreeCancelledError` so callers can distinguish reorg-driven cancel from a genuine proving failure. - **`EpochProvingContext`** (`epoch-proving-context.ts`): per-epoch shared cache for chonk-verifier proofs. Survives sub-tree cancellation so a tx that gets reorged out and re-appears in a replacement checkpoint reuses the cached proof. - **`ProvingScheduler`** (`proving-scheduler.ts`): abstract base owning the `SerialQueue` deferred-job lifecycle, the `pendingProvingJobs` controller list, and a unified `deferredProving<S, T>(state, request, callback, isCancelled?)` submit envelope. The minimal `ProvingStateLike` contract is just `verifyState()` + `reject(reason)`. - **`TopTreeProvingScheduler`** (`top-tree-proving-scheduler.ts`): extends `ProvingScheduler` and holds the checkpoint-merge, padding, and root-rollup drivers (plus tree-walking helpers) shared by both orchestrators. Wraps circuit calls via a `wrapCircuitCall` hook (orchestrator overrides for spans; top-tree leaves identity) and resolves via an `onRootRollupComplete` hook to bridge the two states' differing `resolve` signatures. The per-checkpoint root driver stays subclass-specific because input-building flows differ. - **`EpochProverFactory` interface on `ProverClient`**: new factory methods `createEpochProvingContext(epochNumber)`, `createCheckpointSubTreeOrchestrator(...)`, and `createTopTreeOrchestrator()`. A single shared `BrokerCircuitProverFacade` is owned by `ProverClient` and shared across every orchestrator. ## What changes in existing code - `ProvingOrchestrator` extends `TopTreeProvingScheduler`; the inline broker-job submit envelope, queue lifecycle, and the top-tree-section drivers are inherited. `cancel()` delegates the queue-recreate + abort-jobs logic to `resetSchedulerState(this.cancelJobsOnStop)`. Three internal methods (`getOrEnqueueChonkVerifier`, `checkAndEnqueueBaseRollup`, `checkAndEnqueueCheckpointRootRollup`) become `protected` so the sub-tree can override them; `provingState` and `provingPromise` likewise become `protected` so the sub-tree can hook the parent's failure stream onto `subTreeResult`. No public API change on `ProvingOrchestrator`. - `CheckpointProvingState`: gains two read-only accessors used by the sub-tree's checkpoint-root override — `getSubTreeOutputProofs()` and `getLastArchiveSiblingPath()`. No state changes. - `ProverClient` keeps `createEpochProver()` exactly as before (each call spawns its own `BrokerCircuitProverFacade`); the new factory methods share a `getFacade()` set up in `start()` and torn down in `stop()`. `EpochProver`, `EpochProverManager`, `ServerEpochProver`, `EpochProvingState`, the integration tests in `orchestrator_*.test.ts`, `bb_prover_full_rollup.test.ts`, and `stdlib/interfaces/*` are all unchanged from `merge-train/spartan` — the prover-node and e2e tests continue to build against the existing `EpochProver` API. Migrating the prover-node onto the new factories (and the deferred-finalize flow that goes with optimistic proving) is the follow-up PR. ## Test plan - [x] 261 prover-client tests pass (full `yarn workspace @aztec/prover-client test`). - [x] `yarn build` clean against current merge-train/spartan (modulo the pre-existing `@aztec/sqlite3mc-wasm` issue inherited from baseline).
PR #22933 (and earlier #22809) reshaped L2BlockSource: getBlockHeader, getCheckpointsForEpoch, and the positional getCheckpoints(from, limit) were removed. L2TipsMemoryStore also gained a required initialBlockHash constructor argument. - getBlockHeader(n) -> (await getBlockData({ number: n }))?.header - getCheckpointsForEpoch(epoch) -> getCheckpoints({ epoch }), with field access moving from .number/.blocks to .checkpoint.number/.blocks - startProof folds the two-call pattern (checkpoints + separate attestations fetch) into one getCheckpoints({ epoch }) call since PublishedCheckpoint already carries attestations per entry - L2TipsMemoryStore initialised in the constructor body with l2BlockSource.getGenesisBlockHash() Test updates mirror the production migration; also restores the beforeEach getBlockData mock to return a header for any block number (the merge resolution had narrowed it to a single block, breaking the checkpoint-driven flow tests).
PhilWindle
added a commit
that referenced
this pull request
May 6, 2026
…pair Introduces a sub-tree + top-tree orchestrator pair that decomposes the existing single-class proving orchestrator along the natural state-coupling boundary — per-checkpoint block-level work vs. epoch-level top-tree work — while leaving every existing API on the legacy `EpochProver` / `ProvingOrchestrator` / `EpochProvingState` path untouched. The prover-node and e2e tests build unchanged; this PR is purely additive in surface area, with structural refactors on `ProvingOrchestrator` to share scheduling and top-tree drivers with the new `TopTreeOrchestrator`. Split out from #22990 so it can land independently. ## What's new - **`CheckpointSubTreeOrchestrator`** (`checkpoint-sub-tree-orchestrator.ts`): extends `ProvingOrchestrator`, single-checkpoint by construction. Drives chonk-verifier / base / merge / block-root / block-merge for one checkpoint and resolves a `SubTreeResult` instead of escalating to the checkpoint root — the parent's `checkAndEnqueueCheckpointRootRollup` is overridden to short-circuit. The constructor calls `super.startNewEpoch(epoch, 1, empty challenges)` to set up a single-checkpoint mini-epoch; the count and challenges are never read because the override prevents the parent's finalize / root path from running. - **`TopTreeOrchestrator`** + **`TopTreeProvingState`**: self-contained driver from checkpoint-root through epoch-root rollup. Takes per-checkpoint block-proof promises and pipelines its hint chain against them. Cancellation surfaces as `TopTreeCancelledError` so callers can distinguish reorg-driven cancel from a genuine proving failure. - **`EpochProvingContext`** (`epoch-proving-context.ts`): per-epoch shared cache for chonk-verifier proofs. Survives sub-tree cancellation so a tx that gets reorged out and re-appears in a replacement checkpoint reuses the cached proof. - **`ProvingScheduler`** (`proving-scheduler.ts`): abstract base owning the `SerialQueue` deferred-job lifecycle, the `pendingProvingJobs` controller list, and a unified `deferredProving<S, T>(state, request, callback, isCancelled?)` submit envelope. The minimal `ProvingStateLike` contract is just `verifyState()` + `reject(reason)`. - **`TopTreeProvingScheduler`** (`top-tree-proving-scheduler.ts`): extends `ProvingScheduler` and holds the checkpoint-merge, padding, and root-rollup drivers (plus tree-walking helpers) shared by both orchestrators. Wraps circuit calls via a `wrapCircuitCall` hook (orchestrator overrides for spans; top-tree leaves identity) and resolves via an `onRootRollupComplete` hook to bridge the two states' differing `resolve` signatures. The per-checkpoint root driver stays subclass-specific because input-building flows differ. - **`EpochProverFactory` interface on `ProverClient`**: new factory methods `createEpochProvingContext(epochNumber)`, `createCheckpointSubTreeOrchestrator(...)`, and `createTopTreeOrchestrator()`. A single shared `BrokerCircuitProverFacade` is owned by `ProverClient` and shared across every orchestrator. ## What changes in existing code - `ProvingOrchestrator` extends `TopTreeProvingScheduler`; the inline broker-job submit envelope, queue lifecycle, and the top-tree-section drivers are inherited. `cancel()` delegates the queue-recreate + abort-jobs logic to `resetSchedulerState(this.cancelJobsOnStop)`. Three internal methods (`getOrEnqueueChonkVerifier`, `checkAndEnqueueBaseRollup`, `checkAndEnqueueCheckpointRootRollup`) become `protected` so the sub-tree can override them; `provingState` and `provingPromise` likewise become `protected` so the sub-tree can hook the parent's failure stream onto `subTreeResult`. No public API change on `ProvingOrchestrator`. - `CheckpointProvingState`: gains two read-only accessors used by the sub-tree's checkpoint-root override — `getSubTreeOutputProofs()` and `getLastArchiveSiblingPath()`. No state changes. - `ProverClient` keeps `createEpochProver()` exactly as before (each call spawns its own `BrokerCircuitProverFacade`); the new factory methods share a `getFacade()` set up in `start()` and torn down in `stop()`. `EpochProver`, `EpochProverManager`, `ServerEpochProver`, `EpochProvingState`, the integration tests in `orchestrator_*.test.ts`, `bb_prover_full_rollup.test.ts`, and `stdlib/interfaces/*` are all unchanged from `merge-train/spartan` — the prover-node and e2e tests continue to build against the existing `EpochProver` API. Migrating the prover-node onto the new factories (and the deferred-finalize flow that goes with optimistic proving) is the follow-up PR. ## Test plan - [x] 261 prover-client tests pass (full `yarn workspace @aztec/prover-client test`). - [x] `yarn build` clean against current merge-train/spartan (modulo the pre-existing `@aztec/sqlite3mc-wasm` issue inherited from baseline).
PhilWindle
added a commit
that referenced
this pull request
May 6, 2026
…pair Introduces a sub-tree + top-tree orchestrator pair that decomposes the existing single-class proving orchestrator along the natural state-coupling boundary — per-checkpoint block-level work vs. epoch-level top-tree work — while leaving every existing API on the legacy `EpochProver` / `ProvingOrchestrator` / `EpochProvingState` path untouched. The prover-node and e2e tests build unchanged; this PR is purely additive in surface area, with structural refactors on `ProvingOrchestrator` to share scheduling and top-tree drivers with the new `TopTreeOrchestrator`. Split out from #22990 so it can land independently. ## What's new - **`CheckpointSubTreeOrchestrator`** (`checkpoint-sub-tree-orchestrator.ts`): extends `ProvingOrchestrator`, single-checkpoint by construction. Drives chonk-verifier / base / merge / block-root / block-merge for one checkpoint and resolves a `SubTreeResult` instead of escalating to the checkpoint root — the parent's `checkAndEnqueueCheckpointRootRollup` is overridden to short-circuit. The constructor calls `super.startNewEpoch(epoch, 1, empty challenges)` to set up a single-checkpoint mini-epoch; the count and challenges are never read because the override prevents the parent's finalize / root path from running. - **`TopTreeOrchestrator`** + **`TopTreeProvingState`**: self-contained driver from checkpoint-root through epoch-root rollup. Takes per-checkpoint block-proof promises and pipelines its hint chain against them. Cancellation surfaces as `TopTreeCancelledError` so callers can distinguish reorg-driven cancel from a genuine proving failure. - **`EpochProvingContext`** (`epoch-proving-context.ts`): per-epoch shared cache for chonk-verifier proofs. Survives sub-tree cancellation so a tx that gets reorged out and re-appears in a replacement checkpoint reuses the cached proof. - **`ProvingScheduler`** (`proving-scheduler.ts`): abstract base owning the `SerialQueue` deferred-job lifecycle, the `pendingProvingJobs` controller list, and a unified `deferredProving<S, T>(state, request, callback, isCancelled?)` submit envelope. The minimal `ProvingStateLike` contract is just `verifyState()` + `reject(reason)`. - **`TopTreeProvingScheduler`** (`top-tree-proving-scheduler.ts`): extends `ProvingScheduler` and holds the checkpoint-merge, padding, and root-rollup drivers (plus tree-walking helpers) shared by both orchestrators. Wraps circuit calls via a `wrapCircuitCall` hook (orchestrator overrides for spans; top-tree leaves identity) and resolves via an `onRootRollupComplete` hook to bridge the two states' differing `resolve` signatures. The per-checkpoint root driver stays subclass-specific because input-building flows differ. - **`EpochProverFactory` interface on `ProverClient`**: new factory methods `createEpochProvingContext(epochNumber)`, `createCheckpointSubTreeOrchestrator(...)`, and `createTopTreeOrchestrator()`. A single shared `BrokerCircuitProverFacade` is owned by `ProverClient` and shared across every orchestrator. ## What changes in existing code - `ProvingOrchestrator` extends `TopTreeProvingScheduler`; the inline broker-job submit envelope, queue lifecycle, and the top-tree-section drivers are inherited. `cancel()` delegates the queue-recreate + abort-jobs logic to `resetSchedulerState(this.cancelJobsOnStop)`. Three internal methods (`getOrEnqueueChonkVerifier`, `checkAndEnqueueBaseRollup`, `checkAndEnqueueCheckpointRootRollup`) become `protected` so the sub-tree can override them; `provingState` and `provingPromise` likewise become `protected` so the sub-tree can hook the parent's failure stream onto `subTreeResult`. No public API change on `ProvingOrchestrator`. - `CheckpointProvingState`: gains two read-only accessors used by the sub-tree's checkpoint-root override — `getSubTreeOutputProofs()` and `getLastArchiveSiblingPath()`. No state changes. - `ProverClient` keeps `createEpochProver()` exactly as before (each call spawns its own `BrokerCircuitProverFacade`); the new factory methods share a `getFacade()` set up in `start()` and torn down in `stop()`. `EpochProver`, `EpochProverManager`, `ServerEpochProver`, `EpochProvingState`, the integration tests in `orchestrator_*.test.ts`, `bb_prover_full_rollup.test.ts`, and `stdlib/interfaces/*` are all unchanged from `merge-train/spartan` — the prover-node and e2e tests continue to build against the existing `EpochProver` API. Migrating the prover-node onto the new factories (and the deferred-finalize flow that goes with optimistic proving) is the follow-up PR. ## Test plan - [x] 261 prover-client tests pass (full `yarn workspace @aztec/prover-client test`). - [x] `yarn build` clean against current merge-train/spartan (modulo the pre-existing `@aztec/sqlite3mc-wasm` issue inherited from baseline).
PhilWindle
added a commit
that referenced
this pull request
May 7, 2026
## Summary - Adds a new spartan environment file (`environments/next-net-clone.env`) for cloning the next-net deployment configuration. - Adds the corresponding `!environments/next-net-clone.env` allow-line to `spartan/.gitignore` so the file isn't ignored. ## Context Split out from the broader optimistic-proving work (#22990) so it can land independently. Pure config; no code changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
.