Skip to content

feat: Optimistic Proving#22990

Open
PhilWindle wants to merge 19 commits intomerge-train/spartanfrom
pw/optimistic
Open

feat: Optimistic Proving#22990
PhilWindle wants to merge 19 commits intomerge-train/spartanfrom
pw/optimistic

Conversation

@PhilWindle
Copy link
Copy Markdown
Collaborator

.

PhilWindle and others added 17 commits April 24, 2026 16:14
…y promise, and reorg safety (A-954)

Restructures the orchestrator to support optimistic proving:

- startNewEpoch() no longer takes totalNumCheckpoints or finalBlobBatchingChallenges
- New finalizeEpochStructure() method called after all checkpoints are processed
- Checkpoints-ready promise (callback-based) resolves when all checkpoints complete block-level proving
- Two-input gate: checkpoint roots only enqueue when both block merge proofs and epoch structure are ready
- removeLastCheckpoint() for L1 reorg safety — closes world state forks and cleans up chonk verifier cache
- Checkpoints must be registered sequentially (enforced); block processing remains parallel
- EpochProvingJob registers checkpoints in order, then processes blocks via asyncPool
- No behavior change: proving still triggers at epoch end

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ream (A-955)

Reworks the prover node around an L2BlockStream-driven flow where each
EpochProvingJob owns its own lifecycle:

- The job stores all checkpoints (pending and tracked) in a single map keyed
  by checkpoint number, with the caller-supplied orchestrator index pinned at
  register time so registration and addCheckpoint can race in either order.
- New `completeEpoch()` / `whenComplete()` API: the prover node hands the
  epoch off to the job once L1 has signalled completion and every expected
  checkpoint has been registered; the job waits for any in-flight gather to
  settle, optionally sleeps for `finalizationDelayMs` (lets late-arriving
  events be processed), then runs `finalizeAndProve` and resolves the
  completion promise.
- `removeCheckpoint(N)` is now async, idempotent, and serialises against any
  in-flight `addCheckpoint` so a re-org's prune + replacement can re-register
  the same checkpoint number without racing the orchestrator.
- Per-entry `attestations` and `previousBlockHeader` (and a per-entry
  `checkpointIndex`) eliminate the previous job-level state derived from
  addition order, fixing the subtle bug where a partially-proven epoch's
  checkpoints could be skipped.

Orchestrator side gains `removeCheckpoint(index)` for arbitrary slot removal,
removing the LIFO-only constraint and the tail bookkeeping on the job.

Prover node simplifications:
- Drops `epochsReadyToFinalize`, `epochsFinalizing`, `epochAttestations`,
  `tryFinalizeEpoch`, and `finalizeEpochJob` in favour of a single
  `tryCompleteEpoch` hand-off and a `cleanupCompletedJob` chained off
  `whenComplete()`.
- `computeStartingBlock` and `handleCheckpointEvent` use the same
  "is the proven block the last block of its epoch?" rule via a new
  `isEpochFullyProven` helper, so partially-proven epochs (e.g. epoch with
  32 slots but proven only at slot 10) ingest every checkpoint correctly.
- Catch-up sleep removed from `gatherAndAddCheckpoint`; the same
  `proverNodeEpochProvingDelayMs` config now feeds the job's
  `finalizationDelayMs` and the EpochMonitor's existing delay.

E2E coverage:
- Happy-path single + multi-epoch checkpoint-driven proving.
- Mid-epoch reorg with replacement, mid-epoch reorg without replacement,
  last-slot reorg without replacement.
- Reorg during proving: gates `prover.finalizeEpoch`, computes reorg depth
  from the published L1 block, cheats `proven` on the rollup so the next
  epoch's proof can land, and asserts a subsequent epoch is proven on chain.
…eckpoint

`monitor.run()` reads the rollup contract directly so its `checkpointNumber`
can be ahead of what `node.getCheckpoints` (archiver-backed) has indexed,
producing an empty array and a `Cannot read properties of undefined`
crash on the destructured `cp`. Wrap each lookup in `retryUntil` so the
helpers wait up to 30s for the archiver to catch up.
…reeOrchestrator

Adds the two new orchestrators that will replace the monolithic
ProvingOrchestrator in subsequent commits. ProvingOrchestrator stays
unchanged so the existing EpochProver flow keeps working.

CheckpointSubTreeOrchestrator extends ProvingOrchestrator and stops at
the checkpoint root rollup boundary, resolving a Promise<SubTreeResult>
once block-level proving completes.

TopTreeOrchestrator drives checkpoint-root through root rollup. Inputs
include per-checkpoint Promise<BlockProofs> so checkpoint root rollups
pipeline against in-flight sub-tree proving — out-hash and blob
accumulator hint chains are precomputed synchronously from
archiver-derivable data.

Also exposes getSubTreeOutputProofs / getLastArchiveSiblingPath on
CheckpointProvingState, makes ProvingOrchestrator's
checkAndEnqueueCheckpointRootRollup protected, and surfaces the
checkpoint object on TestContext.makeCheckpoint.

11 new unit tests; the existing 86 orchestrator tests still pass.
…rators directly

EpochProvingJob now holds a CheckpointSubTreeOrchestrator per checkpoint and
constructs a TopTreeOrchestrator inside finalizeAndProve, instead of holding a
single EpochProver. The previous waitForAllCheckpointsReady step is gone — the
top tree starts as soon as every checkpoint is tracked and pipelines its
checkpoint root rollups against the still-pending sub-tree result promises.

Each sub-tree owns its own per-checkpoint state, so removing one (e.g. via a
prune) is now atomic and does not affect the others — the cross-checkpoint
state coupling that triggered Palla's review concerns on #22783 is contained
to the top-tree's lifetime.

Also:
- ProverClient implements a new EpochProverFactory interface with
  createCheckpointSubTreeOrchestrator and createTopTreeOrchestrator. The
  legacy createEpochProver remains for the orchestrator_*.test.ts (deleted
  in commit 2b).
- EpochProvingJob accepts an EpochProvingJobHooks bag (beforeTopTreeProve,
  afterTopTreeProve, topTreeProveOverride) that gives the e2e tests a clean
  patch surface — but the four affected tests migrate to spying on
  createTopTreeOrchestrator and patching prove(), which is the closer analog
  to the legacy finalizeEpoch patch.
- BrokerCircuitProverFacade is exported from @aztec/prover-client/broker so
  the job can manage its lifecycle.
- CheckpointSubTreeOrchestrator gains getPreviousArchiveSiblingPath() so the
  top-tree data assembly is synchronous (no awaiting block-level proving).

epoch-proving-job.test.ts is rewritten end-to-end to mock the new factory
and the per-checkpoint sub-trees (28 tests, all passing). The four e2e tests
that used to spy on createEpochProver().finalizeEpoch are migrated to spy on
createTopTreeOrchestrator().orchestrator.prove.
…orting ProvingOrchestrator

Now that EpochProvingJob talks to CheckpointSubTreeOrchestrator and
TopTreeOrchestrator directly, the wrapper layer has no production users:

- Delete the EpochProver interface (stdlib/src/interfaces/epoch-prover.ts)
  and its re-export from interfaces/server.ts.
- Delete ServerEpochProver, the adapter that translated EpochProver calls
  onto ProvingOrchestrator + a broker facade.
- Drop createEpochProver from EpochProverManager and from ProverClient.
  ProverClient now exposes only the split factories.
- Drop ProvingOrchestrator from prover-client/orchestrator's package
  exports, and remove its `implements EpochProver` clause. The class file
  stays as the base for CheckpointSubTreeOrchestrator (which extends it)
  and as the single-class end-to-end driver used by orchestrator_*.test.ts;
  it is no longer reachable from outside the package.
- Switch test_context.ts to import ProvingOrchestrator via its relative
  module path (the orchestrator-internal test driver TestProvingOrchestrator
  still extends it).

All 301 prover-client tests, 87 prover-node tests, and 801 stdlib tests
still pass.
…ors and epoch jobs

CI on the previous commit caught the symptom — finalize timed out at ~30s
while waiting for sub-tree results. Root cause: the broker maintains a
single global `completedJobNotifications` queue that is drained by the
first caller of `getCompletedJobs([])`. When multiple
`BrokerCircuitProverFacade` instances poll the same broker, the first one
to poll consumes every notification — including notifications for jobs the
others care about. The losers only catch up via the periodic 30-second
snapshot sync, which is far longer than the proof deadline for short
epochs.

Commit 2a turned this into a fast-path bug by accidentally creating N+1
facades per epoch (one per sub-tree, one for the top-tree). The same race
also exists across concurrent epoch jobs, so the right fix is one shared
facade for the whole prover-client lifetime, not just one per job.

- `ProverClient` now owns a single `BrokerCircuitProverFacade`, started in
  `start()` and stopped in `stop()`.
- `createCheckpointSubTreeOrchestrator()` and `createTopTreeOrchestrator()`
  no longer take a facade argument — they wire the orchestrator to the
  shared facade.
- `EpochProvingJob` no longer creates or manages a facade.
- The facade's job map deletes entries on resolve/reject, so memory growth
  is bounded by concurrent in-flight work, not by lifetime jobs.

All 28 epoch-proving-job tests, 87 prover-node tests, and 301 prover-client
tests still pass.
Previously, `removeCheckpoint` was a no-op once `finalizationStarted` was
set — a late prune (e.g. an L1 reorg) couldn't be reflected in the proof
that had already been kicked off, so the only options were "ignore the
prune and submit a stale proof" or "fail the epoch". Both bad.

Now `removeCheckpoint` is allowed at any point until the job reaches a
terminal state. If the top tree is in flight when a removal lands, the
removal cancels it (`cancel({ abortJobs: true })`); the catch arm in
`finalizeAndProve` recognises `TopTreeCancelledError`, drops the cancelled
top tree, and the surrounding loop rebuilds `CheckpointTopTreeData[]` and
the blob batching challenges from the surviving sub-trees and tries again.

The only bound on retries is the job's existing deadline. No retry counter:
a pathological reorg loop fails the epoch via the deadline path it would
have taken anyway, with one less knob to tune.

If every checkpoint is removed mid-finalize, the next loop iteration sees
`survivors === 0` and throws — the catch arm transitions the job to
`failed`, no proof is published.

Three new tests in `reorg-after-finalize`:
- `removeCheckpoint` after finalize-start cancels the top tree and the
  loop restarts with the surviving set; second prove is given the smaller
  count; epoch completes.
- A middle-of-the-list checkpoint pruned mid-prove; submitted proof carries
  the surviving from/to range.
- All checkpoints removed mid-finalize → state transitions to `failed`,
  nothing is published.

90 prover-node tests pass; 301 prover-client tests still pass.
…rove

The reorg-during-proving e2e test (`epochs_optimistic_proving.parallel.test.ts`,
"handles a reorg arriving while proving is in progress") gates `topTree.prove`
via a test patch and fires the L1 reorg while the gate is held. The
prover-node receives the L2BlockStream prune events and calls
`removeCheckpoint`, which after commit 3 cancels the in-flight top tree.

But the patch had not yet released the gate, so `cancel()` runs *before*
`prove()`. The previous code only set `this.cancelled = true` and then, when
prove eventually ran, the per-checkpoint `.then` handlers all bailed on the
flag and the completion promise never resolved — prove hung forever.

Fix: check `this.cancelled` at the top of `prove()` and short-circuit with
`TopTreeCancelledError` immediately. Adds a unit test that constructs a
top-tree, cancels it, then calls prove — expecting the immediate rejection.
Before commit 3 the prover-node could not handle a reorg that removed a
checkpoint after `finalizationStarted` — the in-flight proof referenced a
checkpoint that no longer existed on L1, so its submission was rejected.
The test simulated recovery by writing the new `proven` pointer directly
into the rollup's `stf` storage slot, releasing the gate, and asserting
that a *subsequent* epoch eventually proved.

With commit 3 the prover-node now cancels the in-flight top tree when a
prune lands and rebuilds with the surviving checkpoints. The test should
demonstrate that correct recovery, not the storage-cheat workaround.

Changes:
- After firing the L1 reorg, poll the in-flight job's tracked-checkpoint
  count until it drops. This is the deterministic signal that the prover-
  node observed the prune and called `removeCheckpoint`, which cancelled
  the in-flight top tree. (Without this we'd race the L2BlockStream poll
  and risk top tree #1 starting its real prove before cancellation lands.)
- Drop the storage cheat, the post-cheat block-production resume, and the
  wait-for-next-epoch sequence.
- Assert the *in-flight* epoch is proven up to `afterReorgCheckpoint` —
  the surviving range — directly on L1, no cheats needed.
- Make `ProverNode.epochJobs` `protected` and expose it on `TestProverNode`
  so the test can poll per-job tracked counts.
Until now each `CheckpointSubTreeOrchestrator` carried its own chonk-verifier
proof cache (on the sub-tree's internal `EpochProvingState`). When a
checkpoint was reorged out and a replacement landed in the same epoch,
every public tx in the replacement re-proved its chonk circuit, even though
the proof had already been computed for the original.

Introduces `EpochProvingContext`: a small per-epoch holder for the cache
that:

- Submits chonk-verifier broker jobs through its own AbortController list
  (not the sub-tree's). Sub-tree cancellation (e.g. `removeCheckpoint`
  with `abortJobs: true`) does **not** abort context-owned chonk jobs, so
  a replacement sub-tree can pick up the cached promise.
- Self-cleans cache entries on rejection so a future caller can re-enqueue.
- Exposes `stop()` to abort every in-flight chonk job at job teardown.

Plumbing:

- New `EpochProverFactory.createEpochProvingContext()` returns a context
  wired to `ProverClient`'s shared broker facade. `EpochProvingJob`
  constructs one per epoch and passes it to every sub-tree it creates.
- `CheckpointSubTreeOrchestrator` accepts an optional context. When
  supplied, its overrides for `startChonkVerifierCircuits` and
  `getOrEnqueueChonkVerifier` route through `context.enqueue` /
  `context.getCached` instead of the inherited per-sub-tree path.
- The legacy `ProvingOrchestrator` (test-only) is unchanged: it continues
  to use `EpochProvingState.cachedChonkVerifierProofs`.

5 new unit tests on `EpochProvingContext` cover dedup, get-after-enqueue,
reject-then-retry, abort-on-stop, and post-stop enqueue. All 307
prover-client tests, 90 prover-node tests, and existing e2e build pass.
…ry call

`CheckpointSubTreeOrchestrator` now requires an `EpochProvingContext`
(no fallback to a private chonk cache) and starts its single epoch in
the constructor by reading the epoch number from the context. A new
static `start(...)` factory does the construction plus the single
internal `startNewCheckpoint(0, ...)` and stops the orchestrator
cleanly if the start fails.

`ProverClient.createCheckpointSubTreeOrchestrator` becomes async and
takes the per-checkpoint args, replacing the old three-step dance of
factory + `startNewEpoch` + `startNewCheckpoint` in `EpochProvingJob`.
`createEpochProvingContext` now takes an `epochNumber`, which is also
the per-call arg dropped from `EpochProvingContext.enqueue`.
… enumeration API

Restructure `EpochProvingJob` as an orchestrator over a registry of self-contained
jobs:

  - `CheckpointJob` — identified by `(checkpoint number, slot)` so a stale orphan
    and a re-org replacement at the same number coexist in the parent's map without
    colliding. Owns its own register-time data, sub-tree, blockProofs resolvers,
    abort controller, and tx-processing loop. `cancel()` is idempotent and
    fire-and-forget; `whenDone()` resolves once `provideTxs` and the cancel-driven
    teardown have unwound.

  - `TopTreeJob` — built from a snapshot of `CheckpointJob`s. `start()` runs
    `topTree.prove(...)`; `cancel()` rejects with `TopTreeCancelledError`.
    Hooks (`beforeTopTreeProve` / `afterTopTreeProve` / `topTreeProveOverride`)
    forward to it from `EpochProvingJobHooks`.

  - `EpochProvingJob` — slimmed from ~1020 lines to ~530. The job is now a thin
    driver over `Map<string, CheckpointJob>` and a single `topTreeJob`. The old
    `CheckpointStatus` (pending/tracked), `addCheckpointPromise` synchronisation,
    `accumulatedTxs` / `accumulatedL1ToL2Messages`, and inline restart-loop are
    gone.

`EpochProvingJob`'s public API now reflects intent rather than registry internals.
Three intent-level methods replace the tracked/pending list pair the prover-node
had to enumerate:

  - `removeCheckpointsAfter(threshold): number` — bulk remove for prune
  - `getCheckpointCount(): number` — total registered (live, uncancelled)
  - `cancelPendingCheckpoints(): void` — drop registered jobs that never got txs

`registerPendingCheckpoint` / `addCheckpoint` are renamed to `registerCheckpoint` /
`provideTxs` to reflect that registration carries all data the top tree needs and
txs arrive later. `removeCheckpoint` is now synchronous and idempotent — the
`(number, slot)` identity means multiple removes for the same number don't need
the old "await addCheckpointPromise" serialisation.

Test coverage: 89 prover-node tests pass (down from 90 — two "doesn't finalize
while gathering" tests merged into one that asserts the new early-start invariant).
…estrator infra

`ProvingOrchestrator` and `TopTreeOrchestrator` had duplicated copies of the same
broker-job submission envelope (~80 lines apiece): the `pendingProvingJobs`
controller list, the `SerialQueue` lifecycle, the `cancel`/`stop` plumbing, and
the `deferredProving<T>(state, request, callback)` wrapper that drops obsolete
results and routes errors to `state.reject(...)`.

Lift these to a new abstract `ProvingScheduler` base class. The minimal state
contract is `ProvingStateLike { verifyState(): boolean; reject(reason: string):
void }`, which both `EpochProvingState` / `CheckpointProvingState` /
`BlockProvingState` and `TopTreeProvingState` satisfy. The base owns:

  - `pendingProvingJobs`, `deferredJobQueue`, `getNumPendingProvingJobs`
  - `resetSchedulerState(abortJobs)` — drain + recreate queue, optionally abort
    in-flight jobs (the per-call abort flag covers both parent's
    `cancelJobsOnStop` config and top-tree's `{abortJobs}` arg)
  - `stop()` — standard "grab old queue, cancelInternal, await drain"
  - `deferredProving<S, T>(state, request, callback, isCancelled?)` — unified
    submit envelope. The `isCancelled` predicate covers top-tree's `cancelled`
    flag; the parent uses the default `() => false` and relies on `verifyState`.

Subclasses define `cancelInternal()` for their own cleanup (closing world-state
forks for the parent, propagating cancel into the proving state for top-tree).

Net code reduction: ~120 lines across the two orchestrators. The merge / padding
/ root rollup methods stay subclass-specific — they depend on state-class
methods that aren't unified here.
…rovingOrchestrator

The earlier iteration of this feature handled re-org recovery by mutating an
in-flight `EpochProvingState`: marking individual checkpoints as `removed`,
notifying ready-waiters when block-merge trees completed, and gating the
checkpoint-root rollup behind a two-input guard. The current design replaces all
of that — `EpochProvingJob` cancels and tears down the affected
`CheckpointSubTreeOrchestrator` outright and the top tree rebuilds from the
surviving snapshot.

Nothing in production code (or in any non-deleted test) reaches the old paths;
the only consumer was `orchestrator_deferred_finalization.test.ts`, which
exercised the abandoned mechanism end-to-end. Its happy-path coverage is
duplicated by `orchestrator_workflow.test.ts` /
`orchestrator_single_checkpoint.test.ts` and friends.

Removed:
  - `EpochProvingState`: `removeCheckpoint`, `waitForAllCheckpointsReady`,
    `notifyCheckpointBlockLevelComplete`, `areAllCheckpointsBlockLevelReady`,
    `checkpointsReadyCallbacks`, `rejectCheckpointsReadyWaiters`, the `log`
    field, and the `rejectCheckpointsReadyWaiters(reason)` call in `reject()`.
  - `CheckpointProvingState`: `removed` field, `markRemoved`, `isRemoved`,
    `isBlockMergeTreeComplete`, the `!this.removed &&` clause in `verifyState`,
    and the `if (this.removed) return;` early-return in `reject`.
  - `ProvingOrchestrator`: `removeCheckpoint(checkpointIndex)` (with its
    DB-fork-close and chonk-cache cleanup loops), `waitForAllCheckpointsReady`,
    and the `isBlockMergeTreeComplete`/`notifyCheckpointBlockLevelComplete`
    block + `isEpochStructureFinalized` redundant guard inside
    `checkAndEnqueueCheckpointRootRollup`. The remaining `isReadyForCheckpointRoot`
    check naturally waits for `finalizeEpochStructure` because it gates on
    `previousOutHashHint` and `startBlobAccumulator`, both populated only there.
  - `orchestrator_deferred_finalization.test.ts` (entire file).

Kept (still used):
  - `setFinalBlobBatchingChallenges` and the `FinalBlobBatchingChallenges | undefined`
    shape on `CheckpointProvingState` — integration tests construct checkpoints
    before `finalizeEpochStructure`.
  - `getSubTreeOutputProofs`, `getLastArchiveSiblingPath` — used by
    `CheckpointSubTreeOrchestrator`.
  - `isEpochStructureFinalized`, `finalizeEpochStructure`, `isAcceptingCheckpoints`,
    `_totalNumCheckpoints`/`_finalBlobBatchingChallenges` on `EpochProvingState` —
    required by integration tests that defer finalize.

Test coverage: 89 prover-node + 262 prover-client tests pass (down from 307
because the deleted file held ~45 tests).
PhilWindle added a commit that referenced this pull request May 6, 2026
…pair

Introduces a sub-tree + top-tree orchestrator pair that decomposes the existing
single-class proving orchestrator along the natural state-coupling boundary —
per-checkpoint block-level work vs. epoch-level top-tree work — while leaving
every existing API on the legacy `EpochProver` / `ProvingOrchestrator` /
`EpochProvingState` path untouched. The prover-node and e2e tests build
unchanged; this PR is purely additive in surface area, with one structural
refactor on `ProvingOrchestrator` to share scheduling infrastructure with the
new `TopTreeOrchestrator`.

Split out from #22990 so it can land independently.

## What's new

- **`CheckpointSubTreeOrchestrator`** (`checkpoint-sub-tree-orchestrator.ts`):
  extends `ProvingOrchestrator`, single-checkpoint by construction. Drives
  chonk-verifier / base / merge / block-root / block-merge for one checkpoint
  and resolves a `SubTreeResult` instead of escalating to the checkpoint root —
  the parent's `checkAndEnqueueCheckpointRootRollup` is overridden to
  short-circuit. The constructor calls `super.startNewEpoch(epoch, 1, empty
  challenges)` to set up a single-checkpoint mini-epoch; the count and
  challenges are never read because the override prevents the parent's
  finalize / root path from running.

- **`TopTreeOrchestrator`** + **`TopTreeProvingState`**: self-contained driver
  from checkpoint-root through epoch-root rollup. Takes per-checkpoint
  block-proof promises and pipelines its hint chain against them. Cancellation
  surfaces as `TopTreeCancelledError` so callers can distinguish reorg-driven
  cancel from a genuine proving failure.

- **`EpochProvingContext`** (`epoch-proving-context.ts`): per-epoch shared
  cache for chonk-verifier proofs. Survives sub-tree cancellation so a tx that
  gets reorged out and re-appears in a replacement checkpoint reuses the
  cached proof.

- **`ProvingScheduler`** (`proving-scheduler.ts`): abstract base shared by
  `ProvingOrchestrator` and `TopTreeOrchestrator`. Owns the `SerialQueue`
  deferred-job lifecycle, the `pendingProvingJobs` controller list, and a
  unified `deferredProving<S, T>(state, request, callback, isCancelled?)`
  submit envelope. The minimal `ProvingStateLike` contract is just
  `verifyState()` + `reject(reason)`.

- **`EpochProverFactory` interface on `ProverClient`**: new factory methods
  `createEpochProvingContext(epochNumber)`,
  `createCheckpointSubTreeOrchestrator(...)`, and
  `createTopTreeOrchestrator()`. A single shared `BrokerCircuitProverFacade` is
  owned by `ProverClient` and shared across every orchestrator.

## What changes in existing code

- `ProvingOrchestrator` extends `ProvingScheduler`; the inline broker-job
  submit envelope and queue lifecycle are inherited from the base. `cancel()`
  delegates the queue-recreate + abort-jobs logic to
  `resetSchedulerState(this.cancelJobsOnStop)`. Three internal methods
  (`getOrEnqueueChonkVerifier`, `checkAndEnqueueBaseRollup`,
  `checkAndEnqueueCheckpointRootRollup`) become `protected` so the sub-tree
  can override them; `provingState` and `provingPromise` likewise become
  `protected` so the sub-tree can hook the parent's failure stream onto
  `subTreeResult`. No public API change on `ProvingOrchestrator`.
- `CheckpointProvingState`: gains two read-only accessors used by the
  sub-tree's checkpoint-root override — `getSubTreeOutputProofs()` and
  `getLastArchiveSiblingPath()`. No state changes.
- `ProverClient` keeps `createEpochProver()` exactly as before (each call
  spawns its own `BrokerCircuitProverFacade`); the new factory methods share
  a `getFacade()` set up in `start()` and torn down in `stop()`.

`EpochProver`, `EpochProverManager`, `ServerEpochProver`, `EpochProvingState`,
the integration tests in `orchestrator_*.test.ts`, `bb_prover_full_rollup.test.ts`,
and `stdlib/interfaces/*` are all unchanged from `merge-train/spartan` — the
prover-node and e2e tests continue to build against the existing `EpochProver`
API. Migrating the prover-node onto the new factories (and the deferred-finalize
flow that goes with optimistic proving) is the follow-up PR.

## Test plan

- [x] 261 prover-client tests pass (full `yarn workspace @aztec/prover-client test`).
- [x] `yarn build` clean against current merge-train/spartan (modulo the
  pre-existing `@aztec/sqlite3mc-wasm` issue inherited from baseline).
PhilWindle added a commit that referenced this pull request May 6, 2026
…pair

Introduces a sub-tree + top-tree orchestrator pair that decomposes the existing
single-class proving orchestrator along the natural state-coupling boundary —
per-checkpoint block-level work vs. epoch-level top-tree work — while leaving
every existing API on the legacy `EpochProver` / `ProvingOrchestrator` /
`EpochProvingState` path untouched. The prover-node and e2e tests build
unchanged; this PR is purely additive in surface area, with structural
refactors on `ProvingOrchestrator` to share scheduling and top-tree drivers
with the new `TopTreeOrchestrator`.

Split out from #22990 so it can land independently.

## What's new

- **`CheckpointSubTreeOrchestrator`** (`checkpoint-sub-tree-orchestrator.ts`):
  extends `ProvingOrchestrator`, single-checkpoint by construction. Drives
  chonk-verifier / base / merge / block-root / block-merge for one checkpoint
  and resolves a `SubTreeResult` instead of escalating to the checkpoint root —
  the parent's `checkAndEnqueueCheckpointRootRollup` is overridden to
  short-circuit. The constructor calls `super.startNewEpoch(epoch, 1, empty
  challenges)` to set up a single-checkpoint mini-epoch; the count and
  challenges are never read because the override prevents the parent's
  finalize / root path from running.

- **`TopTreeOrchestrator`** + **`TopTreeProvingState`**: self-contained driver
  from checkpoint-root through epoch-root rollup. Takes per-checkpoint
  block-proof promises and pipelines its hint chain against them. Cancellation
  surfaces as `TopTreeCancelledError` so callers can distinguish reorg-driven
  cancel from a genuine proving failure.

- **`EpochProvingContext`** (`epoch-proving-context.ts`): per-epoch shared
  cache for chonk-verifier proofs. Survives sub-tree cancellation so a tx that
  gets reorged out and re-appears in a replacement checkpoint reuses the
  cached proof.

- **`ProvingScheduler`** (`proving-scheduler.ts`): abstract base owning the
  `SerialQueue` deferred-job lifecycle, the `pendingProvingJobs` controller
  list, and a unified `deferredProving<S, T>(state, request, callback,
  isCancelled?)` submit envelope. The minimal `ProvingStateLike` contract is
  just `verifyState()` + `reject(reason)`.

- **`TopTreeProvingScheduler`** (`top-tree-proving-scheduler.ts`): extends
  `ProvingScheduler` and holds the checkpoint-merge, padding, and root-rollup
  drivers (plus tree-walking helpers) shared by both orchestrators. Wraps
  circuit calls via a `wrapCircuitCall` hook (orchestrator overrides for spans;
  top-tree leaves identity) and resolves via an `onRootRollupComplete` hook to
  bridge the two states' differing `resolve` signatures. The per-checkpoint
  root driver stays subclass-specific because input-building flows differ.

- **`EpochProverFactory` interface on `ProverClient`**: new factory methods
  `createEpochProvingContext(epochNumber)`,
  `createCheckpointSubTreeOrchestrator(...)`, and
  `createTopTreeOrchestrator()`. A single shared `BrokerCircuitProverFacade` is
  owned by `ProverClient` and shared across every orchestrator.

## What changes in existing code

- `ProvingOrchestrator` extends `TopTreeProvingScheduler`; the inline
  broker-job submit envelope, queue lifecycle, and the top-tree-section
  drivers are inherited. `cancel()` delegates the queue-recreate + abort-jobs
  logic to `resetSchedulerState(this.cancelJobsOnStop)`. Three internal
  methods (`getOrEnqueueChonkVerifier`, `checkAndEnqueueBaseRollup`,
  `checkAndEnqueueCheckpointRootRollup`) become `protected` so the sub-tree
  can override them; `provingState` and `provingPromise` likewise become
  `protected` so the sub-tree can hook the parent's failure stream onto
  `subTreeResult`. No public API change on `ProvingOrchestrator`.
- `CheckpointProvingState`: gains two read-only accessors used by the
  sub-tree's checkpoint-root override — `getSubTreeOutputProofs()` and
  `getLastArchiveSiblingPath()`. No state changes.
- `ProverClient` keeps `createEpochProver()` exactly as before (each call
  spawns its own `BrokerCircuitProverFacade`); the new factory methods share
  a `getFacade()` set up in `start()` and torn down in `stop()`.

`EpochProver`, `EpochProverManager`, `ServerEpochProver`, `EpochProvingState`,
the integration tests in `orchestrator_*.test.ts`, `bb_prover_full_rollup.test.ts`,
and `stdlib/interfaces/*` are all unchanged from `merge-train/spartan` — the
prover-node and e2e tests continue to build against the existing `EpochProver`
API. Migrating the prover-node onto the new factories (and the deferred-finalize
flow that goes with optimistic proving) is the follow-up PR.

## Test plan

- [x] 261 prover-client tests pass (full `yarn workspace @aztec/prover-client test`).
- [x] `yarn build` clean against current merge-train/spartan (modulo the
  pre-existing `@aztec/sqlite3mc-wasm` issue inherited from baseline).
PhilWindle added 2 commits May 6, 2026 16:55
PR #22933 (and earlier #22809) reshaped L2BlockSource: getBlockHeader,
getCheckpointsForEpoch, and the positional getCheckpoints(from, limit)
were removed. L2TipsMemoryStore also gained a required initialBlockHash
constructor argument.

- getBlockHeader(n) -> (await getBlockData({ number: n }))?.header
- getCheckpointsForEpoch(epoch) -> getCheckpoints({ epoch }), with
  field access moving from .number/.blocks to .checkpoint.number/.blocks
- startProof folds the two-call pattern (checkpoints + separate
  attestations fetch) into one getCheckpoints({ epoch }) call since
  PublishedCheckpoint already carries attestations per entry
- L2TipsMemoryStore initialised in the constructor body with
  l2BlockSource.getGenesisBlockHash()

Test updates mirror the production migration; also restores the
beforeEach getBlockData mock to return a header for any block number
(the merge resolution had narrowed it to a single block, breaking the
checkpoint-driven flow tests).
PhilWindle added a commit that referenced this pull request May 6, 2026
…pair

Introduces a sub-tree + top-tree orchestrator pair that decomposes the existing
single-class proving orchestrator along the natural state-coupling boundary —
per-checkpoint block-level work vs. epoch-level top-tree work — while leaving
every existing API on the legacy `EpochProver` / `ProvingOrchestrator` /
`EpochProvingState` path untouched. The prover-node and e2e tests build
unchanged; this PR is purely additive in surface area, with structural
refactors on `ProvingOrchestrator` to share scheduling and top-tree drivers
with the new `TopTreeOrchestrator`.

Split out from #22990 so it can land independently.

## What's new

- **`CheckpointSubTreeOrchestrator`** (`checkpoint-sub-tree-orchestrator.ts`):
  extends `ProvingOrchestrator`, single-checkpoint by construction. Drives
  chonk-verifier / base / merge / block-root / block-merge for one checkpoint
  and resolves a `SubTreeResult` instead of escalating to the checkpoint root —
  the parent's `checkAndEnqueueCheckpointRootRollup` is overridden to
  short-circuit. The constructor calls `super.startNewEpoch(epoch, 1, empty
  challenges)` to set up a single-checkpoint mini-epoch; the count and
  challenges are never read because the override prevents the parent's
  finalize / root path from running.

- **`TopTreeOrchestrator`** + **`TopTreeProvingState`**: self-contained driver
  from checkpoint-root through epoch-root rollup. Takes per-checkpoint
  block-proof promises and pipelines its hint chain against them. Cancellation
  surfaces as `TopTreeCancelledError` so callers can distinguish reorg-driven
  cancel from a genuine proving failure.

- **`EpochProvingContext`** (`epoch-proving-context.ts`): per-epoch shared
  cache for chonk-verifier proofs. Survives sub-tree cancellation so a tx that
  gets reorged out and re-appears in a replacement checkpoint reuses the
  cached proof.

- **`ProvingScheduler`** (`proving-scheduler.ts`): abstract base owning the
  `SerialQueue` deferred-job lifecycle, the `pendingProvingJobs` controller
  list, and a unified `deferredProving<S, T>(state, request, callback,
  isCancelled?)` submit envelope. The minimal `ProvingStateLike` contract is
  just `verifyState()` + `reject(reason)`.

- **`TopTreeProvingScheduler`** (`top-tree-proving-scheduler.ts`): extends
  `ProvingScheduler` and holds the checkpoint-merge, padding, and root-rollup
  drivers (plus tree-walking helpers) shared by both orchestrators. Wraps
  circuit calls via a `wrapCircuitCall` hook (orchestrator overrides for spans;
  top-tree leaves identity) and resolves via an `onRootRollupComplete` hook to
  bridge the two states' differing `resolve` signatures. The per-checkpoint
  root driver stays subclass-specific because input-building flows differ.

- **`EpochProverFactory` interface on `ProverClient`**: new factory methods
  `createEpochProvingContext(epochNumber)`,
  `createCheckpointSubTreeOrchestrator(...)`, and
  `createTopTreeOrchestrator()`. A single shared `BrokerCircuitProverFacade` is
  owned by `ProverClient` and shared across every orchestrator.

## What changes in existing code

- `ProvingOrchestrator` extends `TopTreeProvingScheduler`; the inline
  broker-job submit envelope, queue lifecycle, and the top-tree-section
  drivers are inherited. `cancel()` delegates the queue-recreate + abort-jobs
  logic to `resetSchedulerState(this.cancelJobsOnStop)`. Three internal
  methods (`getOrEnqueueChonkVerifier`, `checkAndEnqueueBaseRollup`,
  `checkAndEnqueueCheckpointRootRollup`) become `protected` so the sub-tree
  can override them; `provingState` and `provingPromise` likewise become
  `protected` so the sub-tree can hook the parent's failure stream onto
  `subTreeResult`. No public API change on `ProvingOrchestrator`.
- `CheckpointProvingState`: gains two read-only accessors used by the
  sub-tree's checkpoint-root override — `getSubTreeOutputProofs()` and
  `getLastArchiveSiblingPath()`. No state changes.
- `ProverClient` keeps `createEpochProver()` exactly as before (each call
  spawns its own `BrokerCircuitProverFacade`); the new factory methods share
  a `getFacade()` set up in `start()` and torn down in `stop()`.

`EpochProver`, `EpochProverManager`, `ServerEpochProver`, `EpochProvingState`,
the integration tests in `orchestrator_*.test.ts`, `bb_prover_full_rollup.test.ts`,
and `stdlib/interfaces/*` are all unchanged from `merge-train/spartan` — the
prover-node and e2e tests continue to build against the existing `EpochProver`
API. Migrating the prover-node onto the new factories (and the deferred-finalize
flow that goes with optimistic proving) is the follow-up PR.

## Test plan

- [x] 261 prover-client tests pass (full `yarn workspace @aztec/prover-client test`).
- [x] `yarn build` clean against current merge-train/spartan (modulo the
  pre-existing `@aztec/sqlite3mc-wasm` issue inherited from baseline).
PhilWindle added a commit that referenced this pull request May 6, 2026
…pair

Introduces a sub-tree + top-tree orchestrator pair that decomposes the existing
single-class proving orchestrator along the natural state-coupling boundary —
per-checkpoint block-level work vs. epoch-level top-tree work — while leaving
every existing API on the legacy `EpochProver` / `ProvingOrchestrator` /
`EpochProvingState` path untouched. The prover-node and e2e tests build
unchanged; this PR is purely additive in surface area, with structural
refactors on `ProvingOrchestrator` to share scheduling and top-tree drivers
with the new `TopTreeOrchestrator`.

Split out from #22990 so it can land independently.

## What's new

- **`CheckpointSubTreeOrchestrator`** (`checkpoint-sub-tree-orchestrator.ts`):
  extends `ProvingOrchestrator`, single-checkpoint by construction. Drives
  chonk-verifier / base / merge / block-root / block-merge for one checkpoint
  and resolves a `SubTreeResult` instead of escalating to the checkpoint root —
  the parent's `checkAndEnqueueCheckpointRootRollup` is overridden to
  short-circuit. The constructor calls `super.startNewEpoch(epoch, 1, empty
  challenges)` to set up a single-checkpoint mini-epoch; the count and
  challenges are never read because the override prevents the parent's
  finalize / root path from running.

- **`TopTreeOrchestrator`** + **`TopTreeProvingState`**: self-contained driver
  from checkpoint-root through epoch-root rollup. Takes per-checkpoint
  block-proof promises and pipelines its hint chain against them. Cancellation
  surfaces as `TopTreeCancelledError` so callers can distinguish reorg-driven
  cancel from a genuine proving failure.

- **`EpochProvingContext`** (`epoch-proving-context.ts`): per-epoch shared
  cache for chonk-verifier proofs. Survives sub-tree cancellation so a tx that
  gets reorged out and re-appears in a replacement checkpoint reuses the
  cached proof.

- **`ProvingScheduler`** (`proving-scheduler.ts`): abstract base owning the
  `SerialQueue` deferred-job lifecycle, the `pendingProvingJobs` controller
  list, and a unified `deferredProving<S, T>(state, request, callback,
  isCancelled?)` submit envelope. The minimal `ProvingStateLike` contract is
  just `verifyState()` + `reject(reason)`.

- **`TopTreeProvingScheduler`** (`top-tree-proving-scheduler.ts`): extends
  `ProvingScheduler` and holds the checkpoint-merge, padding, and root-rollup
  drivers (plus tree-walking helpers) shared by both orchestrators. Wraps
  circuit calls via a `wrapCircuitCall` hook (orchestrator overrides for spans;
  top-tree leaves identity) and resolves via an `onRootRollupComplete` hook to
  bridge the two states' differing `resolve` signatures. The per-checkpoint
  root driver stays subclass-specific because input-building flows differ.

- **`EpochProverFactory` interface on `ProverClient`**: new factory methods
  `createEpochProvingContext(epochNumber)`,
  `createCheckpointSubTreeOrchestrator(...)`, and
  `createTopTreeOrchestrator()`. A single shared `BrokerCircuitProverFacade` is
  owned by `ProverClient` and shared across every orchestrator.

## What changes in existing code

- `ProvingOrchestrator` extends `TopTreeProvingScheduler`; the inline
  broker-job submit envelope, queue lifecycle, and the top-tree-section
  drivers are inherited. `cancel()` delegates the queue-recreate + abort-jobs
  logic to `resetSchedulerState(this.cancelJobsOnStop)`. Three internal
  methods (`getOrEnqueueChonkVerifier`, `checkAndEnqueueBaseRollup`,
  `checkAndEnqueueCheckpointRootRollup`) become `protected` so the sub-tree
  can override them; `provingState` and `provingPromise` likewise become
  `protected` so the sub-tree can hook the parent's failure stream onto
  `subTreeResult`. No public API change on `ProvingOrchestrator`.
- `CheckpointProvingState`: gains two read-only accessors used by the
  sub-tree's checkpoint-root override — `getSubTreeOutputProofs()` and
  `getLastArchiveSiblingPath()`. No state changes.
- `ProverClient` keeps `createEpochProver()` exactly as before (each call
  spawns its own `BrokerCircuitProverFacade`); the new factory methods share
  a `getFacade()` set up in `start()` and torn down in `stop()`.

`EpochProver`, `EpochProverManager`, `ServerEpochProver`, `EpochProvingState`,
the integration tests in `orchestrator_*.test.ts`, `bb_prover_full_rollup.test.ts`,
and `stdlib/interfaces/*` are all unchanged from `merge-train/spartan` — the
prover-node and e2e tests continue to build against the existing `EpochProver`
API. Migrating the prover-node onto the new factories (and the deferred-finalize
flow that goes with optimistic proving) is the follow-up PR.

## Test plan

- [x] 261 prover-client tests pass (full `yarn workspace @aztec/prover-client test`).
- [x] `yarn build` clean against current merge-train/spartan (modulo the
  pre-existing `@aztec/sqlite3mc-wasm` issue inherited from baseline).
PhilWindle added a commit that referenced this pull request May 7, 2026
## Summary

- Adds a new spartan environment file
(`environments/next-net-clone.env`) for cloning the next-net deployment
configuration.
- Adds the corresponding `!environments/next-net-clone.env` allow-line
to `spartan/.gitignore` so the file isn't ignored.

## Context

Split out from the broader optimistic-proving work (#22990) so it can
land independently. Pure config; no code changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant