Skip to content

chore: backport merge-train/spartan PRs to v5-next#23846

Merged
spalladino merged 11 commits into
v5-nextfrom
spl/backport-to-v5-next-1
Jun 4, 2026
Merged

chore: backport merge-train/spartan PRs to v5-next#23846
spalladino merged 11 commits into
v5-nextfrom
spl/backport-to-v5-next-1

Conversation

@spalladino
Copy link
Copy Markdown
Contributor

Motivation

v5-next was cut from next at cbc99df (Jun 1), so PRs merged to merge-train/spartan after the cut never flowed into it. This backports all of them (authored by @spalladino and @fcarreiro) to keep v5-next current with the spartan train.

Approach

Each PR is cherry-picked from its squashed merge commit on merge-train/spartan, in merge order, preserving the original commit message and PR number — one commit per backported PR. All 11 applied cleanly with no conflicts; patches are identical to the originals (verified via git patch-id), and bootstrap.sh build yarn-project passes on the result. Labeled ci-no-squash to preserve the per-PR commits.

Backported PRs

Note #23660, #23778, and #23786 are breaking changes (node RPC + tx-effect db format, and p2p wire format respectively), as they were on next.

fcarreiro and others added 11 commits June 3, 2026 21:00
Use `createHash` which is used in other parts of the code.
No awaits. This also fixes a potential race condition if you
`getOrValidate` concurrently.

Seems to even be slightly faster due to no microtask scheduling.

| Metric | Before (subtle) | After (createHash) | Δ (avg) |
|--------|-----------------|---------------------|---------|
| **PRV-SHA256** | 3.39 ms | 3.13 ms | ~8% faster |
| **PUB-SHA256** | 5.27 ms | 5.00 ms | ~5% faster |
| PRV getTxHash | 9.39 ms | 8.24 ms | (unchanged path; run noise) |
| PUB getTxHash | 18.53 ms | 17.55 ms | (unchanged path; run noise) |
Attempt at deflaking slashing tests by keeping some warmup slots. Codex
analysis below.

## Cause

The duplicate proposal slash test was timing-fragile after proposer
pipelining. In the failed CI run, the malicious shared-key validators
were selected for the first slot of the target epoch immediately after
the test started sequencers and warped time forward.

Because pipelining builds a proposal one slot ahead, selecting the first
slot meant the malicious nodes had to start building at the exact warp
boundary. They started late in the build sub-slot, serialized the
duplicate proposals after the receiver nodes had already advanced to the
next slot, and the receivers rejected the gossip as late (`invalid slot
number`) before duplicate-proposal detection could run.

Failed run:
[`36837b7edc543e70`](https://ci.aztec-labs.com/36837b7edc543e70)

## Analysis

The failed run never produced a `DUPLICATE_PROPOSAL` offense. It only
produced a `DUPLICATE_ATTESTATION` offense, then timed out waiting for
the expected proposal offense.

Timeline from the failed run:

- malicious validators were selected for slot 10
- both started building late in sub-slot 3
- both broadcast checkpoint proposals for slot 10
- receivers had already advanced to slot 11 and rejected slot 10
proposals as stale
- the generic offense wait was satisfied by a duplicate attestation
- the final `DUPLICATE_PROPOSAL` assertion timed out

A passing run selected a later proposer slot instead. That gave the
sequencers a full warmup slot after the warp, both malicious nodes built
early enough, standalone duplicate proposals were broadcast in time, and
`DUPLICATE_PROPOSAL` was detected.

## Fix Rationale

Update `advanceToEpochBeforeProposer` to skip the first `warmupSlots`
slots of the target epoch and return the concrete `targetSlot`. The
duplicate proposal and duplicate attestation tests still warp to one
slot before the target epoch, but now the selected malicious proposer
slot is at least one slot into that epoch.

That gives freshly-started sequencers one full slot of wall-clock warmup
before the pipelined build for the malicious slot begins, avoiding the
startup/warp boundary race where duplicate proposals are emitted too
late and rejected before slash detection.
## Motivation

The node exposed two overlapping transaction-lookup methods —
`getTxReceipt` (lifecycle/status) and `getTxEffect` (mined side effects)
— forcing callers to stitch results together and duplicating fields
(txHash, fee, block hash/number, execution result) for mined txs. This
resolves the standing `REFACTOR` note on `TxReceipt` by making
`getTxReceipt` the single lookup API.

**Breaking change** to the public node RPC, with no wire back-compat.

## Approach

`TxReceipt` becomes a discriminated union over the lifecycle —
`PendingTxReceipt | DroppedTxReceipt | MinedTxReceipt` over an abstract
`TxReceiptBase` — with `isMined()`/`isPending()`/`isDropped()` guards
for narrowing while bare field reads keep compiling. `getTxReceipt`
gains `GetTxReceiptOptions` to opt into attaching the full `TxEffect`,
the pending `Tx`, and its proof. Receipt assembly moves from the block
store to the node (deriving mined status from the cached `getL2Tips`),
`getSettledTxReceipt` is removed from the archiver/`L2BlockSource`, and
the public `getTxEffect` is deprecated.

**Breaking change** adds the `slotNumber` to the indexed tx effect,
changing the db format.

## Changes

- **stdlib**: `TxReceipt` variant classes + union + `TxReceiptSchema`,
`GetTxReceiptOptions`, `Tx.withoutProof`; new `getTxReceipt`
interface/RPC signature; removed `getSettledTxReceipt` from
`L2BlockSource`/archiver; `l2_to_l1_membership` collapsed to a single
call.
- **aztec-node**: node assembles `MinedTxReceipt` from `getTxEffect` +
`getL2Tips` (+ epoch); deprecated the public `getTxEffect`.
- **archiver**: removed `getSettledTxReceipt` impl/wrapper/mock; the
block store returns raw `IndexedTxEffect`s.
- **pxe / cli / cli-wallet / wallet-sdk / bot**: migrated callers off
the deprecated `getTxEffect` to `getTxReceipt(h, { includeTxEffect: true
})`; the PXE oracle and message-context paths reconstruct an
`IndexedTxEffect` so downstream note/event services are unchanged.
- **p2p**: `tx_archive` reuses `Tx.withoutProof`.
- **tests**: per-variant construction + union round-trip coverage,
node-level status-derivation tests across the tip boundaries, and
updated RPC round-trip tests.
- **docs**: migration-notes entry for the breaking change +
how-to-send-transaction guide update.

Note: the auto-generated `typescript-api` and `node-api-reference` docs
are intentionally not regenerated here — the former is bulk-regenerated
by release tooling (regenerating now would sweep in unrelated drift
since the v4.3.0 snapshot) and the latter's generator is currently
incompatible with the repo's Zod 4 schemas (a pre-existing issue
affecting all methods).
## Motivation

Proposer pipelining (the proposer builds for `slot + 1` while in `slot`,
defers checkpoint finalization to the next slot, and builds on the
locally-gossiped proposed parent checkpoint) was gated behind
`SEQ_ENABLE_PROPOSER_PIPELINING`. Production already runs with it on and
every non-trivial e2e suite opts in, so the dual on/off code path was
dead weight — a duplicated timing model and `if (isPipelining)` branches
scattered across the sequencer, validator, and p2p stack. This makes
pipelining the only behavior of the production sequencer and removes the
toggle.

## Approach

Removed the `enableProposerPipelining` config /
`SEQ_ENABLE_PROPOSER_PIPELINING` env var everywhere and dropped every
`isProposerPipeliningEnabled()` / `pipeliningOffset()` check, collapsing
each to the pipelining branch (the `EpochCache` now always applies
`PROPOSER_PIPELINING_SLOT_OFFSET`). The checkpoint timing model was
consolidated to a single pipelined model. The test-only
`AutomineSequencer` is preserved — it is selected by the separate
`useAutomineSequencer` flag, publishes synchronously in-slot, and is the
one remaining caller of the non-pipelined
`getPreviousCheckpointOutHashes` branch. Checks for whether a proposed
checkpoint exists (`hasProposedCheckpoint` / `proposedCheckpointData`)
are kept.

## Changes

- **stdlib**: deleted `pipelining-config.ts`; consolidated `timetable`
to a single `CheckpointTimingModel` (removed
`StandardCheckpointTimingModel` and the `pipelining` option from
`createCheckpointTimingModel` / `calculateMaxBlocksPerSlot`); kept the
`pipeliningEnabled` param on `getPreviousCheckpointOutHashes` for
automine, with new dedicated unit coverage.
- **foundation / archiver / sequencer-client / epoch-cache (config)**:
removed the env-var registry entry and the `PipelineConfig` merges.
- **epoch-cache**: removed `isProposerPipeliningEnabled()` /
`pipeliningOffset()`; `getTargetSlot` /
`getTargetEpochAndSlotInNextL1Slot` / `getTargetAndNextSlot` always
apply the offset.
- **sequencer-client**: collapsed the sequencer /
checkpoint-proposal-job / publisher branches to the pipelining path;
`SequencerTimetable` no longer takes a `pipelining` flag;
`AutomineSequencer` untouched (boundary comment added).
- **validator-client / p2p**: collapsed proposal-handler, p2p-client,
and clock-tolerance branches; deleted the orphaned
`waitForBlockSourceSync` and the now-dead `block_source_not_synced`
reason.
- **p2p gossipsub**: `maxBlocksPerSlot` now uses the pipelined timing
model (a higher value), consistent with the always-pipelining sequencer.
- **end-to-end (tests)**: dropped the flag from `PIPELINING_SETUP_OPTS`
/ `AUTOMINE_E2E_OPTS` and ~40 test sites; `setup.ts` allows empty
checkpoints unconditionally.
- **spartan / docs / aztec-up / docker-compose**: removed the env var
from infra, and added a migration note.
- Also includes the benchmark pipelining-setup migration cherry-picked
from #23647.

Note: `SEQ_ENABLE_PROPOSER_PIPELINING` is a breaking change for node
operators — see the migration note. Labeled `ci-no-fail-fast` to survey
the full suite.
## Motivation

Building an L2-to-L1 message membership witness is currently a
client-side responsibility: callers hand
`computeL2ToL1MembershipWitness` an `OutboxRootsReader` plus a node, and
the helper reads the L1 Outbox and rebuilds the full four-level message
tree on every call, which involves sending all messages for an entire
epoch to the client. This adds a node JSON-RPC method that does the work
centrally, behind a cache.

As of this PR, the cache only caches reads to the Outbox and for a
single L1 slot, but the intent is to eventually cache the individual
L2-to-L1 trees so we don't have to recompute them on every request, as
well as permanently cache data for an epoch once it's finalized.

Fixes A-653

## Approach

A new `OutboxTreesResolver` in the archiver resolves witness requests.
Roots are fetched lazily on request and re-fetched only when the node's
synced L1 block has advanced. The four tree levels are rebuilt per
request by delegating to the existing, unchanged
`computeL2ToL1MembershipWitness` helper with the cached roots array, so
there is no second cache to keep consistent. The resolver is exposed
through the archiver (the node's block source) and surfaced as
`AztecNode.getL2ToL1MembershipWitness`.

## Changes

- **archiver**: New `OutboxTreesResolver` (lazy roots cache,
single-flight de-duplication, witness assembly).
- **stdlib**: Add `getL2ToL1MembershipWitness` to the `AztecNode` /
`L2BlockSource` / archiver interfaces and schemas; export
`L2ToL1MembershipWitnessSchema`. The `computeL2ToL1MembershipWitness`
helper is unchanged.
- **aztec-node**: `AztecNodeService.getL2ToL1MembershipWitness`
passthrough to the block source.
- **ethereum**: `OutboxContract.getRoots` gains an optional `{
blockNumber }` read option so reads can be pinned to the node's synced
L1 block.
- **end-to-end (tests)**: Migrate `computeL2ToL1MembershipWitness` call
sites to the new RPC, wrapped in `retryUntil` for the cache's eventual
consistency. The synthetic-roots case in
`epochs_partial_proof_multi_root` keeps using the helper directly.
- **archiver (tests)**: New `outbox_trees_resolver.test.ts` covering the
lazy cache (refresh-on-advance, seal/finalize permanence, not-synced
handling), single-flight, witness correctness across partial-proof
depths and block-level compression, and transient-vs-genuine root
mismatch handling.
## Summary

`BLOCK_TXS` request/response validation had a bug that caused us to
**discard perfectly good transactions**.

When a peer doesn't have the block (proposal pruned, or never received)
but the request carried the full tx hashes, the responder
(`reqRespBlockTxsHandler`) still matches those hashes against its own tx
pool and ships whatever it finds — it just can't produce an availability
bitvector for a block it doesn't know about. This is a legitimate "I
don't have the block, but here are the txs you asked for by hash"
response, not misbehaviour.

Previously this case was signalled by setting `archiveRoot = Fr.ZERO` on
the response, and `validateRequestedBlockTxsConsistency` treated any
response that didn't echo the requested archive root (including the zero
case) as a hard failure: it returned `false`, which routed the response
through the `INTERNAL_ERROR` path and discarded the returned txs
entirely. The intended behaviour is the opposite — we want to **use**
the txs the peer returned and merely mark the peer as "dumb" (it can't
serve index-based smart requests), without penalising it.

## Changes

**Drop `archiveRoot` from `BlockTxsResponse`.**
- The archive root on the response only ever served as an out-of-band "I
have / don't have the block" flag (and a redundant echo of the request).
- Checking if the response matches the request doesn't make sense. A
cheating peer can always return the same archive root as the request,
but otherwise malform the rest of the response.
- It is replaced by a `peerHasBlock()` helper that derives the same
signal from the availability bitvector: an empty bitvector (length 0)
means the peer doesn't have the block. The responder no longer
special-cases the archive root.

**Rework `validateRequestedBlockTxsConsistency`.** Validation now:
- rejects + penalises (mid) duplicate txs in the response;
- resolves the block tx hashes from the proposal or the archiver — if
neither is available we can't verify membership, so we reject without
penalising (local-state gap, not a peer fault);
- rejects + penalises (low) any returned tx that is neither part of the
block nor one we explicitly requested by hash — i.e. the returned set
must be a subset of `block tx hashes ∪ request.txHashes` (a tx requested
by hash may legitimately not belong to the block being validated);
- accepts (returns `true`) when the peer signals it lacks the block
(`!peerHasBlock()`) — the returned txs are still valid and usable, which
is the core of the fix;
- rejects + penalises (mid) a bitvector whose length disagrees with the
block size;
- rejects + penalises (low) a peer that advertises a requested tx via
its bitvector but withholds it from the response.

The previous order / strictly-increasing and `maxReturnable` checks are
removed; membership plus the advertise-vs-deliver check cover the cases
that matter.

**Move dumb-marking into the smart/dumb decision.** `BatchTxRequester`
no longer inspects archive roots. `decideIfPeerIsSmart` marks a peer
dumb (and clears its per-peer data, without penalty) whenever the
response signals it lacks the block (`!peerHasBlock()`); penalisation
for genuinely inconsistent responses is left to the validator. The old
`handleArchiveRootMismatch` helper is removed.

## Tests

- Updated the serialization, handler, validation, requester and
integration tests to the `peerHasBlock()` model.
- Added a regression test at the validator level — a peer that signals
it lacks the block via an empty bitvector but returns valid txs is now
accepted instead of discarded.
- Added a regression test at the requester level — those txs are
delivered (used) and the peer is marked dumb without penalty.
- Added coverage for the partial-availability case (peer returns fewer
txs than bits set: we request a,b,c, peer has c,d,e, so only c comes
back with three bits set) and for the by-hash case (a tx requested via
`request.txHashes` that is not part of the block is accepted).

The regression tests fail against the former behaviour.
## Summary

Fixes
**[A-1070](https://linear.app/aztec-labs/issue/A-1070/malicious-proposer-can-make-honest-nodes-to-fail-tx-validation)**:
a malicious proposer who sends two different proposals with the **same
archive root but different tx sets** could make two honest nodes fail
the `BLOCK_TXS` exchange and penalize each other.

In the `BLOCK_TXS` protocol the requester asks for txs by their
**index** within a block (proposal), identified only by its archive
root. If an equivocating proposer gives node A and node B two proposals
that share an archive root but differ in their tx list, then:

- Node A (requester) asks node B for txs at indices `[i, j, …]` of "the
block with this archive root".
- Node B (responder) resolves those indices against *its* version of the
proposal and returns txs that, from A's perspective, are not part of the
block.
- A's `validateRequestedBlockTxsConsistency` rejects the response and
penalizes B — an honest node punished for honest behavior.

## Fix

The request now carries a **commitment to the full set of block tx
hashes** (`blockTxHashesCommitment`, a SHA-256 over the serialized tx
hashes) alongside the archive root. The responder only serves txs *by
index* (and advertises availability via the bitvector) when its own
block's tx-hash commitment matches the request's. Otherwise it treats
the request as "I don't have that block" — returning an empty bitvector
and only servicing any explicitly-requested tx hashes — so neither side
is penalized for an equivocation it didn't cause.

This closes the gap that the archive root alone could not: identical
archive roots no longer imply identical tx sets.

## Why not use proposal hash?

That would work when the BLOCK_TXS request is from a proposal, but it
cannot be used when it's done from a block (e.g., in the prover node).

## Changes

- `BlockTxsRequest` gains a `blockTxHashesCommitment` field and a
`computeBlockTxHashesCommitment` helper; serialization and
`fromTxsSourceAndMissingTxs` updated accordingly.
- `reqRespBlockTxsHandler` verifies the commitment before serving txs by
index; on mismatch it falls back to the "block not available" path
instead of returning indexed txs.
- This builds on the preceding `BLOCK_TXS` validation revamp commit
(consistency checks on the requester side, response no longer echoes the
archive root).
- Tests adapted across `block_txs`, `block_txs_handler`, and
`libp2p_service`, plus a new handler test covering the equivocation case
(different proposal under the same archive root → responder refuses to
serve by index).

Closes
https://linear.app/aztec-labs/issue/A-1070/malicious-proposer-can-make-honest-nodes-to-fail-tx-validation
.
This allows us to then make local changes to vs code settings in
yarn-project, and then ignore them via `skip-worktree` so they persist
across `git clean` operations triggered by bootstrap.
…due (#23807)

## Motivation

The orphan-block guard in `checkSync` (added in #23606) was logging at
`warn` on every non-proposer validator, ~once per second for a full
slot, every slot. Under pipelining a node receives and re-executes a
block proposal for the next checkpoint up to one slot before the
matching checkpoint proposal arrives, so the world-state tip
legitimately sits in an as-yet-unproposed checkpoint for that whole
window. That is the happy path, not the abnormal "proposer published
blocks but never the checkpoint" case the guard is meant to flag.
Observed on `next-net`: 118 warnings in ~59s on a healthy validator for
a single slot.

## Approach

The condition that distinguishes "checkpoint hasn't arrived yet" from
"checkpoint will never arrive" is purely temporal — which is exactly
what the archiver already computes in `pruneOrphanProposedBlocks` to
decide when to prune an orphan block. The guard now reuses that same
deadline: it still refuses to build (`return undefined`) whenever the
orphan-shaped state holds, but only escalates to `warn` once the
enclosing checkpoint is overdue by that deadline; within the normal
pipelining window it logs at `debug`. The warn therefore fires at the
same instant the archiver would prune the orphan.

## Changes

- **sequencer-client**: Add `isProposedCheckpointOverdue`, mirroring the
archiver's orphan-prune deadline (`start of slot after the block's build
slot + grace`, grace derived from `blockDurationMs` as the node wiring
does). Gate the existing guard's log level on it — `warn` when overdue,
`debug` otherwise. Control flow is unchanged.
- **sequencer-client (tests)**: Thread a real `blockSlot` through the
orphan-guard test setup and split the warning test into an overdue case
(expects `warn`) and a within-window case (expects no `warn`).
…ublish windows (#23776)

## Summary

Fixes timing bugs in block building and validation, now that proposer
pipelining is the only production mode. Found via an audit of the
sequencer timetable, checkpoint proposal job, validator client, proposal
handler, and p2p proposal/attestation validators.

### The frame bug (main fix)

Under pipelining the proposer job runs with `slotNow = N-1` (build slot)
and `targetSlot = N`. The job passed `targetSlot` to `setState` for
build-frame states, so `Sequencer.setState` measured the
`assertTimeLeft` deadlines against
`getSlotStartBuildTimestamp(targetSlot)` — one full Aztec slot (72s)
later than the build frame. The build-frame deadlines
(`INITIALIZING_CHECKPOINT`, `CREATING_BLOCK`, `ASSEMBLING_CHECKPOINT`,
`COLLECTING_ATTESTATIONS`, `PUBLISHING_CHECKPOINT`, …) were therefore
checked ~72s too late and never fired. Now these states are measured
against `slotNow`. `targetSlot` is still used for headers, signing, and
`sendRequestsAt`.

### Aligning the attestation / publish windows around L1 geometry

- The checkpoint attestation/publish deadline and the p2p attestation
acceptance window are now derived from `ethereumSlotDuration` — **one
Ethereum slot (12s) before the last L1 block of the target slot**, the
latest a checkpoint can be submitted and still land on L1 in its slot.
Previously the deadline used the configurable `l1PublishingTime` and the
p2p window was only `2 * p2pPropagationTime` (~4.5s into the target
slot). This also unifies the deadline with the publisher's send lead
(`sendRequestsAt` already targets one Ethereum slot before the target
slot start).
- Validators (in `validateCheckpointProposal`) keep validating/attesting
checkpoint proposals until that L1 publish deadline instead of the
target-slot start, so attestations stay useful right up to the
proposer's real publish cutoff. Block-proposal re-execution deadlines
are intentionally left at the target-slot start.

### Why no test caught the frame bug

The job timing test built the job with `slotNow === targetSlot` (so the
two frames coincided) and stubbed `setStateFn` with a no-op, mocking
away the very `assertTimeLeft` enforcement where the frame matters. This
PR adds:

- A contract test asserting every build-frame state is set against the
build slot (`slotNow`), not the target slot.
- A behavioral test with a real enforcing `setStateFn`: a checkpoint
whose assembly crosses the build-frame deadline is now correctly
abandoned. Both fail on the pre-fix code and pass after the fix.
- Updated stdlib/timetable, clock-tolerance, attestation-validator,
proposal-handler, and validator tests for the realigned windows
(including an `l1PublishingTime != ethereumSlotDuration` case proving
the deadline is now Ethereum-slot-based).

No constants were removed and no broader cleanup was done; that is
deferred.

## Test plan

- `yarn build` green; touched packages lint/format clean.
- `@aztec/stdlib` timetable, `@aztec/sequencer-client` (incl.
`timetable`, `checkpoint_proposal_job.timing`, `sequencer-publisher`),
`@aztec/validator-client` (incl. `proposal_handler`, `validator`), and
`@aztec/p2p` `msg_validators` suites pass.
@spalladino spalladino requested review from a team and charlielye as code owners June 4, 2026 00:07
@spalladino spalladino removed request for a team and charlielye June 4, 2026 00:39
@spalladino spalladino merged commit b7384e4 into v5-next Jun 4, 2026
22 of 28 checks passed
@spalladino spalladino deleted the spl/backport-to-v5-next-1 branch June 4, 2026 10:35
spalladino pushed a commit that referenced this pull request Jun 4, 2026
Backports the `merge-train/spartan-v5` wiring from #23831 (merged to
`next`) onto `v5-next`.

## Why this is needed
The base→train sync (`merge-train-next-to-branches.yml`) is
push-triggered, and for a `push` event GitHub Actions runs the workflow
file **as it exists on the pushed branch**. `v5-next` still has the old
workflow (triggers only on `next`, no `spartan-v5` routing), so commits
landing on `v5-next` never fire the sync into `merge-train/spartan-v5`.
Putting the wiring on `v5-next` makes that sync fire for every
subsequent push to `v5-next`.

This is a cherry-pick of #23831's squashed commit onto `v5-next` — same
12-file changeset, no v5-specific divergence (the merge-train infra
files merged cleanly; v5-next's newer `actions/checkout` pin is
preserved).

## After merge
Future pushes to `v5-next` will sync into `merge-train/spartan-v5`. The
commit that already landed on `v5-next` (`chore: backport
merge-train/spartan PRs to v5-next` #23846) won't retroactively trigger
— it needs either a fresh push to `v5-next` or a one-time manual
`scripts/merge-train/merge-next.sh merge-train/spartan-v5 v5-next` to
catch the train up.

Labeled `ci-skip` — workflow/script/docs config only.

---
*Created by
[claudebox](https://claudebox.work/v2/sessions/a24733a6b8930662) ·
group: `slackbot`*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants