fix(epoch-cache): use finalized L1 block and correct lag for committee guard by spalladino · Pull Request #22153 · AztecProtocol/aztec-packages

spalladino · 2026-03-30T15:45:03Z

Motivation

The computeCommittee guard in epoch-cache had two bugs: it used lagInEpochsForValidatorSet (the looser constraint) instead of lagInEpochsForRandao (the binding one), and it queried the latest L1 block instead of finalized. This meant an L1 reorg could change the RANDAO seed for a committee we'd already cached, and when the two lag values differed the guard was less strict than the L1 contract.

Approach

Switch the guard to use the finalized block tag and lagInEpochsForRandao. Compute the sampling timestamp from the epoch start (not the individual slot timestamp) to match L1 contract logic. Introduce typed error classes (EpochNotFinalizedError, EpochNotStableError) so callers can distinguish between "not yet finalized" and "not yet stable on L1". Extract types and errors into separate files.

Changes

epoch-cache: Fix computeCommittee guard to use lagInEpochsForRandao, finalized block tag, and epoch-start-based sampling timestamp. Add EpochCacheConstants type and getEpochCacheConstants() accessor. Extract errors to errors.ts and types/interfaces to types.ts.
epoch-cache (tests): Use different lag values (lagInEpochsForValidatorSet=2, lagInEpochsForRandao=1) to exercise the fix. Add unit test for EpochNotStableError wrapping. Add integration tests against real Anvil: happy path (committee, caching, proposer selection) and two guard tests that independently trigger each error class.
epoch-cache (docs): Rewrite README with committee computation, LAG values, RANDAO, proposer selection, escape hatch, finalized block guard, and caching strategy.

Fixes A-680

…e guard The computeCommittee guard was using lagInEpochsForValidatorSet (the looser constraint) instead of lagInEpochsForRandao (the binding constraint), and queried the latest L1 block instead of the finalized one. This could allow caching a committee whose RANDAO seed is not yet finalized on L1. Fixes the guard to use lagInEpochsForRandao and the finalized block tag, computes sampling timestamp from epoch start (not slot timestamp), introduces EpochNotFinalizedError and EpochNotStableError, and adds integration tests against a real Anvil instance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Increases the Anvil slotsInAnEpoch from 1 to 8 so finalized = latest - 16 blocks, making tests less likely to pass due to off-by-one near the finality boundary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ract Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ests Only interval mining needs to be stopped/restored to control the gap between latest and finalized blocks. Automine is not used since Anvil is started with l1BlockTime (interval mining). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…llup cheat codes Adds mineUntilTimestamp which mines real L1 blocks (via hardhat_mine with a timestamp interval) so finalized block timestamps advance alongside latest. This prevents epoch-cache's finalized guard from rejecting committees after time advances in tests. The method derives the block interval from the last two block timestamps (to handle anvil_setBlockTimestampInterval overrides), stops interval mining before the burst, and leaves it stopped so the caller controls when to resume. Updates rollup cheat codes (advanceToEpoch, advanceToNextEpoch, advanceToNextSlot, advanceSlots) to use mineUntilTimestamp with automatic interval restore. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…finalized block warpL2TimeAtLeastTo used warp (single block jump), causing the finalized L1 block to lag behind after large time jumps. This triggered EpochNotFinalizedError in the epoch cache, blocking the sequencer from building blocks after the warp. Switches to mineUntilTimestamp which mines real blocks at the ethereum slot interval so finalized advances alongside latest. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…eouts For large time jumps (e.g., 1 day in crossTimestampOfChange), mining at the ethereum slot interval (12s) would require thousands of blocks, causing Anvil to time out. Caps at ~100 blocks and spreads the interval to cover the full jump. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…eouts For large time jumps (e.g., 1 day in crossTimestampOfChange), mining at the ethereum slot interval (12s) would require thousands of blocks, causing Anvil to time out. Caps at ~1000 blocks and spreads the interval to cover the full jump. Also fixes mineUntilTimestamp to use evm_setNextBlockTimestamp + evm_mine per block instead of hardhat_mine, because Anvil's hardhat_mine ignores the interval parameter when anvil_setBlockTimestampInterval has been set. Adds a unit test validating this workaround. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…to catch up Anvil computes finalized = latest - slotsInAnEpoch * 2 blocks. When querying the committee for the next epoch right after advancing, the sampling timestamp can be beyond the finalized block. Mine 3 extra blocks past the target so finalized also advances past it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When querying the next epoch's committee after advancing, the finalized block may not have caught up to the sampling timestamp yet (Anvil computes finalized = latest - slotsInAnEpoch * 2). Catch the error and mine extra blocks to push finalized forward before retrying. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Anvil defaults to slotsInAnEpoch=32 (Ethereum mainnet), which means finalized = latest - 64 blocks (~768s behind). This causes the epoch cache finalized-block guard to reject all committee queries in test environments. Setting slotsInAnEpoch=1 keeps finalized close to latest (only 2 blocks behind). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The HA test relies on slow finalization to keep attestations in the P2P pool long enough for verification. With --slots-in-an-epoch 1, Anvil finalizes every block immediately, triggering aggressive pool cleanup that deletes attestations before the test can read them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…d correct lag (#22204) ## Motivation PR #22153 introduced a hard "finalized block guard" that refuses to compute committees if L1 data isn't finalized. While the safety goal is valid (preventing L1 reorgs from invalidating cached committees), it breaks many tests that don't properly set L1 finalized time and would cause the chain to stall if L1 stops finalizing. This PR takes a different approach that preserves safety while maintaining liveness. Also fixes the lag parameter: the old code used `lagInEpochsForValidatorSet` (the looser constraint) instead of `lagInEpochsForRandao` (the binding one), and computed the sampling timestamp from the slot rather than the epoch start. Fixes A-680 ## Approach Instead of refusing to serve committee data that isn't finalized, use a TTL-based cache: finalized entries are cached permanently, non-finalized entries expire after one Ethereum slot (12s) and get re-fetched from L1. The cache map stores both resolved entries and in-flight promises directly, so concurrent callers for the same epoch coalesce on a single L1 query. On fetch failure, the previous stale entry is restored so the next caller retries cleanly. ## Changes - **epoch-cache**: Replaced the simple `Map<EpochNumber, EpochCommitteeInfo>` cache with `Map<EpochNumber, CachedEpochEntry | Promise<CachedEpochEntry>>`. Each resolved entry carries L1 block provenance metadata (number, hash, timestamp) and a `finalized` flag. Switched from `lagInEpochsForValidatorSet` to `lagInEpochsForRandao` and compute sampling timestamp from epoch start via `getStartTimestampForEpoch`. Simplified `isEscapeHatchOpen` to delegate cache management to `getCommittee`. - **epoch-cache (tests)**: Updated unit tests for the new cache structure. Added 4 new TTL tests: re-query after TTL, no re-query for finalized, concurrent coalescing, eventual finalization promotion. - **epoch-cache (integration tests)**: New integration test suite against real Anvil with deployed L1 contracts and 4 validators. Tests finalized committee retrieval, non-finalized TTL refresh, and cache re-fetch after L1 reorg. - **epoch-cache (README)**: Added comprehensive documentation covering committee computation, LAG values, RANDAO seed, proposer selection, escape hatch, TTL caching with finalization tracking, and configuration. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

spalladino · 2026-04-08T12:42:57Z

Closing in favor of #22204

spalladino added ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure backport-to-v4-next labels Mar 30, 2026

spalladino force-pushed the palla/fix/epoch-cache-finalized-guard branch from eb6dcfa to c362c0e Compare March 30, 2026 17:19

spalladino and others added 4 commits March 30, 2026 14:24

fix(epoch-cache): cross-check epoch cache results against rollup cont…

d51b7e7

…ract Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

spalladino force-pushed the palla/fix/epoch-cache-finalized-guard branch from a013a5b to 97da230 Compare March 30, 2026 21:54

spalladino and others added 2 commits March 30, 2026 19:15

PhilWindle approved these changes Mar 31, 2026

View reviewed changes

spalladino and others added 3 commits March 31, 2026 09:43

spalladino requested a review from a team as a code owner March 31, 2026 18:29

spalladino force-pushed the palla/fix/epoch-cache-finalized-guard branch from 9d60565 to 89e2520 Compare March 31, 2026 18:56

spalladino marked this pull request as draft April 1, 2026 00:28

spalladino mentioned this pull request Apr 1, 2026

fix(epoch-cache): use TTL-based caching with finalization tracking and correct lag #22204

Merged

spalladino closed this Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(epoch-cache): use finalized L1 block and correct lag for committee guard#22153

fix(epoch-cache): use finalized L1 block and correct lag for committee guard#22153
spalladino wants to merge 12 commits intomerge-train/spartanfrom
palla/fix/epoch-cache-finalized-guard

spalladino commented Mar 30, 2026

Uh oh!

spalladino commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

spalladino commented Mar 30, 2026

Motivation

Approach

Changes

Uh oh!

spalladino commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants