fix(epoch-cache): use finalized L1 block and correct lag for committee guard#22153
Closed
spalladino wants to merge 12 commits intomerge-train/spartanfrom
Closed
fix(epoch-cache): use finalized L1 block and correct lag for committee guard#22153spalladino wants to merge 12 commits intomerge-train/spartanfrom
spalladino wants to merge 12 commits intomerge-train/spartanfrom
Conversation
…e guard The computeCommittee guard was using lagInEpochsForValidatorSet (the looser constraint) instead of lagInEpochsForRandao (the binding constraint), and queried the latest L1 block instead of the finalized one. This could allow caching a committee whose RANDAO seed is not yet finalized on L1. Fixes the guard to use lagInEpochsForRandao and the finalized block tag, computes sampling timestamp from epoch start (not slot timestamp), introduces EpochNotFinalizedError and EpochNotStableError, and adds integration tests against a real Anvil instance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
eb6dcfa to
c362c0e
Compare
Increases the Anvil slotsInAnEpoch from 1 to 8 so finalized = latest - 16 blocks, making tests less likely to pass due to off-by-one near the finality boundary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ract Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ests Only interval mining needs to be stopped/restored to control the gap between latest and finalized blocks. Automine is not used since Anvil is started with l1BlockTime (interval mining). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…llup cheat codes Adds mineUntilTimestamp which mines real L1 blocks (via hardhat_mine with a timestamp interval) so finalized block timestamps advance alongside latest. This prevents epoch-cache's finalized guard from rejecting committees after time advances in tests. The method derives the block interval from the last two block timestamps (to handle anvil_setBlockTimestampInterval overrides), stops interval mining before the burst, and leaves it stopped so the caller controls when to resume. Updates rollup cheat codes (advanceToEpoch, advanceToNextEpoch, advanceToNextSlot, advanceSlots) to use mineUntilTimestamp with automatic interval restore. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
a013a5b to
97da230
Compare
…finalized block warpL2TimeAtLeastTo used warp (single block jump), causing the finalized L1 block to lag behind after large time jumps. This triggered EpochNotFinalizedError in the epoch cache, blocking the sequencer from building blocks after the warp. Switches to mineUntilTimestamp which mines real blocks at the ethereum slot interval so finalized advances alongside latest. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eouts For large time jumps (e.g., 1 day in crossTimestampOfChange), mining at the ethereum slot interval (12s) would require thousands of blocks, causing Anvil to time out. Caps at ~100 blocks and spreads the interval to cover the full jump. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PhilWindle
approved these changes
Mar 31, 2026
…eouts For large time jumps (e.g., 1 day in crossTimestampOfChange), mining at the ethereum slot interval (12s) would require thousands of blocks, causing Anvil to time out. Caps at ~1000 blocks and spreads the interval to cover the full jump. Also fixes mineUntilTimestamp to use evm_setNextBlockTimestamp + evm_mine per block instead of hardhat_mine, because Anvil's hardhat_mine ignores the interval parameter when anvil_setBlockTimestampInterval has been set. Adds a unit test validating this workaround. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…to catch up Anvil computes finalized = latest - slotsInAnEpoch * 2 blocks. When querying the committee for the next epoch right after advancing, the sampling timestamp can be beyond the finalized block. Mine 3 extra blocks past the target so finalized also advances past it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When querying the next epoch's committee after advancing, the finalized block may not have caught up to the sampling timestamp yet (Anvil computes finalized = latest - slotsInAnEpoch * 2). Catch the error and mine extra blocks to push finalized forward before retrying. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Anvil defaults to slotsInAnEpoch=32 (Ethereum mainnet), which means finalized = latest - 64 blocks (~768s behind). This causes the epoch cache finalized-block guard to reject all committee queries in test environments. Setting slotsInAnEpoch=1 keeps finalized close to latest (only 2 blocks behind). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9d60565 to
89e2520
Compare
The HA test relies on slow finalization to keep attestations in the P2P pool long enough for verification. With --slots-in-an-epoch 1, Anvil finalizes every block immediately, triggering aggressive pool cleanup that deletes attestations before the test can read them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
spalladino
added a commit
that referenced
this pull request
Apr 8, 2026
…d correct lag (#22204) ## Motivation PR #22153 introduced a hard "finalized block guard" that refuses to compute committees if L1 data isn't finalized. While the safety goal is valid (preventing L1 reorgs from invalidating cached committees), it breaks many tests that don't properly set L1 finalized time and would cause the chain to stall if L1 stops finalizing. This PR takes a different approach that preserves safety while maintaining liveness. Also fixes the lag parameter: the old code used `lagInEpochsForValidatorSet` (the looser constraint) instead of `lagInEpochsForRandao` (the binding one), and computed the sampling timestamp from the slot rather than the epoch start. Fixes A-680 ## Approach Instead of refusing to serve committee data that isn't finalized, use a TTL-based cache: finalized entries are cached permanently, non-finalized entries expire after one Ethereum slot (12s) and get re-fetched from L1. The cache map stores both resolved entries and in-flight promises directly, so concurrent callers for the same epoch coalesce on a single L1 query. On fetch failure, the previous stale entry is restored so the next caller retries cleanly. ## Changes - **epoch-cache**: Replaced the simple `Map<EpochNumber, EpochCommitteeInfo>` cache with `Map<EpochNumber, CachedEpochEntry | Promise<CachedEpochEntry>>`. Each resolved entry carries L1 block provenance metadata (number, hash, timestamp) and a `finalized` flag. Switched from `lagInEpochsForValidatorSet` to `lagInEpochsForRandao` and compute sampling timestamp from epoch start via `getStartTimestampForEpoch`. Simplified `isEscapeHatchOpen` to delegate cache management to `getCommittee`. - **epoch-cache (tests)**: Updated unit tests for the new cache structure. Added 4 new TTL tests: re-query after TTL, no re-query for finalized, concurrent coalescing, eventual finalization promotion. - **epoch-cache (integration tests)**: New integration test suite against real Anvil with deployed L1 contracts and 4 validators. Tests finalized committee retrieval, non-finalized TTL refresh, and cache re-fetch after L1 reorg. - **epoch-cache (README)**: Added comprehensive documentation covering committee computation, LAG values, RANDAO seed, proposer selection, escape hatch, TTL caching with finalization tracking, and configuration. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Author
|
Closing in favor of #22204 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The
computeCommitteeguard in epoch-cache had two bugs: it usedlagInEpochsForValidatorSet(the looser constraint) instead oflagInEpochsForRandao(the binding one), and it queried thelatestL1 block instead offinalized. This meant an L1 reorg could change the RANDAO seed for a committee we'd already cached, and when the two lag values differed the guard was less strict than the L1 contract.Approach
Switch the guard to use the finalized block tag and
lagInEpochsForRandao. Compute the sampling timestamp from the epoch start (not the individual slot timestamp) to match L1 contract logic. Introduce typed error classes (EpochNotFinalizedError,EpochNotStableError) so callers can distinguish between "not yet finalized" and "not yet stable on L1". Extract types and errors into separate files.Changes
computeCommitteeguard to uselagInEpochsForRandao, finalized block tag, and epoch-start-based sampling timestamp. AddEpochCacheConstantstype andgetEpochCacheConstants()accessor. Extract errors toerrors.tsand types/interfaces totypes.ts.lagInEpochsForValidatorSet=2,lagInEpochsForRandao=1) to exercise the fix. Add unit test forEpochNotStableErrorwrapping. Add integration tests against real Anvil: happy path (committee, caching, proposer selection) and two guard tests that independently trigger each error class.Fixes A-680