Skip to content

fix(epoch-cache): use TTL-based caching with finalization tracking and correct lag#22204

Merged
spalladino merged 4 commits intomerge-train/spartanfrom
palla/epoch-cache-ttl
Apr 8, 2026
Merged

fix(epoch-cache): use TTL-based caching with finalization tracking and correct lag#22204
spalladino merged 4 commits intomerge-train/spartanfrom
palla/epoch-cache-ttl

Conversation

@spalladino
Copy link
Copy Markdown
Contributor

@spalladino spalladino commented Apr 1, 2026

Motivation

PR #22153 introduced a hard "finalized block guard" that refuses to compute committees if L1 data isn't finalized. While the safety goal is valid (preventing L1 reorgs from invalidating cached committees), it breaks many tests that don't properly set L1 finalized time and would cause the chain to stall if L1 stops finalizing. This PR takes a different approach that preserves safety while maintaining liveness.

Also fixes the lag parameter: the old code used lagInEpochsForValidatorSet (the looser constraint) instead of lagInEpochsForRandao (the binding one), and computed the sampling timestamp from the slot rather than the epoch start.

Fixes A-680

Approach

Instead of refusing to serve committee data that isn't finalized, use a TTL-based cache: finalized entries are cached permanently, non-finalized entries expire after one Ethereum slot (12s) and get re-fetched from L1. The cache map stores both resolved entries and in-flight promises directly, so concurrent callers for the same epoch coalesce on a single L1 query. On fetch failure, the previous stale entry is restored so the next caller retries cleanly.

Changes

  • epoch-cache: Replaced the simple Map<EpochNumber, EpochCommitteeInfo> cache with Map<EpochNumber, CachedEpochEntry | Promise<CachedEpochEntry>>. Each resolved entry carries L1 block provenance metadata (number, hash, timestamp) and a finalized flag. Switched from lagInEpochsForValidatorSet to lagInEpochsForRandao and compute sampling timestamp from epoch start via getStartTimestampForEpoch. Simplified isEscapeHatchOpen to delegate cache management to getCommittee.
  • epoch-cache (tests): Updated unit tests for the new cache structure. Added 4 new TTL tests: re-query after TTL, no re-query for finalized, concurrent coalescing, eventual finalization promotion.
  • epoch-cache (integration tests): New integration test suite against real Anvil with deployed L1 contracts and 4 validators. Tests finalized committee retrieval, non-finalized TTL refresh, and cache re-fetch after L1 reorg.
  • epoch-cache (README): Added comprehensive documentation covering committee computation, LAG values, RANDAO seed, proposer selection, escape hatch, TTL caching with finalization tracking, and configuration.

spalladino and others added 4 commits March 31, 2026 21:28
…d correct lag

The epoch cache previously cached committee data forever once fetched, and used
lagInEpochsForValidatorSet (the looser constraint) for its staleness guard. This
could serve stale data after L1 reorgs and was less strict than the L1 contract.

Switch to a TTL-based approach: finalized entries are cached permanently, while
non-finalized entries expire after one Ethereum slot (12s) and get re-fetched.
Use lagInEpochsForRandao (the binding constraint) and compute the sampling
timestamp from the epoch start to match the L1 contract's logic. Concurrent
requests for the same epoch coalesce on a single in-flight promise stored
directly in the cache map.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…promises in LRU

When a stale non-finalized entry expires, do a lightweight refresh: query only
the block hash at the original block number and the finalized block timestamp.
If the hash matches (no reorg), keep the cached data and just update the
timestamp and finalization flag — avoiding expensive getCommitteeAt and
getSampleSeedAt calls. Only do a full re-fetch on hash mismatch (reorg).

Also cache empty committees (without the finalized flag, so they always get
re-queried after TTL), and include in-flight promises in LRU purge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ss prefetched timestamps

Rename l1BlockNumber/l1BlockHash to lastQueryL1BlockNumber/lastQueryL1BlockHash
and l1BlockTimestamp to lastRefreshL1Timestamp for clarity on their roles.

Move the latest block fetch into the initial Promise.all in refreshStaleEntry
to minimize latency. Pass already-fetched latest and finalized timestamps from
refreshStaleEntry to fetchAndCache on reorg, avoiding redundant L1 queries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…stead of timestamps

Avoids creating fake block objects with zero number/hash on the prefetched path.
The refreshStaleEntry method already has the full latest and finalized blocks
from its Promise.all, so just pass them through directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@spalladino spalladino enabled auto-merge (squash) April 8, 2026 12:28
@spalladino spalladino disabled auto-merge April 8, 2026 12:42
@spalladino spalladino merged commit 2730d08 into merge-train/spartan Apr 8, 2026
12 checks passed
@spalladino spalladino deleted the palla/epoch-cache-ttl branch April 8, 2026 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants