chain: prune root_to_slot_cache on previous finalized slot#776
Merged
Conversation
Cached post-states retained in `self.states` after `pruneStates` (block.slot > latestFinalized.slot) can still hold justifications_roots referencing slots in (state.latest_finalized.slot, state.slot]. The post-finalization cleanup loop in `BeamState.processAttestations` looks those roots up in the chain-owned `root_to_slot_cache`, so the cache must keep them reachable across at least one finalization boundary. Previously we pruned on `latestFinalized.slot`, which dropped exactly the roots in (previousFinalized.slot, latestFinalized.slot] that such cached states can still reference. On devnet-4 a late-arriving competing block at slot 267 triggered a 171→252 jumbo finality jump followed by a forkchoice swing back to the pre-jump head at slot 268; the very next block on top of it then missed on a justification root in (171, 252] in the STF cleanup loop and wedged zeam_0 with `InvalidJustificationRoot` until a checkpoint-sync restart. Pruning on `previousFinalized.slot` keeps (previousFinalized.slot, latestFinalized.slot] alive in the cache for exactly the window any surviving cached state can reference, closing the coherence gap for the common single-advance case. Paired with #772 (which hardens the STF to drop cache misses instead of wedging), this gives defense in depth: the cache no longer drops roots any reachable state can still need, and any residual miss from the rare two-hop stale-state case is handled gracefully. Refs: #771, #772
6 tasks
g11tech
approved these changes
Apr 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #771.
What this changes
One-line semantic fix in
pkgs/node/src/chain.ziginsideprocessFinalizationAdvancement:Why
The chain-owned
root_to_slot_cacheis the only source of slot info for roots still sitting inBeamState.justifications_roots. AfterpruneStatesat this same call site, retained cached states haveblock.slot > latestFinalized.slot, but their per-statelatest_finalized.slotmay still equalpreviousFinalized.slot— it was frozen at import time and the state itself didn't drive the current advance. Those states'justifications_rootscan therefore reference roots in(previousFinalized.slot, latestFinalized.slot].Pruning on
latestFinalized.slotdrops exactly those roots from the cache. The next block imported on top of such a state then fails theprocessAttestationscleanup lookup instate.zig:519and the STF returnsInvalidJustificationRoot— which on devnet-4 wedged zeam_0 permanently after a cross-fork reorg at the finality boundary (slot 267 triggering a 171→252 jumbo advance, forkchoice swinging back to slot 268, next block on top missing on a root in(171, 252]). Full Loki timeline in the #771 follow-up comment.Pruning on
previousFinalized.slotkeeps the slot window any surviving cached state can still reference. The cache stays coherent with the set of states the chain is holding, so the miss simply does not occur in normal operation.Residual edge case
A cached state B imported when chain finalized was
F_oldcan survive two successive advancesF_old → F_mid → F_newif its block slot stays above both floors. Itsjustifications_rootscan then reference slots in(F_old, F_mid], which this PR drops at the second advance. Requires a minority fork staying above the canonical chain across two finality boundaries — not observed in the wild, and the airtight fix (evict cached states whoselatest_finalized.slot < chain latestFinalized.sloton advance, Option A from the issue comment) is captured as a follow-up. Keeping it out of this PR to preserve minimal scope.Scope
Deliberately narrow, per AGENTS.md:
chain.zigRootToSlotCache,pruneStates, state shape, SSZ types, or STFTest plan
zig fmt --check .zig build test --summary allzig build simtest --summary allcargo fmt --manifest-path rust/Cargo.toml --all -- --checkcargo clippy --manifest-path rust/Cargo.toml --workspace -- -D warningsLocal test run was interrupted; CI will confirm. The change is a single variable swap at one call site — no test case's semantics rely on the exact prune cutoff value at this site, only that pruning advances monotonically (which it still does, just one-advance-behind).
Supersedes #772.