refactor: revert dual-state cache architecture from ePBS#9215
refactor: revert dual-state cache architecture from ePBS#9215
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request simplifies the checkpoint state cache by removing the payloadPresent flag and rolling back dual-state logic previously introduced for the Gloas fork. The changes refactor several core components, including the state regenerator and datastore layers, to use a unified CheckpointHex representation. Feedback indicates an improvement opportunity in PersistentCheckpointStateCache to ensure that empty sets are removed from the epochIndex map during pruning to prevent unnecessary memory accumulation.
|
@lodekeeper review please |
Performance Report✔️ no performance regression detected Full benchmark results
|
lodekeeper
left a comment
There was a problem hiding this comment.
Nice cleanup — this is a significant simplification now that the specs have moved to a single-state model. A few notes:
1. epochIndex empty-set cleanup (agree with Gemini)
In processState() at line ~747, after this.epochIndex.get(epoch)?.delete(rootHex), the empty Set should be removed from the map. The old removeFromEpochIndex did this. Without it, empty sets accumulate over epochs and add overhead to the epoch sorting in getLatest/getOrReloadLatest/findSeedStateToReload. Suggested fix:
const rootSet = this.epochIndex.get(epoch);
if (rootSet) {
rootSet.delete(rootHex);
if (rootSet.size === 0) {
this.epochIndex.delete(epoch);
}
}2. Datastore key backward compatibility
Old persisted keys are 41 bytes (Checkpoint + 1-byte payloadPresent suffix). After this PR, datastoreKeyToCheckpoint calls ssz.phase0.Checkpoint.deserialize(key) directly, which expects exactly 40 bytes. This means:
- Old DB-persisted states become unreachable (new 40-byte lookups won't match old 41-byte keys)
getLatestSafeDatastoreKeyiterating old keys would hit deserialization errors on the 41-byte entries
If this is intentional (i.e., nodes must checkpoint-sync after upgrade), worth documenting. If not, a migration path that strips the trailing byte from existing keys might be needed.
3. Minor: FileCPStateDatastore file name length
CHECKPOINT_FILE_NAME_LENGTH changed from 84 to 82, which correctly matches the new 40-byte key. Same point as above applies — old 84-char files on disk won't be picked up.
Everything else looks good — the bitmask removal, cache key simplification, regen cleanup, and test updates are all straightforward. The _payloadPresent pass-throughs and remaining CheckpointHexPayload type are fine to clean up in the follow-up PR.
Regarding this, I think we remove all existed keys (41 byte) on start up? So there shouldn't be any 41 byte key. @lodekeeper |
There was a problem hiding this comment.
💡 Codex Review
The filename filter now only accepts 82-character checkpoint keys, so previously persisted files that include the payload-status suffix (84 characters) are no longer returned by readKeys(). In file-datastore mode this prevents init()/readLatestSafe() from seeing existing checkpoint states after upgrade, which both drops recoverable data and leaves stale files undeleted.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| function toCacheKey(cp: CheckpointHexPayload): CacheKey { | ||
| return `${cp.rootHex}_${cp.epoch}_${cp.payloadPresent}`; | ||
| function toCacheKey(cp: CheckpointHex): CacheKey { | ||
| return `${cp.rootHex}_${cp.epoch}`; |
There was a problem hiding this comment.
Preserve payload status in checkpoint cache keys
Dropping payloadPresent from the cache key collapses distinct Gloas checkpoint variants into one entry, but the node still produces both variants (block-path checkpoint state during block import and payload-path checkpoint state during payload import for the same root/epoch). With this key shape, whichever variant is written last overwrites the other, so regen can no longer select the state that matches ProtoBlock.payloadStatus, and a PENDING branch can be replayed from a FULL checkpoint state.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This concern doesn't apply. The spec moved to a single-state model (consensus-specs#5094) — store.payloads is now Set[Root], not a per-variant map. The node no longer produces distinct FULL/EMPTY checkpoint variants for the same root/epoch, so collapsing the cache key is correct by design. That's the whole point of this PR.
lodekeeper
left a comment
There was a problem hiding this comment.
@twoeths This is not a `git revert` — it's a manual rollback. A clean `git revert` of #8868 wouldn't apply since `unstable` has evolved significantly since that merge (the touched files have had many other changes layered on top).
The PR manually removes the dual-state additions (#8868 introduced `PayloadAvailability` bitmask, `payloadPresent` in cache keys/datastore, `upgradeToGloas`, etc.) while keeping the current codebase shape intact. From my review of all 14 files, the result correctly matches the pre-#8868 semantics — single checkpoint per root/epoch, no payload variant tracking.
The main areas I'd verify are the ones I flagged: epochIndex empty-set cleanup (missing from the manual rollback but was present pre-#8868) and datastore key backward compat (old 41-byte keys become orphaned).
|
Re: Codex review body (legacy filenames) — this is the same backward compatibility concern I raised in my review (points #2 and #3). Old 41-byte DB keys and 84-char file-datastore filenames become orphaned after this change. Either a migration path is needed or it should be documented that checkpoint-sync is required after upgrade. |
@lodekeeper all of them are gone after startup, this PR is safe to merge |
|
@twoeths You're right — Minor note: the file datastore's Backward-compat concern is resolved. The only remaining item from my review is the epochIndex empty-set cleanup after |
@lodekeeper this was resolved, see 37c688c |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #9215 +/- ##
=========================================
Coverage 52.53% 52.53%
=========================================
Files 848 848
Lines 61424 61423 -1
Branches 4528 4527 -1
=========================================
Hits 32269 32269
+ Misses 29090 29089 -1
Partials 65 65 🚀 New features to boost your workflow:
|
To prepare state cache for ethereum/consensus-specs#5094.
Original epbs state cache PR: #8868
Summary
payloadPresenttracking fromCheckpointStateCache, its datastore layer, andFIFOBlockStateCacheepochIndexfrom bitmask-basedMap<RootHex, number>back toSet<RootHex>{rootHex}_{epoch}_{payloadPresent}to{rootHex}_{epoch}upgradeToGloas()/upgradeForGloas()block state cache upgrade pathPayloadAvailabilityenum and dual-variant iteration in persist/prune/reload pathspayloadPresentsuffix from datastore checkpoint keysThis reverts the dual-state architecture introduced in #8868, where each Gloas block produced both a block state and a payload state. The consensus specs have since moved to a single-state model (consensus-specs PR #5094), making this complexity unnecessary.
The regen interface still carries
payloadPresentparameters as a pass-through — these are ignored by the cache and will be cleaned up in a follow-up PR.Test plan
pnpm check-typespassespnpm lintpassespersistentCheckpointsCache.test.tsunit tests passregen.test.tsunit tests pass🤖 Generated with Claude Code