fix: segfault during cached-block catch-up sync#765
Merged
Conversation
…e cached blocks ssz.serialize called on a live cached SignedBlock (directly or through sszClone) has been observed to corrupt in-memory List/Bitlist state (aggregation_bits, proof_data), causing segfaults on the next cached block processed from the same stack frame — first in the fork-choice attestation loop, then in verifySignatures after the previous sszClone fix shifted the call site earlier. The root fix: serialize each block exactly once, at the moment it is cloned into the cache, and carry those bytes through to DB persistence so onBlock never needs to touch the live SignedBlock with ssz.serialize again. - types/utils: add sszCloneAndGetBytes — one serialize pass that returns both a deep clone and the SSZ byte slice (caller owns), avoiding a second serialize on the same value - network: add fetched_block_ssz map (root → []u8); removeFetchedBlock and deinit clean it up; new getFetchedBlockSsz / storeFetchedBlockSsz helpers - node/cacheBlockAndFetchParent: use sszCloneAndGetBytes instead of sszClone and store the returned bytes in fetched_block_ssz - chain/CachedProcessedBlockInfo: add sszBytes ?[]const u8 field - node/processCachedDescendants: retrieve stored bytes and pass via blockInfo - chain/onBlock: use blockInfo.sszBytes when present; fall back to clone-and- serialize only for locally produced blocks (no cached bytes available)
Contributor
|
the pr resolved the seg fault issue. |
noopur23
approved these changes
Apr 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two distinct segfaults observed during catch-up sync on x86_64:
ssz.serializecorrupting live cachedSignedBlocks. Stack bottoms out inonBlockiteratingaggregated_attestations(or later inverifySignatures)with a dangling slice into a cached block's heap buffer. Root cause:
ssz.serializeon a live cachedSignedBlock(directly, or viasszClone)mutates nested
List/Bitlistfields (aggregation_bits,proof_data),so the next read of the same cached block crashes.
fetched_blockshashmap rehash race (folded in from fix race condition leading to seg faults #741). The map in thenetwork layer is shared between the libxev main thread (
onInterval) and thelibp2p worker thread (
onGossip/onReqRespResponse/onReqRespRequest).When the worker inserts and triggers a rehash, the main thread's in-flight
lookups/iterations become invalid and panic.
Changes
ssz-corruption fix (cherry-picked from
fix/libp2p-glue-c-abi-x86_64)1aecb39 chain, database: persist pre-forkchoice signed block SSZ for rocksdb1fd70ea node, types: capture block SSZ bytes at cache time, never re-serialize cached blocksHighlights:
types/utils: newsszCloneAndGetBytes— one serialize pass returning both adeep clone and the SSZ bytes (caller owns), avoiding a second serialize on the
same value.
network: newfetched_block_sszmap (root →[]u8) with lifecycle tied tocacheFetchedBlock/removeFetchedBlock/deinit;getFetchedBlockSsz/storeFetchedBlockSszhelpers.node.cacheBlockAndFetchParent: usessszCloneAndGetBytesand stores thebytes alongside the cached block.
chain.CachedProcessedBlockInfo: newsszBytes: ?[]const u8field.node.processCachedDescendants: retrieves stored bytes and passes them viablockInfo.sszBytes.chain.onBlock/updateBlockDb: usesblockInfo.sszByteswhen present,falls back to a disposable-clone serialize only for locally produced blocks.
Hashmap rehash race fix (cherry-picked from #741)
71280e3 fix race condition leading to seg faults(by @anshalshukla)Adds a
std.Thread.MutextoBeamNodethat serializes work between thelibxev main thread and the libp2p worker thread. Locks are taken around:
onGossiponReqRespResponseonReqRespRequest(.blocks_by_rootbranch)onInterval(chain tick, pending replay, RPC sweep,processReadyCachedBlocks)Test plan
zig fmt --check .cargo fmt --manifest-path rust/Cargo.toml --all -- --checkcargo clippy --manifest-path rust/Cargo.toml --workspace -- -D warningszig build test --summary allzig build simtest --summary alloriginal crashes are resolved