fix(blockchain): defer main-chain events until write batch is flushed#11640
Conversation
|
@claude review |
|
Claude finished @LukaszRozmej's task in 7m 3s —— View job Code Review
SummaryThe fix is correct and addresses a real production-impact bug. The root cause is well-identified: The deferred-event approach is the right fix. Event ordering is preserved ( Undocumented improvement: FindingsCritical: 0 | High: 0 | Medium: 0 | Low: 4 Low findings (inline comments posted on specific lines):
Verdict: Mergeable. All Low findings are stylistic or minor. The core fix is sound, the test is well-structured and correctly validates the pre-fix failure, and the existing test suite (245/245 + 38/38) continues to pass. |
| pendingEvents.Add(deferred); | ||
| } | ||
|
|
||
| TryUpdateSyncPivot(); |
There was a problem hiding this comment.
Low – TryUpdateSyncPivot() placement inside the using block
In the original code, TryUpdateSyncPivot() was called while the batch was alive (using-declaration scope = entire method). Moving it inside the scoped using (...) block preserves that, so there's no behavioral change here.
However, TryUpdateSyncPivot() calls FindHeader (reads) and writes to _metadataDb, which is independent of the _chainLevelInfoRepository write batch. Logically it belongs after the batch is flushed (alongside the events), since it makes decisions based on the now-canonical head. Consider moving it after the closing brace of the using block and before the event loop — this makes the grouping more explicit: "write phase, then read/notify phase."
| OnUpdateMainChain?.Invoke(this, new OnUpdateMainChainArgs(blocks, wereProcessed)); | ||
| } | ||
|
|
||
| private readonly struct DeferredMainChainEvent(BlockReplacementEventArgs blockAdded, BlockEventArgs? newHead) |
There was a problem hiding this comment.
Low – minor allocation on the block-processing hot path
new(blocks.Count) allocates a List<DeferredMainChainEvent> on every UpdateMainChain call. For the common PoS single-block case this is one small list object. For batch sync the list is proportional to blocks.Count, so it's proportional to the surrounding DB work and unlikely to matter in practice. Worth noting in case allocation profiling surfaces it later.
For the single-block path (blocks.Count == 1) you could skip the list entirely and use a stack-allocated local, but that optimization may not be worth the complexity given the surrounding allocation budget.
| // Mutates Head and writes the head hash without raising NewHeadBlock. | ||
| // UpdateMainChain raises the event itself after the ChainLevelInfoRepository write batch | ||
| // has been disposed (and therefore flushed) so subscribers always observe committed state. | ||
| private BlockEventArgs SetHeadBlock(Block block) |
There was a problem hiding this comment.
Low – naming: Set verb combined with a non-trivial return value is slightly surprising
SetHeadBlock both mutates Head / writes to _blockInfoDb and returns BlockEventArgs. The Set prefix usually implies a pure write with no return value. A name like PrepareHeadUpdate or MutateHeadAndCreateArgs would make the dual responsibility more obvious. Minor readability nit — the inline comment immediately above does help, so this is optional.
| onUpdateDbObserved = level?.HasBlockOnMainChain == true; | ||
| }; | ||
|
|
||
| blockTree.UpdateMainChain(new[] { block1, block2 }, wereProcessed: true); |
There was a problem hiding this comment.
Low – collection expression style
| blockTree.UpdateMainChain(new[] { block1, block2 }, wereProcessed: true); | |
| blockTree.UpdateMainChain([block1, block2], wereProcessed: true); |
Per the C# 14 / repo coding style, prefer collection expressions [...] over new[] { ... } for array literals.
| // observe the underlying RocksDB write batch as committed. Otherwise subscribers reading | ||
| // ChainLevelInfo through a path that bypasses _blockInfoCache (cache eviction, a separate | ||
| // repository instance, or a process restart in between) would see stale state. | ||
| List<DeferredMainChainEvent> pendingEvents = new(blocks.Count); |
| // Events are buffered and raised after the batch is disposed so that subscribers always | ||
| // observe the underlying RocksDB write batch as committed. Otherwise subscribers reading | ||
| // ChainLevelInfo through a path that bypasses _blockInfoCache (cache eviction, a separate | ||
| // repository instance, or a process restart in between) would see stale state. |
| private readonly struct DeferredMainChainEvent(BlockReplacementEventArgs blockAdded, BlockEventArgs? newHead) | ||
| { | ||
| public BlockReplacementEventArgs BlockAdded { get; } = blockAdded; | ||
| public BlockEventArgs? NewHead { get; } = newHead; | ||
| } |
There was a problem hiding this comment.
| private readonly struct DeferredMainChainEvent(BlockReplacementEventArgs blockAdded, BlockEventArgs? newHead) | |
| { | |
| public BlockReplacementEventArgs BlockAdded { get; } = blockAdded; | |
| public BlockEventArgs? NewHead { get; } = newHead; | |
| } | |
| private readonly record struct DeferredMainChainEvent(BlockReplacementEventArgs BlockAdded, BlockEventArgs? NewHead); |
| // Batch disposed above — write batch is now flushed to the underlying IDb. | ||
| // Raise the per-block events in their original relative order: NewHeadBlock first | ||
| // (for the block whose UpdateHeadBlock path ran), then BlockAddedToMain. | ||
| for (int i = 0; i < pendingEvents.Count; i++) |
| // Batch disposed above — write batch is now flushed to the underlying IDb. | ||
| // Raise the per-block events in their original relative order: NewHeadBlock first | ||
| // (for the block whose UpdateHeadBlock path ran), then BlockAddedToMain. |
`UpdateMainChain` opens a `ChainLevelInfoRepository` write batch and holds it open across `MoveToMain` for every block. With the previous ordering, `BlockAddedToMain`, `NewHeadBlock`, and `OnUpdateMainChain` all fired while the batch was still in scope. `PersistLevel` writes to the in-memory cache and to the batch simultaneously, so the bug was masked for any subscriber that happened to hit the cache. Once the cache evicts (CacheSize=64), a fresh repository reads the level directly from the underlying IDb, or a separate process restarts in between, subscribers observed stale `HasBlockOnMainChain` markers. Buffer the per-block `BlockAddedToMain` / `NewHeadBlock` event args inside `MoveToMain`, dispose the batch (which flushes the underlying write batch), then drain the buffered events and finally raise `OnUpdateMainChain`. The relative per-block order (`NewHeadBlock` then `BlockAddedToMain`) is preserved. `UpdateHeadBlock(Block)` is split into a pure state-mutating `SetHeadBlock` that returns the event args, plus a wrapper that keeps existing callers (`DeleteChainSlice`) unchanged. Regression test (`BlockTreeTests`) registers handlers that read each level through a fresh `ChainLevelInfoRepository` instance (empty cache) and asserts `HasBlockOnMainChain` is `true`. Pre-fix, this read hits the underlying `InMemoryWriteBatch`-backed db before the batch has flushed and fails for both `BlockAddedToMain` and `NewHeadBlock`. Fixes NethermindEth#11616
- use ArrayPoolListRef<DeferredMainChainEvent> to avoid List allocation on the hot path - convert DeferredMainChainEvent to a record struct - iterate the pending events over the underlying span - move TryUpdateSyncPivot() after the batch is disposed (write phase / notify phase) - drop redundant explanatory comments - use collection expression in the new BlockTreeTests regression test
577d27b to
767da5f
Compare
|
@claude review |
|
Claude finished @LukaszRozmej's task in 3m 10s —— View job Code Review
SummaryThis is a re-review of the refactoring commit (
Event ordering and semantics are preserved:
Critical: 0 | High: 0 | Medium: 0 | Low: 1 Low: stray double blank line between Verdict: Mergeable. The fix is sound, all prior feedback has been applied, and the only remaining item is a trivial whitespace nit. |
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
CS1002 build error from accidental duplicate method header.
Fixes #11616
Changes
BlockTree.UpdateMainChainopened aChainLevelInfoRepositorywrite batch and held it acrossMoveToMainfor every block in the slice.BlockAddedToMain,NewHeadBlock, andOnUpdateMainChainall fired while the batch was still in scope.PersistLevelwrites to_blockInfoCacheand the batch simultaneously, so the bug was masked for any subscriber hitting the cache; once the cache evicts (CacheSize = 64), a fresh repository reads through to the underlyingIDb, or a process restarts mid-flow, subscribers observe staleHasBlockOnMainChainmarkers.InMemoryWriteBatch/RocksDbWriteBatchonly commit onDispose().MoveToMainnow returns a bufferedDeferredMainChainEvent(BlockReplacementEventArgs+ optionalBlockEventArgsfrom the head-update path) instead of raising the events directly.UpdateHeadBlock(Block)is split into a pure state-mutatingSetHeadBlock(writes_blockInfoDb[HeadAddressInDb], returns the event args) and a thin wrapper that firesNewHeadBlock. Existing callers (DeleteChainSlice) are untouched and continue to fire the event after their own batch closes.UpdateMainChainwraps the batch in an innerusingblock, collects deferred events from eachMoveToMaincall, disposes the batch (flushing the underlying write batch), then raises the buffered events in their original relative order (per block:NewHeadBlock→BlockAddedToMain), and finallyOnUpdateMainChain.UpdateMainChain_fires_main_chain_events_after_chain_level_repository_batch_flushedinBlockTreeTests. Each handler constructs a freshChainLevelInfoRepositoryover the same underlying_blocksInfosDb(cache empty) and assertsLoadLevel(...).HasBlockOnMainChain == true. Pre-fix this fails for both per-block events becauseInMemoryWriteBatchhas not yet flushed.Types of changes
What types of changes does your code introduce?
Testing
Requires testing
If yes, did you write tests?
Notes on testing
dotnet test --filter FullyQualifiedName~BlockTreeTests— 245/245 inNethermind.Blockchain.Test, 38/38 inNethermind.Merge.Plugin.Test. The new regression test fails pre-fix and passes post-fix. Existing notification-ordering tests (Shall_notify_new_head_block_once_and_block_added_to_main_multiple_times...,BlockAddedToMain_should_have_updated_Head) still pass — relative event order andHeadvisibility from subscribers are preserved.Documentation
Requires documentation update
Requires explanation in Release Notes