fix: skip AuRa finalization startup walk on post-merge chains#11306
Conversation
AuRaBlockFinalizationManager ran a catch-up walk from head in its constructor, allocating a List<BlockHeader> that grew to 10M+ entries on post-merge Gnosis after upgrading to 1.37.0. The walk is meaningless post-merge (finalization is handled by the consensus layer) but previously still executed because the inner manager had no way to know it was wrapped by AuRaMergeFinalizationManager. Move Initialize out of the constructor into SetMainBlockBranchProcessor with an idempotency guard, and add an InitializeBlockchainAuRaMerge override that checks IPoSSwitcher and skips the wiring call after the terminal block has been reached. Pre-merge and pure-AuRa chains are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @LukaszRozmej's task in 10m 20s —— View job PR Review
SummaryThe fix correctly addresses the root cause: The approach is sound: defer the walk to FindingsCritical: 0 | High: 0 | Medium: 0 | Low: 3 Low —
|
| Initialize(); | ||
| } | ||
|
|
||
| public void SetMainBlockBranchProcessor(IBranchProcessor branchProcessor) |
There was a problem hiding this comment.
Low — LastFinalizedBlockLevel window is 0 until SetMainBlockBranchProcessor fires
Before this PR, Initialize() ran in the constructor, so LastFinalizedBlockLevel was correct by the time the DI container resolved IAuRaValidator (during base.InitBlockchain()). After this PR, the field stays at 0 until SetMainBlockBranchProcessor is called — which is after base.InitBlockchain() creates MultiValidator.
MultiValidator.SetFinalizationManager reads _blockFinalizationManager.LastFinalizedBlockLevel to call InitCurrentValidator, so it momentarily picks the wrong (genesis) validator when a non-immediately-changing validator type (e.g. Contract) has already been activated.
In practice there is no production impact: StartBlockProcessor depends on InitializeBlockchain completing, and WireFinalizationBranchProcessor (which calls SetMainBlockBranchProcessor → Initialize()) runs before the step returns. So the state is corrected before any block is ever processed. MultiValidator.OnBlockProcessingStart also re-evaluates on the first non-consecutive block regardless.
Worth being aware of if anyone adds code that reads LastFinalizedBlockLevel during the DI resolution window inside InitBlockchain.
| _branchProcessor = branchProcessor; | ||
| _branchProcessor.BlockProcessed += OnBlockProcessed; | ||
| _branchProcessor.BlocksProcessing += OnBlocksProcessing; | ||
| if (!_initialized) |
There was a problem hiding this comment.
Low — silent drop when called with a different processor after initialization
The guard makes a second call with the same processor idempotent (documented by the new test). But if a caller ever passes a different IBranchProcessor instance, it is silently dropped with no diagnostic — the new processor is never subscribed and the existing one stays wired. The current call-graph makes this impossible today (only one call-site, and the merge-plugin guard prevents it from reaching here twice), but the silent failure mode is easy to introduce accidentally.
Consider either throwing InvalidOperationException when called with a different instance, or adding an ArgumentException guard:
if (_initialized)
{
if (!ReferenceEquals(_branchProcessor, branchProcessor))
throw new InvalidOperationException("SetMainBlockBranchProcessor called with a different processor after initialization.");
return;
}|
|
||
| public void SetMainBlockBranchProcessor(IBranchProcessor branchProcessor) => | ||
| _auRaBlockFinalizationManager.SetMainBlockBranchProcessor(branchProcessor); | ||
| public void SetMainBlockBranchProcessor(IBranchProcessor branchProcessor) |
There was a problem hiding this comment.
Observation — this guard is not exercised in the current call flow
SetMainBlockBranchProcessor is called from InitializeBlockchainAuRa.WireFinalizationBranchProcessor() (at the end of InitBlockchain), which operates on api.FinalizationManager when it is still the raw AuRaBlockFinalizationManager. The AuRaMergeFinalizationManager wrapper is only installed later, in MergePlugin.InitNetworkProtocol (which runs inside InitializeNetwork, a step that depends on InitializeBlockchain completing).
So this guard can't be hit by the normal startup path — it's belt-and-suspenders in case something calls SetMainBlockBranchProcessor through the wrapper after startup. That's a reasonable safety net; just noting the actual code path for clarity.
…t processor Guard against a subtle silent-drop if a future caller ever passes a different IBranchProcessor instance after initialization. Also drops a redundant blank line between SetMainBlockBranchProcessor and Initialize. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks for the review. Addressed in a2f1397:
|
|
@claude re-review |
|
Claude finished @LukaszRozmej's task in 1m 56s —— View job Re-review
All three low findings from the initial review have been addressed. Verification: Blank line nit — Fixed. Silent drop on different processor — Fixed correctly. Lines 41–43 do a
No new issues found. The three-layer defense (skip in Critical: 0 | High: 0 | Medium: 0 | Verdict: Mergeable. |
…-rc2 backport) (#11307) fix: skip AuRa finalization startup walk on post-merge chains AuRaBlockFinalizationManager ran a catch-up walk from head in its constructor, allocating a List<BlockHeader> that grew to 10M+ entries on post-merge Gnosis after upgrading to 1.37.0. The walk is meaningless post-merge (finalization is handled by the consensus layer) but previously still executed because the inner manager had no way to know it was wrapped by AuRaMergeFinalizationManager. Move Initialize out of the constructor into SetMainBlockBranchProcessor with an idempotency guard, and add an InitializeBlockchainAuRaMerge override that checks IPoSSwitcher and skips the wiring call after the terminal block has been reached. Pre-merge and pure-AuRa chains are unaffected. Backport of #11306 (master) to release/1.37.0-rc2. The release branch is pre-#11062 so the fix was re-implemented by hand for the older code structure. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Deferring Initialize out of the ctor left LastFinalizedBlockLevel at the C# long default (0) during DI resolution. MultiValidator.SetFinalizationManager reads that value and passes it to InitCurrentValidator to pick the active validator — selecting a different validator at level 0 vs -1 caused divergent block rewards on archive sync once a contract-based transition was crossed, reproduced as an InvalidStateRoot at block 1301 on Gnosis. Split Initialize in two: the backward ChainLevelInfo scan that sets LastFinalizedBlockLevel runs eagerly via a static field initializer (reads metadata only, no BlockHeader allocations, bounded on post-merge chains by the first finalized ancestor), while the FinalizeBlocks(head) catch-up walk stays in SetMainBlockBranchProcessor so post-merge chains still skip it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
IPoSSwitcher.HasEverReachedTerminalBlock() is true on a fresh archive DB as soon as Merge.FinalTotalDifficulty is set in config (see PoSSwitcher.Initialize), even with head at genesis. On Gnosis archive that caused AuRaMergeFinalizationManager to dispose the inner manager in its ctor and InitializeBlockchainAuRaMerge to skip wiring — so the validator-set transition at block 1300 never fired and block 1301 was rejected with InvalidStateRoot. Replace the three HasEverReachedTerminalBlock() checks (ctor, wrapper SetMainBlockBranchProcessor, InitializeBlockchainAuRaMerge) with a head-based IsPostMerge(head) probe via new IsHeadPostMerge extension. Pre-merge heads (archive sync from genesis, or restart mid-sync before the merge) now wire correctly. The original post-merge startup-walk optimization is preserved. Inject IBlockTree into AuRaMergeFinalizationManager for the ctor and method-level checks. Regression test: Fresh_archive_with_FinalTotalDifficulty_in_config_still_wires_pre_merge_finalization covers the Gnosis-archive case (HasEverReachedTerminalBlock=true, head=genesis, IsPostMerge(genesis)=false). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@claude re-review |
|
Claude finished @LukaszRozmej's task in 3m 37s —— View job Re-review
Two new commits since the prior re-review. Both address regressions discovered in testing against the Gnosis archive case; both are clean. 4f9ddef — eager
|
| Scenario | HasEverReachedTerminalBlock() |
IsHeadPostMerge() |
Correct? |
|---|---|---|---|
| Fresh archive, head=genesis | true (FTD in config) |
false (genesis TD < TTD) |
✅ now correct |
| Running node, head post-merge | true |
true |
✅ unchanged |
| Mid-sync restart, head pre-merge | true or false |
false |
✅ now correct |
| Null head (empty DB) | N/A | false (null guard) |
✅ correct |
The IsPostMerge(header) path for genesis: TotalDifficulty < TTD → isPostMerge = false. Correct.
Regression test Fresh_archive_with_FinalTotalDifficulty_in_config_still_wires_pre_merge_finalization directly covers the Gnosis-archive scenario with HasEverReachedTerminalBlock=true and genesis head. The SetHead helper cleanly separates mock setup from test logic.
No new issues found. The fix is layered correctly across all three guard sites.
Critical: 0 | High: 0 | Medium: 0 | Verdict: Mergeable. ✅
Consolidate the head-based post-merge check into a single PoSSwitcherExtensions.IsHeadPostMerge(IBlockTree) extension so the two callers (AuRaMergeFinalizationManager.SetMainBlockBranchProcessor and InitializeBlockchainAuRaMerge.WireFinalizationBranchProcessor) share one implementation and intent. Matches the refactor applied to the master-targeting PR (#11306). Also trims the verbose inline comments — the rationale for not using HasEverReachedTerminalBlock() lives on the helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Consolidate the head-based post-merge check into a single PoSSwitcherExtensions.IsHeadPostMerge(IBlockTree) extension so the two callers (AuRaMergeFinalizationManager.SetMainBlockBranchProcessor and InitializeBlockchainAuRaMerge.WireFinalizationBranchProcessor) share one implementation and intent. Matches the refactor applied to the master-targeting PR (#11306). Also trims the verbose inline comments — the rationale for not using HasEverReachedTerminalBlock() lives on the helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…11313) * fix: guard AuRa finalization wiring on head state, not config flag baf6fc1 skipped AuRa finalization wiring whenever IPoSSwitcher.HasEverReachedTerminalBlock() returned true. That flag is true on a completely fresh archive DB as soon as Merge.FinalTotalDifficulty is set in config (gnosis_archive.json), even with head at genesis — see PoSSwitcher.Initialize lines 82-83. On Gnosis archive that caused SetMainBlockBranchProcessor to never be called, so the inner AuRaBlockFinalizationManager never subscribed to BlockProcessed / BlocksProcessing and never fired BlocksFinalized. The MultiValidator switch to the safeContract validator at block 1300 never happened, ContractBasedValidator.FinalizeChange was not invoked for block 1301, and the node rejected block 1301 with InvalidStateRoot. Use the current head's post-merge status instead. Pre-merge heads (fresh archive from genesis, or a restart mid-sync before the merge) now wire AuRa finalization correctly. The original optimization — skipping the expensive startup walk when resuming on a post-merge head — is preserved. Also inject IBlockTree into AuRaMergeFinalizationManager so its (currently unreachable in production, but covered by tests) SetMainBlockBranchProcessor guard uses the same head-based check. Regression tests: - AuRaMergeFinalizationManagerTests: covers null head, pre-merge head, post-merge head, and the archive case where HasEverReachedTerminalBlock is true but head is still genesis. - AuRaBlockFinalizationManagerTests: idempotency + different-processor guard, mirroring the master-branch coverage missing from 1.37.0-rc2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: extract IsHeadPostMerge helper and trim guard comments Consolidate the head-based post-merge check into a single PoSSwitcherExtensions.IsHeadPostMerge(IBlockTree) extension so the two callers (AuRaMergeFinalizationManager.SetMainBlockBranchProcessor and InitializeBlockchainAuRaMerge.WireFinalizationBranchProcessor) share one implementation and intent. Matches the refactor applied to the master-targeting PR (#11306). Also trims the verbose inline comments — the rationale for not using HasEverReachedTerminalBlock() lives on the helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: lukasz.rozmej <lukasz@nethermind.io>
* fix: skip AuRa finalization startup walk on post-merge chains AuRaBlockFinalizationManager ran a catch-up walk from head in its constructor, allocating a List<BlockHeader> that grew to 10M+ entries on post-merge Gnosis after upgrading to 1.37.0. The walk is meaningless post-merge (finalization is handled by the consensus layer) but previously still executed because the inner manager had no way to know it was wrapped by AuRaMergeFinalizationManager. Move Initialize out of the constructor into SetMainBlockBranchProcessor with an idempotency guard, and add an InitializeBlockchainAuRaMerge override that checks IPoSSwitcher and skips the wiring call after the terminal block has been reached. Pre-merge and pure-AuRa chains are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * address PR review: throw on SetMainBlockBranchProcessor with different processor Guard against a subtle silent-drop if a future caller ever passes a different IBranchProcessor instance after initialization. Also drops a redundant blank line between SetMainBlockBranchProcessor and Initialize. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: eager LastFinalizedBlockLevel load to unbreak archive sync Deferring Initialize out of the ctor left LastFinalizedBlockLevel at the C# long default (0) during DI resolution. MultiValidator.SetFinalizationManager reads that value and passes it to InitCurrentValidator to pick the active validator — selecting a different validator at level 0 vs -1 caused divergent block rewards on archive sync once a contract-based transition was crossed, reproduced as an InvalidStateRoot at block 1301 on Gnosis. Split Initialize in two: the backward ChainLevelInfo scan that sets LastFinalizedBlockLevel runs eagerly via a static field initializer (reads metadata only, no BlockHeader allocations, bounded on post-merge chains by the first finalized ancestor), while the FinalizeBlocks(head) catch-up walk stays in SetMainBlockBranchProcessor so post-merge chains still skip it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: guard AuRa finalization wiring on head state, not config flag IPoSSwitcher.HasEverReachedTerminalBlock() is true on a fresh archive DB as soon as Merge.FinalTotalDifficulty is set in config (see PoSSwitcher.Initialize), even with head at genesis. On Gnosis archive that caused AuRaMergeFinalizationManager to dispose the inner manager in its ctor and InitializeBlockchainAuRaMerge to skip wiring — so the validator-set transition at block 1300 never fired and block 1301 was rejected with InvalidStateRoot. Replace the three HasEverReachedTerminalBlock() checks (ctor, wrapper SetMainBlockBranchProcessor, InitializeBlockchainAuRaMerge) with a head-based IsPostMerge(head) probe via new IsHeadPostMerge extension. Pre-merge heads (archive sync from genesis, or restart mid-sync before the merge) now wire correctly. The original post-merge startup-walk optimization is preserved. Inject IBlockTree into AuRaMergeFinalizationManager for the ctor and method-level checks. Regression test: Fresh_archive_with_FinalTotalDifficulty_in_config_still_wires_pre_merge_finalization covers the Gnosis-archive case (HasEverReachedTerminalBlock=true, head=genesis, IsPostMerge(genesis)=false). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
AuRaBlockFinalizationManagerwas still doing its startup catch-up walk, accumulating every un-finalized ancestor into a localList<BlockHeader>. Heap dumps from affected nodes showed the backing array at 16,777,216 entries (~10GB retained).SetMainBlockBranchProcessorwith an idempotency guard, and addedInitializeBlockchainAuRaMergethat skips wiring the branch processor onceIPoSSwitcher.HasEverReachedTerminalBlock()is true.AuRaMergeFinalizationManager.SetMainBlockBranchProcessoralso skips forwarding post-merge as a belt-and-suspenders guard.Test plan
Nethermind.AuRa.Test.AuRaBlockFinalizationManagerTests— 29/29 pass, including newrepeated_SetMainBlockBranchProcessor_is_idempotent.Nethermind.Merge.AuRa.Test.AuRaMergeFinalizationManagerTests— 5/5 pass, including newSetMainBlockBranchProcessor_forwards_only_pre_mergeTestCase.🤖 Generated with Claude Code