fix: blocks reexecutor panic recovery and shutdown error suppression#4531
fix: blocks reexecutor panic recovery and shutdown error suppression#4531
Conversation
- Add handleContextOrFatal to suppress context errors during shutdown - Wrap AdvanceStateByBlock with recover() to convert panics from concurrent trie access races into errors, preventing database corruption from abnormal process termination - Add unit tests for handleContextOrFatal behavior Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #4531 +/- ##
==========================================
+ Coverage 32.68% 34.41% +1.73%
==========================================
Files 497 497
Lines 58890 58897 +7
==========================================
+ Hits 19247 20272 +1025
+ Misses 36272 35009 -1263
- Partials 3371 3616 +245 |
|
Is this solving a known issue, we or someone we know has faced? If it isn't would you be fine in I got #4528 in first then I reviewed this. I am just asking because it would take me some time to properly review this, and unless I properly reviewed this I am not sure if this change makes sense. If this PR was fully driven by claude (not sure if it was) I would need to review these changes a lot more thoroughly I think |
❌ 8 Tests Failed:
View the top 3 failed tests by shortest run time
📣 Thoughts on this report? Let Codecov know! | Powered by Codecov |
| if r := recover(); r != nil { | ||
| log.Error("panic during block re-execution", "block", blockToRecreate, "recover", r, "stack", string(debug.Stack())) | ||
| state = nil | ||
| err = fmt.Errorf("panic during block re-execution at block %d: %v", blockToRecreate, r) | ||
| } | ||
| }() | ||
| state, block, receipts, err = arbitrum.AdvanceStateByBlock(ctx, s.blockchain, state, blockToRecreate, prevHash, nil, vmConfig) | ||
| }() |
There was a problem hiding this comment.
This is super cool, can we write a test for this? maybe have something inside AdvanceStateByBlock be nil and calling that would cause a panic and test that advanceStateUpToBlock would recover and return the expected error
# Conflicts: # blocks_reexecutor/blocks_reexecutor.go
…races - Wrap AdvanceStateByBlock in recover() to convert panics from concurrent trie-cache eviction races into errors, preventing abnormal process termination - Add unit tests for reportFatalErr (basic, channel-full, multiple error types) and panic recovery in advanceStateUpToBlock Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rReExecution tests - Restructure error check in LaunchBlocksReExecution goroutine so the success log only emits when err is nil, not when ctx is cancelled - Break Start's block-range loop on ctx cancellation, not just fatalReported - Use exhaustive struct initialization in tests to satisfy custom linter - Add unit tests for all three WaitForReExecution select branches Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Config.Validate, Impl, and wrapFatalErr unit tests for blocks reexecutor. Cover all Validate branches (mode, blocks JSON, block ranges, room), Impl early-exit paths (fatalReported pre-set, no work), WaitForReExecution with both channels ready, and wrapFatalErr error wrapping. Also log suppressed errors in goroutineErrorf so they are visible in verbose test output when a background goroutine's error is suppressed because the context is already cancelled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CI linter requires all struct fields to be specified in composite literals. Add the 5 missing zero-value fields (CommitStateToDisk, MinBlocksPerThread, TrieCleanLimit, ValidateMultiGas, blocks) to all Config literals in the test file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
concurrent trie access races into errors, preventing database
corruption from abnormal process termination
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com