Automatic child lifecycle management in StopWaiter#4536
Automatic child lifecycle management in StopWaiter#4536pmikolajczyk41 merged 50 commits intomasterfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #4536 +/- ##
==========================================
+ Coverage 33.99% 34.57% +0.58%
==========================================
Files 498 498
Lines 58994 58977 -17
==========================================
+ Hits 20053 20392 +339
+ Misses 35314 34965 -349
+ Partials 3627 3620 -7 |
❌ 12 Tests Failed:
View the top 3 failed tests by shortest run time
📣 Thoughts on this report? Let Codecov know! | Powered by Codecov |
eljobe
left a comment
There was a problem hiding this comment.
Automated PR Review (Claude pr-review-toolkit)
Note: These findings were generated by Claude's pr-review-toolkit agents and have not been checked for accuracy by Pepper. Please evaluate each comment on its merits.
Overview
Excellent infrastructure PR. The TrackChild/StartAndTrackChild mechanism is well-designed with proper mutual exclusion, atomic ChildrenTaken guarding, and thorough test coverage. It directly fixes several real bugs (leaked children in AddressFilter, BidValidator, ExecutionEngine, BroadcastClients, ValidationServer). All 15+ migration sites were verified for correct shutdown ordering.
Strengths
- Directly fixes real child-leak bugs across multiple components
- Clean LIFO mechanism with
takeChildren()+ChildrenTakenflag - Late-tracking safety net (immediate stop after shutdown) handles dynamic
startBoldSpawner - Excellent test suite covering LIFO order, concurrency, grandchild hierarchy
- All
ValidatorWalletimplementations updated withStopOnly() - Careful handling of exceptions (BroadcastClients secondary, MultiProtocolStaker wallet)
Summary
| Severity | Count |
|---|---|
| Critical | 1 |
| Important | 5 |
| Suggestions | 5 |
| Test gaps | 3 |
See inline comments for details.
|
Approved, but, it would be good to understand why the default-B tests are failing. |
just flakiness, rerun helped |
StopWaiter API: TrackChild / StartAndTrackChild
Added two methods to
StopWaiterfor automatic child lifecycle management:TrackChild(child)— registers a child for automatic shutdown in LIFO orderStartAndTrackChild(child)— starts a child with the parent's managed context and tracks itTracked children are automatically stopped during
StopOnlyand waited on duringStopAndWait. Children are taken atomically (guarded byChildrenTakenflag) to prevent gaps between take and stop.TrackChildafter shutdown immediately stops the child. Adopted across 15+ call sites, removing most customStopAndWaitoverrides.Bug fixes discovered during migration
AddressFilter.FilterService: childaddressCheckerwas started but never stopped — now tracked viaStartAndTrackChildBidValidator: childproducerwas started but never stopped — now tracked viaStartAndTrackChildExecutionEngine: childtransactionFiltererRPCClientwas started (externally) but only stopped if non-nil in a manual override — now tracked viaTrackChild, which handles nil-safety implicitlyExpressLaneService:expressLaneTrackerhad unclear ownership — started byExecutionNodebut stopped byExpressLaneService. Now consistently owned (started and tracked) byExecutionNodeBroadcastClients: primary clients and routers were started with the parent's context but never explicitly tracked for shutdown — now tracked viaTrackChildValidationServer(redis consumer):boldSpawnerwas started dynamically from a background goroutine but could be missed during shutdown ifStopAndWaitwas called before it was created — now tracked viaStartAndTrackChild, and late-tracked children are immediately stopped via theChildrenTakensafety net