fix(db): wait for in-flight timer callback in DbMetricsUpdater.Dispose#11270
Conversation
The previous Dispose() called the parameterless Timer.Dispose(), which does not wait for a callback that is already running. Because DbOnTheRocks.Dispose() disposes metrics updaters before the underlying RocksDb handle is released, an in-flight UpdateMetrics could touch db.GetProperty / dbOptions.GetStatisticsString after the native handle was freed. Switch to Timer.Dispose(WaitHandle) + WaitOne so the method returns only once any pending callback has completed. Also atomically null out _timer so Dispose is idempotent. Removes a stale comment claiming stats are not extracted (they are, by ProcessStatisticsString on the line above). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @asdacap's task in 2m 21s —— View job Code Review
SummaryThe fix correctly addresses a real use-after-free race: the old Findings
Medium — Unbounded
What looks good
|
Unbounded WaitOne() on the shutdown path could hang forever if UpdateMetrics is stuck inside a native RocksDB call (e.g. blocked on an internal RocksDB mutex during compaction), forcing a SIGKILL. Use a 5s timeout and log a warning if it fires — keeps shutdown responsive at the cost of giving up the use-after-free guard in the pathological hang case, which is the correct tradeoff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @asdacap's task in 2m 43s —— View job Code Review
SummaryThe two-commit PR correctly closes a real use-after-free race. The previous Medium finding (unbounded Disposal ordering confirmed: Findings
Low — Potential When Mitigating factors:
For a future hardening pass the WaitHandle could be left undisposed in the timeout branch (letting it be GC'd after the timer eventually releases it), but that's a minor improvement on an already-narrow path and is not blocking. What looks good
Verdict: mergeable. No Critical, High, or Medium findings remain. |
|
Is up to 5s freeze acceptable on shutdown? Sounds like too long. |
|
(This is Claude replying.) The 5 s is a worst-case upper bound, not a typical wait. In practice The bound is necessary to avoid a use-after-free: without waiting, the callback could still be running inside a native RocksDB call when If 5 s feels too long, 1 s is a reasonable tightening — it still covers any realistic stat-read latency and only triggers in a clearly broken/hung state where the node is already in trouble. Happy to reduce it if you prefer. |
I would go to 1s as things like docker might just kill the process if waiting is too long |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
#11270) * fix(db): wait for in-flight timer callback in DbMetricsUpdater.Dispose The previous Dispose() called the parameterless Timer.Dispose(), which does not wait for a callback that is already running. Because DbOnTheRocks.Dispose() disposes metrics updaters before the underlying RocksDb handle is released, an in-flight UpdateMetrics could touch db.GetProperty / dbOptions.GetStatisticsString after the native handle was freed. Switch to Timer.Dispose(WaitHandle) + WaitOne so the method returns only once any pending callback has completed. Also atomically null out _timer so Dispose is idempotent. Removes a stale comment claiming stats are not extracted (they are, by ProcessStatisticsString on the line above). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(db): bound DbMetricsUpdater dispose wait to 5s with warn log Unbounded WaitOne() on the shutdown path could hang forever if UpdateMetrics is stuck inside a native RocksDB call (e.g. blocked on an internal RocksDB mutex during compaction), forcing a SIGKILL. Use a 5s timeout and log a warning if it fires — keeps shutdown responsive at the cost of giving up the use-after-free guard in the pathological hang case, which is the correct tradeoff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(db): reduce DbMetricsUpdater dispose timeout to 1s Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Changes
DbMetricsUpdater.Dispose()now usesTimer.Dispose(WaitHandle)+WaitHandle.WaitOne()so it returns only after any runningUpdateMetricscallback has completed._timeris swapped atomically to make disposal idempotent.ProcessStatisticsStringpath that claimed stats were not being extracted — the line immediately above already extracts them.Why
DbOnTheRocks.Dispose()(DbOnTheRocks.cs:1473-1492) disposes_metricsUpdatersbefore callingReleaseUnmanagedResources()which destroys the nativeRocksDbhandle. The old nakedTimer.Dispose()returned immediately without waiting for pending callbacks, so anUpdateMetricstick that was already running could continue executing and calldb.GetProperty(...)/dbOptions.GetStatisticsString()on a freed native handle. The callback's managedtry/catchdoesn't reliably contain native-side use-after-free.Statistics collection is opt-in (
EnableDbStatistics+EnableMetricsUpdater, both defaultfalse) and the default dump period is 600 s, so the window is narrow — but the race is real and cheap to close.Types of changes
Testing
Requires testing
Notes on testing
The fix is a use of the standard .NET
Timer.Dispose(WaitHandle)guarantee. Existing parsing tests inNethermind.Db.Test/DbMetricsUpdaterTests.cs(22 tests) pass unchanged. A race-reproducing unit test would need a test double fordbanddbOptionsthat is faster than the timer — deemed not worth the complexity for a framework-level guarantee.Documentation
Requires documentation update
Requires explanation in Release Notes
🤖 Generated with Claude Code