feat: add SkipMetricsTracking property to DbSettings#11515
Conversation
- Introduced SkipMetricsTracking property in DbSettings to control metrics tracking for specific databases. - Updated FullPruningInnerDbFactory to set SkipMetricsTracking to true for inner databases to prevent stale references after pruning. - Added unit tests to verify the behavior of metrics tracking based on the new property. - Enhanced DbMonitoringModule to respect the SkipMetricsTracking setting when adding databases to the tracker.
|
Claude finished @cbermudez97's task in 3m 42s —— View job PR Review:
|
| Severity | Count |
|---|---|
| Critical | 0 |
| High | 0 |
| Medium | 0 |
| Low | 3 |
Low — SkipMetricsTracking lacks XML docs (DbSettings.cs:15)
The coding style guide requires <summary> on all public APIs. The semantics here are non-obvious (callers don't know what it affects), so a one-liner doc would help. Other properties in the same class are also undocumented, so this is consistent with the existing pattern.
Low — TryAdd silently drops duplicate registrations (WorldStateModule.cs:45)
Not a production issue (singletons aren't recreated), but worth being aware of in test teardown/rebuild scenarios.
Low — No end-to-end regression test for the stale-reference bug (DbTrackerTests.cs:81)
TestSkipMetricsTracking validates the mechanism, but there's no test that exercises the full scenario: create FullPruningDb → trigger pruning cycle via TryStartPruning/Commit → assert tracker still reads from the new inner DB. The unit test is sufficient to merge, but an integration test would give stronger confidence.
Verdict: ready to merge. All findings are low-severity and none block correctness.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a SkipMetricsTracking flag to DbSettings and updates DB monitoring to avoid tracking ephemeral inner DB instances created during full pruning, preventing stale references and improving resilience when metric gathering fails.
Changes:
- Add
DbSettings.SkipMetricsTrackingand honor it inDbMonitoringModule’s DB factory interceptor. - Mark full-pruning inner DB instances as skipped, and register the stable
FullPruningDbwrapper with the tracker. - Add unit tests for skip-tracking behavior and for per-DB metric gather exception isolation.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Nethermind/Nethermind.Init/Modules/WorldStateModule.cs | Registers the FullPruningDb wrapper in the tracker to keep metrics tied to the active inner DB across prune cycles. |
| src/Nethermind/Nethermind.Init/Modules/DbMonitoringModule.cs | Skips tracking based on SkipMetricsTracking and makes metric updates resilient to per-DB exceptions. |
| src/Nethermind/Nethermind.Db/FullPruning/FullPruningInnerDbFactory.cs | Sets SkipMetricsTracking = true for indexed inner DB instances. |
| src/Nethermind/Nethermind.Db/DbSettings.cs | Introduces the SkipMetricsTracking property. |
| src/Nethermind/Nethermind.Db.Test/Nethermind.Db.Test.csproj | Adds test dependency on Nethermind.Init to access DbMonitoringModule/tracker types. |
| src/Nethermind/Nethermind.Db.Test/DbTrackerTests.cs | Adds tests for skip-tracking and exception isolation during metrics update. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add XML doc to DbSettings.SkipMetricsTracking property - Clarify WorldStateModule comment for both FullPruningInnerDbFactory and MemDbFactory branches - DbMonitoringModule: clear stale dictionary entries on GatherMetric failure and log only once per failure streak (with recovery info log) - DbTrackerTests: add [TearDown] to reset shared static metrics keys, collapse double enumeration in TestSkipMetricsTracking, and add FullPruningDbTrackedWrapper_SurvivesPruningCycle integration test Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: add SkipMetricsTracking property to DbSettings - Introduced SkipMetricsTracking property in DbSettings to control metrics tracking for specific databases. - Updated FullPruningInnerDbFactory to set SkipMetricsTracking to true for inner databases to prevent stale references after pruning. - Added unit tests to verify the behavior of metrics tracking based on the new property. - Enhanced DbMonitoringModule to respect the SkipMetricsTracking setting when adding databases to the tracker. * fix: address PR feedback for db metrics tracking - Add XML doc to DbSettings.SkipMetricsTracking property - Clarify WorldStateModule comment for both FullPruningInnerDbFactory and MemDbFactory branches - DbMonitoringModule: clear stale dictionary entries on GatherMetric failure and log only once per failure streak (with recovery info log) - DbTrackerTests: add [TearDown] to reset shared static metrics keys, collapse double enumeration in TestSkipMetricsTracking, and add FullPruningDbTrackedWrapper_SurvivesPruningCycle integration test Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: dedupe DbTrackerTests container setup and metric-map iteration Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: Initialize _failingDbs with an empty HashSet --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: lukasz.rozmej <lukasz.rozmej@gmail.com>
* feat: add SkipMetricsTracking property to DbSettings - Introduced SkipMetricsTracking property in DbSettings to control metrics tracking for specific databases. - Updated FullPruningInnerDbFactory to set SkipMetricsTracking to true for inner databases to prevent stale references after pruning. - Added unit tests to verify the behavior of metrics tracking based on the new property. - Enhanced DbMonitoringModule to respect the SkipMetricsTracking setting when adding databases to the tracker. * fix: address PR feedback for db metrics tracking - Add XML doc to DbSettings.SkipMetricsTracking property - Clarify WorldStateModule comment for both FullPruningInnerDbFactory and MemDbFactory branches - DbMonitoringModule: clear stale dictionary entries on GatherMetric failure and log only once per failure streak (with recovery info log) - DbTrackerTests: add [TearDown] to reset shared static metrics keys, collapse double enumeration in TestSkipMetricsTracking, and add FullPruningDbTrackedWrapper_SurvivesPruningCycle integration test Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: dedupe DbTrackerTests container setup and metric-map iteration Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: Initialize _failingDbs with an empty HashSet --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: lukasz.rozmej <lukasz.rozmej@gmail.com>
* RLP tx decoding fixes (#11496) * Handle null txs * PR feedback * Revert `BlobTxStorage.TryDecodeFullTx` changes Silent `false` may be worse that NRE on DB corruption --------- Co-authored-by: Alexey Osipov <me@flcl.me> * eth/71 (#10844) * Improve eth/70 checks (#11456) * Improve eth/70 verification * Comments * Gas remake * Fix full sync * fix: prevent negative RequestSize crash when beacon pivot destination advances mid-sync (#11478) * fix: prevent negative RequestSize crash when beacon pivot destination advances mid-sync `HeadersSyncFeed.ShouldBuildANewBatch` checked `_lowestRequestedHeaderNumber == HeadersDestinationNumber`. For beacon headers, `HeadersDestinationNumber` is `BeaconPivot.PivotDestinationNumber`, which tracks `Head.Number - Reorganization.MaxDepth + 1` and so advances upward as the chain head progresses. When it stepped above `_lowestRequestedHeaderNumber` mid-sync, the `==` check missed it, `BuildNewBatch` produced a negative `RequestSize`, and `HeaderStore.FindReversedHeaders` crashed with `ArgumentOutOfRangeException` on `new Dictionary<>(negativeCount)`. Widen the guard to `<=` and add a regression test that reproduces the scenario via mocked `IBeaconPivot`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: shorten inline comments per review Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add new default for gnosis and gnosis archive config (#11269) feat: add Db.SkipCheckingSstFileSizesOnDbOpen=true default for gnosis and gnosis archive * Alchemy - Code Fix (#11714) * feat: add SkipMetricsTracking property to DbSettings (#11515) * feat: add SkipMetricsTracking property to DbSettings - Introduced SkipMetricsTracking property in DbSettings to control metrics tracking for specific databases. - Updated FullPruningInnerDbFactory to set SkipMetricsTracking to true for inner databases to prevent stale references after pruning. - Added unit tests to verify the behavior of metrics tracking based on the new property. - Enhanced DbMonitoringModule to respect the SkipMetricsTracking setting when adding databases to the tracker. * fix: address PR feedback for db metrics tracking - Add XML doc to DbSettings.SkipMetricsTracking property - Clarify WorldStateModule comment for both FullPruningInnerDbFactory and MemDbFactory branches - DbMonitoringModule: clear stale dictionary entries on GatherMetric failure and log only once per failure streak (with recovery info log) - DbTrackerTests: add [TearDown] to reset shared static metrics keys, collapse double enumeration in TestSkipMetricsTracking, and add FullPruningDbTrackedWrapper_SurvivesPruningCycle integration test Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: dedupe DbTrackerTests container setup and metric-map iteration Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: Initialize _failingDbs with an empty HashSet --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: lukasz.rozmej <lukasz.rozmej@gmail.com> * Fix DbTracker repeatedly logging ObjectDisposedException after disposal (#11720) * Fix DbTracker repeatedly logging ObjectDisposedException after disposal When the Autofac LifetimeScope (or the shared cache SafeHandle) is disposed while MonitoringService's timer is still scheduled, `_sharedBlockCache.Value` in `UpdateDbMetrics` throws `ObjectDisposedException` via Autofac's LazyRegistrationSource. The generic catch logs it at Error and the callback stays registered, so the same exception re-fires on every metric interval — producing dozens of identical errors per minute on affected nodes. Catch `ObjectDisposedException` explicitly and short-circuit subsequent ticks via a `_stopped` flag. Adds a regression test that disposes the container and asserts the callback neither throws nor logs on repeated invocations. Fixes #11719 * Address review: debug-log first stop, drop redundant CreateDb in test - Log at Debug level in the new `ObjectDisposedException` branch so there is a (no-cost on production) signal that DbTracker has stopped updating metrics, rather than only inferring it from the absence of further Error logs. - Remove the duplicate `CreateDb` call in the regression test — the helper `ConfigureMetricUpdater` already registers the test DB. - Disable `TestLogger.IsDebug` in the regression test so the new Debug message does not trip the `LogList.Should().BeEmpty()` assertion; the test still asserts no Error-level spam, which was the bug. * Address review: make DbTracker IDisposable, drop redundant comment - Implement IDisposable on DbTracker so Autofac proactively sets _stopped during scope teardown, short-circuiting subsequent monitoring ticks before they touch disposed resources. The catch (ObjectDisposedException) remains as a backstop for the race where a tick is already executing when Dispose runs. - Mark _stopped as volatile since it is now written from the disposing thread and read from the monitoring timer thread. - Drop the inline comment in the catch block; the Debug log message already conveys the same information. * Fix Eth69/Eth70 receipt tests for null-means-unknown contract After dropping the FindHeader pre-check, the response loop relies solely on GetReceipts returning null to detect an unknown block. The two "unknown block hash" tests still mocked the old contract (FindHeader returns null + GetReceipts returns []), so the loop saw [] as a legitimate zero-tx block and kept going instead of breaking. Update the mocks to return null for unknown hashes, matching the ISyncServer.GetReceipts contract (null = unknown, [] = exists w/ 0 txs). * Make EraE tests visible and green (#11727) * fix(eth/70): reject null receipt payloads (#11615) * fix(eth/70): reject null receipt payloads * fix(eth/70): validate receipt payloads while decoding * refactor(eth): move null receipt validation into base serializer Apply the validation in V63 ReceiptsMessageSerializer so eth/63, eth/66, eth/69 and eth/70 all reject null receipt payloads at decode time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: lukasz.rozmej <lukasz.rozmej@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit fd2fd25) * fix(eth/70): stop response early when block has txs but no receipts FulfillReceiptsRequest used to emit `txReceipts.Add([])` for any block where SyncServer.GetReceipts(hash) returned empty, regardless of whether the block actually had zero transactions. The eth/70 receiver validates segment-complete responses against its own transaction count and throws SubprotocolException ("Receipt count mismatch with block transactions count") + disconnects the peer when an [] arrives for a block that locally has transactions. That made a node that is briefly without receipts (e.g. still syncing receipts, or its receipt store is pruned for the requested block) appear malicious to the requester. We observed this disconnect-storm pattern materially starving receipt-sync on small networks. Distinguish the two cases at the sender by looking up the block: - block is null or body is missing → we can't safely claim anything; break the response and let the requester ask another peer - block.Transactions.Length > 0 → same: we don't actually have the receipts even though we have the body; break - block.Transactions.Length == 0 → block is legitimately empty; emit [] as before Update Should_return_empty_receipts_block_when_local_block_has_no_receipts → ..._has_no_transactions to reflect the new precondition, and stub SyncServer.Find on two pre-existing empty-receipts-in-the-middle tests so they still represent the legitimate empty case. Add new regression test Should_stop_response_when_local_block_has_transactions_but_no_receipts covering the bug. Closes #11752. (cherry picked from commit f0f6ea2) * refactor(eth/70): disambiguate "unknown" vs "legit empty" in ISyncServer.GetReceipts Following @LukaszRozmej's review suggestion on #11752: rather than have the protocol handler do a second SyncServer.Find lookup to figure out whether an empty receipts array means "block has zero transactions" or "I don't have the receipts yet", push the disambiguation down to where the data lives. ISyncServer.GetReceipts now returns TxReceipt[]?: null → receipts are not known locally (block missing, body missing, or receipts not stored). Callers MUST NOT emit [] on the wire. empty [] → block is known and legitimately has zero transactions. non-empty → receipts for an executed block. SyncServer.GetReceipts implements the three cases directly: - blockHash is null OR block not found OR block body missing → null - block.Transactions.Length == 0 → [] - block has txs, receipts not stored → null - block has txs and receipts → receipts Eth70ProtocolHandler.FulfillReceiptsRequest is now a single null-check instead of the previous Find-then-classify dance. SyncPeerProtocolHandlerBase.Fulfill (eth/63-69 path) gets the same fix for free — same bug, same one-line guard. Tests: replace the Find-based stubs with GetReceipts-returning-null stubs and update OldStyleFullSynchronizerTests.Can_retrieve_empty_receipts to assert the new contract (genesis → BeEmpty; unknown blocks → BeNull). Closes #11752. (cherry picked from commit 1d880be) * Apply suggestions from code review Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com> (cherry picked from commit c01295b) * Apply suggestion from @LukaszRozmej (cherry picked from commit 27258b9) * Drop redundant FindHeader pre-check in receipt response loop GetReceipts now returns null for unknown blocks (block missing, body missing, or receipts not stored), so the up-front FindHeader call before GetReceipts is redundant — the `if (receipts is null) break;` below it already handles the unknown case. Per @flcl42 review on #11754. (cherry picked from commit 742cb1a) * Fix shutdown race in SnapProvider PLINQ (closes #11806) (#11807) * Unwrap AggregateException(ObjectDisposedException) from snap PLINQ on shutdown When the node is stopped during snap sync, SnapProvider.AddAccountRange's parallel code-existence check (codeHashes.AsParallel().Where(_codeDb.KeyExists)) races RocksDB disposal in the DI container teardown. The resulting ObjectDisposedException is wrapped by PLINQ in an AggregateException, which falls past the snap dispatcher's existing `catch (ObjectDisposedException) → Info("Ignoring sync response as the DB has already closed.")` guard and lands on `catch (Exception) → Error("Error when handling response", e)`. The node recovers correctly on restart — this is purely a noisy shutdown log line — but the post-merge fuzz tests' StabilityVerification watchdog scans for non-allowlisted exception lines and fails the test on it, blocking the 1.38 release smoke run. Unwrap the AggregateException at the point of throw so the dispatcher's existing benign guard handles it uniformly. No new log path; reuses the already-tested "Ignoring sync response..." Info message. Race introduced 2024-03-28 by PR #6873 "Perf/dont redownload downloaded code" (commit 7059b45), latent until the fuzz watchdog started catching it. * Tidy unwrap: single Flatten, preserve stack via ExceptionDispatchInfo Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Guard against empty InnerExceptions in unwrap filter Enumerable.All() returns true vacuously on an empty sequence, which would let the filter pass and then InnerExceptions[0] throw ArgumentOutOfRangeException instead of re-throwing the original. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: lukasz.rozmej <lukasz.rozmej@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Default Discovery to V4 (#11614) * Default Discovery to V4 * Update tests * Activate BAL only when needed (#11795) * Activate BAL only when needed * Guard ChangeState against same-state transitions Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Align IsFinished with ShouldFinish; short-circuit cheap checks first Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: lukasz.rozmej <lukasz.rozmej@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: update Directory.Build.props for 1.38.0 --------- Co-authored-by: Alex <alexb5dh@gmail.com> Co-authored-by: Alexey Osipov <me@flcl.me> Co-authored-by: Amirul Ashraf <asdacap@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Stavros Vlachakis <89769224+svlachakis@users.noreply.github.com> Co-authored-by: Carlos Bermudez Porto <43155355+cbermudez97@users.noreply.github.com> Co-authored-by: lukasz.rozmej <lukasz.rozmej@gmail.com> Co-authored-by: DeFi Junkie <deffie.jnkiee@gmail.com> Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>
Changes
Types of changes
What types of changes does your code introduce?
Testing
Requires testing
If yes, did you write tests?