Skip to content

feat: add SkipMetricsTracking property to DbSettings#11515

Merged
LukaszRozmej merged 7 commits into
masterfrom
fix/db-metrics
May 22, 2026
Merged

feat: add SkipMetricsTracking property to DbSettings#11515
LukaszRozmej merged 7 commits into
masterfrom
fix/db-metrics

Conversation

@cbermudez97
Copy link
Copy Markdown
Contributor

Changes

  • Introduced SkipMetricsTracking property in DbSettings to control metrics tracking for specific databases.
  • Updated FullPruningInnerDbFactory to set SkipMetricsTracking to true for inner databases to prevent stale references after pruning.
  • Added unit tests to verify the behavior of metrics tracking based on the new property.
  • Enhanced DbMonitoringModule to respect the SkipMetricsTracking setting when adding databases to the tracker.

Types of changes

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a change that causes existing functionality not to work as expected)
  • Optimization
  • Refactoring
  • Documentation update
  • Build-related changes
  • Other: Description

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

- Introduced SkipMetricsTracking property in DbSettings to control metrics tracking for specific databases.
- Updated FullPruningInnerDbFactory to set SkipMetricsTracking to true for inner databases to prevent stale references after pruning.
- Added unit tests to verify the behavior of metrics tracking based on the new property.
- Enhanced DbMonitoringModule to respect the SkipMetricsTracking setting when adding databases to the tracker.
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 7, 2026

Claude finished @cbermudez97's task in 3m 42s —— View job


PR Review: feat: add SkipMetricsTracking property to DbSettings

  • Gather context and read changed files
  • Load relevant AGENTS rules
  • Review all changed files
  • Post inline comments
  • Post final summary

Summary

The fix is correct and well-designed. The root cause is clear: FullPruningInnerDbFactory creates indexed sub-DBs that get registered into the DbTracker via the DbFactoryInterceptor. After a pruning cycle, the old inner DB is discarded but still held in the tracker — a stale reference. The solution is sound:

  1. FullPruningInnerDbFactory sets SkipMetricsTracking = true on all inner DB settings so the interceptor never registers them.
  2. WorldStateModule manually registers the outer FullPruningDb wrapper once with a stable name. Since FullPruningDb.GatherMetric() delegates to _currentDb.GatherMetric(), the tracker always reads from whatever inner DB is currently active — automatically correct across pruning cycles.
  3. The DbFactoryInterceptor correctly gates on !dbSettings.SkipMetricsTracking for both CreateDb and CreateColumnsDb.

The propagation through DbSettings.Clone() is safe: MemberwiseClone copies the field, then GetRocksDbSettings explicitly overwrites it to true after cloning.


Findings

Severity Count
Critical 0
High 0
Medium 0
Low 3

Low — SkipMetricsTracking lacks XML docs (DbSettings.cs:15)
The coding style guide requires <summary> on all public APIs. The semantics here are non-obvious (callers don't know what it affects), so a one-liner doc would help. Other properties in the same class are also undocumented, so this is consistent with the existing pattern.

Low — TryAdd silently drops duplicate registrations (WorldStateModule.cs:45)
Not a production issue (singletons aren't recreated), but worth being aware of in test teardown/rebuild scenarios.

Low — No end-to-end regression test for the stale-reference bug (DbTrackerTests.cs:81)
TestSkipMetricsTracking validates the mechanism, but there's no test that exercises the full scenario: create FullPruningDb → trigger pruning cycle via TryStartPruning/Commit → assert tracker still reads from the new inner DB. The unit test is sufficient to merge, but an integration test would give stronger confidence.


Verdict: ready to merge. All findings are low-severity and none block correctness.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a SkipMetricsTracking flag to DbSettings and updates DB monitoring to avoid tracking ephemeral inner DB instances created during full pruning, preventing stale references and improving resilience when metric gathering fails.

Changes:

  • Add DbSettings.SkipMetricsTracking and honor it in DbMonitoringModule’s DB factory interceptor.
  • Mark full-pruning inner DB instances as skipped, and register the stable FullPruningDb wrapper with the tracker.
  • Add unit tests for skip-tracking behavior and for per-DB metric gather exception isolation.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/Nethermind/Nethermind.Init/Modules/WorldStateModule.cs Registers the FullPruningDb wrapper in the tracker to keep metrics tied to the active inner DB across prune cycles.
src/Nethermind/Nethermind.Init/Modules/DbMonitoringModule.cs Skips tracking based on SkipMetricsTracking and makes metric updates resilient to per-DB exceptions.
src/Nethermind/Nethermind.Db/FullPruning/FullPruningInnerDbFactory.cs Sets SkipMetricsTracking = true for indexed inner DB instances.
src/Nethermind/Nethermind.Db/DbSettings.cs Introduces the SkipMetricsTracking property.
src/Nethermind/Nethermind.Db.Test/Nethermind.Db.Test.csproj Adds test dependency on Nethermind.Init to access DbMonitoringModule/tracker types.
src/Nethermind/Nethermind.Db.Test/DbTrackerTests.cs Adds tests for skip-tracking and exception isolation during metrics update.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Nethermind/Nethermind.Init/Modules/WorldStateModule.cs Outdated
Comment thread src/Nethermind/Nethermind.Init/Modules/DbMonitoringModule.cs
Comment thread src/Nethermind/Nethermind.Db.Test/DbTrackerTests.cs
Comment thread src/Nethermind/Nethermind.Db.Test/DbTrackerTests.cs Outdated
Comment thread src/Nethermind/Nethermind.Db/DbSettings.cs
Comment thread src/Nethermind/Nethermind.Init/Modules/WorldStateModule.cs Outdated
Comment thread src/Nethermind/Nethermind.Db.Test/DbTrackerTests.cs
cbermudez97 and others added 2 commits May 11, 2026 10:00
- Add XML doc to DbSettings.SkipMetricsTracking property
- Clarify WorldStateModule comment for both FullPruningInnerDbFactory
  and MemDbFactory branches
- DbMonitoringModule: clear stale dictionary entries on GatherMetric
  failure and log only once per failure streak (with recovery info log)
- DbTrackerTests: add [TearDown] to reset shared static metrics keys,
  collapse double enumeration in TestSkipMetricsTracking, and add
  FullPruningDbTrackedWrapper_SurvivesPruningCycle integration test

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/Nethermind/Nethermind.Init/Modules/WorldStateModule.cs Outdated
@cbermudez97 cbermudez97 requested a review from asdacap May 21, 2026 16:41
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cbermudez97 cbermudez97 removed the request for review from asdacap May 22, 2026 13:07
@LukaszRozmej LukaszRozmej merged commit 27b3119 into master May 22, 2026
542 checks passed
@LukaszRozmej LukaszRozmej deleted the fix/db-metrics branch May 22, 2026 22:34
stdevMac pushed a commit that referenced this pull request May 25, 2026
* feat: add SkipMetricsTracking property to DbSettings

- Introduced SkipMetricsTracking property in DbSettings to control metrics tracking for specific databases.
- Updated FullPruningInnerDbFactory to set SkipMetricsTracking to true for inner databases to prevent stale references after pruning.
- Added unit tests to verify the behavior of metrics tracking based on the new property.
- Enhanced DbMonitoringModule to respect the SkipMetricsTracking setting when adding databases to the tracker.

* fix: address PR feedback for db metrics tracking

- Add XML doc to DbSettings.SkipMetricsTracking property
- Clarify WorldStateModule comment for both FullPruningInnerDbFactory
  and MemDbFactory branches
- DbMonitoringModule: clear stale dictionary entries on GatherMetric
  failure and log only once per failure streak (with recovery info log)
- DbTrackerTests: add [TearDown] to reset shared static metrics keys,
  collapse double enumeration in TestSkipMetricsTracking, and add
  FullPruningDbTrackedWrapper_SurvivesPruningCycle integration test

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: dedupe DbTrackerTests container setup and metric-map iteration

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: Initialize _failingDbs with an empty HashSet

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: lukasz.rozmej <lukasz.rozmej@gmail.com>
stdevMac pushed a commit that referenced this pull request May 26, 2026
* feat: add SkipMetricsTracking property to DbSettings

- Introduced SkipMetricsTracking property in DbSettings to control metrics tracking for specific databases.
- Updated FullPruningInnerDbFactory to set SkipMetricsTracking to true for inner databases to prevent stale references after pruning.
- Added unit tests to verify the behavior of metrics tracking based on the new property.
- Enhanced DbMonitoringModule to respect the SkipMetricsTracking setting when adding databases to the tracker.

* fix: address PR feedback for db metrics tracking

- Add XML doc to DbSettings.SkipMetricsTracking property
- Clarify WorldStateModule comment for both FullPruningInnerDbFactory
  and MemDbFactory branches
- DbMonitoringModule: clear stale dictionary entries on GatherMetric
  failure and log only once per failure streak (with recovery info log)
- DbTrackerTests: add [TearDown] to reset shared static metrics keys,
  collapse double enumeration in TestSkipMetricsTracking, and add
  FullPruningDbTrackedWrapper_SurvivesPruningCycle integration test

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: dedupe DbTrackerTests container setup and metric-map iteration

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: Initialize _failingDbs with an empty HashSet

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: lukasz.rozmej <lukasz.rozmej@gmail.com>
stdevMac added a commit that referenced this pull request Jun 1, 2026
* RLP tx decoding fixes (#11496)

* Handle null txs

* PR feedback

* Revert `BlobTxStorage.TryDecodeFullTx` changes

Silent `false` may be worse that NRE on DB corruption

---------

Co-authored-by: Alexey Osipov <me@flcl.me>

* eth/71 (#10844)

* Improve eth/70 checks (#11456)

* Improve eth/70 verification

* Comments

* Gas remake

* Fix full sync

* fix: prevent negative RequestSize crash when beacon pivot destination advances mid-sync (#11478)

* fix: prevent negative RequestSize crash when beacon pivot destination advances mid-sync

`HeadersSyncFeed.ShouldBuildANewBatch` checked
`_lowestRequestedHeaderNumber == HeadersDestinationNumber`. For beacon
headers, `HeadersDestinationNumber` is `BeaconPivot.PivotDestinationNumber`,
which tracks `Head.Number - Reorganization.MaxDepth + 1` and so advances
upward as the chain head progresses. When it stepped above
`_lowestRequestedHeaderNumber` mid-sync, the `==` check missed it,
`BuildNewBatch` produced a negative `RequestSize`, and
`HeaderStore.FindReversedHeaders` crashed with
`ArgumentOutOfRangeException` on `new Dictionary<>(negativeCount)`.

Widen the guard to `<=` and add a regression test that reproduces the
scenario via mocked `IBeaconPivot`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: shorten inline comments per review

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add new default for gnosis and gnosis archive config (#11269)

feat: add Db.SkipCheckingSstFileSizesOnDbOpen=true default for gnosis and gnosis archive

* Alchemy - Code Fix (#11714)

* feat: add SkipMetricsTracking property to DbSettings (#11515)

* feat: add SkipMetricsTracking property to DbSettings

- Introduced SkipMetricsTracking property in DbSettings to control metrics tracking for specific databases.
- Updated FullPruningInnerDbFactory to set SkipMetricsTracking to true for inner databases to prevent stale references after pruning.
- Added unit tests to verify the behavior of metrics tracking based on the new property.
- Enhanced DbMonitoringModule to respect the SkipMetricsTracking setting when adding databases to the tracker.

* fix: address PR feedback for db metrics tracking

- Add XML doc to DbSettings.SkipMetricsTracking property
- Clarify WorldStateModule comment for both FullPruningInnerDbFactory
  and MemDbFactory branches
- DbMonitoringModule: clear stale dictionary entries on GatherMetric
  failure and log only once per failure streak (with recovery info log)
- DbTrackerTests: add [TearDown] to reset shared static metrics keys,
  collapse double enumeration in TestSkipMetricsTracking, and add
  FullPruningDbTrackedWrapper_SurvivesPruningCycle integration test

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: dedupe DbTrackerTests container setup and metric-map iteration

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: Initialize _failingDbs with an empty HashSet

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: lukasz.rozmej <lukasz.rozmej@gmail.com>

* Fix DbTracker repeatedly logging ObjectDisposedException after disposal (#11720)

* Fix DbTracker repeatedly logging ObjectDisposedException after disposal

When the Autofac LifetimeScope (or the shared cache SafeHandle) is disposed
while MonitoringService's timer is still scheduled, `_sharedBlockCache.Value`
in `UpdateDbMetrics` throws `ObjectDisposedException` via Autofac's
LazyRegistrationSource. The generic catch logs it at Error and the callback
stays registered, so the same exception re-fires on every metric interval —
producing dozens of identical errors per minute on affected nodes.

Catch `ObjectDisposedException` explicitly and short-circuit subsequent ticks
via a `_stopped` flag. Adds a regression test that disposes the container
and asserts the callback neither throws nor logs on repeated invocations.

Fixes #11719

* Address review: debug-log first stop, drop redundant CreateDb in test

- Log at Debug level in the new `ObjectDisposedException` branch so there is
  a (no-cost on production) signal that DbTracker has stopped updating
  metrics, rather than only inferring it from the absence of further Error
  logs.
- Remove the duplicate `CreateDb` call in the regression test — the helper
  `ConfigureMetricUpdater` already registers the test DB.
- Disable `TestLogger.IsDebug` in the regression test so the new Debug
  message does not trip the `LogList.Should().BeEmpty()` assertion; the
  test still asserts no Error-level spam, which was the bug.

* Address review: make DbTracker IDisposable, drop redundant comment

- Implement IDisposable on DbTracker so Autofac proactively sets _stopped
  during scope teardown, short-circuiting subsequent monitoring ticks
  before they touch disposed resources. The catch (ObjectDisposedException)
  remains as a backstop for the race where a tick is already executing
  when Dispose runs.
- Mark _stopped as volatile since it is now written from the disposing
  thread and read from the monitoring timer thread.
- Drop the inline comment in the catch block; the Debug log message
  already conveys the same information.

* Fix Eth69/Eth70 receipt tests for null-means-unknown contract

After dropping the FindHeader pre-check, the response loop relies solely
on GetReceipts returning null to detect an unknown block. The two
"unknown block hash" tests still mocked the old contract (FindHeader
returns null + GetReceipts returns []), so the loop saw [] as a
legitimate zero-tx block and kept going instead of breaking.

Update the mocks to return null for unknown hashes, matching the
ISyncServer.GetReceipts contract (null = unknown, [] = exists w/ 0 txs).

* Make EraE tests visible and green (#11727)

* fix(eth/70): reject null receipt payloads (#11615)

* fix(eth/70): reject null receipt payloads

* fix(eth/70): validate receipt payloads while decoding

* refactor(eth): move null receipt validation into base serializer

Apply the validation in V63 ReceiptsMessageSerializer so eth/63, eth/66,
eth/69 and eth/70 all reject null receipt payloads at decode time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: lukasz.rozmej <lukasz.rozmej@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit fd2fd25)

* fix(eth/70): stop response early when block has txs but no receipts

FulfillReceiptsRequest used to emit `txReceipts.Add([])` for any block where
SyncServer.GetReceipts(hash) returned empty, regardless of whether the block
actually had zero transactions. The eth/70 receiver validates segment-complete
responses against its own transaction count and throws SubprotocolException
("Receipt count mismatch with block transactions count") + disconnects the
peer when an [] arrives for a block that locally has transactions.

That made a node that is briefly without receipts (e.g. still syncing
receipts, or its receipt store is pruned for the requested block) appear
malicious to the requester. We observed this disconnect-storm pattern
materially starving receipt-sync on small networks.

Distinguish the two cases at the sender by looking up the block:
- block is null or body is missing  → we can't safely claim anything; break
  the response and let the requester ask another peer
- block.Transactions.Length > 0     → same: we don't actually have the
  receipts even though we have the body; break
- block.Transactions.Length == 0    → block is legitimately empty; emit []
  as before

Update Should_return_empty_receipts_block_when_local_block_has_no_receipts
→ ..._has_no_transactions to reflect the new precondition, and stub
SyncServer.Find on two pre-existing empty-receipts-in-the-middle tests so
they still represent the legitimate empty case. Add new regression test
Should_stop_response_when_local_block_has_transactions_but_no_receipts
covering the bug.

Closes #11752.

(cherry picked from commit f0f6ea2)

* refactor(eth/70): disambiguate "unknown" vs "legit empty" in ISyncServer.GetReceipts

Following @LukaszRozmej's review suggestion on #11752: rather than have the
protocol handler do a second SyncServer.Find lookup to figure out whether an
empty receipts array means "block has zero transactions" or "I don't have the
receipts yet", push the disambiguation down to where the data lives.

ISyncServer.GetReceipts now returns TxReceipt[]?:
  null      → receipts are not known locally (block missing, body missing, or
              receipts not stored). Callers MUST NOT emit [] on the wire.
  empty []  → block is known and legitimately has zero transactions.
  non-empty → receipts for an executed block.

SyncServer.GetReceipts implements the three cases directly:
  - blockHash is null OR block not found OR block body missing → null
  - block.Transactions.Length == 0                              → []
  - block has txs, receipts not stored                          → null
  - block has txs and receipts                                  → receipts

Eth70ProtocolHandler.FulfillReceiptsRequest is now a single null-check instead
of the previous Find-then-classify dance. SyncPeerProtocolHandlerBase.Fulfill
(eth/63-69 path) gets the same fix for free — same bug, same one-line guard.

Tests: replace the Find-based stubs with GetReceipts-returning-null stubs and
update OldStyleFullSynchronizerTests.Can_retrieve_empty_receipts to assert the
new contract (genesis → BeEmpty; unknown blocks → BeNull).

Closes #11752.

(cherry picked from commit 1d880be)

* Apply suggestions from code review

Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>
(cherry picked from commit c01295b)

* Apply suggestion from @LukaszRozmej

(cherry picked from commit 27258b9)

* Drop redundant FindHeader pre-check in receipt response loop

GetReceipts now returns null for unknown blocks (block missing, body
missing, or receipts not stored), so the up-front FindHeader call before
GetReceipts is redundant — the `if (receipts is null) break;` below it
already handles the unknown case.

Per @flcl42 review on #11754.

(cherry picked from commit 742cb1a)

* Fix shutdown race in SnapProvider PLINQ (closes #11806) (#11807)

* Unwrap AggregateException(ObjectDisposedException) from snap PLINQ on shutdown

When the node is stopped during snap sync, SnapProvider.AddAccountRange's
parallel code-existence check (codeHashes.AsParallel().Where(_codeDb.KeyExists))
races RocksDB disposal in the DI container teardown. The resulting
ObjectDisposedException is wrapped by PLINQ in an AggregateException, which
falls past the snap dispatcher's existing `catch (ObjectDisposedException) →
Info("Ignoring sync response as the DB has already closed.")` guard and lands
on `catch (Exception) → Error("Error when handling response", e)`.

The node recovers correctly on restart — this is purely a noisy shutdown log
line — but the post-merge fuzz tests' StabilityVerification watchdog scans
for non-allowlisted exception lines and fails the test on it, blocking the
1.38 release smoke run.

Unwrap the AggregateException at the point of throw so the dispatcher's
existing benign guard handles it uniformly. No new log path; reuses the
already-tested "Ignoring sync response..." Info message.

Race introduced 2024-03-28 by PR #6873 "Perf/dont redownload downloaded code"
(commit 7059b45), latent until the fuzz watchdog started catching it.

* Tidy unwrap: single Flatten, preserve stack via ExceptionDispatchInfo

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Guard against empty InnerExceptions in unwrap filter

Enumerable.All() returns true vacuously on an empty sequence, which
would let the filter pass and then InnerExceptions[0] throw
ArgumentOutOfRangeException instead of re-throwing the original.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: lukasz.rozmej <lukasz.rozmej@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Default Discovery to V4 (#11614)

* Default Discovery to V4

* Update tests

* Activate BAL only when needed (#11795)

* Activate BAL only when needed

* Guard ChangeState against same-state transitions

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Align IsFinished with ShouldFinish; short-circuit cheap checks first

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: lukasz.rozmej <lukasz.rozmej@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: update Directory.Build.props for 1.38.0

---------

Co-authored-by: Alex <alexb5dh@gmail.com>
Co-authored-by: Alexey Osipov <me@flcl.me>
Co-authored-by: Amirul Ashraf <asdacap@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Stavros Vlachakis <89769224+svlachakis@users.noreply.github.com>
Co-authored-by: Carlos Bermudez Porto <43155355+cbermudez97@users.noreply.github.com>
Co-authored-by: lukasz.rozmej <lukasz.rozmej@gmail.com>
Co-authored-by: DeFi Junkie <deffie.jnkiee@gmail.com>
Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants