fix(correctness): filter agent-internal service checks from dsd-service-checks analysis#1578
Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 1 commit intoMay 4, 2026
Conversation
…ce-checks analysis The DDA emits `datadog.agent.up` on a ~15s flush cycle. Under parallel test load the number of flush cycles that complete before the dump is non-deterministic, producing spurious count mismatches between baseline and comparison. This matches the existing approach in the metrics analyzer, which already filters `datadog.*` (and other known internal prefixes) for the same reason. Confirmed via a probe test that the only non-user checks present are `datadog.agent.up` (timing-dependent, filtered) and the DDA forwarder connectivity probe `test` (one-shot on startup, identical on both sides, passes through the filter correctly). Also improves the count-mismatch error path to log the names of the extra checks on whichever side has more, to aid debugging if a mismatch still occurs after filtering. Closes #1576 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
tobz
approved these changes
May 4, 2026
200c2b8
into
main
71 of 72 checks passed
dd-octo-sts Bot
pushed a commit
that referenced
this pull request
May 4, 2026
…ce-checks analysis (#1578) ## Summary - The DDA emits `datadog.agent.up` on a ~15s flush cycle; under parallel test load the number of flush cycles completing before the dump is non-deterministic, causing spurious count mismatches between baseline and comparison in `dsd-service-checks` - Adds a `datadog.` prefix filter in `ServiceChecksAnalyzer`, matching the identical approach already used in `MetricsAnalyzer` for the same reason - Confirmed via a probe test (millstone configured with `service_check: 0`) that the only non-user checks present are `datadog.agent.up` (timing-dependent → filtered) and the DDA forwarder connectivity probe `{"check":"test","status":0}` (one-shot on startup, always identical on both sides → passes through correctly) - Improves the count-mismatch error path to log the names/details of extra checks on whichever side has more, to aid debugging if a mismatch still occurs after filtering ## Test plan - [ ] Run `make test-correctness-case CASE=dsd-service-checks` in isolation — should pass - [ ] Run full `make test-correctness` with default parallelism several times — `dsd-service-checks` should no longer flake 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: travis.thieman <travis.thieman@datadoghq.com> 200c2b8
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
datadog.agent.upon a ~15s flush cycle; under parallel test load the number of flush cycles completing before the dump is non-deterministic, causing spurious count mismatches between baseline and comparison indsd-service-checksdatadog.prefix filter inServiceChecksAnalyzer, matching the identical approach already used inMetricsAnalyzerfor the same reasonservice_check: 0) that the only non-user checks present aredatadog.agent.up(timing-dependent → filtered) and the DDA forwarder connectivity probe{"check":"test","status":0}(one-shot on startup, always identical on both sides → passes through correctly)Test plan
make test-correctness-case CASE=dsd-service-checksin isolation — should passmake test-correctnesswith default parallelism several times —dsd-service-checksshould no longer flake🤖 Generated with Claude Code