fix(correctness): filter agent-internal service checks from dsd-service-checks analysis by thieman · Pull Request #1578 · DataDog/saluki

thieman · 2026-05-04T18:43:20Z

Summary

The DDA emits datadog.agent.up on a ~15s flush cycle; under parallel test load the number of flush cycles completing before the dump is non-deterministic, causing spurious count mismatches between baseline and comparison in dsd-service-checks
Adds a datadog. prefix filter in ServiceChecksAnalyzer, matching the identical approach already used in MetricsAnalyzer for the same reason
Confirmed via a probe test (millstone configured with service_check: 0) that the only non-user checks present are datadog.agent.up (timing-dependent → filtered) and the DDA forwarder connectivity probe {"check":"test","status":0} (one-shot on startup, always identical on both sides → passes through correctly)
Improves the count-mismatch error path to log the names/details of extra checks on whichever side has more, to aid debugging if a mismatch still occurs after filtering

Test plan

Run make test-correctness-case CASE=dsd-service-checks in isolation — should pass
Run full make test-correctness with default parallelism several times — dsd-service-checks should no longer flake

🤖 Generated with Claude Code

…ce-checks analysis The DDA emits `datadog.agent.up` on a ~15s flush cycle. Under parallel test load the number of flush cycles that complete before the dump is non-deterministic, producing spurious count mismatches between baseline and comparison. This matches the existing approach in the metrics analyzer, which already filters `datadog.*` (and other known internal prefixes) for the same reason. Confirmed via a probe test that the only non-user checks present are `datadog.agent.up` (timing-dependent, filtered) and the DDA forwarder connectivity probe `test` (one-shot on startup, identical on both sides, passes through the filter correctly). Also improves the count-mismatch error path to log the names of the extra checks on whichever side has more, to aid debugging if a mismatch still occurs after filtering. Closes #1576 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…ce-checks analysis (#1578) ## Summary - The DDA emits `datadog.agent.up` on a ~15s flush cycle; under parallel test load the number of flush cycles completing before the dump is non-deterministic, causing spurious count mismatches between baseline and comparison in `dsd-service-checks` - Adds a `datadog.` prefix filter in `ServiceChecksAnalyzer`, matching the identical approach already used in `MetricsAnalyzer` for the same reason - Confirmed via a probe test (millstone configured with `service_check: 0`) that the only non-user checks present are `datadog.agent.up` (timing-dependent → filtered) and the DDA forwarder connectivity probe `{"check":"test","status":0}` (one-shot on startup, always identical on both sides → passes through correctly) - Improves the count-mismatch error path to log the names/details of extra checks on whichever side has more, to aid debugging if a mismatch still occurs after filtering ## Test plan - [ ] Run `make test-correctness-case CASE=dsd-service-checks` in isolation — should pass - [ ] Run full `make test-correctness` with default parallelism several times — `dsd-service-checks` should no longer flake 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: travis.thieman <travis.thieman@datadoghq.com> 200c2b8

tobz approved these changes May 4, 2026

View reviewed changes

tobz added the type/bug Bug fixes. label May 4, 2026

thieman marked this pull request as ready for review May 4, 2026 18:52

thieman requested a review from a team as a code owner May 4, 2026 18:52

thieman changed the title ~~fix(correctness): filter agent-internal service checks from dsd-service-checks analysis [#1576]~~ fix(correctness): filter agent-internal service checks from dsd-service-checks analysis May 4, 2026

gh-worker-dd-devflow-36fce6 Bot added mergequeue-status: queued mergequeue-status: in_progress and removed mergequeue-status: queued labels May 4, 2026

gh-worker-dd-mergequeue-cf854d Bot merged commit 200c2b8 into main May 4, 2026
71 of 72 checks passed

gh-worker-dd-devflow-36fce6 Bot added mergequeue-status: done and removed mergequeue-status: in_progress labels May 4, 2026

thieman mentioned this pull request May 4, 2026

The dsd-service-checks test is flaky when running the entire correctness test suite locally. #1576

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(correctness): filter agent-internal service checks from dsd-service-checks analysis#1578

fix(correctness): filter agent-internal service checks from dsd-service-checks analysis#1578
gh-worker-dd-mergequeue-cf854d[bot] merged 1 commit into
mainfrom
thieman/fix-dsd-service-checks-flakiness

thieman commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

thieman commented May 4, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants