feat(docs): add saluki testing strategy to docs by webern · Pull Request #1281 · DataDog/saluki

webern · 2026-03-30T16:19:25Z

This adds a new section to the Saluki developer documentation about testing.

Summary

As I'm onboarding to Saluki, I felt that a bit more about the testing strategy could be helpful.

Change Type

Non-functional (chore, refactoring, docs)

How did you test this PR?

I ran vitepress, read it, and used an LLM to scrub it against reality. Hopefully it is factually correct!

References

Closes #1277

pr-commenter · 2026-03-30T16:24:53Z

Binary Size Analysis (Agent Data Plane)

Target: 63138ef (baseline) vs b3e0ff6 (comparison) diff
Analysis Type: Stripped binaries (debug symbols excluded)
Baseline Size: 26.20 MiB
Comparison Size: 26.20 MiB
Size Change: +0 B (+0.00%)
Pass/Fail Threshold: +5%
Result: PASSED ✅

Changes by Module

Module	File Size	Symbols

Detailed Symbol Changes

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [ = ]       0  [ = ]       0    TOTAL

tobz

Some nits, but still planning to take another pass over this to make sure it's copacetic.

tobz · 2026-03-30T16:27:12Z

+These tests serve to answer the question: *Does ADP produce the same output as the Datadog agent for a given workload?*
+
+To answer this question, a correctness test runs ADP and the agent side-by-side in containers and compares their output


Style nit: we always capitalize Datadog Agent or Agent.

(This reminds me that I have a stale PR for adding Vale configuration to flag these spelling/grammar/stylistic lints automatically.... I should get back to that 😅)

tobz · 2026-03-30T16:28:42Z

+  - millstone -> adp -> datadog-intake
+  - millstone -> datadog-agent -> datadog-intake


Should have backticks around these since they're representative of actual crates/binaries, also change adp to ADP and datadog-agent to Datadog Agent

tobz · 2026-03-30T16:29:23Z

+Integration tests run a containerized ADP instance and assert high-level invariants: process stability, expected log
+output, port availability, exit behavior. They catch regressions from enabling new features or settings that cause
+crashes or early exits. They do not test output correctness. This type of test is often known as a "smoke test" but our
+original naming stuck even though correctness tests [above](#correctness-tests-ground-truth)


Even though correctness tests above... what? Feels like some words are missing here.

tobz · 2026-03-30T16:30:28Z

+
+CI compares current branch against merge-base of main — purely "has your change regressed or improved?"
+
+You can run experiments locally with `smp local run` to debug experiment configs without waiting for CI (single replicate,


Technically it's smp local-run.

pr-commenter · 2026-03-30T16:40:33Z

Regression Detector (Agent Data Plane)

Regression Detector Results

Run ID: 763a7736-ae2e-4cb6-9039-761f6f7fcaa2

Baseline: 63138ef
Comparison: b3e0ff6
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
❌	otlp_ingest_logs_5mb_memory	memory utilization	+5.01	[+4.52, +5.50]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_logs_5mb_cpu	% cpu utilization	+0.16	[-4.79, +5.10]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_logs_5mb_throughput	ingress throughput	+0.00	[-0.13, +0.13]	1	(metrics) (profiles) (logs)

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	dsd_uds_512kb_3k_contexts_cpu	% cpu utilization	+21.30	[-40.19, +82.79]	1	(metrics) (profiles) (logs)
❌	otlp_ingest_logs_5mb_memory	memory utilization	+5.01	[+4.52, +5.50]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_5mb_cpu	% cpu utilization	+4.09	[+1.95, +6.23]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_filtering_5mb_cpu	% cpu utilization	+3.25	[+0.86, +5.65]	1	(metrics) (profiles) (logs)
➖	dsd_uds_10mb_3k_contexts_cpu	% cpu utilization	+1.19	[-29.20, +31.58]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_metrics_5mb_cpu	% cpu utilization	+1.01	[-7.69, +9.71]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_filtering_5mb_memory	memory utilization	+0.76	[+0.42, +1.10]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_ultraheavy	memory utilization	+0.49	[+0.36, +0.62]	1	(metrics) (profiles) (logs)
➖	dsd_uds_500mb_3k_contexts_memory	memory utilization	+0.41	[+0.25, +0.58]	1	(metrics) (profiles) (logs)
➖	dsd_uds_500mb_3k_contexts_cpu	% cpu utilization	+0.38	[-1.01, +1.76]	1	(metrics) (profiles) (logs)
➖	dsd_uds_100mb_3k_contexts_memory	memory utilization	+0.32	[+0.14, +0.50]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_logs_5mb_cpu	% cpu utilization	+0.16	[-4.79, +5.10]	1	(metrics) (profiles) (logs)
➖	dsd_uds_1mb_3k_contexts_cpu	% cpu utilization	+0.14	[-52.23, +52.51]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_transform_5mb_throughput	ingress throughput	+0.00	[-0.02, +0.02]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_filtering_5mb_throughput	ingress throughput	+0.00	[-0.02, +0.02]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_5mb_throughput	ingress throughput	+0.00	[-0.02, +0.02]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_logs_5mb_throughput	ingress throughput	+0.00	[-0.13, +0.13]	1	(metrics) (profiles) (logs)
➖	dsd_uds_1mb_3k_contexts_throughput	ingress throughput	-0.00	[-0.06, +0.05]	1	(metrics) (profiles) (logs)
➖	dsd_uds_512kb_3k_contexts_throughput	ingress throughput	-0.00	[-0.06, +0.05]	1	(metrics) (profiles) (logs)
➖	dsd_uds_10mb_3k_contexts_throughput	ingress throughput	-0.01	[-0.15, +0.13]	1	(metrics) (profiles) (logs)
➖	dsd_uds_100mb_3k_contexts_throughput	ingress throughput	-0.01	[-0.04, +0.02]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_metrics_5mb_throughput	ingress throughput	-0.02	[-0.15, +0.11]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_heavy	memory utilization	-0.05	[-0.18, +0.09]	1	(metrics) (profiles) (logs)
➖	dsd_uds_512kb_3k_contexts_memory	memory utilization	-0.12	[-0.29, +0.05]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_transform_5mb_memory	memory utilization	-0.17	[-0.42, +0.08]	1	(metrics) (profiles) (logs)
➖	dsd_uds_1mb_3k_contexts_memory	memory utilization	-0.20	[-0.37, -0.02]	1	(metrics) (profiles) (logs)
➖	dsd_uds_10mb_3k_contexts_memory	memory utilization	-0.27	[-0.45, -0.08]	1	(metrics) (profiles) (logs)
➖	dsd_uds_500mb_3k_contexts_throughput	ingress throughput	-0.58	[-0.70, -0.45]	1	(metrics) (profiles) (logs)
➖	dsd_uds_100mb_3k_contexts_cpu	% cpu utilization	-0.63	[-6.64, +5.39]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_5mb_memory	memory utilization	-0.67	[-0.92, -0.42]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_idle	memory utilization	-0.75	[-0.79, -0.72]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_medium	memory utilization	-0.81	[-1.01, -0.62]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_low	memory utilization	-1.01	[-1.21, -0.82]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_metrics_5mb_memory	memory utilization	-1.04	[-1.27, -0.80]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_transform_5mb_cpu	% cpu utilization	-1.47	[-3.79, +0.84]	1	(metrics) (profiles) (logs)

Bounds Checks: ✅ Passed

perf	experiment	bounds_check_name	replicates_passed	observed_value	links
✅	quality_gates_rss_dsd_heavy	memory_usage	10/10	112.78MiB ≤ 140MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_dsd_low	memory_usage	10/10	33.56MiB ≤ 50MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_dsd_medium	memory_usage	10/10	55.62MiB ≤ 75MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_dsd_ultraheavy	memory_usage	10/10	169.01MiB ≤ 200MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_idle	memory_usage	10/10	21.11MiB ≤ 40MiB	(metrics) (profiles) (logs)

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

webern · 2026-03-31T08:45:18Z

Some nits, but still planning to take another pass over this to make sure it's copacetic.

This one addresses your feedback directly by hand (will follow up with use of the linting tools)

201f884

webern · 2026-03-31T08:50:38Z

webern force-pushed the matt.briggs/testing-doc branch from 201f884 to b3e0ff6
now

This is a rebase from main, and does not affect the testing.md file in this PR. (ignore this diff!)

webern · 2026-03-31T09:00:02Z

@tobz, when I opened the PR I hadn't used Vale. Now I'm trying to use it but I find this:

1. Vale.Terms — lowercase conventions (15 errors)

  The project's Vale config enforces lowercase for these terms in docs:

  ┌─────────────┬─────────┬──────────┐
  │    Line     │ Current │ Required │
  ├─────────────┼─────────┼──────────┤
  │ 3           │ Saluki  │ saluki   │
  ├─────────────┼─────────┼──────────┤
  │ 16          │ Linux   │ linux    │
  ├─────────────┼─────────┼──────────┤
  │ 20, 22, 102 │ ADP     │ adp      │
  ├─────────────┼─────────┼──────────┤
  │ 20, 22      │ Datadog │ datadog  │
  ├─────────────┼─────────┼──────────┤
  │ 86          │ Config  │ config   │
  ├─────────────┼─────────┼──────────┤
  │ 100         │ GitLab  │ gitlab   │
  └─────────────┴─────────┴──────────┘

Relevant Value output:

  1. Line 3 — Vale.Terms: Use 'saluki' instead of 'Saluki'
  4. Line 16 — Vale.Terms: Use 'linux' instead of 'Linux'
  5. Line 20 — Vale.Terms: Use 'adp' instead of 'ADP'
  6. Line 20 — Vale.Terms: Use 'datadog' instead of 'Datadog'
  7. Line 22 — Vale.Terms: Use 'adp' instead of 'ADP'
  8. Line 22 — Vale.Terms: Use 'datadog' instead of 'Datadog'
  10. Line 86 — Vale.Terms: Use 'config' instead of 'Config'
  12. Line 100 — Vale.Terms: Use 'gitlab' instead of 'GitLab'
  13. Line 102 — Vale.Terms: Use 'adp' instead of 'ADP'
(...)

However in your feedback you suggested changing adp to ADP. Looking for some guidance here. Thanks!

tobz · 2026-03-31T13:53:48Z

I would ignore make check-docs for right now. It still needs some tuning so that different rules don't clobber each other, which is what is happening here.

This adds a new section to the Saluki developer documentation about testing.

Addresses the first round of feedback from Toby

webern · 2026-04-01T14:30:57Z

@webern webern force-pushed the matt.briggs/testing-doc branch from 84a0de6 to 9138988
now

Just a rebase on main

This adds a new section to the Saluki developer documentation about testing. ## Summary As I'm onboarding to Saluki, I felt that a bit more about the testing strategy could be helpful. ## Change Type - [X] Non-functional (chore, refactoring, **docs**) ## How did you test this PR? I ran vitepress, read it, and used an LLM to scrub it against reality. Hopefully it is factually correct! ## References Closes #1277 d80c47e

webern requested a review from a team as a code owner March 30, 2026 16:19

dd-octo-sts Bot added the area/docs Reference documentation. label Mar 30, 2026

tobz reviewed Mar 30, 2026

View reviewed changes

webern force-pushed the matt.briggs/testing-doc branch from 201f884 to b3e0ff6 Compare March 31, 2026 08:49

webern added 3 commits April 1, 2026 16:29

feat(docs): add saluki testing strategy to docs

386dc9d

This adds a new section to the Saluki developer documentation about testing.

chore(docs): address pr feedback from toby

f4b9943

Addresses the first round of feedback from Toby

chore: fix least controversial vale errors

9138988

webern force-pushed the matt.briggs/testing-doc branch from 84a0de6 to 9138988 Compare April 1, 2026 14:29

tobz approved these changes Apr 2, 2026

View reviewed changes

webern merged commit d80c47e into main Apr 2, 2026
57 of 59 checks passed

webern deleted the matt.briggs/testing-doc branch April 2, 2026 16:19

		These tests serve to answer the question: Does ADP produce the same output as the Datadog agent for a given workload?

		To answer this question, a correctness test runs ADP and the agent side-by-side in containers and compares their output

		- millstone -> adp -> datadog-intake
		- millstone -> datadog-agent -> datadog-intake


		CI compares current branch against merge-base of main — purely "has your change regressed or improved?"

		You can run experiments locally with `smp local run` to debug experiment configs without waiting for CI (single replicate,

Conversation

webern commented Mar 30, 2026

Summary

Change Type

How did you test this PR?

References

Uh oh!

pr-commenter Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Binary Size Analysis (Agent Data Plane)

Changes by Module

Detailed Symbol Changes

Uh oh!

tobz left a comment

Choose a reason for hiding this comment

Uh oh!

tobz Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

tobz Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

tobz Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

tobz Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

pr-commenter Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression Detector (Agent Data Plane)

Regression Detector Results

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

Uh oh!

webern commented Mar 31, 2026

Uh oh!

webern commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

webern commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tobz commented Mar 31, 2026

Uh oh!

webern commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pr-commenter Bot commented Mar 30, 2026 •

edited

Loading

pr-commenter Bot commented Mar 30, 2026 •

edited

Loading

webern commented Mar 31, 2026 •

edited

Loading

webern commented Mar 31, 2026 •

edited

Loading