Skip to content

feat(docs): add saluki testing strategy to docs#1281

Merged
webern merged 3 commits into
mainfrom
matt.briggs/testing-doc
Apr 2, 2026
Merged

feat(docs): add saluki testing strategy to docs#1281
webern merged 3 commits into
mainfrom
matt.briggs/testing-doc

Conversation

@webern
Copy link
Copy Markdown
Contributor

@webern webern commented Mar 30, 2026

This adds a new section to the Saluki developer documentation about testing.

Summary

As I'm onboarding to Saluki, I felt that a bit more about the testing strategy could be helpful.

Change Type

  • Non-functional (chore, refactoring, docs)

How did you test this PR?

I ran vitepress, read it, and used an LLM to scrub it against reality. Hopefully it is factually correct!

References

Closes #1277

@webern webern requested a review from a team as a code owner March 30, 2026 16:19
@dd-octo-sts dd-octo-sts Bot added the area/docs Reference documentation. label Mar 30, 2026
@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Mar 30, 2026

Binary Size Analysis (Agent Data Plane)

Target: 63138ef (baseline) vs b3e0ff6 (comparison) diff
Analysis Type: Stripped binaries (debug symbols excluded)
Baseline Size: 26.20 MiB
Comparison Size: 26.20 MiB
Size Change: +0 B (+0.00%)
Pass/Fail Threshold: +5%
Result: PASSED ✅

Changes by Module

Module File Size Symbols

Detailed Symbol Changes

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [ = ]       0  [ = ]       0    TOTAL

Copy link
Copy Markdown
Member

@tobz tobz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, but still planning to take another pass over this to make sure it's copacetic.

Comment thread docs/development/testing.md Outdated
Comment on lines +20 to +22
These tests serve to answer the question: *Does ADP produce the same output as the Datadog agent for a given workload?*

To answer this question, a correctness test runs ADP and the agent side-by-side in containers and compares their output
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: we always capitalize Datadog Agent or Agent.

(This reminds me that I have a stale PR for adding Vale configuration to flag these spelling/grammar/stylistic lints automatically.... I should get back to that 😅)

Comment thread docs/development/testing.md Outdated
Comment on lines +49 to +50
- millstone -> adp -> datadog-intake
- millstone -> datadog-agent -> datadog-intake
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have backticks around these since they're representative of actual crates/binaries, also change adp to ADP and datadog-agent to Datadog Agent

Comment thread docs/development/testing.md Outdated
Integration tests run a containerized ADP instance and assert high-level invariants: process stability, expected log
output, port availability, exit behavior. They catch regressions from enabling new features or settings that cause
crashes or early exits. They do not test output correctness. This type of test is often known as a "smoke test" but our
original naming stuck even though correctness tests [above](#correctness-tests-ground-truth)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though correctness tests above... what? Feels like some words are missing here.

Comment thread docs/development/testing.md Outdated

CI compares current branch against merge-base of main — purely "has your change regressed or improved?"

You can run experiments locally with `smp local run` to debug experiment configs without waiting for CI (single replicate,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it's smp local-run.

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Mar 30, 2026

Regression Detector (Agent Data Plane)

Regression Detector Results

Run ID: 763a7736-ae2e-4cb6-9039-761f6f7fcaa2

Baseline: 63138ef
Comparison: b3e0ff6
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf experiment goal Δ mean % Δ mean % CI trials links
otlp_ingest_logs_5mb_memory memory utilization +5.01 [+4.52, +5.50] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_cpu % cpu utilization +0.16 [-4.79, +5.10] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_throughput ingress throughput +0.00 [-0.13, +0.13] 1 (metrics) (profiles) (logs)

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI trials links
dsd_uds_512kb_3k_contexts_cpu % cpu utilization +21.30 [-40.19, +82.79] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_memory memory utilization +5.01 [+4.52, +5.50] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_5mb_cpu % cpu utilization +4.09 [+1.95, +6.23] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_filtering_5mb_cpu % cpu utilization +3.25 [+0.86, +5.65] 1 (metrics) (profiles) (logs)
dsd_uds_10mb_3k_contexts_cpu % cpu utilization +1.19 [-29.20, +31.58] 1 (metrics) (profiles) (logs)
otlp_ingest_metrics_5mb_cpu % cpu utilization +1.01 [-7.69, +9.71] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_filtering_5mb_memory memory utilization +0.76 [+0.42, +1.10] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_ultraheavy memory utilization +0.49 [+0.36, +0.62] 1 (metrics) (profiles) (logs)
dsd_uds_500mb_3k_contexts_memory memory utilization +0.41 [+0.25, +0.58] 1 (metrics) (profiles) (logs)
dsd_uds_500mb_3k_contexts_cpu % cpu utilization +0.38 [-1.01, +1.76] 1 (metrics) (profiles) (logs)
dsd_uds_100mb_3k_contexts_memory memory utilization +0.32 [+0.14, +0.50] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_cpu % cpu utilization +0.16 [-4.79, +5.10] 1 (metrics) (profiles) (logs)
dsd_uds_1mb_3k_contexts_cpu % cpu utilization +0.14 [-52.23, +52.51] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_transform_5mb_throughput ingress throughput +0.00 [-0.02, +0.02] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_filtering_5mb_throughput ingress throughput +0.00 [-0.02, +0.02] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_5mb_throughput ingress throughput +0.00 [-0.02, +0.02] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_throughput ingress throughput +0.00 [-0.13, +0.13] 1 (metrics) (profiles) (logs)
dsd_uds_1mb_3k_contexts_throughput ingress throughput -0.00 [-0.06, +0.05] 1 (metrics) (profiles) (logs)
dsd_uds_512kb_3k_contexts_throughput ingress throughput -0.00 [-0.06, +0.05] 1 (metrics) (profiles) (logs)
dsd_uds_10mb_3k_contexts_throughput ingress throughput -0.01 [-0.15, +0.13] 1 (metrics) (profiles) (logs)
dsd_uds_100mb_3k_contexts_throughput ingress throughput -0.01 [-0.04, +0.02] 1 (metrics) (profiles) (logs)
otlp_ingest_metrics_5mb_throughput ingress throughput -0.02 [-0.15, +0.11] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_heavy memory utilization -0.05 [-0.18, +0.09] 1 (metrics) (profiles) (logs)
dsd_uds_512kb_3k_contexts_memory memory utilization -0.12 [-0.29, +0.05] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_transform_5mb_memory memory utilization -0.17 [-0.42, +0.08] 1 (metrics) (profiles) (logs)
dsd_uds_1mb_3k_contexts_memory memory utilization -0.20 [-0.37, -0.02] 1 (metrics) (profiles) (logs)
dsd_uds_10mb_3k_contexts_memory memory utilization -0.27 [-0.45, -0.08] 1 (metrics) (profiles) (logs)
dsd_uds_500mb_3k_contexts_throughput ingress throughput -0.58 [-0.70, -0.45] 1 (metrics) (profiles) (logs)
dsd_uds_100mb_3k_contexts_cpu % cpu utilization -0.63 [-6.64, +5.39] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_5mb_memory memory utilization -0.67 [-0.92, -0.42] 1 (metrics) (profiles) (logs)
quality_gates_rss_idle memory utilization -0.75 [-0.79, -0.72] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_medium memory utilization -0.81 [-1.01, -0.62] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_low memory utilization -1.01 [-1.21, -0.82] 1 (metrics) (profiles) (logs)
otlp_ingest_metrics_5mb_memory memory utilization -1.04 [-1.27, -0.80] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_transform_5mb_cpu % cpu utilization -1.47 [-3.79, +0.84] 1 (metrics) (profiles) (logs)

Bounds Checks: ✅ Passed

perf experiment bounds_check_name replicates_passed observed_value links
quality_gates_rss_dsd_heavy memory_usage 10/10 112.78MiB ≤ 140MiB (metrics) (profiles) (logs)
quality_gates_rss_dsd_low memory_usage 10/10 33.56MiB ≤ 50MiB (metrics) (profiles) (logs)
quality_gates_rss_dsd_medium memory_usage 10/10 55.62MiB ≤ 75MiB (metrics) (profiles) (logs)
quality_gates_rss_dsd_ultraheavy memory_usage 10/10 169.01MiB ≤ 200MiB (metrics) (profiles) (logs)
quality_gates_rss_idle memory_usage 10/10 21.11MiB ≤ 40MiB (metrics) (profiles) (logs)

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

@webern
Copy link
Copy Markdown
Contributor Author

webern commented Mar 31, 2026

Some nits, but still planning to take another pass over this to make sure it's copacetic.

This one addresses your feedback directly by hand (will follow up with use of the linting tools)

201f884

@webern webern force-pushed the matt.briggs/testing-doc branch from 201f884 to b3e0ff6 Compare March 31, 2026 08:49
@webern
Copy link
Copy Markdown
Contributor Author

webern commented Mar 31, 2026

webern force-pushed the matt.briggs/testing-doc branch from 201f884 to b3e0ff6
now

This is a rebase from main, and does not affect the testing.md file in this PR. (ignore this diff!)

@webern
Copy link
Copy Markdown
Contributor Author

webern commented Mar 31, 2026

@tobz, when I opened the PR I hadn't used Vale. Now I'm trying to use it but I find this:

1. Vale.Terms — lowercase conventions (15 errors)

  The project's Vale config enforces lowercase for these terms in docs:

  ┌─────────────┬─────────┬──────────┐
  │    Line     │ Current │ Required │
  ├─────────────┼─────────┼──────────┤
  │ 3           │ Saluki  │ saluki   │
  ├─────────────┼─────────┼──────────┤
  │ 16          │ Linux   │ linux    │
  ├─────────────┼─────────┼──────────┤
  │ 20, 22, 102 │ ADP     │ adp      │
  ├─────────────┼─────────┼──────────┤
  │ 20, 22      │ Datadog │ datadog  │
  ├─────────────┼─────────┼──────────┤
  │ 86          │ Config  │ config   │
  ├─────────────┼─────────┼──────────┤
  │ 100         │ GitLab  │ gitlab   │
  └─────────────┴─────────┴──────────┘

Relevant Value output:

  1. Line 3 — Vale.Terms: Use 'saluki' instead of 'Saluki'
  4. Line 16 — Vale.Terms: Use 'linux' instead of 'Linux'
  5. Line 20 — Vale.Terms: Use 'adp' instead of 'ADP'
  6. Line 20 — Vale.Terms: Use 'datadog' instead of 'Datadog'
  7. Line 22 — Vale.Terms: Use 'adp' instead of 'ADP'
  8. Line 22 — Vale.Terms: Use 'datadog' instead of 'Datadog'
  10. Line 86 — Vale.Terms: Use 'config' instead of 'Config'
  12. Line 100 — Vale.Terms: Use 'gitlab' instead of 'GitLab'
  13. Line 102 — Vale.Terms: Use 'adp' instead of 'ADP'
(...)

However in your feedback you suggested changing adp to ADP. Looking for some guidance here. Thanks!

@tobz
Copy link
Copy Markdown
Member

tobz commented Mar 31, 2026

I would ignore make check-docs for right now. It still needs some tuning so that different rules don't clobber each other, which is what is happening here.

webern added 3 commits April 1, 2026 16:29
This adds a new section to the Saluki developer documentation about
testing.
Addresses the first round of feedback from Toby
@webern webern force-pushed the matt.briggs/testing-doc branch from 84a0de6 to 9138988 Compare April 1, 2026 14:29
@webern
Copy link
Copy Markdown
Contributor Author

webern commented Apr 1, 2026

@webern webern force-pushed the matt.briggs/testing-doc branch from 84a0de6 to 9138988
now

Just a rebase on main

@webern webern merged commit d80c47e into main Apr 2, 2026
57 of 59 checks passed
@webern webern deleted the matt.briggs/testing-doc branch April 2, 2026 16:19
dd-octo-sts Bot pushed a commit that referenced this pull request Apr 2, 2026
This adds a new section to the Saluki developer documentation about
testing.

## Summary

As I'm onboarding to Saluki, I felt that a bit more about the testing
strategy could be helpful.

## Change Type
- [X] Non-functional (chore, refactoring, **docs**)

## How did you test this PR?

I ran vitepress, read it, and used an LLM to scrub it against reality.
Hopefully it is factually correct!

## References

Closes #1277 d80c47e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/docs Reference documentation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs: enhance developer docs with test onboarding

2 participants