feat(docs): add saluki testing strategy to docs#1281
Conversation
Binary Size Analysis (Agent Data Plane)Target: 63138ef (baseline) vs b3e0ff6 (comparison) diff
|
| Module | File Size | Symbols |
|---|
Detailed Symbol Changes
FILE SIZE VM SIZE
-------------- --------------
[ = ] 0 [ = ] 0 TOTAL
tobz
left a comment
There was a problem hiding this comment.
Some nits, but still planning to take another pass over this to make sure it's copacetic.
| These tests serve to answer the question: *Does ADP produce the same output as the Datadog agent for a given workload?* | ||
|
|
||
| To answer this question, a correctness test runs ADP and the agent side-by-side in containers and compares their output |
There was a problem hiding this comment.
Style nit: we always capitalize Datadog Agent or Agent.
(This reminds me that I have a stale PR for adding Vale configuration to flag these spelling/grammar/stylistic lints automatically.... I should get back to that 😅)
| - millstone -> adp -> datadog-intake | ||
| - millstone -> datadog-agent -> datadog-intake |
There was a problem hiding this comment.
Should have backticks around these since they're representative of actual crates/binaries, also change adp to ADP and datadog-agent to Datadog Agent
| Integration tests run a containerized ADP instance and assert high-level invariants: process stability, expected log | ||
| output, port availability, exit behavior. They catch regressions from enabling new features or settings that cause | ||
| crashes or early exits. They do not test output correctness. This type of test is often known as a "smoke test" but our | ||
| original naming stuck even though correctness tests [above](#correctness-tests-ground-truth) |
There was a problem hiding this comment.
Even though correctness tests above... what? Feels like some words are missing here.
|
|
||
| CI compares current branch against merge-base of main — purely "has your change regressed or improved?" | ||
|
|
||
| You can run experiments locally with `smp local run` to debug experiment configs without waiting for CI (single replicate, |
Regression Detector (Agent Data Plane)Regression Detector ResultsRun ID: 763a7736-ae2e-4cb6-9039-761f6f7fcaa2 Baseline: 63138ef Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ❌ | otlp_ingest_logs_5mb_memory | memory utilization | +5.01 | [+4.52, +5.50] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | +0.16 | [-4.79, +5.10] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | +0.00 | [-0.13, +0.13] | 1 | (metrics) (profiles) (logs) |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | dsd_uds_512kb_3k_contexts_cpu | % cpu utilization | +21.30 | [-40.19, +82.79] | 1 | (metrics) (profiles) (logs) |
| ❌ | otlp_ingest_logs_5mb_memory | memory utilization | +5.01 | [+4.52, +5.50] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_cpu | % cpu utilization | +4.09 | [+1.95, +6.23] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_cpu | % cpu utilization | +3.25 | [+0.86, +5.65] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_cpu | % cpu utilization | +1.19 | [-29.20, +31.58] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_cpu | % cpu utilization | +1.01 | [-7.69, +9.71] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_memory | memory utilization | +0.76 | [+0.42, +1.10] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_ultraheavy | memory utilization | +0.49 | [+0.36, +0.62] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_memory | memory utilization | +0.41 | [+0.25, +0.58] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_cpu | % cpu utilization | +0.38 | [-1.01, +1.76] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_memory | memory utilization | +0.32 | [+0.14, +0.50] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | +0.16 | [-4.79, +5.10] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_cpu | % cpu utilization | +0.14 | [-52.23, +52.51] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_throughput | ingress throughput | +0.00 | [-0.02, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_throughput | ingress throughput | +0.00 | [-0.02, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_throughput | ingress throughput | +0.00 | [-0.02, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | +0.00 | [-0.13, +0.13] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_throughput | ingress throughput | -0.00 | [-0.06, +0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_throughput | ingress throughput | -0.00 | [-0.06, +0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_throughput | ingress throughput | -0.01 | [-0.15, +0.13] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_throughput | ingress throughput | -0.01 | [-0.04, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_throughput | ingress throughput | -0.02 | [-0.15, +0.11] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_heavy | memory utilization | -0.05 | [-0.18, +0.09] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_memory | memory utilization | -0.12 | [-0.29, +0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_memory | memory utilization | -0.17 | [-0.42, +0.08] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_memory | memory utilization | -0.20 | [-0.37, -0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_memory | memory utilization | -0.27 | [-0.45, -0.08] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_throughput | ingress throughput | -0.58 | [-0.70, -0.45] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_cpu | % cpu utilization | -0.63 | [-6.64, +5.39] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_memory | memory utilization | -0.67 | [-0.92, -0.42] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_idle | memory utilization | -0.75 | [-0.79, -0.72] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_medium | memory utilization | -0.81 | [-1.01, -0.62] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_low | memory utilization | -1.01 | [-1.21, -0.82] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_memory | memory utilization | -1.04 | [-1.27, -0.80] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_cpu | % cpu utilization | -1.47 | [-3.79, +0.84] | 1 | (metrics) (profiles) (logs) |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | quality_gates_rss_dsd_heavy | memory_usage | 10/10 | 112.78MiB ≤ 140MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_low | memory_usage | 10/10 | 33.56MiB ≤ 50MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_medium | memory_usage | 10/10 | 55.62MiB ≤ 75MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_ultraheavy | memory_usage | 10/10 | 169.01MiB ≤ 200MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_idle | memory_usage | 10/10 | 21.11MiB ≤ 40MiB | (metrics) (profiles) (logs) |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
This one addresses your feedback directly by hand (will follow up with use of the linting tools) |
201f884 to
b3e0ff6
Compare
This is a rebase from main, and does not affect the testing.md file in this PR. (ignore this diff!) |
|
@tobz, when I opened the PR I hadn't used Vale. Now I'm trying to use it but I find this: Relevant Value output: However in your feedback you suggested changing adp to ADP. Looking for some guidance here. Thanks! |
|
I would ignore |
This adds a new section to the Saluki developer documentation about testing.
Addresses the first round of feedback from Toby
84a0de6 to
9138988
Compare
Just a rebase on main |
This adds a new section to the Saluki developer documentation about testing. ## Summary As I'm onboarding to Saluki, I felt that a bit more about the testing strategy could be helpful. ## Change Type - [X] Non-functional (chore, refactoring, **docs**) ## How did you test this PR? I ran vitepress, read it, and used an LLM to scrub it against reality. Hopefully it is factually correct! ## References Closes #1277 d80c47e
This adds a new section to the Saluki developer documentation about testing.
Summary
As I'm onboarding to Saluki, I felt that a bit more about the testing strategy could be helpful.
Change Type
How did you test this PR?
I ran vitepress, read it, and used an LLM to scrub it against reality. Hopefully it is factually correct!
References
Closes #1277