Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a new correctness test case for DogStatsD tag filtering. The test verifies that tag filtering rules are correctly applied by the Datadog Agent and the ADP (Adaptive Data Plane) implementation. The PR introduces test configuration files, updates the CI/CD pipeline, and refactors the metrics normalization logic to ensure that metrics with the same tags in different orders are correctly recognized and merged as a single metric.
Changes:
- Added a new correctness test case for DogStatsD tag filtering (
dsd-tag-filterlist) - Refactored metrics normalization to normalize context tags before grouping metrics, ensuring tag order independence
- Added unit test to verify correct metric deduplication when tags are in different order
- Added Makefile target and GitLab CI job to execute the new test
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| test/correctness/dsd-tag-filterlist/config.yaml | Test orchestration configuration defining baseline Agent (public Datadog Agent v7.76.2) and comparison target (ADP) |
| test/correctness/dsd-tag-filterlist/datadog.yaml | Agent configuration with specific tag filtering rules (exclude/include/miss patterns) |
| test/correctness/dsd-tag-filterlist/millstone.yaml | Metric generation configuration with fixed metric names and tag values to test filtering behavior |
| Makefile | Added new test-correctness-dsd-tag-filterlist target following existing pattern |
| .gitlab/e2e.yml | Added GitLab CI job for the new test, correctly not extending .test-correctness-adp-baseline since it uses a different baseline image |
| bin/correctness/ground-truth/src/analysis/metrics/types.rs | Refactored normalization logic and added unit test for tag order normalization |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
751d3a2 to
6fd3517
Compare
6fd3517 to
35ed30b
Compare
a5b6896 to
3afa1d7
Compare
Binary Size Analysis (Agent Data Plane)Target: 5612418 (baseline) vs dadad04 (comparison) diff
|
| Module | File Size | Symbols |
|---|
Detailed Symbol Changes
FILE SIZE VM SIZE
-------------- --------------
[ = ] 0 [ = ] 0 TOTAL
Regression Detector (Agent Data Plane)Regression Detector ResultsRun ID: 26e9fd2d-9d17-4c3d-8cd6-7bb6353ab586 Baseline: 5612418 Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ❌ | otlp_ingest_logs_5mb_memory | memory utilization | +13.77 | [+13.34, +14.20] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | +0.02 | [-0.11, +0.15] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | -0.14 | [-5.24, +4.95] | 1 | (metrics) (profiles) (logs) |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ❌ | otlp_ingest_logs_5mb_memory | memory utilization | +13.77 | [+13.34, +14.20] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_cpu | % cpu utilization | +3.55 | [-28.51, +35.61] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_cpu | % cpu utilization | +2.31 | [+0.19, +4.43] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_cpu | % cpu utilization | +2.22 | [-0.13, +4.58] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_cpu | % cpu utilization | +1.31 | [-53.48, +56.09] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_cpu | % cpu utilization | +0.42 | [-1.73, +2.56] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_memory | memory utilization | +0.31 | [+0.13, +0.49] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_medium | memory utilization | +0.31 | [+0.11, +0.50] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_cpu | % cpu utilization | +0.24 | [-56.89, +57.36] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_throughput | ingress throughput | +0.19 | [+0.05, +0.32] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_throughput | ingress throughput | +0.04 | [-0.09, +0.16] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | +0.02 | [-0.11, +0.15] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_throughput | ingress throughput | +0.00 | [-0.05, +0.06] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_throughput | ingress throughput | +0.00 | [-0.05, +0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_throughput | ingress throughput | +0.00 | [-0.02, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_throughput | ingress throughput | -0.00 | [-0.02, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_throughput | ingress throughput | -0.00 | [-0.02, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_throughput | ingress throughput | -0.00 | [-0.04, +0.03] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_throughput | ingress throughput | -0.00 | [-0.13, +0.12] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_memory | memory utilization | -0.01 | [-0.18, +0.15] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_heavy | memory utilization | -0.03 | [-0.17, +0.10] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_memory | memory utilization | -0.06 | [-0.23, +0.11] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_memory | memory utilization | -0.09 | [-0.27, +0.10] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_idle | memory utilization | -0.10 | [-0.13, -0.08] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_memory | memory utilization | -0.11 | [-0.35, +0.14] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_cpu | % cpu utilization | -0.13 | [-1.59, +1.33] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_memory | memory utilization | -0.13 | [-0.30, +0.04] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | -0.14 | [-5.24, +4.95] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_memory | memory utilization | -0.18 | [-0.51, +0.16] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_memory | memory utilization | -0.38 | [-0.64, -0.13] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_low | memory utilization | -0.40 | [-0.59, -0.21] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_ultraheavy | memory utilization | -0.61 | [-0.74, -0.48] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_cpu | % cpu utilization | -0.64 | [-6.78, +5.50] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_memory | memory utilization | -1.01 | [-1.24, -0.78] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_cpu | % cpu utilization | -1.93 | [-7.92, +4.06] | 1 | (metrics) (profiles) (logs) |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | quality_gates_rss_dsd_heavy | memory_usage | 10/10 | 115.48MiB ≤ 140MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_low | memory_usage | 10/10 | 33.98MiB ≤ 50MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_medium | memory_usage | 10/10 | 54.18MiB ≤ 75MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_ultraheavy | memory_usage | 10/10 | 169.87MiB ≤ 200MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_idle | memory_usage | 10/10 | 21.28MiB ≤ 40MiB | (metrics) (profiles) (logs) |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
webern
left a comment
There was a problem hiding this comment.
Diff looks good now, only test code 👍
Summary
PR title
Change Type
How did you test this PR?
CI
References