chore(panoramic): dynamic dispatch with test trait and runner by webern · Pull Request #1552 · DataDog/saluki

webern · 2026-05-01T16:39:00Z

Summary

Test execution was organized around two known test types. This PR introduces a Test trait and central Runner so that new types can be added without modifying the dispatch or scheduling code.

This PR replaces the enum dispatch with a Test trait and a Runner struct that owns scheduling, timeout, and cancellation uniformly. Discovery returns Vec<Box<dyn Test>>. Each test receives a TestContext at execution time carrying a cancellation token and log directory. The Runner creates a per-test cancel token, enforces the timeout with a grace period, and propagates program-level cancellation to running tests.

TestResult no longer carries filesystem paths - the Runner computes log directories and passes them through events. This is a minor change, but TestResult carried log_dir to a lot of places where it wasn't needed.

The --no-logs flag was removed. I didn't see it being used anywhere in the codebase and couldn't think of a great usecase. Meanwhile it was a bit unwieldy to wire it through to where it needed to be.

These changes make it straightforward to add new test types or create test cases dynamically. They just need to implement the trait, be added to the runner, and they will run and report like any other test.

Change	Reason / Benefit
`Test` trait with `run(tctx)` signature	New test types implement one trait, no enum arms
`Runner` struct owns scheduling and timeout	Uniform timeout/cancel for all test types
`TestContext` passed to `run()`	No stored runtime state
Per-test `CancellationToken` created by Runner	Runner controls shutdown; tests respond cooperatively
Global cancel propagates to running tests	Users can cancel via TUI or ctrl-c
`log_dir` removed from `TestResult`	TestResult is a bit more pure without it
`--no-logs` CLI flag removed	I can put it back if needed, but it was a little wonky
`TestRunner` -> `IntegrationRunner`	Disambiguates from the generic `Runner` scheduler
`TestRunner` -> `CorrectnessRunner`	Disambiguates from the generic `Runner` scheduler
`DiscoveredTest` enum deleted	Replaced by trait objects; removes match arms
`RunArgs` builder for runner configuration	Reduces parameters for the run_tests function call

Summary

Change Type

Non-functional (chore, refactoring, docs)

How did you test this PR?

CI
Run locally on main
Run locally on branch (check for differences)

Using:

docker ps -q --filter name=airlock- | xargs docker rm -f
docker network prune -f

make \
  build-datadog-intake-image \
  build-millstone-image \
  build-datadog-agent-image-release \
  build-datadog-agent-image \
  build-adp-image && \
cargo run --release --bin panoramic -- \
  run -d test/integration/cases \
      -d test/correctness \
      --no-tui

References

Related, precursor to #1536

pr-commenter · 2026-05-01T16:42:14Z

Binary Size Analysis (Agent Data Plane)

Target: 507d913 (baseline) vs 2765cb1 (comparison) diff
Analysis Type: Stripped binaries (debug symbols excluded)
Baseline Size: 36.78 MiB
Comparison Size: 36.78 MiB
Size Change: -8 B (-0.00%)
Pass/Fail Threshold: +5%
Result: PASSED ✅

Changes by Module

Module	File Size	Symbols
`anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.17550754633244885205`	-130 B	1
`anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.2417440905338573713`	+129 B	1
`anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.17550754633244885205`	-115 B	1
`anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.2417440905338573713`	+114 B	1
`anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.17550754633244885205`	-109 B	1
`anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.2417440905338573713`	+108 B	1
`anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.17550754633244885205`	-97 B	1
`anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.2417440905338573713`	+96 B	1
`anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.17550754633244885205`	-95 B	1
`anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.2417440905338573713`	+94 B	1
`[Unmapped]`	-3 B	1

Detailed Symbol Changes

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [NEW]    +129  [NEW]     +40    anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.2417440905338573713
  [NEW]    +114  [NEW]     +25    anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.2417440905338573713
  [NEW]    +108  [NEW]     +19    anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.2417440905338573713
  [NEW]     +96  [NEW]      +7    anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.2417440905338573713
  [NEW]     +94  [NEW]      +5    anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.2417440905338573713
  -0.0%      -3  [ = ]       0    [Unmapped]
  [DEL]     -95  [DEL]      -5    anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.17550754633244885205
  [DEL]     -97  [DEL]      -7    anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.17550754633244885205
  [DEL]    -109  [DEL]     -19    anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.17550754633244885205
  [DEL]    -115  [DEL]     -25    anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.17550754633244885205
  [DEL]    -130  [DEL]     -40    anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.17550754633244885205
  -0.0%      -8  [ = ]       0    TOTAL

pr-commenter · 2026-05-01T16:57:36Z

Regression Detector (Agent Data Plane)

Regression Detector Results

Run ID: efbe8889-3221-4d3a-bcec-83658a0074ac

Baseline: 507d913
Comparison: 2765cb1
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
❌	otlp_ingest_logs_5mb_memory	memory utilization	+14.68	[+14.31, +15.05]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_logs_5mb_cpu	% cpu utilization	+2.81	[-1.65, +7.27]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_logs_5mb_throughput	ingress throughput	+0.03	[-0.09, +0.16]	1	(metrics) (profiles) (logs)

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
❌	otlp_ingest_logs_5mb_memory	memory utilization	+14.68	[+14.31, +15.05]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_metrics_5mb_memory	memory utilization	+4.08	[+3.90, +4.26]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_logs_5mb_cpu	% cpu utilization	+2.81	[-1.65, +7.27]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_metrics_5mb_cpu	% cpu utilization	+1.70	[-4.32, +7.73]	1	(metrics) (profiles) (logs)
➖	dsd_uds_100mb_3k_contexts_cpu	% cpu utilization	+1.27	[-4.07, +6.60]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_5mb_throughput	ingress throughput	+0.56	[+0.48, +0.64]	1	(metrics) (profiles) (logs)
➖	dsd_uds_500mb_3k_contexts_throughput	ingress throughput	+0.48	[+0.36, +0.60]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_transform_5mb_throughput	ingress throughput	+0.36	[+0.28, +0.43]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_low	memory utilization	+0.28	[+0.12, +0.44]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_5mb_cpu	% cpu utilization	+0.24	[-1.86, +2.34]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_5mb_memory	memory utilization	+0.16	[+0.00, +0.32]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_idle	memory utilization	+0.10	[+0.07, +0.14]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_heavy	memory utilization	+0.08	[-0.04, +0.20]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_logs_5mb_throughput	ingress throughput	+0.03	[-0.09, +0.16]	1	(metrics) (profiles) (logs)
➖	dsd_uds_1mb_3k_contexts_throughput	ingress throughput	+0.00	[-0.06, +0.06]	1	(metrics) (profiles) (logs)
➖	dsd_uds_512kb_3k_contexts_throughput	ingress throughput	+0.00	[-0.05, +0.06]	1	(metrics) (profiles) (logs)
➖	dsd_uds_10mb_3k_contexts_throughput	ingress throughput	+0.00	[-0.17, +0.17]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_metrics_5mb_throughput	ingress throughput	-0.00	[-0.16, +0.15]	1	(metrics) (profiles) (logs)
➖	dsd_uds_100mb_3k_contexts_memory	memory utilization	-0.01	[-0.16, +0.14]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_filtering_5mb_memory	memory utilization	-0.02	[-0.26, +0.22]	1	(metrics) (profiles) (logs)
➖	dsd_uds_10mb_3k_contexts_memory	memory utilization	-0.02	[-0.17, +0.14]	1	(metrics) (profiles) (logs)
➖	dsd_uds_100mb_3k_contexts_throughput	ingress throughput	-0.02	[-0.05, +0.01]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_transform_5mb_cpu	% cpu utilization	-0.05	[-2.10, +1.99]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_ultraheavy	memory utilization	-0.06	[-0.19, +0.06]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_filtering_5mb_throughput	ingress throughput	-0.16	[-0.24, -0.09]	1	(metrics) (profiles) (logs)
➖	dsd_uds_1mb_3k_contexts_memory	memory utilization	-0.19	[-0.33, -0.05]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_transform_5mb_memory	memory utilization	-0.25	[-0.41, -0.10]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_medium	memory utilization	-0.27	[-0.44, -0.10]	1	(metrics) (profiles) (logs)
➖	dsd_uds_512kb_3k_contexts_memory	memory utilization	-0.37	[-0.52, -0.23]	1	(metrics) (profiles) (logs)
➖	dsd_uds_500mb_3k_contexts_memory	memory utilization	-0.54	[-0.68, -0.39]	1	(metrics) (profiles) (logs)
➖	dsd_uds_500mb_3k_contexts_cpu	% cpu utilization	-0.56	[-1.99, +0.87]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_filtering_5mb_cpu	% cpu utilization	-1.19	[-3.38, +0.99]	1	(metrics) (profiles) (logs)
➖	dsd_uds_512kb_3k_contexts_cpu	% cpu utilization	-2.19	[-58.64, +54.27]	1	(metrics) (profiles) (logs)
➖	dsd_uds_10mb_3k_contexts_cpu	% cpu utilization	-3.40	[-32.70, +25.91]	1	(metrics) (profiles) (logs)
➖	dsd_uds_1mb_3k_contexts_cpu	% cpu utilization	-3.64	[-56.11, +48.82]	1	(metrics) (profiles) (logs)

Bounds Checks: ✅ Passed

perf	experiment	bounds_check_name	replicates_passed	observed_value	links
✅	quality_gates_rss_dsd_heavy	memory_usage	10/10	121.82MiB ≤ 140MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_dsd_low	memory_usage	10/10	40.11MiB ≤ 50MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_dsd_medium	memory_usage	10/10	61.98MiB ≤ 75MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_dsd_ultraheavy	memory_usage	10/10	177.18MiB ≤ 200MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_idle	memory_usage	10/10	27.38MiB ≤ 40MiB	(metrics) (profiles) (logs)

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

thieman · 2026-05-01T17:34:40Z

+        run_with_tui_consumer(rx, cancel_all, Some(log_dir), runner_handle).await
    } else {
-        run_with_logging_consumer(rx, &cmd, log_dir, runner_handle).await
+        run_with_logging_consumer(rx, &cmd, Some(log_dir), runner_handle).await


cancel doesn't go into this path?

No, I ran out of time trying to get this done, but I don't think we already had a signal capture for ctrl-c in the program and I... ran out of time. I think we just need to trap ctrl-c and wire it up.

Added this. It was a tiny thing, I was just feeling rushed on Friday to get the PR open.

thieman · 2026-05-01T17:39:11Z

+pub(crate) type EventSender = mpsc::UnboundedSender<TestEvent>;
+
+/// The amount of time a test has to clean up after cancellation or timing out.
+const GRACE_TIME: Duration = Duration::from_secs(5);


This will likely be pretty tight for kind tests that tear down the cluster after themselves, maybe give it 30s just to be accommodating?

Seems like a mistake, I thought I had 30 there. Oops, will change

thieman · 2026-05-01T17:40:33Z

+/// Handles test filtering, parallelism, timeout enforcement, cancellation propagation,
+/// log directory creation, and event dispatch. Individual test logic lives in the `Test`
+/// implementations.
+pub(crate) struct Runner {


Will this also hold options passed from the CLI? An example I was thinking of adding was a --no-delete-kind-cluster flag that would need to eventually get passed down to the correctness test

My idea for that is test.rs:

/// A directory from which files should be mounted into one or more of the domain-specific containers used in this /// test. // TODO: this is a hack introduced to support the PANORAMIC_DYNAMIC feature. Consider generalizing if needed. // For example: this could become runtime_config: HashMap<String, String> for shuttling domain specific items from // runtime to a test. mounts_dir: PathBuf,

Instead of mounts_dir there we could have runtime_config: HashMap<String, String>

And you could pass stuff that way.

But... I'm not sure --no-delete-kind-cluster at the CLI sounds great.

What I actually think is that there should probably be a whole new Test type for Kind cluster tests.

It bears thinking through design tradeoffs.

The issue with something like --no-delete-kind-cluster at the CLI is that now we are pushing extremely specific stuff into the CLI whereas panoramic might be capable of running tests that don't use containers or can't run on kind. So I think we should consider how we design things.

But... I'm not sure --no-delete-kind-cluster at the CLI sounds great.

It's not really a property of the test itself, it's a runtime option for how you'd like to actually execute the tests. In this case the idea would be "I'm futzing with this test locally and will need to run it a bunch, so let's skip starting and killing a full kind cluster each time." The CLI seems like the natural place for those, I shouldn't need to futz with YAML for a temporary change I don't want to commit.

What I actually think is that there should probably be a whole new Test type for Kind cluster tests.

In my current PR the config for it is limited to a single runtime field on the correctness test def, if it gets more complex than that could certainly consider splitting. But if they're 99% equivalent, seems like they can share.

I would think of that in terms of test lifecycle being something like this:

test resource set-up

test execution

test resource teardown

And what you're asking panoramic to do is --skip-teardown which could be applicable to other test types as well.

Edit: in fact, --skip-teardown can be important sometimes if you need to get in there and see what went wrong.

Skipping the kind cluster teardown specifically is more granular than that, skipping teardown entirely would also skip cleaning up any of the test-specific namespaces, pods, containers, etc. Not saying we shouldn't support --skip-teardown, we should, but I think a general mechanism for passing CLI flags down to Test implementations will be necessary in either case.

I think my ideal, when it comes to kubernetes, would be to create the cluster and tear it down without necessarily even caring about namespaces, etc. I can't remember if kind is one of those tools that needs a cluster to create your cluster... but in general, it's much cleaner if you can create and discard clusters.

That's the default, but if you're intending to run the test repeatedly on your Mac then that's going to add ~2 minutes to each test run that you'd probably rather not add. It's purely a convenience option.

Test execution was organized around two known test types. This PR introduces a `Test` trait and central `Runner` so that new types can be added without modifying the dispatch or scheduling code. This PR replaces the enum dispatch with a `Test` trait and a `Runner` struct that owns scheduling, timeout, and cancellation uniformly. Discovery returns `Vec<Box<dyn Test>>`. Each test receives a `TestContext` at execution time carrying a cancellation token and log directory. The `Runner` creates a per-test cancel token, enforces the timeout with a grace period, and propagates program-level cancellation to running tests. `TestResult` no longer carries filesystem paths - the `Runner` computes log directories and passes them through events. This is a minor change, but `TestResult` carried `log_dir` to a lot of places where it wasn't needed. The `--no-logs` flag was removed. I didn't see it being used anywhere in the codebase and couldn't think of a great usecase. Meanwhile it was a bit unwieldy to wire it through to where it needed to be. These changes make it straightforward to add new test types or create test cases dynamically. They just need to implement the trait, be added to the runner, and they will run and report like any other test. | Change | Reason / Benefit | |------------------------------------------------|--------------------------------------------------------| | `Test` trait with `run(tctx)` signature | New test types implement one trait, no enum arms | | `Runner` struct owns scheduling and timeout | Uniform timeout/cancel for all test types | | `TestContext` passed to `run()` | No stored runtime state | | Per-test `CancellationToken` created by Runner | Runner controls shutdown; tests respond cooperatively | | Global cancel propagates to running tests | Users can cancel via TUI or ctrl-c | | `log_dir` removed from `TestResult` | TestResult is a bit more pure without it | | `--no-logs` CLI flag removed | I can put it back if needed, but it was a little wonky | | `TestRunner` -> `IntegrationRunner` | Disambiguates from the generic `Runner` scheduler | | `TestRunner` -> `CorrectnessRunner` | Disambiguates from the generic `Runner` scheduler | | `DiscoveredTest` enum deleted | Replaced by trait objects; removes match arms | | `RunArgs` builder for runner configuration | Reduces parameters for the run_tests function call |

## Summary Test execution was organized around two known test types. This PR introduces a `Test` trait and central `Runner` so that new types can be added without modifying the dispatch or scheduling code. This PR replaces the enum dispatch with a `Test` trait and a `Runner` struct that owns scheduling, timeout, and cancellation uniformly. Discovery returns `Vec<Box<dyn Test>>`. Each test receives a `TestContext` at execution time carrying a cancellation token and log directory. The `Runner` creates a per-test cancel token, enforces the timeout with a grace period, and propagates program-level cancellation to running tests. `TestResult` no longer carries filesystem paths - the `Runner` computes log directories and passes them through events. This is a minor change, but `TestResult` carried `log_dir` to a lot of places where it wasn't needed. The `--no-logs` flag was removed. I didn't see it being used anywhere in the codebase and couldn't think of a great usecase. Meanwhile it was a bit unwieldy to wire it through to where it needed to be. These changes make it straightforward to add new test types or create test cases dynamically. They just need to implement the trait, be added to the runner, and they will run and report like any other test. | Change | Reason / Benefit | |------------------------------------------------|--------------------------------------------------------| | `Test` trait with `run(tctx)` signature | New test types implement one trait, no enum arms | | `Runner` struct owns scheduling and timeout | Uniform timeout/cancel for all test types | | `TestContext` passed to `run()` | No stored runtime state | | Per-test `CancellationToken` created by Runner | Runner controls shutdown; tests respond cooperatively | | Global cancel propagates to running tests | Users can cancel via TUI or ctrl-c | | `log_dir` removed from `TestResult` | TestResult is a bit more pure without it | | `--no-logs` CLI flag removed | I can put it back if needed, but it was a little wonky | | `TestRunner` -> `IntegrationRunner` | Disambiguates from the generic `Runner` scheduler | | `TestRunner` -> `CorrectnessRunner` | Disambiguates from the generic `Runner` scheduler | | `DiscoveredTest` enum deleted | Replaced by trait objects; removes match arms | | `RunArgs` builder for runner configuration | Reduces parameters for the run_tests function call | ## Summary  ## Change Type - [x] Non-functional (chore, refactoring, docs) ## How did you test this PR? - [x] CI - [ ] Run locally on main - [ ] Run locally on branch (check for differences) Using: ```bash docker ps -q --filter name=airlock- | xargs docker rm -f docker network prune -f make \ build-datadog-intake-image \ build-millstone-image \ build-datadog-agent-image-release \ build-datadog-agent-image \ build-adp-image && \ cargo run --release --bin panoramic -- \ run -d test/integration/cases \ -d test/correctness \ --no-tui ``` ## References Related, precursor to #1536 eb250ff

## Human Summary Adds the ability to run `kind` (kubernetes-in-docker) correctness tests. The actual goal here is to add multiple tests covering origin detection which has a lot of Kubernetes-specific behavior we'd like to cover. This PR does not add origin detection tests yet, but it does add a `dsd-plain-kind` test which is just the existing `dsd-plain` correctness test but modified to run under kind. This test will be deleted when we add the origin detection tests. Example run including the kind test is provided immediately below. You'll need to [install kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installing-from-release-binaries) first: `cargo run --profile release --package panoramic -- run -d test/correctness -d test/integration/cases -t basic-startup,dsd-plain,dsd-plain-kind` ## Summary - Adds a `runtime: kubernetes_in_docker` option to correctness test configs that runs test groups as multi-container Kubernetes pods inside a kind cluster, unlocking origin detection testing scenarios that are impossible in the plain Docker framework (real pod UIDs, containerd container IDs, K8s labels, External Data injection via `DD_EXTERNAL_ENV`) - Introduces `dsd-plain-kind` as the initial kind-based test — verifies the existing dsd-plain workload passes end-to-end through the kind path as a baseline before origin-detection-specific tests are layered on top - CI integration follows the same dynamic pipeline approach as existing Docker tests: the pipeline generator emits kind jobs automatically when `runtime: kubernetes_in_docker` is detected, using a new `.test-correctness-kind-definition` mixin that extends the existing Docker mixin with a longer timeout - `runtime` is now a required field in all correctness configs (no default); all existing tests explicitly declare their runtime - Rebased onto main after PR #1552 (dynamic dispatch with Test trait and Runner) and latest main (May 7) ## What changed **New runtime path (`k8s.rs`):** - Integrates with PR 1552's `Test` trait architecture — `Config::run(tctx: TestContext)` dispatches to the kind path when `runtime: kubernetes_in_docker` - Each test group (baseline + comparison) runs as a multi-container pod in its own namespace, labelled `created-by=panoramic-kind` for orphan cleanup - `datadog-intake`, `target` (agent), and `millstone` share a pod with an emptyDir at `/airlock` for the UDS socket - Config files injected via ConfigMap with `subPath` mounts so the agent's `/etc/datadog-agent/` remains writable (needed for `auth_token` creation) - Millstone wrapped in a socket-wait shell command so it doesn't send before the agent is ready - After the pod reaches Running, background tasks stream each container's logs to `<log-dir>/<baseline|comparison>/<container>.log`; ANSI escape codes stripped via `crate::utils::strip_ansi_codes` (no duplicate implementations) - Data collected via kube-rs port-forward to datadog-intake port 2049; forward cancelled via `CancellationToken` after collection - `wait_for_millstone_exit` bounded by `MILLSTONE_EXIT_TIMEOUT` (300s) - Both baseline and comparison errors reported when both groups fail simultaneously - Malformed env vars (missing `=`) in target config emit `warn!` instead of being silently dropped **ANSI stripping (both runtimes):** - `airlock/src/driver.rs` strips ANSI codes from Docker container log output - `k8s.rs` and `kind.rs` share `strip_ansi_codes` from `crate::utils`; `airlock` keeps its own copy as a separate crate - `kind create cluster` output is captured and emitted through tracing at debug level; raw emoji/ANSI output suppressed **Kind cluster lifecycle (`kind.rs`, `main.rs`, `runner.rs`, `test.rs`):** - Kind cluster setup only runs when the selected test set includes at least one `kubernetes_in_docker` test; running `-t dsd-plain` never touches kind - Kind cluster setup runs as a background task — Docker-runtime tests start immediately without waiting - Kind tests wait for the cluster-ready signal **before acquiring a concurrency slot**, gated on `test.runtime() == "kubernetes_in_docker"` so Docker tests are completely unaffected; each task holds its own cloned `watch::Receiver` with independent "last seen" state — no mutex needed - The wait loop uses `tokio::select!` on the cancellation token so Ctrl-C during cluster setup doesn't deadlock kind test futures; both `run_parallel` and `run_fail_fast` perform this wait - `check_kind_installed` verifies the exit code in addition to binary presence - A warning is emitted when kind setup fails after cluster creation, alerting users to a potentially dangling cluster - Kind setup emits `TestEvent::StatusLine` messages visible in both TUI and logging modes - panoramic manages the full kind cluster lifecycle: creates if absent (reuses if present), pulls images only if not already in the local daemon (pull failure is fatal), loads images in parallel, deletes cluster after tests - `--no-delete-kind-cluster` keeps the cluster alive between runs (useful locally) - `--kind-cluster-name` overrides the default (`saluki-correctness`) - Flush wait changed from 32s to 30s (`FLUSH_WAIT: Duration`) **CI (`.gitlab/correctness-mixins.yml`, `generate-correctness-pipeline.sh`):** - `.test-correctness-kind-definition` extends `.test-correctness-definition` with a longer timeout — all cluster/image management is inside panoramic - Pipeline generator emits kind jobs via the kind mixin; mixin selection driven by `test.runtime()` from the `Test` trait **kind pre-installed in `SALUKI_BUILD_CI_IMAGE`:** - `.ci/install-kind.sh` installs kind in the same `RUN` layer as `install-docker-cli.sh`, with per-arch SHA256 checksums hardcoded in the script **Explicitness:** - `Runtime` enum has no default; all correctness configs have an explicit `runtime:` field (including `dsd-tag-filterlist` from PR 1552) **Local dev:** - `cargo run --profile release --package panoramic -- run -d test/correctness -t dsd-plain-kind --no-tui --no-delete-kind-cluster` - `make clean-kind` / `make clean-correctness` **Dependencies:** kube 0.93 (client + rustls-tls + ws), k8s-openapi 0.22, tokio-util compat; RUSTSEC-2025-0134 ignored (rustls-pemfile, transitive via kube) ## Test plan - [x] `dsd-plain-kind` passes locally - [x] Running `-t dsd-plain` (Docker test) does not trigger kind cluster setup - [x] Docker and kind tests run concurrently — Docker tests start immediately, kind tests wait without holding concurrency slots - [x] Ctrl-C during cluster setup doesn't deadlock kind test futures - [x] Kind setup progress visible in both TUI and logging modes via `TestEvent::StatusLine` - [x] Container logs are plain text without ANSI codes (both runtimes) - [x] All correctness tests discovered with explicit runtime fields (including from latest main) - [x] Branch rebased cleanly onto main post-PR-1552 - [ ] `SALUKI_BUILD_CI_IMAGE` rebuilt with kind (trigger `generate-build-ci-image` via Run Pipeline on this branch) - [ ] CI pipeline generates `test-correctness-dsd-plain-kind` job - [ ] Existing Docker correctness tests unaffected 🤖 Generated with [Claude Code](https://claude.ai/claude-code) Co-authored-by: travis.thieman <travis.thieman@datadoghq.com>

## Human Summary Adds the ability to run `kind` (kubernetes-in-docker) correctness tests. The actual goal here is to add multiple tests covering origin detection which has a lot of Kubernetes-specific behavior we'd like to cover. This PR does not add origin detection tests yet, but it does add a `dsd-plain-kind` test which is just the existing `dsd-plain` correctness test but modified to run under kind. This test will be deleted when we add the origin detection tests. Example run including the kind test is provided immediately below. You'll need to [install kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installing-from-release-binaries) first: `cargo run --profile release --package panoramic -- run -d test/correctness -d test/integration/cases -t basic-startup,dsd-plain,dsd-plain-kind` ## Summary - Adds a `runtime: kubernetes_in_docker` option to correctness test configs that runs test groups as multi-container Kubernetes pods inside a kind cluster, unlocking origin detection testing scenarios that are impossible in the plain Docker framework (real pod UIDs, containerd container IDs, K8s labels, External Data injection via `DD_EXTERNAL_ENV`) - Introduces `dsd-plain-kind` as the initial kind-based test — verifies the existing dsd-plain workload passes end-to-end through the kind path as a baseline before origin-detection-specific tests are layered on top - CI integration follows the same dynamic pipeline approach as existing Docker tests: the pipeline generator emits kind jobs automatically when `runtime: kubernetes_in_docker` is detected, using a new `.test-correctness-kind-definition` mixin that extends the existing Docker mixin with a longer timeout - `runtime` is now a required field in all correctness configs (no default); all existing tests explicitly declare their runtime - Rebased onto main after PR #1552 (dynamic dispatch with Test trait and Runner) and latest main (May 7) ## What changed **New runtime path (`k8s.rs`):** - Integrates with PR 1552's `Test` trait architecture — `Config::run(tctx: TestContext)` dispatches to the kind path when `runtime: kubernetes_in_docker` - Each test group (baseline + comparison) runs as a multi-container pod in its own namespace, labelled `created-by=panoramic-kind` for orphan cleanup - `datadog-intake`, `target` (agent), and `millstone` share a pod with an emptyDir at `/airlock` for the UDS socket - Config files injected via ConfigMap with `subPath` mounts so the agent's `/etc/datadog-agent/` remains writable (needed for `auth_token` creation) - Millstone wrapped in a socket-wait shell command so it doesn't send before the agent is ready - After the pod reaches Running, background tasks stream each container's logs to `<log-dir>/<baseline|comparison>/<container>.log`; ANSI escape codes stripped via `crate::utils::strip_ansi_codes` (no duplicate implementations) - Data collected via kube-rs port-forward to datadog-intake port 2049; forward cancelled via `CancellationToken` after collection - `wait_for_millstone_exit` bounded by `MILLSTONE_EXIT_TIMEOUT` (300s) - Both baseline and comparison errors reported when both groups fail simultaneously - Malformed env vars (missing `=`) in target config emit `warn!` instead of being silently dropped **ANSI stripping (both runtimes):** - `airlock/src/driver.rs` strips ANSI codes from Docker container log output - `k8s.rs` and `kind.rs` share `strip_ansi_codes` from `crate::utils`; `airlock` keeps its own copy as a separate crate - `kind create cluster` output is captured and emitted through tracing at debug level; raw emoji/ANSI output suppressed **Kind cluster lifecycle (`kind.rs`, `main.rs`, `runner.rs`, `test.rs`):** - Kind cluster setup only runs when the selected test set includes at least one `kubernetes_in_docker` test; running `-t dsd-plain` never touches kind - Kind cluster setup runs as a background task — Docker-runtime tests start immediately without waiting - Kind tests wait for the cluster-ready signal **before acquiring a concurrency slot**, gated on `test.runtime() == "kubernetes_in_docker"` so Docker tests are completely unaffected; each task holds its own cloned `watch::Receiver` with independent "last seen" state — no mutex needed - The wait loop uses `tokio::select!` on the cancellation token so Ctrl-C during cluster setup doesn't deadlock kind test futures; both `run_parallel` and `run_fail_fast` perform this wait - `check_kind_installed` verifies the exit code in addition to binary presence - A warning is emitted when kind setup fails after cluster creation, alerting users to a potentially dangling cluster - Kind setup emits `TestEvent::StatusLine` messages visible in both TUI and logging modes - panoramic manages the full kind cluster lifecycle: creates if absent (reuses if present), pulls images only if not already in the local daemon (pull failure is fatal), loads images in parallel, deletes cluster after tests - `--no-delete-kind-cluster` keeps the cluster alive between runs (useful locally) - `--kind-cluster-name` overrides the default (`saluki-correctness`) - Flush wait changed from 32s to 30s (`FLUSH_WAIT: Duration`) **CI (`.gitlab/correctness-mixins.yml`, `generate-correctness-pipeline.sh`):** - `.test-correctness-kind-definition` extends `.test-correctness-definition` with a longer timeout — all cluster/image management is inside panoramic - Pipeline generator emits kind jobs via the kind mixin; mixin selection driven by `test.runtime()` from the `Test` trait **kind pre-installed in `SALUKI_BUILD_CI_IMAGE`:** - `.ci/install-kind.sh` installs kind in the same `RUN` layer as `install-docker-cli.sh`, with per-arch SHA256 checksums hardcoded in the script **Explicitness:** - `Runtime` enum has no default; all correctness configs have an explicit `runtime:` field (including `dsd-tag-filterlist` from PR 1552) **Local dev:** - `cargo run --profile release --package panoramic -- run -d test/correctness -t dsd-plain-kind --no-tui --no-delete-kind-cluster` - `make clean-kind` / `make clean-correctness` **Dependencies:** kube 0.93 (client + rustls-tls + ws), k8s-openapi 0.22, tokio-util compat; RUSTSEC-2025-0134 ignored (rustls-pemfile, transitive via kube) ## Test plan - [x] `dsd-plain-kind` passes locally - [x] Running `-t dsd-plain` (Docker test) does not trigger kind cluster setup - [x] Docker and kind tests run concurrently — Docker tests start immediately, kind tests wait without holding concurrency slots - [x] Ctrl-C during cluster setup doesn't deadlock kind test futures - [x] Kind setup progress visible in both TUI and logging modes via `TestEvent::StatusLine` - [x] Container logs are plain text without ANSI codes (both runtimes) - [x] All correctness tests discovered with explicit runtime fields (including from latest main) - [x] Branch rebased cleanly onto main post-PR-1552 - [ ] `SALUKI_BUILD_CI_IMAGE` rebuilt with kind (trigger `generate-build-ci-image` via Run Pipeline on this branch) - [ ] CI pipeline generates `test-correctness-dsd-plain-kind` job - [ ] Existing Docker correctness tests unaffected 🤖 Generated with [Claude Code](https://claude.ai/claude-code) Co-authored-by: travis.thieman <travis.thieman@datadoghq.com> d04bde8

webern requested a review from a team as a code owner May 1, 2026 16:39

webern commented May 1, 2026

View reviewed changes

Comment thread bin/correctness/panoramic/src/correctness/runner.rs

webern force-pushed the matt.briggs/panoramic-refactor branch from 3279c37 to f7dc1cd Compare May 1, 2026 16:57

thieman reviewed May 1, 2026

View reviewed changes

thieman approved these changes May 1, 2026

View reviewed changes

webern added 3 commits May 4, 2026 11:58

fixup: set GRACE_TIME to 30 seconds

f81e553

fixup: handle ctrl-c in non-tui mode

2765cb1

webern force-pushed the matt.briggs/panoramic-refactor branch from 448363f to 2765cb1 Compare May 4, 2026 09:58

tobz approved these changes May 4, 2026

View reviewed changes

webern merged commit eb250ff into main May 4, 2026
73 checks passed

webern deleted the matt.briggs/panoramic-refactor branch May 4, 2026 15:00

thieman mentioned this pull request May 4, 2026

feat(panoramic): add kind-based correctness test runtime #1541

Merged

11 tasks

Conversation

webern commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summary

Change Type

How did you test this PR?

References

Uh oh!

pr-commenter Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Binary Size Analysis (Agent Data Plane)

Changes by Module

Detailed Symbol Changes

Uh oh!

Uh oh!

pr-commenter Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression Detector (Agent Data Plane)

Regression Detector Results

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

webern May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

webern May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

webern commented May 1, 2026 •

edited

Loading

pr-commenter Bot commented May 1, 2026 •

edited

Loading

pr-commenter Bot commented May 1, 2026 •

edited

Loading

webern May 1, 2026 •

edited

Loading

webern May 1, 2026 •

edited

Loading