feat(panoramic): add kind-based correctness test runtime by thieman · Pull Request #1541 · DataDog/saluki

thieman · 2026-04-30T20:05:00Z

Human Summary

Adds the ability to run kind (kubernetes-in-docker) correctness tests. The actual goal here is to add multiple tests covering origin detection which has a lot of Kubernetes-specific behavior we'd like to cover. This PR does not add origin detection tests yet, but it does add a dsd-plain-kind test which is just the existing dsd-plain correctness test but modified to run under kind. This test will be deleted when we add the origin detection tests.

Example run including the kind test is provided immediately below. You'll need to install kind first:

cargo run --profile release --package panoramic -- run -d test/correctness -d test/integration/cases -t basic-startup,dsd-plain,dsd-plain-kind

Summary

Adds a runtime: kubernetes_in_docker option to correctness test configs that runs test groups as multi-container Kubernetes pods inside a kind cluster, unlocking origin detection testing scenarios that are impossible in the plain Docker framework (real pod UIDs, containerd container IDs, K8s labels, External Data injection via DD_EXTERNAL_ENV)
Introduces dsd-plain-kind as the initial kind-based test — verifies the existing dsd-plain workload passes end-to-end through the kind path as a baseline before origin-detection-specific tests are layered on top
CI integration follows the same dynamic pipeline approach as existing Docker tests: the pipeline generator emits kind jobs automatically when runtime: kubernetes_in_docker is detected, using a new .test-correctness-kind-definition mixin that extends the existing Docker mixin with a longer timeout
runtime is now a required field in all correctness configs (no default); all existing tests explicitly declare their runtime
Rebased onto main after PR chore(panoramic): dynamic dispatch with test trait and runner #1552 (dynamic dispatch with Test trait and Runner) and latest main (May 7)

What changed

New runtime path (k8s.rs):

Integrates with PR 1552's Test trait architecture — Config::run(tctx: TestContext) dispatches to the kind path when runtime: kubernetes_in_docker
Each test group (baseline + comparison) runs as a multi-container pod in its own namespace, labelled created-by=panoramic-kind for orphan cleanup
datadog-intake, target (agent), and millstone share a pod with an emptyDir at /airlock for the UDS socket
Config files injected via ConfigMap with subPath mounts so the agent's /etc/datadog-agent/ remains writable (needed for auth_token creation)
Millstone wrapped in a socket-wait shell command so it doesn't send before the agent is ready
After the pod reaches Running, background tasks stream each container's logs to <log-dir>/<baseline|comparison>/<container>.log; ANSI escape codes stripped via crate::utils::strip_ansi_codes (no duplicate implementations)
Data collected via kube-rs port-forward to datadog-intake port 2049; forward cancelled via CancellationToken after collection
wait_for_millstone_exit bounded by MILLSTONE_EXIT_TIMEOUT (300s)
Both baseline and comparison errors reported when both groups fail simultaneously
Malformed env vars (missing =) in target config emit warn! instead of being silently dropped

ANSI stripping (both runtimes):

airlock/src/driver.rs strips ANSI codes from Docker container log output
k8s.rs and kind.rs share strip_ansi_codes from crate::utils; airlock keeps its own copy as a separate crate
kind create cluster output is captured and emitted through tracing at debug level; raw emoji/ANSI output suppressed

Kind cluster lifecycle (kind.rs, main.rs, runner.rs, test.rs):

Kind cluster setup only runs when the selected test set includes at least one kubernetes_in_docker test; running -t dsd-plain never touches kind
Kind cluster setup runs as a background task — Docker-runtime tests start immediately without waiting
Kind tests wait for the cluster-ready signal before acquiring a concurrency slot, gated on test.runtime() == "kubernetes_in_docker" so Docker tests are completely unaffected; each task holds its own cloned watch::Receiver with independent "last seen" state — no mutex needed
The wait loop uses tokio::select! on the cancellation token so Ctrl-C during cluster setup doesn't deadlock kind test futures; both run_parallel and run_fail_fast perform this wait
check_kind_installed verifies the exit code in addition to binary presence
A warning is emitted when kind setup fails after cluster creation, alerting users to a potentially dangling cluster
Kind setup emits TestEvent::StatusLine messages visible in both TUI and logging modes
panoramic manages the full kind cluster lifecycle: creates if absent (reuses if present), pulls images only if not already in the local daemon (pull failure is fatal), loads images in parallel, deletes cluster after tests
--no-delete-kind-cluster keeps the cluster alive between runs (useful locally)
--kind-cluster-name overrides the default (saluki-correctness)
Flush wait changed from 32s to 30s (FLUSH_WAIT: Duration)

CI (.gitlab/correctness-mixins.yml, generate-correctness-pipeline.sh):

.test-correctness-kind-definition extends .test-correctness-definition with a longer timeout — all cluster/image management is inside panoramic
Pipeline generator emits kind jobs via the kind mixin; mixin selection driven by test.runtime() from the Test trait

kind pre-installed in SALUKI_BUILD_CI_IMAGE:

.ci/install-kind.sh installs kind in the same RUN layer as install-docker-cli.sh, with per-arch SHA256 checksums hardcoded in the script

Explicitness:

Runtime enum has no default; all correctness configs have an explicit runtime: field (including dsd-tag-filterlist from PR 1552)

Local dev:

cargo run --profile release --package panoramic -- run -d test/correctness -t dsd-plain-kind --no-tui --no-delete-kind-cluster
make clean-kind / make clean-correctness

Dependencies: kube 0.93 (client + rustls-tls + ws), k8s-openapi 0.22, tokio-util compat; RUSTSEC-2025-0134 ignored (rustls-pemfile, transitive via kube)

Test plan

🤖 Generated with Claude Code

pr-commenter · 2026-04-30T20:20:32Z

Binary Size Analysis (Agent Data Plane)

Target: bc61ae9 (baseline) vs 5020f9d (comparison) diff
Analysis Type: Stripped binaries (debug symbols excluded)
Baseline Size: 37.33 MiB
Comparison Size: 37.34 MiB
Size Change: +9.32 KiB (+0.02%)
Pass/Fail Threshold: +5%
Result: PASSED ✅

Changes by Module

Module	File Size	Symbols
`figment`	+11.51 KiB	561
`core`	-1.93 KiB	13573
`serde_core`	-1.88 KiB	752
`anon.583fff1769e839a4ac4c3817980336c2.88.llvm.3366751254004393234`	-1.69 KiB	1
`anon.026f154fc78761c6625909119fdb6b08.339.llvm.779975656443548521`	+1.69 KiB	1
`anon.edc88c1f77f7fc54f3491f6dbe3b9816.26.llvm.8772311085534667432`	-1.66 KiB	1
`anon.dd4568488db98f1f5e36dc582ec46b89.1300.llvm.598376170750614907`	+1.66 KiB	1
`hickory_proto`	+1.45 KiB	508
`anon.65bdbfe4458a3f8f9b0946c368867cf8.50.llvm.4968226963744631081`	-1.41 KiB	1
`anon.9c41cca0aee603b5f5a665dc9e65f22f.127.llvm.13310154803454401535`	+1.41 KiB	1
`anon.0b516339eca47b86e19d894c06a552f4.490.llvm.12175845880619374489`	+1.25 KiB	1
`anon.edc88c1f77f7fc54f3491f6dbe3b9816.29.llvm.8772311085534667432`	-1.25 KiB	1
`anon.01e3c670ef17ae5fac81a933540fa1b7.119.llvm.7463965485508184397`	+1.24 KiB	1
`anon.cf0916cd32a442a1e4bd06931e9a86cd.274.llvm.16793965728278154532`	-1.24 KiB	1
`anon.0e1016b7c3f08cf2959bb8871559dd7c.790.llvm.1411995310717264458`	+1.22 KiB	1
`anon.1655e58d66d1ef1230538706b7bd06b2.59.llvm.13790112841392064238`	-1.22 KiB	1
`anon.1fe68cbddbd7be3d3efdb1444e7a5632.0.llvm.8454770004358052377`	+1.14 KiB	1
`anon.c51fd3990c6b999fc93e5ff25d808401.126.llvm.8984495447445464818`	-1.14 KiB	1
`tower_layer`	-1.11 KiB	17
`anon.56e6ba155cb7b0d0983a1359a1e3d756.33.llvm.16671882612047019062`	-1.08 KiB	1

Detailed Symbol Changes

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [NEW]  +149Ki  [NEW]  +149Ki    agent_data_plane::cli::run::handle_run_command::_{{closure}}::h6d7b3b1e4cf1ec3e
  [NEW] +85.9Ki  [NEW] +85.7Ki    saluki_env::workload::providers::remote_agent::RemoteAgentWorkloadProvider::from_configuration::_{{closure}}::hee3561fa4c60372a
  [NEW] +69.9Ki  [NEW] +69.7Ki    agent_data_plane::run_inner::_{{closure}}::h8bc94950e259da2f
  [NEW] +67.2Ki  [NEW] +67.0Ki    agent_data_plane::cli::run::create_topology::_{{closure}}::h14b8f0e0f4d015c1
  [NEW] +64.9Ki  [NEW] +64.7Ki    saluki_core::topology::built::BuiltTopology::spawn::_{{closure}}::hc90d33c0f1c609c8
  [NEW] +57.6Ki  [NEW] +57.4Ki    agent_data_plane::cli::debug::handle_debug_command::_{{closure}}::h61f5b9bab11f1e45
  [NEW] +57.5Ki  [NEW] +57.4Ki    saluki_core::topology::blueprint::TopologyBlueprint::build::_{{closure}}::h5639eb098c3cf990
  [NEW] +49.6Ki  [NEW] +49.4Ki    _<saluki_components::transforms::apm_stats::ApmStats as saluki_core::components::transforms::Transform>::run::_{{closure}}::h01d949e514d06c59
  [NEW] +44.5Ki  [NEW] +44.3Ki    _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::h5bcbd42b1175098f
  [NEW] +41.0Ki  [NEW] +40.8Ki    _<saluki_components::forwarders::otlp::OtlpForwarder as saluki_core::components::forwarders::Forwarder>::run::_{{closure}}::h7b3976be055d813c
  +0.0% +9.32Ki  +0.1% +8.51Ki    [44089 Others]
  [DEL] -41.0Ki  [DEL] -40.8Ki    _<saluki_components::forwarders::otlp::OtlpForwarder as saluki_core::components::forwarders::Forwarder>::run::_{{closure}}::h1bb8d646569bc30e
  [DEL] -44.5Ki  [DEL] -44.3Ki    _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::h1baf53f0a27caf99
  [DEL] -49.6Ki  [DEL] -49.4Ki    _<saluki_components::transforms::apm_stats::ApmStats as saluki_core::components::transforms::Transform>::run::_{{closure}}::h9eaed642fde8076b
  [DEL] -57.5Ki  [DEL] -57.4Ki    saluki_core::topology::blueprint::TopologyBlueprint::build::_{{closure}}::h5bf8b481e91d3cf2
  [DEL] -57.6Ki  [DEL] -57.4Ki    agent_data_plane::cli::debug::handle_debug_command::_{{closure}}::h6f1710360d926a91
  [DEL] -64.9Ki  [DEL] -64.7Ki    saluki_core::topology::built::BuiltTopology::spawn::_{{closure}}::h2757e3b25873956f
  [DEL] -67.2Ki  [DEL] -67.0Ki    agent_data_plane::cli::run::create_topology::_{{closure}}::h264ca2385ebef6f2
  [DEL] -69.9Ki  [DEL] -69.7Ki    agent_data_plane::run_inner::_{{closure}}::hea2335eebab44bfd
  [DEL] -85.9Ki  [DEL] -85.7Ki    saluki_env::workload::providers::remote_agent::RemoteAgentWorkloadProvider::from_configuration::_{{closure}}::h9b1aa6330ab3f6f1
  [DEL]  -149Ki  [DEL]  -149Ki    agent_data_plane::cli::run::handle_run_command::_{{closure}}::hc4b614f155e62c55
  +0.0% +9.32Ki  +0.0% +8.51Ki    TOTAL

pr-commenter · 2026-04-30T20:37:31Z

Regression Detector (Agent Data Plane)

Regression Detector Results

Run ID: c581904c-e4ef-4e04-b3c4-5516d651a0d8

Baseline: bc61ae9
Comparison: 5020f9d
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	otlp_ingest_logs_5mb_cpu	% cpu utilization	+2.51	[-2.41, +7.44]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_logs_5mb_throughput	ingress throughput	-0.01	[-0.13, +0.11]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_logs_5mb_memory	memory utilization	-1.43	[-1.87, -0.98]	1	(metrics) (profiles) (logs)

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	dsd_uds_500mb_3k_contexts_throughput	ingress throughput	+3.13	[+3.00, +3.26]	1	(metrics) (profiles) (logs)
➖	dsd_uds_512kb_3k_contexts_cpu	% cpu utilization	+2.89	[-54.97, +60.75]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_logs_5mb_cpu	% cpu utilization	+2.51	[-2.41, +7.44]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_metrics_5mb_memory	memory utilization	+1.38	[+1.18, +1.59]	1	(metrics) (profiles) (logs)
➖	dsd_uds_1mb_3k_contexts_cpu	% cpu utilization	+1.37	[-51.75, +54.49]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_5mb_cpu	% cpu utilization	+1.18	[-0.84, +3.19]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_medium	memory utilization	+0.40	[+0.22, +0.57]	1	(metrics) (profiles) (logs)
➖	dsd_uds_500mb_3k_contexts_cpu	% cpu utilization	+0.37	[-0.93, +1.68]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_transform_5mb_cpu	% cpu utilization	+0.33	[-1.76, +2.41]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_low	memory utilization	+0.19	[+0.04, +0.35]	1	(metrics) (profiles) (logs)
➖	dsd_uds_10mb_3k_contexts_memory	memory utilization	+0.19	[+0.04, +0.34]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_filtering_5mb_throughput	ingress throughput	+0.18	[+0.11, +0.26]	1	(metrics) (profiles) (logs)
➖	dsd_uds_100mb_3k_contexts_memory	memory utilization	+0.14	[-0.01, +0.30]	1	(metrics) (profiles) (logs)
➖	dsd_uds_500mb_3k_contexts_memory	memory utilization	+0.11	[-0.03, +0.25]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_5mb_memory	memory utilization	+0.09	[-0.07, +0.25]	1	(metrics) (profiles) (logs)
➖	dsd_uds_512kb_3k_contexts_memory	memory utilization	+0.04	[-0.11, +0.18]	1	(metrics) (profiles) (logs)
➖	dsd_uds_100mb_3k_contexts_cpu	% cpu utilization	+0.01	[-5.79, +5.82]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_ultraheavy	memory utilization	+0.01	[-0.12, +0.14]	1	(metrics) (profiles) (logs)
➖	dsd_uds_100mb_3k_contexts_throughput	ingress throughput	+0.01	[-0.02, +0.04]	1	(metrics) (profiles) (logs)
➖	dsd_uds_1mb_3k_contexts_throughput	ingress throughput	+0.00	[-0.05, +0.06]	1	(metrics) (profiles) (logs)
➖	dsd_uds_512kb_3k_contexts_throughput	ingress throughput	-0.00	[-0.06, +0.05]	1	(metrics) (profiles) (logs)
➖	dsd_uds_1mb_3k_contexts_memory	memory utilization	-0.01	[-0.14, +0.13]	1	(metrics) (profiles) (logs)
➖	dsd_uds_10mb_3k_contexts_throughput	ingress throughput	-0.01	[-0.19, +0.17]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_logs_5mb_throughput	ingress throughput	-0.01	[-0.13, +0.11]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_metrics_5mb_throughput	ingress throughput	-0.01	[-0.18, +0.15]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_5mb_throughput	ingress throughput	-0.05	[-0.12, +0.02]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_transform_5mb_throughput	ingress throughput	-0.09	[-0.16, -0.01]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_transform_5mb_memory	memory utilization	-0.23	[-0.39, -0.08]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_heavy	memory utilization	-0.27	[-0.39, -0.15]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_filtering_5mb_memory	memory utilization	-0.29	[-0.53, -0.05]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_metrics_5mb_cpu	% cpu utilization	-0.35	[-6.48, +5.79]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_idle	memory utilization	-0.45	[-0.49, -0.41]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_logs_5mb_memory	memory utilization	-1.43	[-1.87, -0.98]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_filtering_5mb_cpu	% cpu utilization	-1.98	[-4.43, +0.46]	1	(metrics) (profiles) (logs)
➖	dsd_uds_10mb_3k_contexts_cpu	% cpu utilization	-2.18	[-31.36, +27.01]	1	(metrics) (profiles) (logs)

Bounds Checks: ✅ Passed

perf	experiment	bounds_check_name	replicates_passed	observed_value	links
✅	quality_gates_rss_dsd_heavy	memory_usage	10/10	124.02MiB ≤ 140MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_dsd_low	memory_usage	10/10	40.64MiB ≤ 50MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_dsd_medium	memory_usage	10/10	61.24MiB ≤ 75MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_dsd_ultraheavy	memory_usage	10/10	175.05MiB ≤ 200MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_idle	memory_usage	10/10	28.05MiB ≤ 40MiB	(metrics) (profiles) (logs)

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

thieman · 2026-05-06T19:12:34Z

 }

+/// Removes ANSI escape sequences (`ESC[...letter`) from a byte slice.
+fn strip_ansi_codes(input: &[u8]) -> Vec<u8> {


Driveby, cleans up the log artifacts produced by the correctness tests (not just kind, this was affecting all of them)

thieman · 2026-05-07T14:33:21Z

@@ -0,0 +1,18 @@
+/// Removes ANSI escape sequences (`ESC[...letter`) from a byte slice.
+pub(crate) fn strip_ansi_codes(input: &[u8]) -> Vec<u8> {


There are two copies of this function, the alternative would be to share it from airlock. Didn't seem like it made a ton of sense to do that since it's not directly in line with airlock's purpose, but can do that if you'd rather.

…etes coverage Adds a `runtime: kind` option to correctness test configs that runs test groups as multi-container Kubernetes pods inside a kind cluster rather than Docker containers. This unlocks testing of origin detection scenarios that require real Kubernetes metadata (pod UIDs, containerd container IDs, K8s labels, External Data injection) which are untestable in the plain Docker correctness framework. Introduces `dsd-plain-kind` as the initial kind-based test, verifying that the existing dsd-plain workload passes end-to-end through the kind path before adding origin-detection-specific tests on top. - Add `Runtime` enum to correctness config schema (`docker` default, `kind`) - Add `k8s.rs` correctness runner: namespace-per-group isolation, ConfigMap file injection with subPath mounts, multi-container pod with shared emptyDir at /airlock, socket-wait wrapper on millstone, kube-rs port-forward for data collection, namespace cascade cleanup - Dispatch in correctness runner based on `config.runtime` - Expose `runtime` field in `panoramic list --json` output - Update pipeline generator to emit kind jobs using `.test-correctness-kind-definition` mixin (same dynamic approach as Docker tests, no special-casing) - Add `.test-correctness-kind-definition` CI mixin: extends Docker mixin, adds kind install/cluster-create/image-load in script, cluster teardown in after_script - Add Makefile targets: kind-create-cluster, kind-delete-cluster, kind-load-images, test-correctness-kind, test-correctness-kind-case, check-kind-tools - Add workspace deps: kube 0.93 (client + rustls-tls + ws), k8s-openapi 0.22 - Add RUSTSEC-2025-0134 ignore (rustls-pemfile, transitive via kube, no upgrade path) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…gs; parallelize kind image loading - Remove Default impl from Runtime enum; runtime field is now required in all correctness test configs (deserialization fails fast on any missing value) - Add runtime: docker to all 11 existing correctness test configs - Parallelize docker pulls and kind loads in .test-correctness-kind-definition CI mixin using background jobs + wait, reducing image prep time Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Fix port-forward background task leak: accept loop now exits on CancellationToken; token is cancelled after data collection completes - Add MILLSTONE_EXIT_TIMEOUT (300s) to wait_for_millstone_exit; was unbounded before - Report both errors when both baseline and comparison groups fail simultaneously - Rename spawn_duration/spawn_pods -> run_duration/run_groups (phase covers full group lifecycle, not just pod creation) - Parallelize cleanup_namespace calls with tokio::join! - Move `use std::time::Instant` to top-level imports - Import TargetConfig at top level; remove fully-qualified paths in fn signatures - Fix create_config_map: add ConfigMap to top-level imports, remove redundant full-path qualification on the constructor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Download from the canonical GitHub releases URL (more trustworthy than kind.sigs.k8s.io) and verify SHA256 against the value hardcoded in this file before installing. The hardcoded checksum is what provides the security guarantee — downloading it from the same server at runtime would not help. Update KIND_SHA256 here whenever KIND_VERSION is bumped. Also add deny.toml ignores for two new hickory-proto advisories (RUSTSEC-2026-0118, RUSTSEC-2026-0119). These are pre-existing vulnerabilities in DNSSEC validation code that we do not exercise; fixing them requires updating hyper-hickory past 0.8.0 to lift the hickory 0.25.x pin, which is a separate concern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add .ci/install-kind.sh following the same pattern as install-docker-cli.sh. The script downloads kind from the canonical GitHub releases URL, verifies the SHA256 against a hardcoded value, and installs it. Update KIND_SHA256 in the script whenever KIND_VERSION is bumped. Baking kind into the image is preferable to downloading it at job runtime: no external network dependency per job, checksum pinned in version control rather than inline YAML, and the install is audited as part of image builds rather than every test run. Remove the download-and-verify step from .test-correctness-kind-definition since kind is now available in the image. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The script already handled TARGETARCH dispatch but used the same SHA256 for both architectures. Add the correct per-arch checksums so arm64 builds are verified correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge install-kind.sh into the same RUN layer as install-docker-cli.sh so clean-temporary-caches.sh applies to both in a single layer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Move all kind cluster setup and teardown out of the CI mixin and Makefile and into panoramic itself, so the runner is self-contained for kind tests. On startup, if any kind-runtime tests are selected, panoramic: - Checks whether the named cluster already exists (reuses it if so) - Creates it if not - Pulls all required images in parallel (best-effort; local-only images that fail the pull are still loaded from the local daemon) - Loads all images into the cluster in parallel After all tests complete, panoramic deletes the cluster unless --no-delete-cluster is passed (useful for local iteration). The kind cluster name defaults to "saluki-correctness" and can be overridden with --kind-cluster-name. CI mixin: remove cluster create/image pull/load/delete steps; the mixin now only extends the Docker definition with a longer timeout. Makefile: remove kind-create-cluster, kind-delete-cluster, kind-load-images, test-correctness-kind, test-correctness-kind-case targets. check-kind-tools is kept for install verification. Users run kind tests via the existing make test-correctness-case CASE=dsd-plain-kind or make test-correctness (which picks up all tests). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…-correctness target Label each panoramic-created namespace with created-by=panoramic-kind so orphaned namespaces (from a killed panoramic process) can be bulk-deleted. Add Makefile targets: clean-kind -- kubectl delete namespace -l created-by=panoramic-kind clean-correctness -- wraps clean-airlock + clean-kind Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…st on pull error Replace best-effort parallel pull with a per-image check: if the image is already in the local Docker daemon (docker image inspect succeeds), skip the pull. If it is absent, pull it — and if the pull fails, fail the entire test run immediately rather than continuing with a missing image that will cause kind load to fail with a less obvious error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

This reverts commit 7a8c5f2.

…runtime to kubernetes_in_docker Container logs: after each pod reaches Running, spawn background tasks that stream each container's logs (datadog-intake, target, millstone) to <log-dir>/correctness/<test>/<baseline|comparison>/<container>.log using kube-rs log_stream. Uses tokio-util compat bridge to connect futures::AsyncBufRead to tokio::io::copy. Rename: runtime: kind → runtime: kubernetes_in_docker for clarity. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… ANSI codes in logs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tracing-subscriber hardcodes with_ansi(true) in datadog-intake and millstone, and the Datadog Agent's own logger also emits color codes. NO_COLOR is not respected. Instead, strip ANSI escape sequences (ESC[...letter) from the log stream in panoramic before writing to disk, which works uniformly for all containers regardless of how they handle color. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Same fix as the kind log streaming path — containers emit colored tracing output regardless of whether stdout is a terminal, so strip before writing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ync licenses New test added by PR 1552 was missing the required runtime field. License file updated for new dependencies introduced by the rebase. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

hickory-proto updated to 0.26.1 on main, which resolved RUSTSEC-2026-0118 and RUSTSEC-2026-0119. Remove the now-stale ignore entries. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Previously, kind cluster setup (create cluster + load images) blocked all tests from starting. Now it runs as a background task so Docker-runtime tests start immediately. The wait is deferred to the kind test itself via a watch channel injected into TestContext. The runner waits for the channel BEFORE acquiring a concurrency slot so kind tests don't hold slots while the cluster is still being set up. By the time a kind test actually runs, the cluster is ready and the check in run_k8s_correctness_test is a fast-path no-op. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… tests The kind_ready wait was applied to all tests in the parallel runner, causing Docker tests to also block on cluster setup. Guard the wait with a runtime check so only kubernetes_in_docker tests wait. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Remove duplicate implementations from kind.rs and k8s.rs and share from crate::utils. airlock/src/driver.rs keeps its own copy as it's a separate crate. Also capture kind create cluster output through tracing (debug level) instead of letting it print raw to the terminal with emojis and ANSI codes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add TestEvent::StatusLine for infrastructure messages that bypass the test runner event flow. Move event channel creation before the kind setup background task so it can emit status lines that appear in both TUI and logging modes. kind.rs now emits StatusLine events for: - "Reusing existing kind cluster..." / "Creating kind cluster..." - "Pulling container images (if not already present)..." - "Loading images into kind cluster..." Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

1. Cancel-safe kind_ready wait: use tokio::select! on cancel.cancelled() in both run_parallel and run_fail_fast so Ctrl-C during cluster setup doesn't deadlock kind test futures. 2. run_fail_fast kind_ready wait: add the same pre-semaphore wait that run_parallel has so kind tests don't immediately fail with "did not complete" when run with --fail-fast. 3. Remove duplicate strip_ansi_codes from k8s.rs — kind.rs and k8s.rs now both import from crate::utils. 4. check_kind_installed now verifies the exit code, not just whether the binary exists. 5. Move `use crate::utils::strip_ansi_codes` to top of kind.rs imports. 6. Warn on malformed env vars (missing '=') in the k8s target env parsing instead of silently dropping them. 7. Warn when kind setup fails mid-way (cluster created but image load failed), so users know a dangling cluster may need manual cleanup. 8. Fix step comment numbering in run_group (was missing step 6). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The previous commit's Python replace didn't match because the function had a different internal comment style. Remove it properly this time. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

collect_kind_images was called on all discovered tests before the name filter was applied, so running '-t dsd-plain' would still trigger kind cluster setup because dsd-plain-kind exists in the test directory. Pass the name filter into collect_kind_images so only tests that will actually run are considered when deciding whether to start kind setup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… to 30s Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…atch::Receiver for KindReadyReceiver watch::Receiver<T> implements Clone, giving each task its own independent receiver with its own "last seen" mark. No shared lock needed — concurrent kind tests each wait on their own clone without serializing. Also fix stale comments: - KindReadyReceiver doc no longer references Arc<Mutex<_>> - Remove orphaned "Cancellation token for the test run." comment in main.rs - Update generate-correctness-pipeline.sh comment (kind tests are included, not excluded, in the dynamic pipeline) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

New test added by main was missing the required runtime field, breaking the correctness pipeline generator. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

## Human Summary Adds the ability to run `kind` (kubernetes-in-docker) correctness tests. The actual goal here is to add multiple tests covering origin detection which has a lot of Kubernetes-specific behavior we'd like to cover. This PR does not add origin detection tests yet, but it does add a `dsd-plain-kind` test which is just the existing `dsd-plain` correctness test but modified to run under kind. This test will be deleted when we add the origin detection tests. Example run including the kind test is provided immediately below. You'll need to [install kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installing-from-release-binaries) first: `cargo run --profile release --package panoramic -- run -d test/correctness -d test/integration/cases -t basic-startup,dsd-plain,dsd-plain-kind` ## Summary - Adds a `runtime: kubernetes_in_docker` option to correctness test configs that runs test groups as multi-container Kubernetes pods inside a kind cluster, unlocking origin detection testing scenarios that are impossible in the plain Docker framework (real pod UIDs, containerd container IDs, K8s labels, External Data injection via `DD_EXTERNAL_ENV`) - Introduces `dsd-plain-kind` as the initial kind-based test — verifies the existing dsd-plain workload passes end-to-end through the kind path as a baseline before origin-detection-specific tests are layered on top - CI integration follows the same dynamic pipeline approach as existing Docker tests: the pipeline generator emits kind jobs automatically when `runtime: kubernetes_in_docker` is detected, using a new `.test-correctness-kind-definition` mixin that extends the existing Docker mixin with a longer timeout - `runtime` is now a required field in all correctness configs (no default); all existing tests explicitly declare their runtime - Rebased onto main after PR #1552 (dynamic dispatch with Test trait and Runner) and latest main (May 7) ## What changed **New runtime path (`k8s.rs`):** - Integrates with PR 1552's `Test` trait architecture — `Config::run(tctx: TestContext)` dispatches to the kind path when `runtime: kubernetes_in_docker` - Each test group (baseline + comparison) runs as a multi-container pod in its own namespace, labelled `created-by=panoramic-kind` for orphan cleanup - `datadog-intake`, `target` (agent), and `millstone` share a pod with an emptyDir at `/airlock` for the UDS socket - Config files injected via ConfigMap with `subPath` mounts so the agent's `/etc/datadog-agent/` remains writable (needed for `auth_token` creation) - Millstone wrapped in a socket-wait shell command so it doesn't send before the agent is ready - After the pod reaches Running, background tasks stream each container's logs to `<log-dir>/<baseline|comparison>/<container>.log`; ANSI escape codes stripped via `crate::utils::strip_ansi_codes` (no duplicate implementations) - Data collected via kube-rs port-forward to datadog-intake port 2049; forward cancelled via `CancellationToken` after collection - `wait_for_millstone_exit` bounded by `MILLSTONE_EXIT_TIMEOUT` (300s) - Both baseline and comparison errors reported when both groups fail simultaneously - Malformed env vars (missing `=`) in target config emit `warn!` instead of being silently dropped **ANSI stripping (both runtimes):** - `airlock/src/driver.rs` strips ANSI codes from Docker container log output - `k8s.rs` and `kind.rs` share `strip_ansi_codes` from `crate::utils`; `airlock` keeps its own copy as a separate crate - `kind create cluster` output is captured and emitted through tracing at debug level; raw emoji/ANSI output suppressed **Kind cluster lifecycle (`kind.rs`, `main.rs`, `runner.rs`, `test.rs`):** - Kind cluster setup only runs when the selected test set includes at least one `kubernetes_in_docker` test; running `-t dsd-plain` never touches kind - Kind cluster setup runs as a background task — Docker-runtime tests start immediately without waiting - Kind tests wait for the cluster-ready signal **before acquiring a concurrency slot**, gated on `test.runtime() == "kubernetes_in_docker"` so Docker tests are completely unaffected; each task holds its own cloned `watch::Receiver` with independent "last seen" state — no mutex needed - The wait loop uses `tokio::select!` on the cancellation token so Ctrl-C during cluster setup doesn't deadlock kind test futures; both `run_parallel` and `run_fail_fast` perform this wait - `check_kind_installed` verifies the exit code in addition to binary presence - A warning is emitted when kind setup fails after cluster creation, alerting users to a potentially dangling cluster - Kind setup emits `TestEvent::StatusLine` messages visible in both TUI and logging modes - panoramic manages the full kind cluster lifecycle: creates if absent (reuses if present), pulls images only if not already in the local daemon (pull failure is fatal), loads images in parallel, deletes cluster after tests - `--no-delete-kind-cluster` keeps the cluster alive between runs (useful locally) - `--kind-cluster-name` overrides the default (`saluki-correctness`) - Flush wait changed from 32s to 30s (`FLUSH_WAIT: Duration`) **CI (`.gitlab/correctness-mixins.yml`, `generate-correctness-pipeline.sh`):** - `.test-correctness-kind-definition` extends `.test-correctness-definition` with a longer timeout — all cluster/image management is inside panoramic - Pipeline generator emits kind jobs via the kind mixin; mixin selection driven by `test.runtime()` from the `Test` trait **kind pre-installed in `SALUKI_BUILD_CI_IMAGE`:** - `.ci/install-kind.sh` installs kind in the same `RUN` layer as `install-docker-cli.sh`, with per-arch SHA256 checksums hardcoded in the script **Explicitness:** - `Runtime` enum has no default; all correctness configs have an explicit `runtime:` field (including `dsd-tag-filterlist` from PR 1552) **Local dev:** - `cargo run --profile release --package panoramic -- run -d test/correctness -t dsd-plain-kind --no-tui --no-delete-kind-cluster` - `make clean-kind` / `make clean-correctness` **Dependencies:** kube 0.93 (client + rustls-tls + ws), k8s-openapi 0.22, tokio-util compat; RUSTSEC-2025-0134 ignored (rustls-pemfile, transitive via kube) ## Test plan - [x] `dsd-plain-kind` passes locally - [x] Running `-t dsd-plain` (Docker test) does not trigger kind cluster setup - [x] Docker and kind tests run concurrently — Docker tests start immediately, kind tests wait without holding concurrency slots - [x] Ctrl-C during cluster setup doesn't deadlock kind test futures - [x] Kind setup progress visible in both TUI and logging modes via `TestEvent::StatusLine` - [x] Container logs are plain text without ANSI codes (both runtimes) - [x] All correctness tests discovered with explicit runtime fields (including from latest main) - [x] Branch rebased cleanly onto main post-PR-1552 - [ ] `SALUKI_BUILD_CI_IMAGE` rebuilt with kind (trigger `generate-build-ci-image` via Run Pipeline on this branch) - [ ] CI pipeline generates `test-correctness-dsd-plain-kind` job - [ ] Existing Docker correctness tests unaffected 🤖 Generated with [Claude Code](https://claude.ai/claude-code) Co-authored-by: travis.thieman <travis.thieman@datadoghq.com> d04bde8

thieman changed the title ~~feat(correctness): add kind-based correctness test runtime [DADP-58]~~ feat(panoramic): add kind-based correctness test runtime [DADP-58] Apr 30, 2026

thieman force-pushed the thieman/DADP-58-kind-correctness-tests branch from f9cdc46 to 6c248f2 Compare April 30, 2026 20:06

dd-octo-sts Bot removed area/core Core functionality, event model, etc. area/components Sources, transforms, and destinations. source/dogstatsd DogStatsD source. transform/host-enrichment Host Enrichment synchronous transform. labels Apr 30, 2026

thieman force-pushed the thieman/DADP-58-kind-correctness-tests branch from 567464a to 6186190 Compare May 4, 2026 19:44

thieman commented May 6, 2026

View reviewed changes

thieman commented May 7, 2026

View reviewed changes

thieman and others added 13 commits May 7, 2026 10:40

ci(panoramic): add arm64 checksum to install-kind.sh

9b4eb2d

The script already handled TARGETARCH dispatch but used the same SHA256 for both architectures. Add the correct per-arch checksums so arm64 builds are verified correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ci(panoramic): combine kind install into docker-cli RUN block

d94800f

Merge install-kind.sh into the same RUN layer as install-docker-cli.sh so clean-temporary-caches.sh applies to both in a single layer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore(panoramic): rename --no-delete-cluster to --no-delete-kind-cluster

fe6a018

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore(panoramic): capture kind load output; only surface it on failure

d9cb5e8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore(panoramic): improve kind image pull/load log messages

428c786

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

thieman and others added 17 commits May 7, 2026 10:40

chore: ignore .claude/settings.local.json; update Cargo.lock

2593aec

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Revert "chore: ignore .claude/settings.local.json; update Cargo.lock"

0fdaf7b

This reverts commit 7a8c5f2.

fix(panoramic): set NO_COLOR=1 on all kind pod containers to suppress…

d528b56

… ANSI codes in logs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(airlock): strip ANSI escape codes from Docker container log files

56c1dcc

Same fix as the kind log streaming path — containers emit colored tracing output regardless of whether stdout is a terminal, so strip before writing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore(panoramic): add runtime: docker to dsd-tag-filterlist config; s…

25123ae

…ync licenses New test added by PR 1552 was missing the required runtime field. License file updated for new dependencies introduced by the rebase. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: remove stale hickory deny.toml ignores

162e6f9

hickory-proto updated to 0.26.1 on main, which resolved RUSTSEC-2026-0118 and RUSTSEC-2026-0119. Remove the now-stale ignore entries. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(panoramic): actually remove strip_ansi_codes duplicate from k8s.rs

6169e1b

The previous commit's Python replace didn't match because the function had a different internal comment style. Remove it properly this time. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore(panoramic): change FLUSH_WAIT_SECS to FLUSH_WAIT: Duration; set…

f32b5ad

… to 30s Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

thieman force-pushed the thieman/DADP-58-kind-correctness-tests branch from 0fc4521 to 09e98cb Compare May 7, 2026 14:40

thieman marked this pull request as ready for review May 7, 2026 14:42

thieman requested a review from a team as a code owner May 7, 2026 14:42

thieman changed the title ~~feat(panoramic): add kind-based correctness test runtime [DADP-58]~~ feat(panoramic): add kind-based correctness test runtime May 7, 2026

chore(correctness): add runtime: docker to dsd-mapper-blocklist config

5020f9d

New test added by main was missing the required runtime field, breaking the correctness pipeline generator. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tobz approved these changes May 7, 2026

View reviewed changes

gh-worker-dd-devflow-36fce6 Bot added mergequeue-status: queued mergequeue-status: in_progress and removed mergequeue-status: queued labels May 7, 2026

gh-worker-dd-mergequeue-cf854d Bot merged commit d04bde8 into main May 7, 2026
65 of 66 checks passed

gh-worker-dd-devflow-36fce6 Bot added mergequeue-status: done and removed mergequeue-status: in_progress labels May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(panoramic): add kind-based correctness test runtime#1541

feat(panoramic): add kind-based correctness test runtime#1541
gh-worker-dd-mergequeue-cf854d[bot] merged 31 commits into
mainfrom
thieman/DADP-58-kind-correctness-tests

thieman commented Apr 30, 2026 •

edited

Loading

Uh oh!

pr-commenter Bot commented Apr 30, 2026 •

edited

Loading

Changes by Module

Detailed Symbol Changes

Uh oh!

pr-commenter Bot commented Apr 30, 2026 •

edited

Loading

Experiments ignored for regressions

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

Uh oh!

thieman May 6, 2026

Uh oh!

thieman May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,18 @@
		/// Removes ANSI escape sequences (`ESC[...letter`) from a byte slice.
		pub(crate) fn strip_ansi_codes(input: &[u8]) -> Vec<u8> {

Conversation

thieman commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Human Summary

Summary

What changed

Test plan

Uh oh!

pr-commenter Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Binary Size Analysis (Agent Data Plane)

Changes by Module

Detailed Symbol Changes

Uh oh!

pr-commenter Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression Detector (Agent Data Plane)

Regression Detector Results

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

Uh oh!

thieman May 6, 2026

Choose a reason for hiding this comment

Uh oh!

thieman May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thieman commented Apr 30, 2026 •

edited

Loading

pr-commenter Bot commented Apr 30, 2026 •

edited

Loading

pr-commenter Bot commented Apr 30, 2026 •

edited

Loading