Skip to content

feat(panoramic): add kind-based correctness test runtime#1541

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 31 commits into
mainfrom
thieman/DADP-58-kind-correctness-tests
May 7, 2026
Merged

feat(panoramic): add kind-based correctness test runtime#1541
gh-worker-dd-mergequeue-cf854d[bot] merged 31 commits into
mainfrom
thieman/DADP-58-kind-correctness-tests

Conversation

@thieman
Copy link
Copy Markdown
Contributor

@thieman thieman commented Apr 30, 2026

Human Summary

Adds the ability to run kind (kubernetes-in-docker) correctness tests. The actual goal here is to add multiple tests covering origin detection which has a lot of Kubernetes-specific behavior we'd like to cover. This PR does not add origin detection tests yet, but it does add a dsd-plain-kind test which is just the existing dsd-plain correctness test but modified to run under kind. This test will be deleted when we add the origin detection tests.

Example run including the kind test is provided immediately below. You'll need to install kind first:

cargo run --profile release --package panoramic -- run -d test/correctness -d test/integration/cases -t basic-startup,dsd-plain,dsd-plain-kind

Summary

  • Adds a runtime: kubernetes_in_docker option to correctness test configs that runs test groups as multi-container Kubernetes pods inside a kind cluster, unlocking origin detection testing scenarios that are impossible in the plain Docker framework (real pod UIDs, containerd container IDs, K8s labels, External Data injection via DD_EXTERNAL_ENV)
  • Introduces dsd-plain-kind as the initial kind-based test — verifies the existing dsd-plain workload passes end-to-end through the kind path as a baseline before origin-detection-specific tests are layered on top
  • CI integration follows the same dynamic pipeline approach as existing Docker tests: the pipeline generator emits kind jobs automatically when runtime: kubernetes_in_docker is detected, using a new .test-correctness-kind-definition mixin that extends the existing Docker mixin with a longer timeout
  • runtime is now a required field in all correctness configs (no default); all existing tests explicitly declare their runtime
  • Rebased onto main after PR chore(panoramic): dynamic dispatch with test trait and runner #1552 (dynamic dispatch with Test trait and Runner) and latest main (May 7)

What changed

New runtime path (k8s.rs):

  • Integrates with PR 1552's Test trait architecture — Config::run(tctx: TestContext) dispatches to the kind path when runtime: kubernetes_in_docker
  • Each test group (baseline + comparison) runs as a multi-container pod in its own namespace, labelled created-by=panoramic-kind for orphan cleanup
  • datadog-intake, target (agent), and millstone share a pod with an emptyDir at /airlock for the UDS socket
  • Config files injected via ConfigMap with subPath mounts so the agent's /etc/datadog-agent/ remains writable (needed for auth_token creation)
  • Millstone wrapped in a socket-wait shell command so it doesn't send before the agent is ready
  • After the pod reaches Running, background tasks stream each container's logs to <log-dir>/<baseline|comparison>/<container>.log; ANSI escape codes stripped via crate::utils::strip_ansi_codes (no duplicate implementations)
  • Data collected via kube-rs port-forward to datadog-intake port 2049; forward cancelled via CancellationToken after collection
  • wait_for_millstone_exit bounded by MILLSTONE_EXIT_TIMEOUT (300s)
  • Both baseline and comparison errors reported when both groups fail simultaneously
  • Malformed env vars (missing =) in target config emit warn! instead of being silently dropped

ANSI stripping (both runtimes):

  • airlock/src/driver.rs strips ANSI codes from Docker container log output
  • k8s.rs and kind.rs share strip_ansi_codes from crate::utils; airlock keeps its own copy as a separate crate
  • kind create cluster output is captured and emitted through tracing at debug level; raw emoji/ANSI output suppressed

Kind cluster lifecycle (kind.rs, main.rs, runner.rs, test.rs):

  • Kind cluster setup only runs when the selected test set includes at least one kubernetes_in_docker test; running -t dsd-plain never touches kind
  • Kind cluster setup runs as a background task — Docker-runtime tests start immediately without waiting
  • Kind tests wait for the cluster-ready signal before acquiring a concurrency slot, gated on test.runtime() == "kubernetes_in_docker" so Docker tests are completely unaffected; each task holds its own cloned watch::Receiver with independent "last seen" state — no mutex needed
  • The wait loop uses tokio::select! on the cancellation token so Ctrl-C during cluster setup doesn't deadlock kind test futures; both run_parallel and run_fail_fast perform this wait
  • check_kind_installed verifies the exit code in addition to binary presence
  • A warning is emitted when kind setup fails after cluster creation, alerting users to a potentially dangling cluster
  • Kind setup emits TestEvent::StatusLine messages visible in both TUI and logging modes
  • panoramic manages the full kind cluster lifecycle: creates if absent (reuses if present), pulls images only if not already in the local daemon (pull failure is fatal), loads images in parallel, deletes cluster after tests
  • --no-delete-kind-cluster keeps the cluster alive between runs (useful locally)
  • --kind-cluster-name overrides the default (saluki-correctness)
  • Flush wait changed from 32s to 30s (FLUSH_WAIT: Duration)

CI (.gitlab/correctness-mixins.yml, generate-correctness-pipeline.sh):

  • .test-correctness-kind-definition extends .test-correctness-definition with a longer timeout — all cluster/image management is inside panoramic
  • Pipeline generator emits kind jobs via the kind mixin; mixin selection driven by test.runtime() from the Test trait

kind pre-installed in SALUKI_BUILD_CI_IMAGE:

  • .ci/install-kind.sh installs kind in the same RUN layer as install-docker-cli.sh, with per-arch SHA256 checksums hardcoded in the script

Explicitness:

  • Runtime enum has no default; all correctness configs have an explicit runtime: field (including dsd-tag-filterlist from PR 1552)

Local dev:

  • cargo run --profile release --package panoramic -- run -d test/correctness -t dsd-plain-kind --no-tui --no-delete-kind-cluster
  • make clean-kind / make clean-correctness

Dependencies: kube 0.93 (client + rustls-tls + ws), k8s-openapi 0.22, tokio-util compat; RUSTSEC-2025-0134 ignored (rustls-pemfile, transitive via kube)

Test plan

  • dsd-plain-kind passes locally
  • Running -t dsd-plain (Docker test) does not trigger kind cluster setup
  • Docker and kind tests run concurrently — Docker tests start immediately, kind tests wait without holding concurrency slots
  • Ctrl-C during cluster setup doesn't deadlock kind test futures
  • Kind setup progress visible in both TUI and logging modes via TestEvent::StatusLine
  • Container logs are plain text without ANSI codes (both runtimes)
  • All correctness tests discovered with explicit runtime fields (including from latest main)
  • Branch rebased cleanly onto main post-PR-1552
  • SALUKI_BUILD_CI_IMAGE rebuilt with kind (trigger generate-build-ci-image via Run Pipeline on this branch)
  • CI pipeline generates test-correctness-dsd-plain-kind job
  • Existing Docker correctness tests unaffected

🤖 Generated with Claude Code

@dd-octo-sts dd-octo-sts Bot added area/core Core functionality, event model, etc. area/components Sources, transforms, and destinations. area/ci CI/CD, automated testing, etc. source/dogstatsd DogStatsD source. transform/host-enrichment Host Enrichment synchronous transform. area/test All things testing: unit/integration, correctness, SMP regression, etc. labels Apr 30, 2026
@thieman thieman changed the title feat(correctness): add kind-based correctness test runtime [DADP-58] feat(panoramic): add kind-based correctness test runtime [DADP-58] Apr 30, 2026
@thieman thieman force-pushed the thieman/DADP-58-kind-correctness-tests branch from f9cdc46 to 6c248f2 Compare April 30, 2026 20:06
@dd-octo-sts dd-octo-sts Bot removed area/core Core functionality, event model, etc. area/components Sources, transforms, and destinations. source/dogstatsd DogStatsD source. transform/host-enrichment Host Enrichment synchronous transform. labels Apr 30, 2026
@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Apr 30, 2026

Binary Size Analysis (Agent Data Plane)

Target: bc61ae9 (baseline) vs 5020f9d (comparison) diff
Analysis Type: Stripped binaries (debug symbols excluded)
Baseline Size: 37.33 MiB
Comparison Size: 37.34 MiB
Size Change: +9.32 KiB (+0.02%)
Pass/Fail Threshold: +5%
Result: PASSED ✅

Changes by Module

Module File Size Symbols
figment +11.51 KiB 561
core -1.93 KiB 13573
serde_core -1.88 KiB 752
anon.583fff1769e839a4ac4c3817980336c2.88.llvm.3366751254004393234 -1.69 KiB 1
anon.026f154fc78761c6625909119fdb6b08.339.llvm.779975656443548521 +1.69 KiB 1
anon.edc88c1f77f7fc54f3491f6dbe3b9816.26.llvm.8772311085534667432 -1.66 KiB 1
anon.dd4568488db98f1f5e36dc582ec46b89.1300.llvm.598376170750614907 +1.66 KiB 1
hickory_proto +1.45 KiB 508
anon.65bdbfe4458a3f8f9b0946c368867cf8.50.llvm.4968226963744631081 -1.41 KiB 1
anon.9c41cca0aee603b5f5a665dc9e65f22f.127.llvm.13310154803454401535 +1.41 KiB 1
anon.0b516339eca47b86e19d894c06a552f4.490.llvm.12175845880619374489 +1.25 KiB 1
anon.edc88c1f77f7fc54f3491f6dbe3b9816.29.llvm.8772311085534667432 -1.25 KiB 1
anon.01e3c670ef17ae5fac81a933540fa1b7.119.llvm.7463965485508184397 +1.24 KiB 1
anon.cf0916cd32a442a1e4bd06931e9a86cd.274.llvm.16793965728278154532 -1.24 KiB 1
anon.0e1016b7c3f08cf2959bb8871559dd7c.790.llvm.1411995310717264458 +1.22 KiB 1
anon.1655e58d66d1ef1230538706b7bd06b2.59.llvm.13790112841392064238 -1.22 KiB 1
anon.1fe68cbddbd7be3d3efdb1444e7a5632.0.llvm.8454770004358052377 +1.14 KiB 1
anon.c51fd3990c6b999fc93e5ff25d808401.126.llvm.8984495447445464818 -1.14 KiB 1
tower_layer -1.11 KiB 17
anon.56e6ba155cb7b0d0983a1359a1e3d756.33.llvm.16671882612047019062 -1.08 KiB 1

Detailed Symbol Changes

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [NEW]  +149Ki  [NEW]  +149Ki    agent_data_plane::cli::run::handle_run_command::_{{closure}}::h6d7b3b1e4cf1ec3e
  [NEW] +85.9Ki  [NEW] +85.7Ki    saluki_env::workload::providers::remote_agent::RemoteAgentWorkloadProvider::from_configuration::_{{closure}}::hee3561fa4c60372a
  [NEW] +69.9Ki  [NEW] +69.7Ki    agent_data_plane::run_inner::_{{closure}}::h8bc94950e259da2f
  [NEW] +67.2Ki  [NEW] +67.0Ki    agent_data_plane::cli::run::create_topology::_{{closure}}::h14b8f0e0f4d015c1
  [NEW] +64.9Ki  [NEW] +64.7Ki    saluki_core::topology::built::BuiltTopology::spawn::_{{closure}}::hc90d33c0f1c609c8
  [NEW] +57.6Ki  [NEW] +57.4Ki    agent_data_plane::cli::debug::handle_debug_command::_{{closure}}::h61f5b9bab11f1e45
  [NEW] +57.5Ki  [NEW] +57.4Ki    saluki_core::topology::blueprint::TopologyBlueprint::build::_{{closure}}::h5639eb098c3cf990
  [NEW] +49.6Ki  [NEW] +49.4Ki    _<saluki_components::transforms::apm_stats::ApmStats as saluki_core::components::transforms::Transform>::run::_{{closure}}::h01d949e514d06c59
  [NEW] +44.5Ki  [NEW] +44.3Ki    _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::h5bcbd42b1175098f
  [NEW] +41.0Ki  [NEW] +40.8Ki    _<saluki_components::forwarders::otlp::OtlpForwarder as saluki_core::components::forwarders::Forwarder>::run::_{{closure}}::h7b3976be055d813c
  +0.0% +9.32Ki  +0.1% +8.51Ki    [44089 Others]
  [DEL] -41.0Ki  [DEL] -40.8Ki    _<saluki_components::forwarders::otlp::OtlpForwarder as saluki_core::components::forwarders::Forwarder>::run::_{{closure}}::h1bb8d646569bc30e
  [DEL] -44.5Ki  [DEL] -44.3Ki    _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::h1baf53f0a27caf99
  [DEL] -49.6Ki  [DEL] -49.4Ki    _<saluki_components::transforms::apm_stats::ApmStats as saluki_core::components::transforms::Transform>::run::_{{closure}}::h9eaed642fde8076b
  [DEL] -57.5Ki  [DEL] -57.4Ki    saluki_core::topology::blueprint::TopologyBlueprint::build::_{{closure}}::h5bf8b481e91d3cf2
  [DEL] -57.6Ki  [DEL] -57.4Ki    agent_data_plane::cli::debug::handle_debug_command::_{{closure}}::h6f1710360d926a91
  [DEL] -64.9Ki  [DEL] -64.7Ki    saluki_core::topology::built::BuiltTopology::spawn::_{{closure}}::h2757e3b25873956f
  [DEL] -67.2Ki  [DEL] -67.0Ki    agent_data_plane::cli::run::create_topology::_{{closure}}::h264ca2385ebef6f2
  [DEL] -69.9Ki  [DEL] -69.7Ki    agent_data_plane::run_inner::_{{closure}}::hea2335eebab44bfd
  [DEL] -85.9Ki  [DEL] -85.7Ki    saluki_env::workload::providers::remote_agent::RemoteAgentWorkloadProvider::from_configuration::_{{closure}}::h9b1aa6330ab3f6f1
  [DEL]  -149Ki  [DEL]  -149Ki    agent_data_plane::cli::run::handle_run_command::_{{closure}}::hc4b614f155e62c55
  +0.0% +9.32Ki  +0.0% +8.51Ki    TOTAL

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Apr 30, 2026

Regression Detector (Agent Data Plane)

Regression Detector Results

Run ID: c581904c-e4ef-4e04-b3c4-5516d651a0d8

Baseline: bc61ae9
Comparison: 5020f9d
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf experiment goal Δ mean % Δ mean % CI trials links
otlp_ingest_logs_5mb_cpu % cpu utilization +2.51 [-2.41, +7.44] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_throughput ingress throughput -0.01 [-0.13, +0.11] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_memory memory utilization -1.43 [-1.87, -0.98] 1 (metrics) (profiles) (logs)

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI trials links
dsd_uds_500mb_3k_contexts_throughput ingress throughput +3.13 [+3.00, +3.26] 1 (metrics) (profiles) (logs)
dsd_uds_512kb_3k_contexts_cpu % cpu utilization +2.89 [-54.97, +60.75] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_cpu % cpu utilization +2.51 [-2.41, +7.44] 1 (metrics) (profiles) (logs)
otlp_ingest_metrics_5mb_memory memory utilization +1.38 [+1.18, +1.59] 1 (metrics) (profiles) (logs)
dsd_uds_1mb_3k_contexts_cpu % cpu utilization +1.37 [-51.75, +54.49] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_5mb_cpu % cpu utilization +1.18 [-0.84, +3.19] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_medium memory utilization +0.40 [+0.22, +0.57] 1 (metrics) (profiles) (logs)
dsd_uds_500mb_3k_contexts_cpu % cpu utilization +0.37 [-0.93, +1.68] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_transform_5mb_cpu % cpu utilization +0.33 [-1.76, +2.41] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_low memory utilization +0.19 [+0.04, +0.35] 1 (metrics) (profiles) (logs)
dsd_uds_10mb_3k_contexts_memory memory utilization +0.19 [+0.04, +0.34] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_filtering_5mb_throughput ingress throughput +0.18 [+0.11, +0.26] 1 (metrics) (profiles) (logs)
dsd_uds_100mb_3k_contexts_memory memory utilization +0.14 [-0.01, +0.30] 1 (metrics) (profiles) (logs)
dsd_uds_500mb_3k_contexts_memory memory utilization +0.11 [-0.03, +0.25] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_5mb_memory memory utilization +0.09 [-0.07, +0.25] 1 (metrics) (profiles) (logs)
dsd_uds_512kb_3k_contexts_memory memory utilization +0.04 [-0.11, +0.18] 1 (metrics) (profiles) (logs)
dsd_uds_100mb_3k_contexts_cpu % cpu utilization +0.01 [-5.79, +5.82] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_ultraheavy memory utilization +0.01 [-0.12, +0.14] 1 (metrics) (profiles) (logs)
dsd_uds_100mb_3k_contexts_throughput ingress throughput +0.01 [-0.02, +0.04] 1 (metrics) (profiles) (logs)
dsd_uds_1mb_3k_contexts_throughput ingress throughput +0.00 [-0.05, +0.06] 1 (metrics) (profiles) (logs)
dsd_uds_512kb_3k_contexts_throughput ingress throughput -0.00 [-0.06, +0.05] 1 (metrics) (profiles) (logs)
dsd_uds_1mb_3k_contexts_memory memory utilization -0.01 [-0.14, +0.13] 1 (metrics) (profiles) (logs)
dsd_uds_10mb_3k_contexts_throughput ingress throughput -0.01 [-0.19, +0.17] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_throughput ingress throughput -0.01 [-0.13, +0.11] 1 (metrics) (profiles) (logs)
otlp_ingest_metrics_5mb_throughput ingress throughput -0.01 [-0.18, +0.15] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_5mb_throughput ingress throughput -0.05 [-0.12, +0.02] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_transform_5mb_throughput ingress throughput -0.09 [-0.16, -0.01] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_transform_5mb_memory memory utilization -0.23 [-0.39, -0.08] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_heavy memory utilization -0.27 [-0.39, -0.15] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_filtering_5mb_memory memory utilization -0.29 [-0.53, -0.05] 1 (metrics) (profiles) (logs)
otlp_ingest_metrics_5mb_cpu % cpu utilization -0.35 [-6.48, +5.79] 1 (metrics) (profiles) (logs)
quality_gates_rss_idle memory utilization -0.45 [-0.49, -0.41] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_memory memory utilization -1.43 [-1.87, -0.98] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_filtering_5mb_cpu % cpu utilization -1.98 [-4.43, +0.46] 1 (metrics) (profiles) (logs)
dsd_uds_10mb_3k_contexts_cpu % cpu utilization -2.18 [-31.36, +27.01] 1 (metrics) (profiles) (logs)

Bounds Checks: ✅ Passed

perf experiment bounds_check_name replicates_passed observed_value links
quality_gates_rss_dsd_heavy memory_usage 10/10 124.02MiB ≤ 140MiB (metrics) (profiles) (logs)
quality_gates_rss_dsd_low memory_usage 10/10 40.64MiB ≤ 50MiB (metrics) (profiles) (logs)
quality_gates_rss_dsd_medium memory_usage 10/10 61.24MiB ≤ 75MiB (metrics) (profiles) (logs)
quality_gates_rss_dsd_ultraheavy memory_usage 10/10 175.05MiB ≤ 200MiB (metrics) (profiles) (logs)
quality_gates_rss_idle memory_usage 10/10 28.05MiB ≤ 40MiB (metrics) (profiles) (logs)

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

@thieman thieman force-pushed the thieman/DADP-58-kind-correctness-tests branch from 567464a to 6186190 Compare May 4, 2026 19:44
}

/// Removes ANSI escape sequences (`ESC[...letter`) from a byte slice.
fn strip_ansi_codes(input: &[u8]) -> Vec<u8> {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Driveby, cleans up the log artifacts produced by the correctness tests (not just kind, this was affecting all of them)

@@ -0,0 +1,18 @@
/// Removes ANSI escape sequences (`ESC[...letter`) from a byte slice.
pub(crate) fn strip_ansi_codes(input: &[u8]) -> Vec<u8> {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two copies of this function, the alternative would be to share it from airlock. Didn't seem like it made a ton of sense to do that since it's not directly in line with airlock's purpose, but can do that if you'd rather.

thieman and others added 13 commits May 7, 2026 10:40
…etes coverage

Adds a `runtime: kind` option to correctness test configs that runs test groups
as multi-container Kubernetes pods inside a kind cluster rather than Docker containers.
This unlocks testing of origin detection scenarios that require real Kubernetes
metadata (pod UIDs, containerd container IDs, K8s labels, External Data injection)
which are untestable in the plain Docker correctness framework.

Introduces `dsd-plain-kind` as the initial kind-based test, verifying that the
existing dsd-plain workload passes end-to-end through the kind path before adding
origin-detection-specific tests on top.

- Add `Runtime` enum to correctness config schema (`docker` default, `kind`)
- Add `k8s.rs` correctness runner: namespace-per-group isolation, ConfigMap file
  injection with subPath mounts, multi-container pod with shared emptyDir at
  /airlock, socket-wait wrapper on millstone, kube-rs port-forward for data
  collection, namespace cascade cleanup
- Dispatch in correctness runner based on `config.runtime`
- Expose `runtime` field in `panoramic list --json` output
- Update pipeline generator to emit kind jobs using `.test-correctness-kind-definition`
  mixin (same dynamic approach as Docker tests, no special-casing)
- Add `.test-correctness-kind-definition` CI mixin: extends Docker mixin, adds kind
  install/cluster-create/image-load in script, cluster teardown in after_script
- Add Makefile targets: kind-create-cluster, kind-delete-cluster, kind-load-images,
  test-correctness-kind, test-correctness-kind-case, check-kind-tools
- Add workspace deps: kube 0.93 (client + rustls-tls + ws), k8s-openapi 0.22
- Add RUSTSEC-2025-0134 ignore (rustls-pemfile, transitive via kube, no upgrade path)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…gs; parallelize kind image loading

- Remove Default impl from Runtime enum; runtime field is now required in all
  correctness test configs (deserialization fails fast on any missing value)
- Add runtime: docker to all 11 existing correctness test configs
- Parallelize docker pulls and kind loads in .test-correctness-kind-definition
  CI mixin using background jobs + wait, reducing image prep time

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix port-forward background task leak: accept loop now exits on CancellationToken;
  token is cancelled after data collection completes
- Add MILLSTONE_EXIT_TIMEOUT (300s) to wait_for_millstone_exit; was unbounded before
- Report both errors when both baseline and comparison groups fail simultaneously
- Rename spawn_duration/spawn_pods -> run_duration/run_groups (phase covers full group
  lifecycle, not just pod creation)
- Parallelize cleanup_namespace calls with tokio::join!
- Move `use std::time::Instant` to top-level imports
- Import TargetConfig at top level; remove fully-qualified paths in fn signatures
- Fix create_config_map: add ConfigMap to top-level imports, remove redundant
  full-path qualification on the constructor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Download from the canonical GitHub releases URL (more trustworthy than
kind.sigs.k8s.io) and verify SHA256 against the value hardcoded in this
file before installing. The hardcoded checksum is what provides the
security guarantee — downloading it from the same server at runtime
would not help.

Update KIND_SHA256 here whenever KIND_VERSION is bumped.

Also add deny.toml ignores for two new hickory-proto advisories
(RUSTSEC-2026-0118, RUSTSEC-2026-0119). These are pre-existing
vulnerabilities in DNSSEC validation code that we do not exercise;
fixing them requires updating hyper-hickory past 0.8.0 to lift the
hickory 0.25.x pin, which is a separate concern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add .ci/install-kind.sh following the same pattern as install-docker-cli.sh.
The script downloads kind from the canonical GitHub releases URL, verifies the
SHA256 against a hardcoded value, and installs it. Update KIND_SHA256 in the
script whenever KIND_VERSION is bumped.

Baking kind into the image is preferable to downloading it at job runtime:
no external network dependency per job, checksum pinned in version control
rather than inline YAML, and the install is audited as part of image builds
rather than every test run.

Remove the download-and-verify step from .test-correctness-kind-definition
since kind is now available in the image.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The script already handled TARGETARCH dispatch but used the same SHA256
for both architectures. Add the correct per-arch checksums so arm64
builds are verified correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Merge install-kind.sh into the same RUN layer as install-docker-cli.sh
so clean-temporary-caches.sh applies to both in a single layer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move all kind cluster setup and teardown out of the CI mixin and Makefile
and into panoramic itself, so the runner is self-contained for kind tests.

On startup, if any kind-runtime tests are selected, panoramic:
  - Checks whether the named cluster already exists (reuses it if so)
  - Creates it if not
  - Pulls all required images in parallel (best-effort; local-only images
    that fail the pull are still loaded from the local daemon)
  - Loads all images into the cluster in parallel

After all tests complete, panoramic deletes the cluster unless
--no-delete-cluster is passed (useful for local iteration).

The kind cluster name defaults to "saluki-correctness" and can be
overridden with --kind-cluster-name.

CI mixin: remove cluster create/image pull/load/delete steps; the mixin
now only extends the Docker definition with a longer timeout.

Makefile: remove kind-create-cluster, kind-delete-cluster,
kind-load-images, test-correctness-kind, test-correctness-kind-case
targets. check-kind-tools is kept for install verification. Users run
kind tests via the existing make test-correctness-case CASE=dsd-plain-kind
or make test-correctness (which picks up all tests).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-correctness target

Label each panoramic-created namespace with created-by=panoramic-kind so
orphaned namespaces (from a killed panoramic process) can be bulk-deleted.

Add Makefile targets:
  clean-kind        -- kubectl delete namespace -l created-by=panoramic-kind
  clean-correctness -- wraps clean-airlock + clean-kind

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…st on pull error

Replace best-effort parallel pull with a per-image check: if the image is
already in the local Docker daemon (docker image inspect succeeds), skip the
pull. If it is absent, pull it — and if the pull fails, fail the entire test
run immediately rather than continuing with a missing image that will cause
kind load to fail with a less obvious error.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
thieman and others added 17 commits May 7, 2026 10:40
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…runtime to kubernetes_in_docker

Container logs: after each pod reaches Running, spawn background tasks that
stream each container's logs (datadog-intake, target, millstone) to
<log-dir>/correctness/<test>/<baseline|comparison>/<container>.log using
kube-rs log_stream. Uses tokio-util compat bridge to connect futures::AsyncBufRead
to tokio::io::copy.

Rename: runtime: kind → runtime: kubernetes_in_docker for clarity.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… ANSI codes in logs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
tracing-subscriber hardcodes with_ansi(true) in datadog-intake and millstone,
and the Datadog Agent's own logger also emits color codes. NO_COLOR is not
respected. Instead, strip ANSI escape sequences (ESC[...letter) from the log
stream in panoramic before writing to disk, which works uniformly for all
containers regardless of how they handle color.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Same fix as the kind log streaming path — containers emit colored tracing
output regardless of whether stdout is a terminal, so strip before writing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ync licenses

New test added by PR 1552 was missing the required runtime field.
License file updated for new dependencies introduced by the rebase.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
hickory-proto updated to 0.26.1 on main, which resolved RUSTSEC-2026-0118
and RUSTSEC-2026-0119. Remove the now-stale ignore entries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously, kind cluster setup (create cluster + load images) blocked all
tests from starting. Now it runs as a background task so Docker-runtime
tests start immediately.

The wait is deferred to the kind test itself via a watch channel injected
into TestContext. The runner waits for the channel BEFORE acquiring a
concurrency slot so kind tests don't hold slots while the cluster is still
being set up. By the time a kind test actually runs, the cluster is ready
and the check in run_k8s_correctness_test is a fast-path no-op.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… tests

The kind_ready wait was applied to all tests in the parallel runner, causing
Docker tests to also block on cluster setup. Guard the wait with a runtime
check so only kubernetes_in_docker tests wait.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove duplicate implementations from kind.rs and k8s.rs and share from
crate::utils. airlock/src/driver.rs keeps its own copy as it's a separate crate.

Also capture kind create cluster output through tracing (debug level) instead
of letting it print raw to the terminal with emojis and ANSI codes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add TestEvent::StatusLine for infrastructure messages that bypass the
test runner event flow. Move event channel creation before the kind setup
background task so it can emit status lines that appear in both TUI and
logging modes.

kind.rs now emits StatusLine events for:
  - "Reusing existing kind cluster..." / "Creating kind cluster..."
  - "Pulling container images (if not already present)..."
  - "Loading images into kind cluster..."

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. Cancel-safe kind_ready wait: use tokio::select! on cancel.cancelled() in
   both run_parallel and run_fail_fast so Ctrl-C during cluster setup doesn't
   deadlock kind test futures.

2. run_fail_fast kind_ready wait: add the same pre-semaphore wait that
   run_parallel has so kind tests don't immediately fail with "did not
   complete" when run with --fail-fast.

3. Remove duplicate strip_ansi_codes from k8s.rs — kind.rs and k8s.rs now
   both import from crate::utils.

4. check_kind_installed now verifies the exit code, not just whether the
   binary exists.

5. Move `use crate::utils::strip_ansi_codes` to top of kind.rs imports.

6. Warn on malformed env vars (missing '=') in the k8s target env parsing
   instead of silently dropping them.

7. Warn when kind setup fails mid-way (cluster created but image load
   failed), so users know a dangling cluster may need manual cleanup.

8. Fix step comment numbering in run_group (was missing step 6).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous commit's Python replace didn't match because the function
had a different internal comment style. Remove it properly this time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
collect_kind_images was called on all discovered tests before the name
filter was applied, so running '-t dsd-plain' would still trigger kind
cluster setup because dsd-plain-kind exists in the test directory.

Pass the name filter into collect_kind_images so only tests that will
actually run are considered when deciding whether to start kind setup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… to 30s

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…atch::Receiver for KindReadyReceiver

watch::Receiver<T> implements Clone, giving each task its own independent
receiver with its own "last seen" mark. No shared lock needed — concurrent
kind tests each wait on their own clone without serializing.

Also fix stale comments:
- KindReadyReceiver doc no longer references Arc<Mutex<_>>
- Remove orphaned "Cancellation token for the test run." comment in main.rs
- Update generate-correctness-pipeline.sh comment (kind tests are included,
  not excluded, in the dynamic pipeline)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@thieman thieman force-pushed the thieman/DADP-58-kind-correctness-tests branch from 0fc4521 to 09e98cb Compare May 7, 2026 14:40
@thieman thieman marked this pull request as ready for review May 7, 2026 14:42
@thieman thieman requested a review from a team as a code owner May 7, 2026 14:42
@thieman thieman changed the title feat(panoramic): add kind-based correctness test runtime [DADP-58] feat(panoramic): add kind-based correctness test runtime May 7, 2026
New test added by main was missing the required runtime field, breaking
the correctness pipeline generator.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit d04bde8 into main May 7, 2026
65 of 66 checks passed
dd-octo-sts Bot pushed a commit that referenced this pull request May 7, 2026
## Human Summary

Adds the ability to run `kind` (kubernetes-in-docker) correctness tests. The actual goal here is to add multiple tests covering origin detection which has a lot of Kubernetes-specific behavior we'd like to cover. This PR does not add origin detection tests yet, but it does add a `dsd-plain-kind` test which is just the existing `dsd-plain` correctness test but modified to run under kind. This test will be deleted when we add the origin detection tests.

Example run including the kind test is provided immediately below. You'll need to [install kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installing-from-release-binaries) first:

`cargo run --profile release --package panoramic -- run -d test/correctness -d test/integration/cases -t basic-startup,dsd-plain,dsd-plain-kind`

## Summary

- Adds a `runtime: kubernetes_in_docker` option to correctness test configs that runs test groups as multi-container Kubernetes pods inside a kind cluster, unlocking origin detection testing scenarios that are impossible in the plain Docker framework (real pod UIDs, containerd container IDs, K8s labels, External Data injection via `DD_EXTERNAL_ENV`)
- Introduces `dsd-plain-kind` as the initial kind-based test — verifies the existing dsd-plain workload passes end-to-end through the kind path as a baseline before origin-detection-specific tests are layered on top
- CI integration follows the same dynamic pipeline approach as existing Docker tests: the pipeline generator emits kind jobs automatically when `runtime: kubernetes_in_docker` is detected, using a new `.test-correctness-kind-definition` mixin that extends the existing Docker mixin with a longer timeout
- `runtime` is now a required field in all correctness configs (no default); all existing tests explicitly declare their runtime
- Rebased onto main after PR #1552 (dynamic dispatch with Test trait and Runner) and latest main (May 7)

## What changed

**New runtime path (`k8s.rs`):**
- Integrates with PR 1552's `Test` trait architecture — `Config::run(tctx: TestContext)` dispatches to the kind path when `runtime: kubernetes_in_docker`
- Each test group (baseline + comparison) runs as a multi-container pod in its own namespace, labelled `created-by=panoramic-kind` for orphan cleanup
- `datadog-intake`, `target` (agent), and `millstone` share a pod with an emptyDir at `/airlock` for the UDS socket
- Config files injected via ConfigMap with `subPath` mounts so the agent's `/etc/datadog-agent/` remains writable (needed for `auth_token` creation)
- Millstone wrapped in a socket-wait shell command so it doesn't send before the agent is ready
- After the pod reaches Running, background tasks stream each container's logs to `<log-dir>/<baseline|comparison>/<container>.log`; ANSI escape codes stripped via `crate::utils::strip_ansi_codes` (no duplicate implementations)
- Data collected via kube-rs port-forward to datadog-intake port 2049; forward cancelled via `CancellationToken` after collection
- `wait_for_millstone_exit` bounded by `MILLSTONE_EXIT_TIMEOUT` (300s)
- Both baseline and comparison errors reported when both groups fail simultaneously
- Malformed env vars (missing `=`) in target config emit `warn!` instead of being silently dropped

**ANSI stripping (both runtimes):**
- `airlock/src/driver.rs` strips ANSI codes from Docker container log output
- `k8s.rs` and `kind.rs` share `strip_ansi_codes` from `crate::utils`; `airlock` keeps its own copy as a separate crate
- `kind create cluster` output is captured and emitted through tracing at debug level; raw emoji/ANSI output suppressed

**Kind cluster lifecycle (`kind.rs`, `main.rs`, `runner.rs`, `test.rs`):**
- Kind cluster setup only runs when the selected test set includes at least one `kubernetes_in_docker` test; running `-t dsd-plain` never touches kind
- Kind cluster setup runs as a background task — Docker-runtime tests start immediately without waiting
- Kind tests wait for the cluster-ready signal **before acquiring a concurrency slot**, gated on `test.runtime() == "kubernetes_in_docker"` so Docker tests are completely unaffected; each task holds its own cloned `watch::Receiver` with independent "last seen" state — no mutex needed
- The wait loop uses `tokio::select!` on the cancellation token so Ctrl-C during cluster setup doesn't deadlock kind test futures; both `run_parallel` and `run_fail_fast` perform this wait
- `check_kind_installed` verifies the exit code in addition to binary presence
- A warning is emitted when kind setup fails after cluster creation, alerting users to a potentially dangling cluster
- Kind setup emits `TestEvent::StatusLine` messages visible in both TUI and logging modes
- panoramic manages the full kind cluster lifecycle: creates if absent (reuses if present), pulls images only if not already in the local daemon (pull failure is fatal), loads images in parallel, deletes cluster after tests
- `--no-delete-kind-cluster` keeps the cluster alive between runs (useful locally)
- `--kind-cluster-name` overrides the default (`saluki-correctness`)
- Flush wait changed from 32s to 30s (`FLUSH_WAIT: Duration`)

**CI (`.gitlab/correctness-mixins.yml`, `generate-correctness-pipeline.sh`):**
- `.test-correctness-kind-definition` extends `.test-correctness-definition` with a longer timeout — all cluster/image management is inside panoramic
- Pipeline generator emits kind jobs via the kind mixin; mixin selection driven by `test.runtime()` from the `Test` trait

**kind pre-installed in `SALUKI_BUILD_CI_IMAGE`:**
- `.ci/install-kind.sh` installs kind in the same `RUN` layer as `install-docker-cli.sh`, with per-arch SHA256 checksums hardcoded in the script

**Explicitness:**
- `Runtime` enum has no default; all correctness configs have an explicit `runtime:` field (including `dsd-tag-filterlist` from PR 1552)

**Local dev:**
- `cargo run --profile release --package panoramic -- run -d test/correctness -t dsd-plain-kind --no-tui --no-delete-kind-cluster`
- `make clean-kind` / `make clean-correctness`

**Dependencies:** kube 0.93 (client + rustls-tls + ws), k8s-openapi 0.22, tokio-util compat; RUSTSEC-2025-0134 ignored (rustls-pemfile, transitive via kube)

## Test plan

- [x] `dsd-plain-kind` passes locally
- [x] Running `-t dsd-plain` (Docker test) does not trigger kind cluster setup
- [x] Docker and kind tests run concurrently — Docker tests start immediately, kind tests wait without holding concurrency slots
- [x] Ctrl-C during cluster setup doesn't deadlock kind test futures
- [x] Kind setup progress visible in both TUI and logging modes via `TestEvent::StatusLine`
- [x] Container logs are plain text without ANSI codes (both runtimes)
- [x] All correctness tests discovered with explicit runtime fields (including  from latest main)
- [x] Branch rebased cleanly onto main post-PR-1552
- [ ] `SALUKI_BUILD_CI_IMAGE` rebuilt with kind (trigger `generate-build-ci-image` via Run Pipeline on this branch)
- [ ] CI pipeline generates `test-correctness-dsd-plain-kind` job
- [ ] Existing Docker correctness tests unaffected

🤖 Generated with [Claude Code](https://claude.ai/claude-code)

Co-authored-by: travis.thieman <travis.thieman@datadoghq.com> d04bde8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ci CI/CD, automated testing, etc. area/test All things testing: unit/integration, correctness, SMP regression, etc. mergequeue-status: done

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants