feat(panoramic): add kind-based correctness test runtime#1541
Conversation
f9cdc46 to
6c248f2
Compare
Binary Size Analysis (Agent Data Plane)Target: bc61ae9 (baseline) vs 5020f9d (comparison) diff
|
| Module | File Size | Symbols |
|---|---|---|
figment |
+11.51 KiB | 561 |
core |
-1.93 KiB | 13573 |
serde_core |
-1.88 KiB | 752 |
anon.583fff1769e839a4ac4c3817980336c2.88.llvm.3366751254004393234 |
-1.69 KiB | 1 |
anon.026f154fc78761c6625909119fdb6b08.339.llvm.779975656443548521 |
+1.69 KiB | 1 |
anon.edc88c1f77f7fc54f3491f6dbe3b9816.26.llvm.8772311085534667432 |
-1.66 KiB | 1 |
anon.dd4568488db98f1f5e36dc582ec46b89.1300.llvm.598376170750614907 |
+1.66 KiB | 1 |
hickory_proto |
+1.45 KiB | 508 |
anon.65bdbfe4458a3f8f9b0946c368867cf8.50.llvm.4968226963744631081 |
-1.41 KiB | 1 |
anon.9c41cca0aee603b5f5a665dc9e65f22f.127.llvm.13310154803454401535 |
+1.41 KiB | 1 |
anon.0b516339eca47b86e19d894c06a552f4.490.llvm.12175845880619374489 |
+1.25 KiB | 1 |
anon.edc88c1f77f7fc54f3491f6dbe3b9816.29.llvm.8772311085534667432 |
-1.25 KiB | 1 |
anon.01e3c670ef17ae5fac81a933540fa1b7.119.llvm.7463965485508184397 |
+1.24 KiB | 1 |
anon.cf0916cd32a442a1e4bd06931e9a86cd.274.llvm.16793965728278154532 |
-1.24 KiB | 1 |
anon.0e1016b7c3f08cf2959bb8871559dd7c.790.llvm.1411995310717264458 |
+1.22 KiB | 1 |
anon.1655e58d66d1ef1230538706b7bd06b2.59.llvm.13790112841392064238 |
-1.22 KiB | 1 |
anon.1fe68cbddbd7be3d3efdb1444e7a5632.0.llvm.8454770004358052377 |
+1.14 KiB | 1 |
anon.c51fd3990c6b999fc93e5ff25d808401.126.llvm.8984495447445464818 |
-1.14 KiB | 1 |
tower_layer |
-1.11 KiB | 17 |
anon.56e6ba155cb7b0d0983a1359a1e3d756.33.llvm.16671882612047019062 |
-1.08 KiB | 1 |
Detailed Symbol Changes
FILE SIZE VM SIZE
-------------- --------------
[NEW] +149Ki [NEW] +149Ki agent_data_plane::cli::run::handle_run_command::_{{closure}}::h6d7b3b1e4cf1ec3e
[NEW] +85.9Ki [NEW] +85.7Ki saluki_env::workload::providers::remote_agent::RemoteAgentWorkloadProvider::from_configuration::_{{closure}}::hee3561fa4c60372a
[NEW] +69.9Ki [NEW] +69.7Ki agent_data_plane::run_inner::_{{closure}}::h8bc94950e259da2f
[NEW] +67.2Ki [NEW] +67.0Ki agent_data_plane::cli::run::create_topology::_{{closure}}::h14b8f0e0f4d015c1
[NEW] +64.9Ki [NEW] +64.7Ki saluki_core::topology::built::BuiltTopology::spawn::_{{closure}}::hc90d33c0f1c609c8
[NEW] +57.6Ki [NEW] +57.4Ki agent_data_plane::cli::debug::handle_debug_command::_{{closure}}::h61f5b9bab11f1e45
[NEW] +57.5Ki [NEW] +57.4Ki saluki_core::topology::blueprint::TopologyBlueprint::build::_{{closure}}::h5639eb098c3cf990
[NEW] +49.6Ki [NEW] +49.4Ki _<saluki_components::transforms::apm_stats::ApmStats as saluki_core::components::transforms::Transform>::run::_{{closure}}::h01d949e514d06c59
[NEW] +44.5Ki [NEW] +44.3Ki _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::h5bcbd42b1175098f
[NEW] +41.0Ki [NEW] +40.8Ki _<saluki_components::forwarders::otlp::OtlpForwarder as saluki_core::components::forwarders::Forwarder>::run::_{{closure}}::h7b3976be055d813c
+0.0% +9.32Ki +0.1% +8.51Ki [44089 Others]
[DEL] -41.0Ki [DEL] -40.8Ki _<saluki_components::forwarders::otlp::OtlpForwarder as saluki_core::components::forwarders::Forwarder>::run::_{{closure}}::h1bb8d646569bc30e
[DEL] -44.5Ki [DEL] -44.3Ki _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::h1baf53f0a27caf99
[DEL] -49.6Ki [DEL] -49.4Ki _<saluki_components::transforms::apm_stats::ApmStats as saluki_core::components::transforms::Transform>::run::_{{closure}}::h9eaed642fde8076b
[DEL] -57.5Ki [DEL] -57.4Ki saluki_core::topology::blueprint::TopologyBlueprint::build::_{{closure}}::h5bf8b481e91d3cf2
[DEL] -57.6Ki [DEL] -57.4Ki agent_data_plane::cli::debug::handle_debug_command::_{{closure}}::h6f1710360d926a91
[DEL] -64.9Ki [DEL] -64.7Ki saluki_core::topology::built::BuiltTopology::spawn::_{{closure}}::h2757e3b25873956f
[DEL] -67.2Ki [DEL] -67.0Ki agent_data_plane::cli::run::create_topology::_{{closure}}::h264ca2385ebef6f2
[DEL] -69.9Ki [DEL] -69.7Ki agent_data_plane::run_inner::_{{closure}}::hea2335eebab44bfd
[DEL] -85.9Ki [DEL] -85.7Ki saluki_env::workload::providers::remote_agent::RemoteAgentWorkloadProvider::from_configuration::_{{closure}}::h9b1aa6330ab3f6f1
[DEL] -149Ki [DEL] -149Ki agent_data_plane::cli::run::handle_run_command::_{{closure}}::hc4b614f155e62c55
+0.0% +9.32Ki +0.0% +8.51Ki TOTAL
Regression Detector (Agent Data Plane)Regression Detector ResultsRun ID: c581904c-e4ef-4e04-b3c4-5516d651a0d8 Baseline: bc61ae9 Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | +2.51 | [-2.41, +7.44] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | -0.01 | [-0.13, +0.11] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_memory | memory utilization | -1.43 | [-1.87, -0.98] | 1 | (metrics) (profiles) (logs) |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | dsd_uds_500mb_3k_contexts_throughput | ingress throughput | +3.13 | [+3.00, +3.26] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_cpu | % cpu utilization | +2.89 | [-54.97, +60.75] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | +2.51 | [-2.41, +7.44] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_memory | memory utilization | +1.38 | [+1.18, +1.59] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_cpu | % cpu utilization | +1.37 | [-51.75, +54.49] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_cpu | % cpu utilization | +1.18 | [-0.84, +3.19] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_medium | memory utilization | +0.40 | [+0.22, +0.57] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_cpu | % cpu utilization | +0.37 | [-0.93, +1.68] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_cpu | % cpu utilization | +0.33 | [-1.76, +2.41] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_low | memory utilization | +0.19 | [+0.04, +0.35] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_memory | memory utilization | +0.19 | [+0.04, +0.34] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_throughput | ingress throughput | +0.18 | [+0.11, +0.26] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_memory | memory utilization | +0.14 | [-0.01, +0.30] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_memory | memory utilization | +0.11 | [-0.03, +0.25] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_memory | memory utilization | +0.09 | [-0.07, +0.25] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_memory | memory utilization | +0.04 | [-0.11, +0.18] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_cpu | % cpu utilization | +0.01 | [-5.79, +5.82] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_ultraheavy | memory utilization | +0.01 | [-0.12, +0.14] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_throughput | ingress throughput | +0.01 | [-0.02, +0.04] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_throughput | ingress throughput | +0.00 | [-0.05, +0.06] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_throughput | ingress throughput | -0.00 | [-0.06, +0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_memory | memory utilization | -0.01 | [-0.14, +0.13] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_throughput | ingress throughput | -0.01 | [-0.19, +0.17] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | -0.01 | [-0.13, +0.11] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_throughput | ingress throughput | -0.01 | [-0.18, +0.15] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_throughput | ingress throughput | -0.05 | [-0.12, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_throughput | ingress throughput | -0.09 | [-0.16, -0.01] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_memory | memory utilization | -0.23 | [-0.39, -0.08] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_heavy | memory utilization | -0.27 | [-0.39, -0.15] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_memory | memory utilization | -0.29 | [-0.53, -0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_cpu | % cpu utilization | -0.35 | [-6.48, +5.79] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_idle | memory utilization | -0.45 | [-0.49, -0.41] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_memory | memory utilization | -1.43 | [-1.87, -0.98] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_cpu | % cpu utilization | -1.98 | [-4.43, +0.46] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_cpu | % cpu utilization | -2.18 | [-31.36, +27.01] | 1 | (metrics) (profiles) (logs) |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | quality_gates_rss_dsd_heavy | memory_usage | 10/10 | 124.02MiB ≤ 140MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_low | memory_usage | 10/10 | 40.64MiB ≤ 50MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_medium | memory_usage | 10/10 | 61.24MiB ≤ 75MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_ultraheavy | memory_usage | 10/10 | 175.05MiB ≤ 200MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_idle | memory_usage | 10/10 | 28.05MiB ≤ 40MiB | (metrics) (profiles) (logs) |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
567464a to
6186190
Compare
| } | ||
|
|
||
| /// Removes ANSI escape sequences (`ESC[...letter`) from a byte slice. | ||
| fn strip_ansi_codes(input: &[u8]) -> Vec<u8> { |
There was a problem hiding this comment.
Driveby, cleans up the log artifacts produced by the correctness tests (not just kind, this was affecting all of them)
| @@ -0,0 +1,18 @@ | |||
| /// Removes ANSI escape sequences (`ESC[...letter`) from a byte slice. | |||
| pub(crate) fn strip_ansi_codes(input: &[u8]) -> Vec<u8> { | |||
There was a problem hiding this comment.
There are two copies of this function, the alternative would be to share it from airlock. Didn't seem like it made a ton of sense to do that since it's not directly in line with airlock's purpose, but can do that if you'd rather.
…etes coverage Adds a `runtime: kind` option to correctness test configs that runs test groups as multi-container Kubernetes pods inside a kind cluster rather than Docker containers. This unlocks testing of origin detection scenarios that require real Kubernetes metadata (pod UIDs, containerd container IDs, K8s labels, External Data injection) which are untestable in the plain Docker correctness framework. Introduces `dsd-plain-kind` as the initial kind-based test, verifying that the existing dsd-plain workload passes end-to-end through the kind path before adding origin-detection-specific tests on top. - Add `Runtime` enum to correctness config schema (`docker` default, `kind`) - Add `k8s.rs` correctness runner: namespace-per-group isolation, ConfigMap file injection with subPath mounts, multi-container pod with shared emptyDir at /airlock, socket-wait wrapper on millstone, kube-rs port-forward for data collection, namespace cascade cleanup - Dispatch in correctness runner based on `config.runtime` - Expose `runtime` field in `panoramic list --json` output - Update pipeline generator to emit kind jobs using `.test-correctness-kind-definition` mixin (same dynamic approach as Docker tests, no special-casing) - Add `.test-correctness-kind-definition` CI mixin: extends Docker mixin, adds kind install/cluster-create/image-load in script, cluster teardown in after_script - Add Makefile targets: kind-create-cluster, kind-delete-cluster, kind-load-images, test-correctness-kind, test-correctness-kind-case, check-kind-tools - Add workspace deps: kube 0.93 (client + rustls-tls + ws), k8s-openapi 0.22 - Add RUSTSEC-2025-0134 ignore (rustls-pemfile, transitive via kube, no upgrade path) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…gs; parallelize kind image loading - Remove Default impl from Runtime enum; runtime field is now required in all correctness test configs (deserialization fails fast on any missing value) - Add runtime: docker to all 11 existing correctness test configs - Parallelize docker pulls and kind loads in .test-correctness-kind-definition CI mixin using background jobs + wait, reducing image prep time Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix port-forward background task leak: accept loop now exits on CancellationToken; token is cancelled after data collection completes - Add MILLSTONE_EXIT_TIMEOUT (300s) to wait_for_millstone_exit; was unbounded before - Report both errors when both baseline and comparison groups fail simultaneously - Rename spawn_duration/spawn_pods -> run_duration/run_groups (phase covers full group lifecycle, not just pod creation) - Parallelize cleanup_namespace calls with tokio::join! - Move `use std::time::Instant` to top-level imports - Import TargetConfig at top level; remove fully-qualified paths in fn signatures - Fix create_config_map: add ConfigMap to top-level imports, remove redundant full-path qualification on the constructor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Download from the canonical GitHub releases URL (more trustworthy than kind.sigs.k8s.io) and verify SHA256 against the value hardcoded in this file before installing. The hardcoded checksum is what provides the security guarantee — downloading it from the same server at runtime would not help. Update KIND_SHA256 here whenever KIND_VERSION is bumped. Also add deny.toml ignores for two new hickory-proto advisories (RUSTSEC-2026-0118, RUSTSEC-2026-0119). These are pre-existing vulnerabilities in DNSSEC validation code that we do not exercise; fixing them requires updating hyper-hickory past 0.8.0 to lift the hickory 0.25.x pin, which is a separate concern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add .ci/install-kind.sh following the same pattern as install-docker-cli.sh. The script downloads kind from the canonical GitHub releases URL, verifies the SHA256 against a hardcoded value, and installs it. Update KIND_SHA256 in the script whenever KIND_VERSION is bumped. Baking kind into the image is preferable to downloading it at job runtime: no external network dependency per job, checksum pinned in version control rather than inline YAML, and the install is audited as part of image builds rather than every test run. Remove the download-and-verify step from .test-correctness-kind-definition since kind is now available in the image. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The script already handled TARGETARCH dispatch but used the same SHA256 for both architectures. Add the correct per-arch checksums so arm64 builds are verified correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Merge install-kind.sh into the same RUN layer as install-docker-cli.sh so clean-temporary-caches.sh applies to both in a single layer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move all kind cluster setup and teardown out of the CI mixin and Makefile
and into panoramic itself, so the runner is self-contained for kind tests.
On startup, if any kind-runtime tests are selected, panoramic:
- Checks whether the named cluster already exists (reuses it if so)
- Creates it if not
- Pulls all required images in parallel (best-effort; local-only images
that fail the pull are still loaded from the local daemon)
- Loads all images into the cluster in parallel
After all tests complete, panoramic deletes the cluster unless
--no-delete-cluster is passed (useful for local iteration).
The kind cluster name defaults to "saluki-correctness" and can be
overridden with --kind-cluster-name.
CI mixin: remove cluster create/image pull/load/delete steps; the mixin
now only extends the Docker definition with a longer timeout.
Makefile: remove kind-create-cluster, kind-delete-cluster,
kind-load-images, test-correctness-kind, test-correctness-kind-case
targets. check-kind-tools is kept for install verification. Users run
kind tests via the existing make test-correctness-case CASE=dsd-plain-kind
or make test-correctness (which picks up all tests).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-correctness target Label each panoramic-created namespace with created-by=panoramic-kind so orphaned namespaces (from a killed panoramic process) can be bulk-deleted. Add Makefile targets: clean-kind -- kubectl delete namespace -l created-by=panoramic-kind clean-correctness -- wraps clean-airlock + clean-kind Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…st on pull error Replace best-effort parallel pull with a per-image check: if the image is already in the local Docker daemon (docker image inspect succeeds), skip the pull. If it is absent, pull it — and if the pull fails, fail the entire test run immediately rather than continuing with a missing image that will cause kind load to fail with a less obvious error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This reverts commit 7a8c5f2.
…runtime to kubernetes_in_docker Container logs: after each pod reaches Running, spawn background tasks that stream each container's logs (datadog-intake, target, millstone) to <log-dir>/correctness/<test>/<baseline|comparison>/<container>.log using kube-rs log_stream. Uses tokio-util compat bridge to connect futures::AsyncBufRead to tokio::io::copy. Rename: runtime: kind → runtime: kubernetes_in_docker for clarity. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… ANSI codes in logs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
tracing-subscriber hardcodes with_ansi(true) in datadog-intake and millstone, and the Datadog Agent's own logger also emits color codes. NO_COLOR is not respected. Instead, strip ANSI escape sequences (ESC[...letter) from the log stream in panoramic before writing to disk, which works uniformly for all containers regardless of how they handle color. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Same fix as the kind log streaming path — containers emit colored tracing output regardless of whether stdout is a terminal, so strip before writing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ync licenses New test added by PR 1552 was missing the required runtime field. License file updated for new dependencies introduced by the rebase. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
hickory-proto updated to 0.26.1 on main, which resolved RUSTSEC-2026-0118 and RUSTSEC-2026-0119. Remove the now-stale ignore entries. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously, kind cluster setup (create cluster + load images) blocked all tests from starting. Now it runs as a background task so Docker-runtime tests start immediately. The wait is deferred to the kind test itself via a watch channel injected into TestContext. The runner waits for the channel BEFORE acquiring a concurrency slot so kind tests don't hold slots while the cluster is still being set up. By the time a kind test actually runs, the cluster is ready and the check in run_k8s_correctness_test is a fast-path no-op. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… tests The kind_ready wait was applied to all tests in the parallel runner, causing Docker tests to also block on cluster setup. Guard the wait with a runtime check so only kubernetes_in_docker tests wait. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove duplicate implementations from kind.rs and k8s.rs and share from crate::utils. airlock/src/driver.rs keeps its own copy as it's a separate crate. Also capture kind create cluster output through tracing (debug level) instead of letting it print raw to the terminal with emojis and ANSI codes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add TestEvent::StatusLine for infrastructure messages that bypass the test runner event flow. Move event channel creation before the kind setup background task so it can emit status lines that appear in both TUI and logging modes. kind.rs now emits StatusLine events for: - "Reusing existing kind cluster..." / "Creating kind cluster..." - "Pulling container images (if not already present)..." - "Loading images into kind cluster..." Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. Cancel-safe kind_ready wait: use tokio::select! on cancel.cancelled() in both run_parallel and run_fail_fast so Ctrl-C during cluster setup doesn't deadlock kind test futures. 2. run_fail_fast kind_ready wait: add the same pre-semaphore wait that run_parallel has so kind tests don't immediately fail with "did not complete" when run with --fail-fast. 3. Remove duplicate strip_ansi_codes from k8s.rs — kind.rs and k8s.rs now both import from crate::utils. 4. check_kind_installed now verifies the exit code, not just whether the binary exists. 5. Move `use crate::utils::strip_ansi_codes` to top of kind.rs imports. 6. Warn on malformed env vars (missing '=') in the k8s target env parsing instead of silently dropping them. 7. Warn when kind setup fails mid-way (cluster created but image load failed), so users know a dangling cluster may need manual cleanup. 8. Fix step comment numbering in run_group (was missing step 6). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous commit's Python replace didn't match because the function had a different internal comment style. Remove it properly this time. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
collect_kind_images was called on all discovered tests before the name filter was applied, so running '-t dsd-plain' would still trigger kind cluster setup because dsd-plain-kind exists in the test directory. Pass the name filter into collect_kind_images so only tests that will actually run are considered when deciding whether to start kind setup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… to 30s Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…atch::Receiver for KindReadyReceiver watch::Receiver<T> implements Clone, giving each task its own independent receiver with its own "last seen" mark. No shared lock needed — concurrent kind tests each wait on their own clone without serializing. Also fix stale comments: - KindReadyReceiver doc no longer references Arc<Mutex<_>> - Remove orphaned "Cancellation token for the test run." comment in main.rs - Update generate-correctness-pipeline.sh comment (kind tests are included, not excluded, in the dynamic pipeline) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0fc4521 to
09e98cb
Compare
New test added by main was missing the required runtime field, breaking the correctness pipeline generator. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
d04bde8
into
main
## Human Summary Adds the ability to run `kind` (kubernetes-in-docker) correctness tests. The actual goal here is to add multiple tests covering origin detection which has a lot of Kubernetes-specific behavior we'd like to cover. This PR does not add origin detection tests yet, but it does add a `dsd-plain-kind` test which is just the existing `dsd-plain` correctness test but modified to run under kind. This test will be deleted when we add the origin detection tests. Example run including the kind test is provided immediately below. You'll need to [install kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installing-from-release-binaries) first: `cargo run --profile release --package panoramic -- run -d test/correctness -d test/integration/cases -t basic-startup,dsd-plain,dsd-plain-kind` ## Summary - Adds a `runtime: kubernetes_in_docker` option to correctness test configs that runs test groups as multi-container Kubernetes pods inside a kind cluster, unlocking origin detection testing scenarios that are impossible in the plain Docker framework (real pod UIDs, containerd container IDs, K8s labels, External Data injection via `DD_EXTERNAL_ENV`) - Introduces `dsd-plain-kind` as the initial kind-based test — verifies the existing dsd-plain workload passes end-to-end through the kind path as a baseline before origin-detection-specific tests are layered on top - CI integration follows the same dynamic pipeline approach as existing Docker tests: the pipeline generator emits kind jobs automatically when `runtime: kubernetes_in_docker` is detected, using a new `.test-correctness-kind-definition` mixin that extends the existing Docker mixin with a longer timeout - `runtime` is now a required field in all correctness configs (no default); all existing tests explicitly declare their runtime - Rebased onto main after PR #1552 (dynamic dispatch with Test trait and Runner) and latest main (May 7) ## What changed **New runtime path (`k8s.rs`):** - Integrates with PR 1552's `Test` trait architecture — `Config::run(tctx: TestContext)` dispatches to the kind path when `runtime: kubernetes_in_docker` - Each test group (baseline + comparison) runs as a multi-container pod in its own namespace, labelled `created-by=panoramic-kind` for orphan cleanup - `datadog-intake`, `target` (agent), and `millstone` share a pod with an emptyDir at `/airlock` for the UDS socket - Config files injected via ConfigMap with `subPath` mounts so the agent's `/etc/datadog-agent/` remains writable (needed for `auth_token` creation) - Millstone wrapped in a socket-wait shell command so it doesn't send before the agent is ready - After the pod reaches Running, background tasks stream each container's logs to `<log-dir>/<baseline|comparison>/<container>.log`; ANSI escape codes stripped via `crate::utils::strip_ansi_codes` (no duplicate implementations) - Data collected via kube-rs port-forward to datadog-intake port 2049; forward cancelled via `CancellationToken` after collection - `wait_for_millstone_exit` bounded by `MILLSTONE_EXIT_TIMEOUT` (300s) - Both baseline and comparison errors reported when both groups fail simultaneously - Malformed env vars (missing `=`) in target config emit `warn!` instead of being silently dropped **ANSI stripping (both runtimes):** - `airlock/src/driver.rs` strips ANSI codes from Docker container log output - `k8s.rs` and `kind.rs` share `strip_ansi_codes` from `crate::utils`; `airlock` keeps its own copy as a separate crate - `kind create cluster` output is captured and emitted through tracing at debug level; raw emoji/ANSI output suppressed **Kind cluster lifecycle (`kind.rs`, `main.rs`, `runner.rs`, `test.rs`):** - Kind cluster setup only runs when the selected test set includes at least one `kubernetes_in_docker` test; running `-t dsd-plain` never touches kind - Kind cluster setup runs as a background task — Docker-runtime tests start immediately without waiting - Kind tests wait for the cluster-ready signal **before acquiring a concurrency slot**, gated on `test.runtime() == "kubernetes_in_docker"` so Docker tests are completely unaffected; each task holds its own cloned `watch::Receiver` with independent "last seen" state — no mutex needed - The wait loop uses `tokio::select!` on the cancellation token so Ctrl-C during cluster setup doesn't deadlock kind test futures; both `run_parallel` and `run_fail_fast` perform this wait - `check_kind_installed` verifies the exit code in addition to binary presence - A warning is emitted when kind setup fails after cluster creation, alerting users to a potentially dangling cluster - Kind setup emits `TestEvent::StatusLine` messages visible in both TUI and logging modes - panoramic manages the full kind cluster lifecycle: creates if absent (reuses if present), pulls images only if not already in the local daemon (pull failure is fatal), loads images in parallel, deletes cluster after tests - `--no-delete-kind-cluster` keeps the cluster alive between runs (useful locally) - `--kind-cluster-name` overrides the default (`saluki-correctness`) - Flush wait changed from 32s to 30s (`FLUSH_WAIT: Duration`) **CI (`.gitlab/correctness-mixins.yml`, `generate-correctness-pipeline.sh`):** - `.test-correctness-kind-definition` extends `.test-correctness-definition` with a longer timeout — all cluster/image management is inside panoramic - Pipeline generator emits kind jobs via the kind mixin; mixin selection driven by `test.runtime()` from the `Test` trait **kind pre-installed in `SALUKI_BUILD_CI_IMAGE`:** - `.ci/install-kind.sh` installs kind in the same `RUN` layer as `install-docker-cli.sh`, with per-arch SHA256 checksums hardcoded in the script **Explicitness:** - `Runtime` enum has no default; all correctness configs have an explicit `runtime:` field (including `dsd-tag-filterlist` from PR 1552) **Local dev:** - `cargo run --profile release --package panoramic -- run -d test/correctness -t dsd-plain-kind --no-tui --no-delete-kind-cluster` - `make clean-kind` / `make clean-correctness` **Dependencies:** kube 0.93 (client + rustls-tls + ws), k8s-openapi 0.22, tokio-util compat; RUSTSEC-2025-0134 ignored (rustls-pemfile, transitive via kube) ## Test plan - [x] `dsd-plain-kind` passes locally - [x] Running `-t dsd-plain` (Docker test) does not trigger kind cluster setup - [x] Docker and kind tests run concurrently — Docker tests start immediately, kind tests wait without holding concurrency slots - [x] Ctrl-C during cluster setup doesn't deadlock kind test futures - [x] Kind setup progress visible in both TUI and logging modes via `TestEvent::StatusLine` - [x] Container logs are plain text without ANSI codes (both runtimes) - [x] All correctness tests discovered with explicit runtime fields (including from latest main) - [x] Branch rebased cleanly onto main post-PR-1552 - [ ] `SALUKI_BUILD_CI_IMAGE` rebuilt with kind (trigger `generate-build-ci-image` via Run Pipeline on this branch) - [ ] CI pipeline generates `test-correctness-dsd-plain-kind` job - [ ] Existing Docker correctness tests unaffected 🤖 Generated with [Claude Code](https://claude.ai/claude-code) Co-authored-by: travis.thieman <travis.thieman@datadoghq.com> d04bde8
Human Summary
Adds the ability to run
kind(kubernetes-in-docker) correctness tests. The actual goal here is to add multiple tests covering origin detection which has a lot of Kubernetes-specific behavior we'd like to cover. This PR does not add origin detection tests yet, but it does add adsd-plain-kindtest which is just the existingdsd-plaincorrectness test but modified to run under kind. This test will be deleted when we add the origin detection tests.Example run including the kind test is provided immediately below. You'll need to install kind first:
cargo run --profile release --package panoramic -- run -d test/correctness -d test/integration/cases -t basic-startup,dsd-plain,dsd-plain-kindSummary
runtime: kubernetes_in_dockeroption to correctness test configs that runs test groups as multi-container Kubernetes pods inside a kind cluster, unlocking origin detection testing scenarios that are impossible in the plain Docker framework (real pod UIDs, containerd container IDs, K8s labels, External Data injection viaDD_EXTERNAL_ENV)dsd-plain-kindas the initial kind-based test — verifies the existing dsd-plain workload passes end-to-end through the kind path as a baseline before origin-detection-specific tests are layered on topruntime: kubernetes_in_dockeris detected, using a new.test-correctness-kind-definitionmixin that extends the existing Docker mixin with a longer timeoutruntimeis now a required field in all correctness configs (no default); all existing tests explicitly declare their runtimeWhat changed
New runtime path (
k8s.rs):Testtrait architecture —Config::run(tctx: TestContext)dispatches to the kind path whenruntime: kubernetes_in_dockercreated-by=panoramic-kindfor orphan cleanupdatadog-intake,target(agent), andmillstoneshare a pod with an emptyDir at/airlockfor the UDS socketsubPathmounts so the agent's/etc/datadog-agent/remains writable (needed forauth_tokencreation)<log-dir>/<baseline|comparison>/<container>.log; ANSI escape codes stripped viacrate::utils::strip_ansi_codes(no duplicate implementations)CancellationTokenafter collectionwait_for_millstone_exitbounded byMILLSTONE_EXIT_TIMEOUT(300s)=) in target config emitwarn!instead of being silently droppedANSI stripping (both runtimes):
airlock/src/driver.rsstrips ANSI codes from Docker container log outputk8s.rsandkind.rssharestrip_ansi_codesfromcrate::utils;airlockkeeps its own copy as a separate cratekind create clusteroutput is captured and emitted through tracing at debug level; raw emoji/ANSI output suppressedKind cluster lifecycle (
kind.rs,main.rs,runner.rs,test.rs):kubernetes_in_dockertest; running-t dsd-plainnever touches kindtest.runtime() == "kubernetes_in_docker"so Docker tests are completely unaffected; each task holds its own clonedwatch::Receiverwith independent "last seen" state — no mutex neededtokio::select!on the cancellation token so Ctrl-C during cluster setup doesn't deadlock kind test futures; bothrun_parallelandrun_fail_fastperform this waitcheck_kind_installedverifies the exit code in addition to binary presenceTestEvent::StatusLinemessages visible in both TUI and logging modes--no-delete-kind-clusterkeeps the cluster alive between runs (useful locally)--kind-cluster-nameoverrides the default (saluki-correctness)FLUSH_WAIT: Duration)CI (
.gitlab/correctness-mixins.yml,generate-correctness-pipeline.sh):.test-correctness-kind-definitionextends.test-correctness-definitionwith a longer timeout — all cluster/image management is inside panoramictest.runtime()from theTesttraitkind pre-installed in
SALUKI_BUILD_CI_IMAGE:.ci/install-kind.shinstalls kind in the sameRUNlayer asinstall-docker-cli.sh, with per-arch SHA256 checksums hardcoded in the scriptExplicitness:
Runtimeenum has no default; all correctness configs have an explicitruntime:field (includingdsd-tag-filterlistfrom PR 1552)Local dev:
cargo run --profile release --package panoramic -- run -d test/correctness -t dsd-plain-kind --no-tui --no-delete-kind-clustermake clean-kind/make clean-correctnessDependencies: kube 0.93 (client + rustls-tls + ws), k8s-openapi 0.22, tokio-util compat; RUSTSEC-2025-0134 ignored (rustls-pemfile, transitive via kube)
Test plan
dsd-plain-kindpasses locally-t dsd-plain(Docker test) does not trigger kind cluster setupTestEvent::StatusLineSALUKI_BUILD_CI_IMAGErebuilt with kind (triggergenerate-build-ci-imagevia Run Pipeline on this branch)test-correctness-dsd-plain-kindjob🤖 Generated with Claude Code