chore(panoramic): dynamic dispatch with test trait and runner#1552
Conversation
Binary Size Analysis (Agent Data Plane)Target: 507d913 (baseline) vs 2765cb1 (comparison) diff
|
| Module | File Size | Symbols |
|---|---|---|
anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.17550754633244885205 |
-130 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.2417440905338573713 |
+129 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.17550754633244885205 |
-115 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.2417440905338573713 |
+114 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.17550754633244885205 |
-109 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.2417440905338573713 |
+108 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.17550754633244885205 |
-97 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.2417440905338573713 |
+96 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.17550754633244885205 |
-95 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.2417440905338573713 |
+94 B | 1 |
[Unmapped] |
-3 B | 1 |
Detailed Symbol Changes
FILE SIZE VM SIZE
-------------- --------------
[NEW] +129 [NEW] +40 anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.2417440905338573713
[NEW] +114 [NEW] +25 anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.2417440905338573713
[NEW] +108 [NEW] +19 anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.2417440905338573713
[NEW] +96 [NEW] +7 anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.2417440905338573713
[NEW] +94 [NEW] +5 anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.2417440905338573713
-0.0% -3 [ = ] 0 [Unmapped]
[DEL] -95 [DEL] -5 anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.17550754633244885205
[DEL] -97 [DEL] -7 anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.17550754633244885205
[DEL] -109 [DEL] -19 anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.17550754633244885205
[DEL] -115 [DEL] -25 anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.17550754633244885205
[DEL] -130 [DEL] -40 anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.17550754633244885205
-0.0% -8 [ = ] 0 TOTAL
Regression Detector (Agent Data Plane)Regression Detector ResultsRun ID: efbe8889-3221-4d3a-bcec-83658a0074ac Baseline: 507d913 Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ❌ | otlp_ingest_logs_5mb_memory | memory utilization | +14.68 | [+14.31, +15.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | +2.81 | [-1.65, +7.27] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | +0.03 | [-0.09, +0.16] | 1 | (metrics) (profiles) (logs) |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ❌ | otlp_ingest_logs_5mb_memory | memory utilization | +14.68 | [+14.31, +15.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_memory | memory utilization | +4.08 | [+3.90, +4.26] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | +2.81 | [-1.65, +7.27] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_cpu | % cpu utilization | +1.70 | [-4.32, +7.73] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_cpu | % cpu utilization | +1.27 | [-4.07, +6.60] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_throughput | ingress throughput | +0.56 | [+0.48, +0.64] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_throughput | ingress throughput | +0.48 | [+0.36, +0.60] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_throughput | ingress throughput | +0.36 | [+0.28, +0.43] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_low | memory utilization | +0.28 | [+0.12, +0.44] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_cpu | % cpu utilization | +0.24 | [-1.86, +2.34] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_memory | memory utilization | +0.16 | [+0.00, +0.32] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_idle | memory utilization | +0.10 | [+0.07, +0.14] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_heavy | memory utilization | +0.08 | [-0.04, +0.20] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | +0.03 | [-0.09, +0.16] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_throughput | ingress throughput | +0.00 | [-0.06, +0.06] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_throughput | ingress throughput | +0.00 | [-0.05, +0.06] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_throughput | ingress throughput | +0.00 | [-0.17, +0.17] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_throughput | ingress throughput | -0.00 | [-0.16, +0.15] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_memory | memory utilization | -0.01 | [-0.16, +0.14] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_memory | memory utilization | -0.02 | [-0.26, +0.22] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_memory | memory utilization | -0.02 | [-0.17, +0.14] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_throughput | ingress throughput | -0.02 | [-0.05, +0.01] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_cpu | % cpu utilization | -0.05 | [-2.10, +1.99] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_ultraheavy | memory utilization | -0.06 | [-0.19, +0.06] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_throughput | ingress throughput | -0.16 | [-0.24, -0.09] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_memory | memory utilization | -0.19 | [-0.33, -0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_memory | memory utilization | -0.25 | [-0.41, -0.10] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_medium | memory utilization | -0.27 | [-0.44, -0.10] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_memory | memory utilization | -0.37 | [-0.52, -0.23] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_memory | memory utilization | -0.54 | [-0.68, -0.39] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_cpu | % cpu utilization | -0.56 | [-1.99, +0.87] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_cpu | % cpu utilization | -1.19 | [-3.38, +0.99] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_cpu | % cpu utilization | -2.19 | [-58.64, +54.27] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_cpu | % cpu utilization | -3.40 | [-32.70, +25.91] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_cpu | % cpu utilization | -3.64 | [-56.11, +48.82] | 1 | (metrics) (profiles) (logs) |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | quality_gates_rss_dsd_heavy | memory_usage | 10/10 | 121.82MiB ≤ 140MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_low | memory_usage | 10/10 | 40.11MiB ≤ 50MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_medium | memory_usage | 10/10 | 61.98MiB ≤ 75MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_ultraheavy | memory_usage | 10/10 | 177.18MiB ≤ 200MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_idle | memory_usage | 10/10 | 27.38MiB ≤ 40MiB | (metrics) (profiles) (logs) |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
3279c37 to
f7dc1cd
Compare
| run_with_tui_consumer(rx, cancel_all, Some(log_dir), runner_handle).await | ||
| } else { | ||
| run_with_logging_consumer(rx, &cmd, log_dir, runner_handle).await | ||
| run_with_logging_consumer(rx, &cmd, Some(log_dir), runner_handle).await |
There was a problem hiding this comment.
cancel doesn't go into this path?
There was a problem hiding this comment.
No, I ran out of time trying to get this done, but I don't think we already had a signal capture for ctrl-c in the program and I... ran out of time. I think we just need to trap ctrl-c and wire it up.
There was a problem hiding this comment.
Added this. It was a tiny thing, I was just feeling rushed on Friday to get the PR open.
| pub(crate) type EventSender = mpsc::UnboundedSender<TestEvent>; | ||
|
|
||
| /// The amount of time a test has to clean up after cancellation or timing out. | ||
| const GRACE_TIME: Duration = Duration::from_secs(5); |
There was a problem hiding this comment.
This will likely be pretty tight for kind tests that tear down the cluster after themselves, maybe give it 30s just to be accommodating?
There was a problem hiding this comment.
Seems like a mistake, I thought I had 30 there. Oops, will change
| /// Handles test filtering, parallelism, timeout enforcement, cancellation propagation, | ||
| /// log directory creation, and event dispatch. Individual test logic lives in the `Test` | ||
| /// implementations. | ||
| pub(crate) struct Runner { |
There was a problem hiding this comment.
Will this also hold options passed from the CLI? An example I was thinking of adding was a --no-delete-kind-cluster flag that would need to eventually get passed down to the correctness test
There was a problem hiding this comment.
My idea for that is test.rs:
/// A directory from which files should be mounted into one or more of the domain-specific containers used in this
/// test.
// TODO: this is a hack introduced to support the PANORAMIC_DYNAMIC feature. Consider generalizing if needed.
// For example: this could become runtime_config: HashMap<String, String> for shuttling domain specific items from
// runtime to a test.
mounts_dir: PathBuf,
Instead of mounts_dir there we could have runtime_config: HashMap<String, String>
And you could pass stuff that way.
But... I'm not sure --no-delete-kind-cluster at the CLI sounds great.
What I actually think is that there should probably be a whole new Test type for Kind cluster tests.
It bears thinking through design tradeoffs.
The issue with something like --no-delete-kind-cluster at the CLI is that now we are pushing extremely specific stuff into the CLI whereas panoramic might be capable of running tests that don't use containers or can't run on kind. So I think we should consider how we design things.
There was a problem hiding this comment.
But... I'm not sure --no-delete-kind-cluster at the CLI sounds great.
It's not really a property of the test itself, it's a runtime option for how you'd like to actually execute the tests. In this case the idea would be "I'm futzing with this test locally and will need to run it a bunch, so let's skip starting and killing a full kind cluster each time." The CLI seems like the natural place for those, I shouldn't need to futz with YAML for a temporary change I don't want to commit.
What I actually think is that there should probably be a whole new Test type for Kind cluster tests.
In my current PR the config for it is limited to a single runtime field on the correctness test def, if it gets more complex than that could certainly consider splitting. But if they're 99% equivalent, seems like they can share.
There was a problem hiding this comment.
I would think of that in terms of test lifecycle being something like this:
- test resource set-up
- test execution
- test resource teardown
And what you're asking panoramic to do is --skip-teardown which could be applicable to other test types as well.
Edit: in fact, --skip-teardown can be important sometimes if you need to get in there and see what went wrong.
There was a problem hiding this comment.
Skipping the kind cluster teardown specifically is more granular than that, skipping teardown entirely would also skip cleaning up any of the test-specific namespaces, pods, containers, etc. Not saying we shouldn't support --skip-teardown, we should, but I think a general mechanism for passing CLI flags down to Test implementations will be necessary in either case.
There was a problem hiding this comment.
I think my ideal, when it comes to kubernetes, would be to create the cluster and tear it down without necessarily even caring about namespaces, etc. I can't remember if kind is one of those tools that needs a cluster to create your cluster... but in general, it's much cleaner if you can create and discard clusters.
There was a problem hiding this comment.
That's the default, but if you're intending to run the test repeatedly on your Mac then that's going to add ~2 minutes to each test run that you'd probably rather not add. It's purely a convenience option.
Test execution was organized around two known test types. This PR introduces a `Test` trait and central `Runner` so that new types can be added without modifying the dispatch or scheduling code. This PR replaces the enum dispatch with a `Test` trait and a `Runner` struct that owns scheduling, timeout, and cancellation uniformly. Discovery returns `Vec<Box<dyn Test>>`. Each test receives a `TestContext` at execution time carrying a cancellation token and log directory. The `Runner` creates a per-test cancel token, enforces the timeout with a grace period, and propagates program-level cancellation to running tests. `TestResult` no longer carries filesystem paths - the `Runner` computes log directories and passes them through events. This is a minor change, but `TestResult` carried `log_dir` to a lot of places where it wasn't needed. The `--no-logs` flag was removed. I didn't see it being used anywhere in the codebase and couldn't think of a great usecase. Meanwhile it was a bit unwieldy to wire it through to where it needed to be. These changes make it straightforward to add new test types or create test cases dynamically. They just need to implement the trait, be added to the runner, and they will run and report like any other test. | Change | Reason / Benefit | |------------------------------------------------|--------------------------------------------------------| | `Test` trait with `run(tctx)` signature | New test types implement one trait, no enum arms | | `Runner` struct owns scheduling and timeout | Uniform timeout/cancel for all test types | | `TestContext` passed to `run()` | No stored runtime state | | Per-test `CancellationToken` created by Runner | Runner controls shutdown; tests respond cooperatively | | Global cancel propagates to running tests | Users can cancel via TUI or ctrl-c | | `log_dir` removed from `TestResult` | TestResult is a bit more pure without it | | `--no-logs` CLI flag removed | I can put it back if needed, but it was a little wonky | | `TestRunner` -> `IntegrationRunner` | Disambiguates from the generic `Runner` scheduler | | `TestRunner` -> `CorrectnessRunner` | Disambiguates from the generic `Runner` scheduler | | `DiscoveredTest` enum deleted | Replaced by trait objects; removes match arms | | `RunArgs` builder for runner configuration | Reduces parameters for the run_tests function call |
448363f to
2765cb1
Compare
## Summary
Test execution was organized around two known test types. This PR
introduces a `Test` trait and central `Runner` so that new types can be
added without modifying the dispatch or scheduling code.
This PR replaces the enum dispatch with a `Test` trait and a `Runner`
struct that owns scheduling, timeout, and cancellation uniformly.
Discovery returns `Vec<Box<dyn Test>>`. Each test receives a
`TestContext` at execution time carrying a cancellation token and log
directory. The `Runner` creates a per-test cancel token, enforces the
timeout with a grace period, and propagates program-level cancellation
to running tests.
`TestResult` no longer carries filesystem paths - the `Runner` computes
log directories and passes them through events. This is a minor change,
but `TestResult` carried `log_dir` to a lot of places where it wasn't
needed.
The `--no-logs` flag was removed. I didn't see it being used anywhere in
the codebase and couldn't think of a great usecase. Meanwhile it was a
bit unwieldy to wire it through to where it needed to be.
These changes make it straightforward to add new test types or create
test cases dynamically. They just need to implement the trait, be added
to the runner, and they will run and report like any other test.
| Change | Reason / Benefit |
|------------------------------------------------|--------------------------------------------------------|
| `Test` trait with `run(tctx)` signature | New test types implement one
trait, no enum arms |
| `Runner` struct owns scheduling and timeout | Uniform timeout/cancel
for all test types |
| `TestContext` passed to `run()` | No stored runtime state |
| Per-test `CancellationToken` created by Runner | Runner controls
shutdown; tests respond cooperatively |
| Global cancel propagates to running tests | Users can cancel via TUI
or ctrl-c |
| `log_dir` removed from `TestResult` | TestResult is a bit more pure
without it |
| `--no-logs` CLI flag removed | I can put it back if needed, but it was
a little wonky |
| `TestRunner` -> `IntegrationRunner` | Disambiguates from the generic
`Runner` scheduler |
| `TestRunner` -> `CorrectnessRunner` | Disambiguates from the generic
`Runner` scheduler |
| `DiscoveredTest` enum deleted | Replaced by trait objects; removes
match arms |
| `RunArgs` builder for runner configuration | Reduces parameters for
the run_tests function call |
## Summary
<!-- Please provide a brief summary about what this PR does.
This should help the reviewers give feedback faster and with higher
quality. -->
## Change Type
- [x] Non-functional (chore, refactoring, docs)
## How did you test this PR?
- [x] CI
- [ ] Run locally on main
- [ ] Run locally on branch (check for differences)
Using:
```bash
docker ps -q --filter name=airlock- | xargs docker rm -f
docker network prune -f
make \
build-datadog-intake-image \
build-millstone-image \
build-datadog-agent-image-release \
build-datadog-agent-image \
build-adp-image && \
cargo run --release --bin panoramic -- \
run -d test/integration/cases \
-d test/correctness \
--no-tui
```
## References
Related, precursor to #1536 eb250ff
## Human Summary Adds the ability to run `kind` (kubernetes-in-docker) correctness tests. The actual goal here is to add multiple tests covering origin detection which has a lot of Kubernetes-specific behavior we'd like to cover. This PR does not add origin detection tests yet, but it does add a `dsd-plain-kind` test which is just the existing `dsd-plain` correctness test but modified to run under kind. This test will be deleted when we add the origin detection tests. Example run including the kind test is provided immediately below. You'll need to [install kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installing-from-release-binaries) first: `cargo run --profile release --package panoramic -- run -d test/correctness -d test/integration/cases -t basic-startup,dsd-plain,dsd-plain-kind` ## Summary - Adds a `runtime: kubernetes_in_docker` option to correctness test configs that runs test groups as multi-container Kubernetes pods inside a kind cluster, unlocking origin detection testing scenarios that are impossible in the plain Docker framework (real pod UIDs, containerd container IDs, K8s labels, External Data injection via `DD_EXTERNAL_ENV`) - Introduces `dsd-plain-kind` as the initial kind-based test — verifies the existing dsd-plain workload passes end-to-end through the kind path as a baseline before origin-detection-specific tests are layered on top - CI integration follows the same dynamic pipeline approach as existing Docker tests: the pipeline generator emits kind jobs automatically when `runtime: kubernetes_in_docker` is detected, using a new `.test-correctness-kind-definition` mixin that extends the existing Docker mixin with a longer timeout - `runtime` is now a required field in all correctness configs (no default); all existing tests explicitly declare their runtime - Rebased onto main after PR #1552 (dynamic dispatch with Test trait and Runner) and latest main (May 7) ## What changed **New runtime path (`k8s.rs`):** - Integrates with PR 1552's `Test` trait architecture — `Config::run(tctx: TestContext)` dispatches to the kind path when `runtime: kubernetes_in_docker` - Each test group (baseline + comparison) runs as a multi-container pod in its own namespace, labelled `created-by=panoramic-kind` for orphan cleanup - `datadog-intake`, `target` (agent), and `millstone` share a pod with an emptyDir at `/airlock` for the UDS socket - Config files injected via ConfigMap with `subPath` mounts so the agent's `/etc/datadog-agent/` remains writable (needed for `auth_token` creation) - Millstone wrapped in a socket-wait shell command so it doesn't send before the agent is ready - After the pod reaches Running, background tasks stream each container's logs to `<log-dir>/<baseline|comparison>/<container>.log`; ANSI escape codes stripped via `crate::utils::strip_ansi_codes` (no duplicate implementations) - Data collected via kube-rs port-forward to datadog-intake port 2049; forward cancelled via `CancellationToken` after collection - `wait_for_millstone_exit` bounded by `MILLSTONE_EXIT_TIMEOUT` (300s) - Both baseline and comparison errors reported when both groups fail simultaneously - Malformed env vars (missing `=`) in target config emit `warn!` instead of being silently dropped **ANSI stripping (both runtimes):** - `airlock/src/driver.rs` strips ANSI codes from Docker container log output - `k8s.rs` and `kind.rs` share `strip_ansi_codes` from `crate::utils`; `airlock` keeps its own copy as a separate crate - `kind create cluster` output is captured and emitted through tracing at debug level; raw emoji/ANSI output suppressed **Kind cluster lifecycle (`kind.rs`, `main.rs`, `runner.rs`, `test.rs`):** - Kind cluster setup only runs when the selected test set includes at least one `kubernetes_in_docker` test; running `-t dsd-plain` never touches kind - Kind cluster setup runs as a background task — Docker-runtime tests start immediately without waiting - Kind tests wait for the cluster-ready signal **before acquiring a concurrency slot**, gated on `test.runtime() == "kubernetes_in_docker"` so Docker tests are completely unaffected; each task holds its own cloned `watch::Receiver` with independent "last seen" state — no mutex needed - The wait loop uses `tokio::select!` on the cancellation token so Ctrl-C during cluster setup doesn't deadlock kind test futures; both `run_parallel` and `run_fail_fast` perform this wait - `check_kind_installed` verifies the exit code in addition to binary presence - A warning is emitted when kind setup fails after cluster creation, alerting users to a potentially dangling cluster - Kind setup emits `TestEvent::StatusLine` messages visible in both TUI and logging modes - panoramic manages the full kind cluster lifecycle: creates if absent (reuses if present), pulls images only if not already in the local daemon (pull failure is fatal), loads images in parallel, deletes cluster after tests - `--no-delete-kind-cluster` keeps the cluster alive between runs (useful locally) - `--kind-cluster-name` overrides the default (`saluki-correctness`) - Flush wait changed from 32s to 30s (`FLUSH_WAIT: Duration`) **CI (`.gitlab/correctness-mixins.yml`, `generate-correctness-pipeline.sh`):** - `.test-correctness-kind-definition` extends `.test-correctness-definition` with a longer timeout — all cluster/image management is inside panoramic - Pipeline generator emits kind jobs via the kind mixin; mixin selection driven by `test.runtime()` from the `Test` trait **kind pre-installed in `SALUKI_BUILD_CI_IMAGE`:** - `.ci/install-kind.sh` installs kind in the same `RUN` layer as `install-docker-cli.sh`, with per-arch SHA256 checksums hardcoded in the script **Explicitness:** - `Runtime` enum has no default; all correctness configs have an explicit `runtime:` field (including `dsd-tag-filterlist` from PR 1552) **Local dev:** - `cargo run --profile release --package panoramic -- run -d test/correctness -t dsd-plain-kind --no-tui --no-delete-kind-cluster` - `make clean-kind` / `make clean-correctness` **Dependencies:** kube 0.93 (client + rustls-tls + ws), k8s-openapi 0.22, tokio-util compat; RUSTSEC-2025-0134 ignored (rustls-pemfile, transitive via kube) ## Test plan - [x] `dsd-plain-kind` passes locally - [x] Running `-t dsd-plain` (Docker test) does not trigger kind cluster setup - [x] Docker and kind tests run concurrently — Docker tests start immediately, kind tests wait without holding concurrency slots - [x] Ctrl-C during cluster setup doesn't deadlock kind test futures - [x] Kind setup progress visible in both TUI and logging modes via `TestEvent::StatusLine` - [x] Container logs are plain text without ANSI codes (both runtimes) - [x] All correctness tests discovered with explicit runtime fields (including from latest main) - [x] Branch rebased cleanly onto main post-PR-1552 - [ ] `SALUKI_BUILD_CI_IMAGE` rebuilt with kind (trigger `generate-build-ci-image` via Run Pipeline on this branch) - [ ] CI pipeline generates `test-correctness-dsd-plain-kind` job - [ ] Existing Docker correctness tests unaffected 🤖 Generated with [Claude Code](https://claude.ai/claude-code) Co-authored-by: travis.thieman <travis.thieman@datadoghq.com>
## Human Summary Adds the ability to run `kind` (kubernetes-in-docker) correctness tests. The actual goal here is to add multiple tests covering origin detection which has a lot of Kubernetes-specific behavior we'd like to cover. This PR does not add origin detection tests yet, but it does add a `dsd-plain-kind` test which is just the existing `dsd-plain` correctness test but modified to run under kind. This test will be deleted when we add the origin detection tests. Example run including the kind test is provided immediately below. You'll need to [install kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installing-from-release-binaries) first: `cargo run --profile release --package panoramic -- run -d test/correctness -d test/integration/cases -t basic-startup,dsd-plain,dsd-plain-kind` ## Summary - Adds a `runtime: kubernetes_in_docker` option to correctness test configs that runs test groups as multi-container Kubernetes pods inside a kind cluster, unlocking origin detection testing scenarios that are impossible in the plain Docker framework (real pod UIDs, containerd container IDs, K8s labels, External Data injection via `DD_EXTERNAL_ENV`) - Introduces `dsd-plain-kind` as the initial kind-based test — verifies the existing dsd-plain workload passes end-to-end through the kind path as a baseline before origin-detection-specific tests are layered on top - CI integration follows the same dynamic pipeline approach as existing Docker tests: the pipeline generator emits kind jobs automatically when `runtime: kubernetes_in_docker` is detected, using a new `.test-correctness-kind-definition` mixin that extends the existing Docker mixin with a longer timeout - `runtime` is now a required field in all correctness configs (no default); all existing tests explicitly declare their runtime - Rebased onto main after PR #1552 (dynamic dispatch with Test trait and Runner) and latest main (May 7) ## What changed **New runtime path (`k8s.rs`):** - Integrates with PR 1552's `Test` trait architecture — `Config::run(tctx: TestContext)` dispatches to the kind path when `runtime: kubernetes_in_docker` - Each test group (baseline + comparison) runs as a multi-container pod in its own namespace, labelled `created-by=panoramic-kind` for orphan cleanup - `datadog-intake`, `target` (agent), and `millstone` share a pod with an emptyDir at `/airlock` for the UDS socket - Config files injected via ConfigMap with `subPath` mounts so the agent's `/etc/datadog-agent/` remains writable (needed for `auth_token` creation) - Millstone wrapped in a socket-wait shell command so it doesn't send before the agent is ready - After the pod reaches Running, background tasks stream each container's logs to `<log-dir>/<baseline|comparison>/<container>.log`; ANSI escape codes stripped via `crate::utils::strip_ansi_codes` (no duplicate implementations) - Data collected via kube-rs port-forward to datadog-intake port 2049; forward cancelled via `CancellationToken` after collection - `wait_for_millstone_exit` bounded by `MILLSTONE_EXIT_TIMEOUT` (300s) - Both baseline and comparison errors reported when both groups fail simultaneously - Malformed env vars (missing `=`) in target config emit `warn!` instead of being silently dropped **ANSI stripping (both runtimes):** - `airlock/src/driver.rs` strips ANSI codes from Docker container log output - `k8s.rs` and `kind.rs` share `strip_ansi_codes` from `crate::utils`; `airlock` keeps its own copy as a separate crate - `kind create cluster` output is captured and emitted through tracing at debug level; raw emoji/ANSI output suppressed **Kind cluster lifecycle (`kind.rs`, `main.rs`, `runner.rs`, `test.rs`):** - Kind cluster setup only runs when the selected test set includes at least one `kubernetes_in_docker` test; running `-t dsd-plain` never touches kind - Kind cluster setup runs as a background task — Docker-runtime tests start immediately without waiting - Kind tests wait for the cluster-ready signal **before acquiring a concurrency slot**, gated on `test.runtime() == "kubernetes_in_docker"` so Docker tests are completely unaffected; each task holds its own cloned `watch::Receiver` with independent "last seen" state — no mutex needed - The wait loop uses `tokio::select!` on the cancellation token so Ctrl-C during cluster setup doesn't deadlock kind test futures; both `run_parallel` and `run_fail_fast` perform this wait - `check_kind_installed` verifies the exit code in addition to binary presence - A warning is emitted when kind setup fails after cluster creation, alerting users to a potentially dangling cluster - Kind setup emits `TestEvent::StatusLine` messages visible in both TUI and logging modes - panoramic manages the full kind cluster lifecycle: creates if absent (reuses if present), pulls images only if not already in the local daemon (pull failure is fatal), loads images in parallel, deletes cluster after tests - `--no-delete-kind-cluster` keeps the cluster alive between runs (useful locally) - `--kind-cluster-name` overrides the default (`saluki-correctness`) - Flush wait changed from 32s to 30s (`FLUSH_WAIT: Duration`) **CI (`.gitlab/correctness-mixins.yml`, `generate-correctness-pipeline.sh`):** - `.test-correctness-kind-definition` extends `.test-correctness-definition` with a longer timeout — all cluster/image management is inside panoramic - Pipeline generator emits kind jobs via the kind mixin; mixin selection driven by `test.runtime()` from the `Test` trait **kind pre-installed in `SALUKI_BUILD_CI_IMAGE`:** - `.ci/install-kind.sh` installs kind in the same `RUN` layer as `install-docker-cli.sh`, with per-arch SHA256 checksums hardcoded in the script **Explicitness:** - `Runtime` enum has no default; all correctness configs have an explicit `runtime:` field (including `dsd-tag-filterlist` from PR 1552) **Local dev:** - `cargo run --profile release --package panoramic -- run -d test/correctness -t dsd-plain-kind --no-tui --no-delete-kind-cluster` - `make clean-kind` / `make clean-correctness` **Dependencies:** kube 0.93 (client + rustls-tls + ws), k8s-openapi 0.22, tokio-util compat; RUSTSEC-2025-0134 ignored (rustls-pemfile, transitive via kube) ## Test plan - [x] `dsd-plain-kind` passes locally - [x] Running `-t dsd-plain` (Docker test) does not trigger kind cluster setup - [x] Docker and kind tests run concurrently — Docker tests start immediately, kind tests wait without holding concurrency slots - [x] Ctrl-C during cluster setup doesn't deadlock kind test futures - [x] Kind setup progress visible in both TUI and logging modes via `TestEvent::StatusLine` - [x] Container logs are plain text without ANSI codes (both runtimes) - [x] All correctness tests discovered with explicit runtime fields (including from latest main) - [x] Branch rebased cleanly onto main post-PR-1552 - [ ] `SALUKI_BUILD_CI_IMAGE` rebuilt with kind (trigger `generate-build-ci-image` via Run Pipeline on this branch) - [ ] CI pipeline generates `test-correctness-dsd-plain-kind` job - [ ] Existing Docker correctness tests unaffected 🤖 Generated with [Claude Code](https://claude.ai/claude-code) Co-authored-by: travis.thieman <travis.thieman@datadoghq.com> d04bde8
Summary
Test execution was organized around two known test types. This PR introduces a
Testtrait and centralRunnerso that new types can be added without modifying the dispatch or scheduling code.This PR replaces the enum dispatch with a
Testtrait and aRunnerstruct that owns scheduling, timeout, and cancellation uniformly. Discovery returnsVec<Box<dyn Test>>. Each test receives aTestContextat execution time carrying a cancellation token and log directory. TheRunnercreates a per-test cancel token, enforces the timeout with a grace period, and propagates program-level cancellation to running tests.TestResultno longer carries filesystem paths - theRunnercomputes log directories and passes them through events. This is a minor change, butTestResultcarriedlog_dirto a lot of places where it wasn't needed.The
--no-logsflag was removed. I didn't see it being used anywhere in the codebase and couldn't think of a great usecase. Meanwhile it was a bit unwieldy to wire it through to where it needed to be.These changes make it straightforward to add new test types or create test cases dynamically. They just need to implement the trait, be added to the runner, and they will run and report like any other test.
Testtrait withrun(tctx)signatureRunnerstruct owns scheduling and timeoutTestContextpassed torun()CancellationTokencreated by Runnerlog_dirremoved fromTestResult--no-logsCLI flag removedTestRunner->IntegrationRunnerRunnerschedulerTestRunner->CorrectnessRunnerRunnerschedulerDiscoveredTestenum deletedRunArgsbuilder for runner configurationSummary
Change Type
How did you test this PR?
Using:
References
Related, precursor to #1536