Skip to content

chore(panoramic): dynamic dispatch with test trait and runner#1552

Merged
webern merged 3 commits into
mainfrom
matt.briggs/panoramic-refactor
May 4, 2026
Merged

chore(panoramic): dynamic dispatch with test trait and runner#1552
webern merged 3 commits into
mainfrom
matt.briggs/panoramic-refactor

Conversation

@webern
Copy link
Copy Markdown
Contributor

@webern webern commented May 1, 2026

Summary

Test execution was organized around two known test types. This PR introduces a Test trait and central Runner so that new types can be added without modifying the dispatch or scheduling code.

This PR replaces the enum dispatch with a Test trait and a Runner struct that owns scheduling, timeout, and cancellation uniformly. Discovery returns Vec<Box<dyn Test>>. Each test receives a TestContext at execution time carrying a cancellation token and log directory. The Runner creates a per-test cancel token, enforces the timeout with a grace period, and propagates program-level cancellation to running tests.

TestResult no longer carries filesystem paths - the Runner computes log directories and passes them through events. This is a minor change, but TestResult carried log_dir to a lot of places where it wasn't needed.

The --no-logs flag was removed. I didn't see it being used anywhere in the codebase and couldn't think of a great usecase. Meanwhile it was a bit unwieldy to wire it through to where it needed to be.

These changes make it straightforward to add new test types or create test cases dynamically. They just need to implement the trait, be added to the runner, and they will run and report like any other test.

Change Reason / Benefit
Test trait with run(tctx) signature New test types implement one trait, no enum arms
Runner struct owns scheduling and timeout Uniform timeout/cancel for all test types
TestContext passed to run() No stored runtime state
Per-test CancellationToken created by Runner Runner controls shutdown; tests respond cooperatively
Global cancel propagates to running tests Users can cancel via TUI or ctrl-c
log_dir removed from TestResult TestResult is a bit more pure without it
--no-logs CLI flag removed I can put it back if needed, but it was a little wonky
TestRunner -> IntegrationRunner Disambiguates from the generic Runner scheduler
TestRunner -> CorrectnessRunner Disambiguates from the generic Runner scheduler
DiscoveredTest enum deleted Replaced by trait objects; removes match arms
RunArgs builder for runner configuration Reduces parameters for the run_tests function call

Summary

Change Type

  • Non-functional (chore, refactoring, docs)

How did you test this PR?

  • CI
  • Run locally on main
  • Run locally on branch (check for differences)

Using:

docker ps -q --filter name=airlock- | xargs docker rm -f
docker network prune -f

make \
  build-datadog-intake-image \
  build-millstone-image \
  build-datadog-agent-image-release \
  build-datadog-agent-image \
  build-adp-image && \
cargo run --release --bin panoramic -- \
  run -d test/integration/cases \
      -d test/correctness \
      --no-tui

References

Related, precursor to #1536

@webern webern requested a review from a team as a code owner May 1, 2026 16:39
@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 1, 2026

Binary Size Analysis (Agent Data Plane)

Target: 507d913 (baseline) vs 2765cb1 (comparison) diff
Analysis Type: Stripped binaries (debug symbols excluded)
Baseline Size: 36.78 MiB
Comparison Size: 36.78 MiB
Size Change: -8 B (-0.00%)
Pass/Fail Threshold: +5%
Result: PASSED ✅

Changes by Module

Module File Size Symbols
anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.17550754633244885205 -130 B 1
anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.2417440905338573713 +129 B 1
anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.17550754633244885205 -115 B 1
anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.2417440905338573713 +114 B 1
anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.17550754633244885205 -109 B 1
anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.2417440905338573713 +108 B 1
anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.17550754633244885205 -97 B 1
anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.2417440905338573713 +96 B 1
anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.17550754633244885205 -95 B 1
anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.2417440905338573713 +94 B 1
[Unmapped] -3 B 1

Detailed Symbol Changes

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [NEW]    +129  [NEW]     +40    anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.2417440905338573713
  [NEW]    +114  [NEW]     +25    anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.2417440905338573713
  [NEW]    +108  [NEW]     +19    anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.2417440905338573713
  [NEW]     +96  [NEW]      +7    anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.2417440905338573713
  [NEW]     +94  [NEW]      +5    anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.2417440905338573713
  -0.0%      -3  [ = ]       0    [Unmapped]
  [DEL]     -95  [DEL]      -5    anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.17550754633244885205
  [DEL]     -97  [DEL]      -7    anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.17550754633244885205
  [DEL]    -109  [DEL]     -19    anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.17550754633244885205
  [DEL]    -115  [DEL]     -25    anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.17550754633244885205
  [DEL]    -130  [DEL]     -40    anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.17550754633244885205
  -0.0%      -8  [ = ]       0    TOTAL

Comment thread bin/correctness/panoramic/src/correctness/runner.rs
@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 1, 2026

Regression Detector (Agent Data Plane)

Regression Detector Results

Run ID: efbe8889-3221-4d3a-bcec-83658a0074ac

Baseline: 507d913
Comparison: 2765cb1
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf experiment goal Δ mean % Δ mean % CI trials links
otlp_ingest_logs_5mb_memory memory utilization +14.68 [+14.31, +15.05] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_cpu % cpu utilization +2.81 [-1.65, +7.27] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_throughput ingress throughput +0.03 [-0.09, +0.16] 1 (metrics) (profiles) (logs)

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI trials links
otlp_ingest_logs_5mb_memory memory utilization +14.68 [+14.31, +15.05] 1 (metrics) (profiles) (logs)
otlp_ingest_metrics_5mb_memory memory utilization +4.08 [+3.90, +4.26] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_cpu % cpu utilization +2.81 [-1.65, +7.27] 1 (metrics) (profiles) (logs)
otlp_ingest_metrics_5mb_cpu % cpu utilization +1.70 [-4.32, +7.73] 1 (metrics) (profiles) (logs)
dsd_uds_100mb_3k_contexts_cpu % cpu utilization +1.27 [-4.07, +6.60] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_5mb_throughput ingress throughput +0.56 [+0.48, +0.64] 1 (metrics) (profiles) (logs)
dsd_uds_500mb_3k_contexts_throughput ingress throughput +0.48 [+0.36, +0.60] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_transform_5mb_throughput ingress throughput +0.36 [+0.28, +0.43] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_low memory utilization +0.28 [+0.12, +0.44] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_5mb_cpu % cpu utilization +0.24 [-1.86, +2.34] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_5mb_memory memory utilization +0.16 [+0.00, +0.32] 1 (metrics) (profiles) (logs)
quality_gates_rss_idle memory utilization +0.10 [+0.07, +0.14] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_heavy memory utilization +0.08 [-0.04, +0.20] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_throughput ingress throughput +0.03 [-0.09, +0.16] 1 (metrics) (profiles) (logs)
dsd_uds_1mb_3k_contexts_throughput ingress throughput +0.00 [-0.06, +0.06] 1 (metrics) (profiles) (logs)
dsd_uds_512kb_3k_contexts_throughput ingress throughput +0.00 [-0.05, +0.06] 1 (metrics) (profiles) (logs)
dsd_uds_10mb_3k_contexts_throughput ingress throughput +0.00 [-0.17, +0.17] 1 (metrics) (profiles) (logs)
otlp_ingest_metrics_5mb_throughput ingress throughput -0.00 [-0.16, +0.15] 1 (metrics) (profiles) (logs)
dsd_uds_100mb_3k_contexts_memory memory utilization -0.01 [-0.16, +0.14] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_filtering_5mb_memory memory utilization -0.02 [-0.26, +0.22] 1 (metrics) (profiles) (logs)
dsd_uds_10mb_3k_contexts_memory memory utilization -0.02 [-0.17, +0.14] 1 (metrics) (profiles) (logs)
dsd_uds_100mb_3k_contexts_throughput ingress throughput -0.02 [-0.05, +0.01] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_transform_5mb_cpu % cpu utilization -0.05 [-2.10, +1.99] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_ultraheavy memory utilization -0.06 [-0.19, +0.06] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_filtering_5mb_throughput ingress throughput -0.16 [-0.24, -0.09] 1 (metrics) (profiles) (logs)
dsd_uds_1mb_3k_contexts_memory memory utilization -0.19 [-0.33, -0.05] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_transform_5mb_memory memory utilization -0.25 [-0.41, -0.10] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_medium memory utilization -0.27 [-0.44, -0.10] 1 (metrics) (profiles) (logs)
dsd_uds_512kb_3k_contexts_memory memory utilization -0.37 [-0.52, -0.23] 1 (metrics) (profiles) (logs)
dsd_uds_500mb_3k_contexts_memory memory utilization -0.54 [-0.68, -0.39] 1 (metrics) (profiles) (logs)
dsd_uds_500mb_3k_contexts_cpu % cpu utilization -0.56 [-1.99, +0.87] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_filtering_5mb_cpu % cpu utilization -1.19 [-3.38, +0.99] 1 (metrics) (profiles) (logs)
dsd_uds_512kb_3k_contexts_cpu % cpu utilization -2.19 [-58.64, +54.27] 1 (metrics) (profiles) (logs)
dsd_uds_10mb_3k_contexts_cpu % cpu utilization -3.40 [-32.70, +25.91] 1 (metrics) (profiles) (logs)
dsd_uds_1mb_3k_contexts_cpu % cpu utilization -3.64 [-56.11, +48.82] 1 (metrics) (profiles) (logs)

Bounds Checks: ✅ Passed

perf experiment bounds_check_name replicates_passed observed_value links
quality_gates_rss_dsd_heavy memory_usage 10/10 121.82MiB ≤ 140MiB (metrics) (profiles) (logs)
quality_gates_rss_dsd_low memory_usage 10/10 40.11MiB ≤ 50MiB (metrics) (profiles) (logs)
quality_gates_rss_dsd_medium memory_usage 10/10 61.98MiB ≤ 75MiB (metrics) (profiles) (logs)
quality_gates_rss_dsd_ultraheavy memory_usage 10/10 177.18MiB ≤ 200MiB (metrics) (profiles) (logs)
quality_gates_rss_idle memory_usage 10/10 27.38MiB ≤ 40MiB (metrics) (profiles) (logs)

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

@webern webern force-pushed the matt.briggs/panoramic-refactor branch from 3279c37 to f7dc1cd Compare May 1, 2026 16:57
run_with_tui_consumer(rx, cancel_all, Some(log_dir), runner_handle).await
} else {
run_with_logging_consumer(rx, &cmd, log_dir, runner_handle).await
run_with_logging_consumer(rx, &cmd, Some(log_dir), runner_handle).await
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cancel doesn't go into this path?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I ran out of time trying to get this done, but I don't think we already had a signal capture for ctrl-c in the program and I... ran out of time. I think we just need to trap ctrl-c and wire it up.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this. It was a tiny thing, I was just feeling rushed on Friday to get the PR open.

Comment thread bin/correctness/panoramic/src/runner.rs Outdated
pub(crate) type EventSender = mpsc::UnboundedSender<TestEvent>;

/// The amount of time a test has to clean up after cancellation or timing out.
const GRACE_TIME: Duration = Duration::from_secs(5);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will likely be pretty tight for kind tests that tear down the cluster after themselves, maybe give it 30s just to be accommodating?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a mistake, I thought I had 30 there. Oops, will change

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

/// Handles test filtering, parallelism, timeout enforcement, cancellation propagation,
/// log directory creation, and event dispatch. Individual test logic lives in the `Test`
/// implementations.
pub(crate) struct Runner {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this also hold options passed from the CLI? An example I was thinking of adding was a --no-delete-kind-cluster flag that would need to eventually get passed down to the correctness test

Copy link
Copy Markdown
Contributor Author

@webern webern May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea for that is test.rs:

    /// A directory from which files should be mounted into one or more of the domain-specific containers used in this
    /// test.
    // TODO: this is a hack introduced to support the PANORAMIC_DYNAMIC feature. Consider generalizing if needed.
    // For example: this could become runtime_config: HashMap<String, String> for shuttling domain specific items from
    // runtime to a test.
    mounts_dir: PathBuf,

Instead of mounts_dir there we could have runtime_config: HashMap<String, String>

And you could pass stuff that way.

But... I'm not sure --no-delete-kind-cluster at the CLI sounds great.

What I actually think is that there should probably be a whole new Test type for Kind cluster tests.

It bears thinking through design tradeoffs.

The issue with something like --no-delete-kind-cluster at the CLI is that now we are pushing extremely specific stuff into the CLI whereas panoramic might be capable of running tests that don't use containers or can't run on kind. So I think we should consider how we design things.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But... I'm not sure --no-delete-kind-cluster at the CLI sounds great.

It's not really a property of the test itself, it's a runtime option for how you'd like to actually execute the tests. In this case the idea would be "I'm futzing with this test locally and will need to run it a bunch, so let's skip starting and killing a full kind cluster each time." The CLI seems like the natural place for those, I shouldn't need to futz with YAML for a temporary change I don't want to commit.

What I actually think is that there should probably be a whole new Test type for Kind cluster tests.

In my current PR the config for it is limited to a single runtime field on the correctness test def, if it gets more complex than that could certainly consider splitting. But if they're 99% equivalent, seems like they can share.

Copy link
Copy Markdown
Contributor Author

@webern webern May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think of that in terms of test lifecycle being something like this:

  • test resource set-up
  • test execution
  • test resource teardown

And what you're asking panoramic to do is --skip-teardown which could be applicable to other test types as well.

Edit: in fact, --skip-teardown can be important sometimes if you need to get in there and see what went wrong.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping the kind cluster teardown specifically is more granular than that, skipping teardown entirely would also skip cleaning up any of the test-specific namespaces, pods, containers, etc. Not saying we shouldn't support --skip-teardown, we should, but I think a general mechanism for passing CLI flags down to Test implementations will be necessary in either case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my ideal, when it comes to kubernetes, would be to create the cluster and tear it down without necessarily even caring about namespaces, etc. I can't remember if kind is one of those tools that needs a cluster to create your cluster... but in general, it's much cleaner if you can create and discard clusters.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the default, but if you're intending to run the test repeatedly on your Mac then that's going to add ~2 minutes to each test run that you'd probably rather not add. It's purely a convenience option.

webern added 3 commits May 4, 2026 11:58
Test execution was organized around two known test types. This PR introduces a `Test` trait and
central `Runner` so that new types can be added without modifying the dispatch or scheduling code.

This PR replaces the enum dispatch with a `Test` trait and a `Runner` struct that owns scheduling,
timeout, and cancellation uniformly. Discovery returns `Vec<Box<dyn Test>>`. Each test receives a
`TestContext` at execution time carrying a cancellation token and log directory. The `Runner`
creates a per-test cancel token, enforces the timeout with a grace period, and propagates
program-level cancellation to running tests.

`TestResult` no longer carries filesystem paths - the `Runner` computes log directories and passes
them through events. This is a minor change, but `TestResult` carried `log_dir` to a lot of places
where it wasn't needed.

The `--no-logs` flag was removed. I didn't see it being used anywhere in the codebase and couldn't
think of a great usecase. Meanwhile it was a bit unwieldy to wire it through to where it needed to
be.

These changes make it straightforward to add new test types or create test cases dynamically. They
just need to implement the trait, be added to the runner, and they will run and report like any
other test.

| Change                                         | Reason / Benefit                                       |
|------------------------------------------------|--------------------------------------------------------|
| `Test` trait with `run(tctx)` signature        | New test types implement one trait, no enum arms       |
| `Runner` struct owns scheduling and timeout    | Uniform timeout/cancel for all test types              |
| `TestContext` passed to `run()`                | No stored runtime state                                |
| Per-test `CancellationToken` created by Runner | Runner controls shutdown; tests respond cooperatively  |
| Global cancel propagates to running tests      | Users can cancel via TUI or ctrl-c                     |
| `log_dir` removed from `TestResult`            | TestResult is a bit more pure without it               |
| `--no-logs` CLI flag removed                   | I can put it back if needed, but it was a little wonky |
| `TestRunner` -> `IntegrationRunner`            | Disambiguates from the generic `Runner` scheduler      |
| `TestRunner` -> `CorrectnessRunner`            | Disambiguates from the generic `Runner` scheduler      |
| `DiscoveredTest` enum deleted                  | Replaced by trait objects; removes match arms          |
| `RunArgs` builder for runner configuration     | Reduces parameters for the run_tests function call     |
@webern webern force-pushed the matt.briggs/panoramic-refactor branch from 448363f to 2765cb1 Compare May 4, 2026 09:58
@webern webern merged commit eb250ff into main May 4, 2026
73 checks passed
@webern webern deleted the matt.briggs/panoramic-refactor branch May 4, 2026 15:00
dd-octo-sts Bot pushed a commit that referenced this pull request May 4, 2026
## Summary

Test execution was organized around two known test types. This PR
introduces a `Test` trait and central `Runner` so that new types can be
added without modifying the dispatch or scheduling code.

This PR replaces the enum dispatch with a `Test` trait and a `Runner`
struct that owns scheduling, timeout, and cancellation uniformly.
Discovery returns `Vec<Box<dyn Test>>`. Each test receives a
`TestContext` at execution time carrying a cancellation token and log
directory. The `Runner` creates a per-test cancel token, enforces the
timeout with a grace period, and propagates program-level cancellation
to running tests.

`TestResult` no longer carries filesystem paths - the `Runner` computes
log directories and passes them through events. This is a minor change,
but `TestResult` carried `log_dir` to a lot of places where it wasn't
needed.

The `--no-logs` flag was removed. I didn't see it being used anywhere in
the codebase and couldn't think of a great usecase. Meanwhile it was a
bit unwieldy to wire it through to where it needed to be.

These changes make it straightforward to add new test types or create
test cases dynamically. They just need to implement the trait, be added
to the runner, and they will run and report like any other test.

| Change | Reason / Benefit |

|------------------------------------------------|--------------------------------------------------------|
| `Test` trait with `run(tctx)` signature | New test types implement one
trait, no enum arms |
| `Runner` struct owns scheduling and timeout | Uniform timeout/cancel
for all test types |
| `TestContext` passed to `run()` | No stored runtime state |
| Per-test `CancellationToken` created by Runner | Runner controls
shutdown; tests respond cooperatively |
| Global cancel propagates to running tests | Users can cancel via TUI
or ctrl-c |
| `log_dir` removed from `TestResult` | TestResult is a bit more pure
without it |
| `--no-logs` CLI flag removed | I can put it back if needed, but it was
a little wonky |
| `TestRunner` -> `IntegrationRunner` | Disambiguates from the generic
`Runner` scheduler |
| `TestRunner` -> `CorrectnessRunner` | Disambiguates from the generic
`Runner` scheduler |
| `DiscoveredTest` enum deleted | Replaced by trait objects; removes
match arms |
| `RunArgs` builder for runner configuration | Reduces parameters for
the run_tests function call |

## Summary
<!-- Please provide a brief summary about what this PR does.
This should help the reviewers give feedback faster and with higher
quality. -->

## Change Type
- [x] Non-functional (chore, refactoring, docs)

## How did you test this PR?

- [x] CI
- [ ] Run locally on main
- [ ] Run locally on branch (check for differences)

Using:

```bash
docker ps -q --filter name=airlock- | xargs docker rm -f
docker network prune -f

make \
  build-datadog-intake-image \
  build-millstone-image \
  build-datadog-agent-image-release \
  build-datadog-agent-image \
  build-adp-image && \
cargo run --release --bin panoramic -- \
  run -d test/integration/cases \
      -d test/correctness \
      --no-tui
```

## References

Related, precursor to #1536 eb250ff
gh-worker-dd-mergequeue-cf854d Bot pushed a commit that referenced this pull request May 7, 2026
## Human Summary

Adds the ability to run `kind` (kubernetes-in-docker) correctness tests. The actual goal here is to add multiple tests covering origin detection which has a lot of Kubernetes-specific behavior we'd like to cover. This PR does not add origin detection tests yet, but it does add a `dsd-plain-kind` test which is just the existing `dsd-plain` correctness test but modified to run under kind. This test will be deleted when we add the origin detection tests.

Example run including the kind test is provided immediately below. You'll need to [install kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installing-from-release-binaries) first:

`cargo run --profile release --package panoramic -- run -d test/correctness -d test/integration/cases -t basic-startup,dsd-plain,dsd-plain-kind`

## Summary

- Adds a `runtime: kubernetes_in_docker` option to correctness test configs that runs test groups as multi-container Kubernetes pods inside a kind cluster, unlocking origin detection testing scenarios that are impossible in the plain Docker framework (real pod UIDs, containerd container IDs, K8s labels, External Data injection via `DD_EXTERNAL_ENV`)
- Introduces `dsd-plain-kind` as the initial kind-based test — verifies the existing dsd-plain workload passes end-to-end through the kind path as a baseline before origin-detection-specific tests are layered on top
- CI integration follows the same dynamic pipeline approach as existing Docker tests: the pipeline generator emits kind jobs automatically when `runtime: kubernetes_in_docker` is detected, using a new `.test-correctness-kind-definition` mixin that extends the existing Docker mixin with a longer timeout
- `runtime` is now a required field in all correctness configs (no default); all existing tests explicitly declare their runtime
- Rebased onto main after PR #1552 (dynamic dispatch with Test trait and Runner) and latest main (May 7)

## What changed

**New runtime path (`k8s.rs`):**
- Integrates with PR 1552's `Test` trait architecture — `Config::run(tctx: TestContext)` dispatches to the kind path when `runtime: kubernetes_in_docker`
- Each test group (baseline + comparison) runs as a multi-container pod in its own namespace, labelled `created-by=panoramic-kind` for orphan cleanup
- `datadog-intake`, `target` (agent), and `millstone` share a pod with an emptyDir at `/airlock` for the UDS socket
- Config files injected via ConfigMap with `subPath` mounts so the agent's `/etc/datadog-agent/` remains writable (needed for `auth_token` creation)
- Millstone wrapped in a socket-wait shell command so it doesn't send before the agent is ready
- After the pod reaches Running, background tasks stream each container's logs to `<log-dir>/<baseline|comparison>/<container>.log`; ANSI escape codes stripped via `crate::utils::strip_ansi_codes` (no duplicate implementations)
- Data collected via kube-rs port-forward to datadog-intake port 2049; forward cancelled via `CancellationToken` after collection
- `wait_for_millstone_exit` bounded by `MILLSTONE_EXIT_TIMEOUT` (300s)
- Both baseline and comparison errors reported when both groups fail simultaneously
- Malformed env vars (missing `=`) in target config emit `warn!` instead of being silently dropped

**ANSI stripping (both runtimes):**
- `airlock/src/driver.rs` strips ANSI codes from Docker container log output
- `k8s.rs` and `kind.rs` share `strip_ansi_codes` from `crate::utils`; `airlock` keeps its own copy as a separate crate
- `kind create cluster` output is captured and emitted through tracing at debug level; raw emoji/ANSI output suppressed

**Kind cluster lifecycle (`kind.rs`, `main.rs`, `runner.rs`, `test.rs`):**
- Kind cluster setup only runs when the selected test set includes at least one `kubernetes_in_docker` test; running `-t dsd-plain` never touches kind
- Kind cluster setup runs as a background task — Docker-runtime tests start immediately without waiting
- Kind tests wait for the cluster-ready signal **before acquiring a concurrency slot**, gated on `test.runtime() == "kubernetes_in_docker"` so Docker tests are completely unaffected; each task holds its own cloned `watch::Receiver` with independent "last seen" state — no mutex needed
- The wait loop uses `tokio::select!` on the cancellation token so Ctrl-C during cluster setup doesn't deadlock kind test futures; both `run_parallel` and `run_fail_fast` perform this wait
- `check_kind_installed` verifies the exit code in addition to binary presence
- A warning is emitted when kind setup fails after cluster creation, alerting users to a potentially dangling cluster
- Kind setup emits `TestEvent::StatusLine` messages visible in both TUI and logging modes
- panoramic manages the full kind cluster lifecycle: creates if absent (reuses if present), pulls images only if not already in the local daemon (pull failure is fatal), loads images in parallel, deletes cluster after tests
- `--no-delete-kind-cluster` keeps the cluster alive between runs (useful locally)
- `--kind-cluster-name` overrides the default (`saluki-correctness`)
- Flush wait changed from 32s to 30s (`FLUSH_WAIT: Duration`)

**CI (`.gitlab/correctness-mixins.yml`, `generate-correctness-pipeline.sh`):**
- `.test-correctness-kind-definition` extends `.test-correctness-definition` with a longer timeout — all cluster/image management is inside panoramic
- Pipeline generator emits kind jobs via the kind mixin; mixin selection driven by `test.runtime()` from the `Test` trait

**kind pre-installed in `SALUKI_BUILD_CI_IMAGE`:**
- `.ci/install-kind.sh` installs kind in the same `RUN` layer as `install-docker-cli.sh`, with per-arch SHA256 checksums hardcoded in the script

**Explicitness:**
- `Runtime` enum has no default; all correctness configs have an explicit `runtime:` field (including `dsd-tag-filterlist` from PR 1552)

**Local dev:**
- `cargo run --profile release --package panoramic -- run -d test/correctness -t dsd-plain-kind --no-tui --no-delete-kind-cluster`
- `make clean-kind` / `make clean-correctness`

**Dependencies:** kube 0.93 (client + rustls-tls + ws), k8s-openapi 0.22, tokio-util compat; RUSTSEC-2025-0134 ignored (rustls-pemfile, transitive via kube)

## Test plan

- [x] `dsd-plain-kind` passes locally
- [x] Running `-t dsd-plain` (Docker test) does not trigger kind cluster setup
- [x] Docker and kind tests run concurrently — Docker tests start immediately, kind tests wait without holding concurrency slots
- [x] Ctrl-C during cluster setup doesn't deadlock kind test futures
- [x] Kind setup progress visible in both TUI and logging modes via `TestEvent::StatusLine`
- [x] Container logs are plain text without ANSI codes (both runtimes)
- [x] All correctness tests discovered with explicit runtime fields (including  from latest main)
- [x] Branch rebased cleanly onto main post-PR-1552
- [ ] `SALUKI_BUILD_CI_IMAGE` rebuilt with kind (trigger `generate-build-ci-image` via Run Pipeline on this branch)
- [ ] CI pipeline generates `test-correctness-dsd-plain-kind` job
- [ ] Existing Docker correctness tests unaffected

🤖 Generated with [Claude Code](https://claude.ai/claude-code)


Co-authored-by: travis.thieman <travis.thieman@datadoghq.com>
dd-octo-sts Bot pushed a commit that referenced this pull request May 7, 2026
## Human Summary

Adds the ability to run `kind` (kubernetes-in-docker) correctness tests. The actual goal here is to add multiple tests covering origin detection which has a lot of Kubernetes-specific behavior we'd like to cover. This PR does not add origin detection tests yet, but it does add a `dsd-plain-kind` test which is just the existing `dsd-plain` correctness test but modified to run under kind. This test will be deleted when we add the origin detection tests.

Example run including the kind test is provided immediately below. You'll need to [install kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installing-from-release-binaries) first:

`cargo run --profile release --package panoramic -- run -d test/correctness -d test/integration/cases -t basic-startup,dsd-plain,dsd-plain-kind`

## Summary

- Adds a `runtime: kubernetes_in_docker` option to correctness test configs that runs test groups as multi-container Kubernetes pods inside a kind cluster, unlocking origin detection testing scenarios that are impossible in the plain Docker framework (real pod UIDs, containerd container IDs, K8s labels, External Data injection via `DD_EXTERNAL_ENV`)
- Introduces `dsd-plain-kind` as the initial kind-based test — verifies the existing dsd-plain workload passes end-to-end through the kind path as a baseline before origin-detection-specific tests are layered on top
- CI integration follows the same dynamic pipeline approach as existing Docker tests: the pipeline generator emits kind jobs automatically when `runtime: kubernetes_in_docker` is detected, using a new `.test-correctness-kind-definition` mixin that extends the existing Docker mixin with a longer timeout
- `runtime` is now a required field in all correctness configs (no default); all existing tests explicitly declare their runtime
- Rebased onto main after PR #1552 (dynamic dispatch with Test trait and Runner) and latest main (May 7)

## What changed

**New runtime path (`k8s.rs`):**
- Integrates with PR 1552's `Test` trait architecture — `Config::run(tctx: TestContext)` dispatches to the kind path when `runtime: kubernetes_in_docker`
- Each test group (baseline + comparison) runs as a multi-container pod in its own namespace, labelled `created-by=panoramic-kind` for orphan cleanup
- `datadog-intake`, `target` (agent), and `millstone` share a pod with an emptyDir at `/airlock` for the UDS socket
- Config files injected via ConfigMap with `subPath` mounts so the agent's `/etc/datadog-agent/` remains writable (needed for `auth_token` creation)
- Millstone wrapped in a socket-wait shell command so it doesn't send before the agent is ready
- After the pod reaches Running, background tasks stream each container's logs to `<log-dir>/<baseline|comparison>/<container>.log`; ANSI escape codes stripped via `crate::utils::strip_ansi_codes` (no duplicate implementations)
- Data collected via kube-rs port-forward to datadog-intake port 2049; forward cancelled via `CancellationToken` after collection
- `wait_for_millstone_exit` bounded by `MILLSTONE_EXIT_TIMEOUT` (300s)
- Both baseline and comparison errors reported when both groups fail simultaneously
- Malformed env vars (missing `=`) in target config emit `warn!` instead of being silently dropped

**ANSI stripping (both runtimes):**
- `airlock/src/driver.rs` strips ANSI codes from Docker container log output
- `k8s.rs` and `kind.rs` share `strip_ansi_codes` from `crate::utils`; `airlock` keeps its own copy as a separate crate
- `kind create cluster` output is captured and emitted through tracing at debug level; raw emoji/ANSI output suppressed

**Kind cluster lifecycle (`kind.rs`, `main.rs`, `runner.rs`, `test.rs`):**
- Kind cluster setup only runs when the selected test set includes at least one `kubernetes_in_docker` test; running `-t dsd-plain` never touches kind
- Kind cluster setup runs as a background task — Docker-runtime tests start immediately without waiting
- Kind tests wait for the cluster-ready signal **before acquiring a concurrency slot**, gated on `test.runtime() == "kubernetes_in_docker"` so Docker tests are completely unaffected; each task holds its own cloned `watch::Receiver` with independent "last seen" state — no mutex needed
- The wait loop uses `tokio::select!` on the cancellation token so Ctrl-C during cluster setup doesn't deadlock kind test futures; both `run_parallel` and `run_fail_fast` perform this wait
- `check_kind_installed` verifies the exit code in addition to binary presence
- A warning is emitted when kind setup fails after cluster creation, alerting users to a potentially dangling cluster
- Kind setup emits `TestEvent::StatusLine` messages visible in both TUI and logging modes
- panoramic manages the full kind cluster lifecycle: creates if absent (reuses if present), pulls images only if not already in the local daemon (pull failure is fatal), loads images in parallel, deletes cluster after tests
- `--no-delete-kind-cluster` keeps the cluster alive between runs (useful locally)
- `--kind-cluster-name` overrides the default (`saluki-correctness`)
- Flush wait changed from 32s to 30s (`FLUSH_WAIT: Duration`)

**CI (`.gitlab/correctness-mixins.yml`, `generate-correctness-pipeline.sh`):**
- `.test-correctness-kind-definition` extends `.test-correctness-definition` with a longer timeout — all cluster/image management is inside panoramic
- Pipeline generator emits kind jobs via the kind mixin; mixin selection driven by `test.runtime()` from the `Test` trait

**kind pre-installed in `SALUKI_BUILD_CI_IMAGE`:**
- `.ci/install-kind.sh` installs kind in the same `RUN` layer as `install-docker-cli.sh`, with per-arch SHA256 checksums hardcoded in the script

**Explicitness:**
- `Runtime` enum has no default; all correctness configs have an explicit `runtime:` field (including `dsd-tag-filterlist` from PR 1552)

**Local dev:**
- `cargo run --profile release --package panoramic -- run -d test/correctness -t dsd-plain-kind --no-tui --no-delete-kind-cluster`
- `make clean-kind` / `make clean-correctness`

**Dependencies:** kube 0.93 (client + rustls-tls + ws), k8s-openapi 0.22, tokio-util compat; RUSTSEC-2025-0134 ignored (rustls-pemfile, transitive via kube)

## Test plan

- [x] `dsd-plain-kind` passes locally
- [x] Running `-t dsd-plain` (Docker test) does not trigger kind cluster setup
- [x] Docker and kind tests run concurrently — Docker tests start immediately, kind tests wait without holding concurrency slots
- [x] Ctrl-C during cluster setup doesn't deadlock kind test futures
- [x] Kind setup progress visible in both TUI and logging modes via `TestEvent::StatusLine`
- [x] Container logs are plain text without ANSI codes (both runtimes)
- [x] All correctness tests discovered with explicit runtime fields (including  from latest main)
- [x] Branch rebased cleanly onto main post-PR-1552
- [ ] `SALUKI_BUILD_CI_IMAGE` rebuilt with kind (trigger `generate-build-ci-image` via Run Pipeline on this branch)
- [ ] CI pipeline generates `test-correctness-dsd-plain-kind` job
- [ ] Existing Docker correctness tests unaffected

🤖 Generated with [Claude Code](https://claude.ai/claude-code)

Co-authored-by: travis.thieman <travis.thieman@datadoghq.com> d04bde8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants