Skip to content

chore(correctness): switch Docker runtime to single shared millstone container#1618

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 15 commits into
mainfrom
thieman/docker-single-millstone
May 12, 2026
Merged

chore(correctness): switch Docker runtime to single shared millstone container#1618
gh-worker-dd-mergequeue-cf854d[bot] merged 15 commits into
mainfrom
thieman/docker-single-millstone

Conversation

@thieman
Copy link
Copy Markdown
Contributor

@thieman thieman commented May 11, 2026

Human Summary

Follows up on #1607, which switched the kind correctness tests to use a single shared millstone pod. This PR does the same for the Docker runtime for consistency, and while doing so fixes several pre-existing issues in the kind runtime as well.

Summary

$GROUP placeholder in millstone.yaml:

  • All millstone.yaml files now use $GROUP as a placeholder in the target: URL (e.g. unixgram:///$GROUP-airlock/metrics.sock, grpc://$GROUP:4317/..., udp://$GROUP:8125). At test runtime the shared millstone container uses sed to substitute baseline or comparison to produce two per-target configs, then runs both millstone processes in parallel.

Docker runtime (runner.rs, airlock/driver.rs):

  • Removes millstone from the baseline and comparison GroupRunners; each group now only contains [datadog-intake, target]
  • Adds a shared millstone GroupRunner with its own isolation group, connected to both agent networks
  • Each agent container gets a network alias (baseline or comparison) via NetworkingConfig at creation time so the shared millstone can address each agent unambiguously for both DSD (Unix socket) and TCP/gRPC/UDP tests
  • Splits DataCollector into AgentGroupCollector (intake port) and MillstoneWaiter (millstone handle)
  • Adds DriverConfig::with_volume_mount(), DriverConfig::with_network_alias(), DriverConfig::with_network(), and DriverConfig::from_image() (pub) to airlock
  • The shared millstone container bind-mounts the original millstone.yaml (same mechanism as before) and derives the two per-target configs at startup with sed
  • Fixes Docker volumes and networks leaking on test failure paths — cleanup_groups is now called on every exit path, not just the success path

Kind runtime (k8s.rs):

  • Fixes the same TCP/gRPC/UDP target bug that existed in the kind runtime (was hardcoded to socket paths regardless of test type). Now uses $GROUP substitution: baseline/comparison for socket tests, agent pod cluster IPs for TCP/gRPC/UDP tests
  • The millstone pod startup wait condition is now conditional: only waits for socket files on DSD tests (is_socket_target)
  • For TCP/gRPC/UDP kind tests, agent pods are started first and their IPs fetched before the millstone pod is created; the millstone pod then includes nc -z host port readiness checks in its startup wait condition so millstone does not attempt a connection before the agent has bound its port
  • Port is extracted from the $GROUP placeholder URL via serde_yaml — parses the template as YAML and reads the target field

New test cases from main (rebased):

  • dsd-uds-stream (Unix stream socket) and dsd-udp (UDP) updated to use $GROUP placeholder

Key changes

Target rewriting ($GROUP):

Every millstone.yaml uses $GROUP in the target URL. At test runtime:

  • Docker: sed 's/\$GROUP/baseline/g' substitutes the network alias
  • Kind/DSD: template.replace("$GROUP", "baseline") substitutes the group name (for airlock volume path)
  • Kind/TCP: template.replace("$GROUP", pod_ip) substitutes the agent pod's cluster IP

Docker millstone container startup:

sed 's/\$GROUP/baseline/g' /etc/millstone/config.toml > /tmp/millstone-baseline.toml &&
sed 's/\$GROUP/comparison/g' /etc/millstone/config.toml > /tmp/millstone-comparison.toml &&
exec sh -c 'millstone /tmp/millstone-baseline.toml & P1=$!; millstone /tmp/millstone-comparison.toml & P2=$!; wait $P1; R1=$?; wait $P2; R2=$?; exit $((R1|R2))'

Kind port-readiness (non-socket targets):

  1. Start agent pods (Phase 1a)
  2. Fetch pod IPs; build nc -z ip port checks using port extracted via serde_yaml (Phase 1b)
  3. Start millstone pod with those checks baked into its wait condition (Phase 1c)

Test plan

  • dsd-plain — PASS
  • dsd-added-tags — PASS
  • dsd-events — PASS
  • otlp-metrics — PASS
  • otlp-traces — PASS
  • otlp-traces-kind (temporary test case, removed after validation) — PASS

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 11, 2026

Binary Size Analysis (Agent Data Plane)

Target: 4aeb9c7 (baseline) vs 569b63b (comparison) diff
Analysis Type: Stripped binaries (debug symbols excluded)
Baseline Size: 37.61 MiB
Comparison Size: 37.61 MiB
Size Change: -8 B (-0.00%)
Pass/Fail Threshold: +5%
Result: PASSED ✅

Changes by Module

Module File Size Symbols
anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.3228274853015480322 -129 B 1
anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.709450314824670458 +128 B 1
anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.3228274853015480322 -114 B 1
anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.709450314824670458 +113 B 1
anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.3228274853015480322 -108 B 1
anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.709450314824670458 +107 B 1
anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.3228274853015480322 -96 B 1
anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.709450314824670458 +95 B 1
anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.3228274853015480322 -94 B 1
anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.709450314824670458 +93 B 1
[Unmapped] -3 B 1

Detailed Symbol Changes

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [NEW]    +128  [NEW]     +40    anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.709450314824670458
  [NEW]    +113  [NEW]     +25    anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.709450314824670458
  [NEW]    +107  [NEW]     +19    anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.709450314824670458
  [NEW]     +95  [NEW]      +7    anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.709450314824670458
  [NEW]     +93  [NEW]      +5    anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.709450314824670458
  -0.0%      -3  [ = ]       0    [Unmapped]
  [DEL]     -94  [DEL]      -5    anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.3228274853015480322
  [DEL]     -96  [DEL]      -7    anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.3228274853015480322
  [DEL]    -108  [DEL]     -19    anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.3228274853015480322
  [DEL]    -114  [DEL]     -25    anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.3228274853015480322
  [DEL]    -129  [DEL]     -40    anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.3228274853015480322
  -0.0%      -8  [ = ]       0    TOTAL

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 11, 2026

Regression Detector (Agent Data Plane)

Regression Detector Results

Run ID: 7092381f-55b4-4944-8293-b374c8220798

Baseline: 4aeb9c7
Comparison: 569b63b
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf experiment goal Δ mean % Δ mean % CI trials links
otlp_ingest_logs_5mb_cpu % cpu utilization +0.69 [-3.81, +5.19] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_throughput ingress throughput +0.01 [-0.11, +0.14] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_memory memory utilization -7.51 [-7.91, -7.12] 1 (metrics) (profiles) (logs)

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI trials links
dsd_uds_10mb_3k_contexts_cpu % cpu utilization +6.45 [-25.54, +38.44] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_5mb_cpu % cpu utilization +2.74 [+0.78, +4.70] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_transform_5mb_cpu % cpu utilization +2.05 [+0.02, +4.09] 1 (metrics) (profiles) (logs)
dsd_uds_100mb_3k_contexts_cpu % cpu utilization +1.21 [-4.67, +7.08] 1 (metrics) (profiles) (logs)
otlp_ingest_metrics_5mb_cpu % cpu utilization +1.13 [-5.16, +7.42] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_cpu % cpu utilization +0.69 [-3.81, +5.19] 1 (metrics) (profiles) (logs)
dsd_uds_500mb_3k_contexts_throughput ingress throughput +0.50 [+0.38, +0.61] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_medium memory utilization +0.45 [+0.28, +0.62] 1 (metrics) (profiles) (logs)
dsd_uds_100mb_3k_contexts_memory memory utilization +0.42 [+0.26, +0.57] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_transform_5mb_memory memory utilization +0.37 [+0.21, +0.53] 1 (metrics) (profiles) (logs)
quality_gates_rss_idle memory utilization +0.20 [+0.16, +0.24] 1 (metrics) (profiles) (logs)
dsd_uds_10mb_3k_contexts_memory memory utilization +0.17 [+0.01, +0.33] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_filtering_5mb_memory memory utilization +0.12 [-0.12, +0.37] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_heavy memory utilization +0.08 [-0.05, +0.20] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_5mb_throughput ingress throughput +0.04 [-0.04, +0.11] 1 (metrics) (profiles) (logs)
dsd_uds_10mb_3k_contexts_throughput ingress throughput +0.02 [-0.17, +0.20] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_throughput ingress throughput +0.01 [-0.11, +0.14] 1 (metrics) (profiles) (logs)
dsd_uds_512kb_3k_contexts_throughput ingress throughput +0.00 [-0.05, +0.06] 1 (metrics) (profiles) (logs)
dsd_uds_1mb_3k_contexts_throughput ingress throughput -0.00 [-0.06, +0.06] 1 (metrics) (profiles) (logs)
dsd_uds_100mb_3k_contexts_throughput ingress throughput -0.00 [-0.04, +0.03] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_transform_5mb_throughput ingress throughput -0.01 [-0.08, +0.07] 1 (metrics) (profiles) (logs)
otlp_ingest_metrics_5mb_throughput ingress throughput -0.01 [-0.18, +0.16] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_low memory utilization -0.06 [-0.23, +0.10] 1 (metrics) (profiles) (logs)
dsd_uds_1mb_3k_contexts_memory memory utilization -0.08 [-0.22, +0.07] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_ultraheavy memory utilization -0.10 [-0.23, +0.03] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_5mb_memory memory utilization -0.11 [-0.27, +0.05] 1 (metrics) (profiles) (logs)
dsd_uds_512kb_3k_contexts_memory memory utilization -0.25 [-0.39, -0.10] 1 (metrics) (profiles) (logs)
dsd_uds_500mb_3k_contexts_memory memory utilization -0.27 [-0.41, -0.13] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_filtering_5mb_throughput ingress throughput -0.35 [-0.43, -0.28] 1 (metrics) (profiles) (logs)
dsd_uds_500mb_3k_contexts_cpu % cpu utilization -0.65 [-2.06, +0.77] 1 (metrics) (profiles) (logs)
dsd_uds_1mb_3k_contexts_cpu % cpu utilization -1.00 [-56.55, +54.54] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_filtering_5mb_cpu % cpu utilization -1.79 [-3.95, +0.38] 1 (metrics) (profiles) (logs)
dsd_uds_512kb_3k_contexts_cpu % cpu utilization -2.16 [-60.62, +56.30] 1 (metrics) (profiles) (logs)
otlp_ingest_metrics_5mb_memory memory utilization -2.37 [-2.56, -2.18] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_memory memory utilization -7.51 [-7.91, -7.12] 1 (metrics) (profiles) (logs)

Bounds Checks: ✅ Passed

perf experiment bounds_check_name replicates_passed observed_value links
quality_gates_rss_dsd_heavy memory_usage 10/10 127.33MiB ≤ 140MiB (metrics) (profiles) (logs)
quality_gates_rss_dsd_low memory_usage 10/10 39.95MiB ≤ 50MiB (metrics) (profiles) (logs)
quality_gates_rss_dsd_medium memory_usage 10/10 60.77MiB ≤ 75MiB (metrics) (profiles) (logs)
quality_gates_rss_dsd_ultraheavy memory_usage 10/10 175.21MiB ≤ 200MiB (metrics) (profiles) (logs)
quality_gates_rss_idle memory_usage 10/10 27.15MiB ≤ 40MiB (metrics) (profiles) (logs)

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

@dd-octo-sts dd-octo-sts Bot added the area/test All things testing: unit/integration, correctness, SMP regression, etc. label May 11, 2026
///
/// This gives the agents time to flush any remaining aggregated metrics after millstone stops
/// sending. The value is slightly longer than a full aggregation bucket width.
const FLUSH_WAIT: Duration = Duration::from_secs(32);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as before, just moved around and hoisted to a constant

/// - Unix socket targets (`unixgram:///airlock/...`): `/airlock/` is replaced with
/// `/{group}-airlock/`, directing millstone at the group's volume mount.
/// - TCP/gRPC targets (`grpc://target:...`): `://target` is replaced with `://{group}`,
/// directing millstone at the group's network alias.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding correctness tests on additional supported transports in a subsequent PR which might need to expand on this substitution piece. This is obviously pretty janky / magical, might make more sense to support a $GROUP substitution or something in the millstone YAML. Will take as a follow-up.

@dd-octo-sts dd-octo-sts Bot added area/test All things testing: unit/integration, correctness, SMP regression, etc. and removed area/test All things testing: unit/integration, correctness, SMP regression, etc. labels May 11, 2026
@thieman thieman force-pushed the thieman/docker-single-millstone branch from 9e901ae to 06676ae Compare May 12, 2026 09:41
@thieman thieman marked this pull request as ready for review May 12, 2026 12:37
@thieman thieman requested a review from a team as a code owner May 12, 2026 12:37
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 06676aebaa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +696 to +697
"[ -f /etc/millstone/baseline.toml ] && \
[ -f /etc/millstone/comparison.toml ]"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wait for non-socket targets before launching millstone

When a Kubernetes correctness test uses a TCP/gRPC/UDP target, this branch starts millstone as soon as the two config files exist; the agent pods have only reached phase Running and build_agent_pod does not define a readiness probe or port check. Millstone's gRPC path attempts a single Channel::connect() during startup, so if the agent has not opened port 4317 yet the test can fail immediately (and UDP can silently drop early payloads) rather than waiting as the socket path does via -S checks.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 2bbe5fb

/// pod's startup script for TCP/gRPC/UDP targets.
fn extract_target_port(template: &str) -> Option<u16> {
let doc: serde_yaml::Value = serde_yaml::from_str(template).ok()?;
let target = doc.get("target")?.as_str()?;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit janky but importing the actual Millstone config schema would require giving it a lib.rs etc. Can certainly do that but this was already a lot more code than I was expecting and this should break pretty loudly if it ever drifts since we run correctness on every PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dream of:

find . -type f -name "*.yaml" -delete

/// containing ADP and the Agent working together. These are called 'baseline' and 'comparison', respectively.
/// In a correctness test, two isolated groups of containers are created. One containing the Agent
/// alone (baseline), and the other containing ADP and the Agent working together (comparison). A
/// single shared millstone container runs two parallel millstone processes — one targeting each
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it have to be two processes? I don't know how, but could the same bytes be, hmmm, copied to two sockets?

Copy link
Copy Markdown
Contributor Author

@thieman thieman May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably doable for UDP sockets with something like tee, not sure about gRPC. It's comparatively a lot of complexity imo compared to just running two processes, which lets us take advantage of the fact that Lading is deterministic

@@ -1,5 +1,5 @@
seed: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131]
target: "unixgram:///airlock/metrics.sock"
target: "unixgram:///$GROUP-airlock/metrics.sock"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh.

/// pod's startup script for TCP/gRPC/UDP targets.
fn extract_target_port(template: &str) -> Option<u16> {
let doc: serde_yaml::Value = serde_yaml::from_str(template).ok()?;
let target = doc.get("target")?.as_str()?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dream of:

find . -type f -name "*.yaml" -delete

@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit a0e22b5 into main May 12, 2026
76 checks passed
dd-octo-sts Bot pushed a commit that referenced this pull request May 12, 2026
…container (#1618)

## Human Summary

Follows up on #1607, which switched the kind correctness tests to use a single shared millstone pod. This PR does the same for the Docker runtime for consistency, and while doing so fixes several pre-existing issues in the kind runtime as well.

## Summary

**`$GROUP` placeholder in `millstone.yaml`:**
- All `millstone.yaml` files now use `$GROUP` as a placeholder in the `target:` URL (e.g. `unixgram:///$GROUP-airlock/metrics.sock`, `grpc://$GROUP:4317/...`, `udp://$GROUP:8125`). At test runtime the shared millstone container uses `sed` to substitute `baseline` or `comparison` to produce two per-target configs, then runs both millstone processes in parallel.

**Docker runtime (`runner.rs`, `airlock/driver.rs`):**
- Removes millstone from the baseline and comparison `GroupRunner`s; each group now only contains `[datadog-intake, target]`
- Adds a shared millstone `GroupRunner` with its own isolation group, connected to both agent networks
- Each agent container gets a network alias (`baseline` or `comparison`) via `NetworkingConfig` at creation time so the shared millstone can address each agent unambiguously for both DSD (Unix socket) and TCP/gRPC/UDP tests
- Splits `DataCollector` into `AgentGroupCollector` (intake port) and `MillstoneWaiter` (millstone handle)
- Adds `DriverConfig::with_volume_mount()`, `DriverConfig::with_network_alias()`, `DriverConfig::with_network()`, and `DriverConfig::from_image()` (pub) to airlock
- The shared millstone container bind-mounts the original `millstone.yaml` (same mechanism as before) and derives the two per-target configs at startup with `sed`
- Fixes Docker volumes and networks leaking on test failure paths — `cleanup_groups` is now called on every exit path, not just the success path

**Kind runtime (`k8s.rs`):**
- Fixes the same TCP/gRPC/UDP target bug that existed in the kind runtime (was hardcoded to socket paths regardless of test type). Now uses `$GROUP` substitution: `baseline`/`comparison` for socket tests, agent pod cluster IPs for TCP/gRPC/UDP tests
- The millstone pod startup wait condition is now conditional: only waits for socket files on DSD tests (`is_socket_target`)
- For TCP/gRPC/UDP kind tests, agent pods are started first and their IPs fetched before the millstone pod is created; the millstone pod then includes `nc -z host port` readiness checks in its startup wait condition so millstone does not attempt a connection before the agent has bound its port
- Port is extracted from the `$GROUP` placeholder URL via `serde_yaml` — parses the template as YAML and reads the `target` field

**New test cases from main (rebased):**
- `dsd-uds-stream` (Unix stream socket) and `dsd-udp` (UDP) updated to use `$GROUP` placeholder

## Key changes

**Target rewriting (`$GROUP`):**

Every `millstone.yaml` uses `$GROUP` in the target URL. At test runtime:
- Docker: `sed 's/\$GROUP/baseline/g'` substitutes the network alias
- Kind/DSD: `template.replace("$GROUP", "baseline")` substitutes the group name (for airlock volume path)
- Kind/TCP: `template.replace("$GROUP", pod_ip)` substitutes the agent pod's cluster IP

**Docker millstone container startup:**
```sh
sed 's/\$GROUP/baseline/g' /etc/millstone/config.toml > /tmp/millstone-baseline.toml &&
sed 's/\$GROUP/comparison/g' /etc/millstone/config.toml > /tmp/millstone-comparison.toml &&
exec sh -c 'millstone /tmp/millstone-baseline.toml & P1=$!; millstone /tmp/millstone-comparison.toml & P2=$!; wait $P1; R1=$?; wait $P2; R2=$?; exit $((R1|R2))'
```

**Kind port-readiness (non-socket targets):**
1. Start agent pods (Phase 1a)
2. Fetch pod IPs; build `nc -z ip port` checks using port extracted via `serde_yaml` (Phase 1b)
3. Start millstone pod with those checks baked into its wait condition (Phase 1c)

## Test plan

- [x] `dsd-plain` — PASS
- [x] `dsd-added-tags` — PASS
- [x] `dsd-events` — PASS
- [x] `otlp-metrics` — PASS
- [x] `otlp-traces` — PASS
- [x] `otlp-traces-kind` (temporary test case, removed after validation) — PASS

Co-authored-by: travis.thieman <travis.thieman@datadoghq.com> a0e22b5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/test All things testing: unit/integration, correctness, SMP regression, etc. mergequeue-status: done

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants