chore(correctness): switch Docker runtime to single shared millstone container by thieman · Pull Request #1618 · DataDog/saluki

thieman · 2026-05-11T09:40:59Z

Human Summary

Follows up on #1607, which switched the kind correctness tests to use a single shared millstone pod. This PR does the same for the Docker runtime for consistency, and while doing so fixes several pre-existing issues in the kind runtime as well.

Summary

$GROUP placeholder in millstone.yaml:

All millstone.yaml files now use $GROUP as a placeholder in the target: URL (e.g. unixgram:///$GROUP-airlock/metrics.sock, grpc://$GROUP:4317/..., udp://$GROUP:8125). At test runtime the shared millstone container uses sed to substitute baseline or comparison to produce two per-target configs, then runs both millstone processes in parallel.

Docker runtime (runner.rs, airlock/driver.rs):

Removes millstone from the baseline and comparison GroupRunners; each group now only contains [datadog-intake, target]
Adds a shared millstone GroupRunner with its own isolation group, connected to both agent networks
Each agent container gets a network alias (baseline or comparison) via NetworkingConfig at creation time so the shared millstone can address each agent unambiguously for both DSD (Unix socket) and TCP/gRPC/UDP tests
Splits DataCollector into AgentGroupCollector (intake port) and MillstoneWaiter (millstone handle)
Adds DriverConfig::with_volume_mount(), DriverConfig::with_network_alias(), DriverConfig::with_network(), and DriverConfig::from_image() (pub) to airlock
The shared millstone container bind-mounts the original millstone.yaml (same mechanism as before) and derives the two per-target configs at startup with sed
Fixes Docker volumes and networks leaking on test failure paths — cleanup_groups is now called on every exit path, not just the success path

Kind runtime (k8s.rs):

Fixes the same TCP/gRPC/UDP target bug that existed in the kind runtime (was hardcoded to socket paths regardless of test type). Now uses $GROUP substitution: baseline/comparison for socket tests, agent pod cluster IPs for TCP/gRPC/UDP tests
The millstone pod startup wait condition is now conditional: only waits for socket files on DSD tests (is_socket_target)
For TCP/gRPC/UDP kind tests, agent pods are started first and their IPs fetched before the millstone pod is created; the millstone pod then includes nc -z host port readiness checks in its startup wait condition so millstone does not attempt a connection before the agent has bound its port
Port is extracted from the $GROUP placeholder URL via serde_yaml — parses the template as YAML and reads the target field

New test cases from main (rebased):

dsd-uds-stream (Unix stream socket) and dsd-udp (UDP) updated to use $GROUP placeholder

Key changes

Target rewriting ($GROUP):

Every millstone.yaml uses $GROUP in the target URL. At test runtime:

Docker: sed 's/\$GROUP/baseline/g' substitutes the network alias
Kind/DSD: template.replace("$GROUP", "baseline") substitutes the group name (for airlock volume path)
Kind/TCP: template.replace("$GROUP", pod_ip) substitutes the agent pod's cluster IP

Docker millstone container startup:

sed 's/\$GROUP/baseline/g' /etc/millstone/config.toml > /tmp/millstone-baseline.toml &&
sed 's/\$GROUP/comparison/g' /etc/millstone/config.toml > /tmp/millstone-comparison.toml &&
exec sh -c 'millstone /tmp/millstone-baseline.toml & P1=$!; millstone /tmp/millstone-comparison.toml & P2=$!; wait $P1; R1=$?; wait $P2; R2=$?; exit $((R1|R2))'

Kind port-readiness (non-socket targets):

Start agent pods (Phase 1a)
Fetch pod IPs; build nc -z ip port checks using port extracted via serde_yaml (Phase 1b)
Start millstone pod with those checks baked into its wait condition (Phase 1c)

Test plan

dsd-plain — PASS
dsd-added-tags — PASS
dsd-events — PASS
otlp-metrics — PASS
otlp-traces — PASS
otlp-traces-kind (temporary test case, removed after validation) — PASS

pr-commenter · 2026-05-11T09:48:32Z

Binary Size Analysis (Agent Data Plane)

Target: 4aeb9c7 (baseline) vs 569b63b (comparison) diff
Analysis Type: Stripped binaries (debug symbols excluded)
Baseline Size: 37.61 MiB
Comparison Size: 37.61 MiB
Size Change: -8 B (-0.00%)
Pass/Fail Threshold: +5%
Result: PASSED ✅

Changes by Module

Module	File Size	Symbols
`anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.3228274853015480322`	-129 B	1
`anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.709450314824670458`	+128 B	1
`anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.3228274853015480322`	-114 B	1
`anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.709450314824670458`	+113 B	1
`anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.3228274853015480322`	-108 B	1
`anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.709450314824670458`	+107 B	1
`anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.3228274853015480322`	-96 B	1
`anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.709450314824670458`	+95 B	1
`anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.3228274853015480322`	-94 B	1
`anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.709450314824670458`	+93 B	1
`[Unmapped]`	-3 B	1

Detailed Symbol Changes

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [NEW]    +128  [NEW]     +40    anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.709450314824670458
  [NEW]    +113  [NEW]     +25    anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.709450314824670458
  [NEW]    +107  [NEW]     +19    anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.709450314824670458
  [NEW]     +95  [NEW]      +7    anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.709450314824670458
  [NEW]     +93  [NEW]      +5    anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.709450314824670458
  -0.0%      -3  [ = ]       0    [Unmapped]
  [DEL]     -94  [DEL]      -5    anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.3228274853015480322
  [DEL]     -96  [DEL]      -7    anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.3228274853015480322
  [DEL]    -108  [DEL]     -19    anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.3228274853015480322
  [DEL]    -114  [DEL]     -25    anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.3228274853015480322
  [DEL]    -129  [DEL]     -40    anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.3228274853015480322
  -0.0%      -8  [ = ]       0    TOTAL

pr-commenter · 2026-05-11T10:03:29Z

Regression Detector (Agent Data Plane)

Regression Detector Results

Run ID: 7092381f-55b4-4944-8293-b374c8220798

Baseline: 4aeb9c7
Comparison: 569b63b
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	otlp_ingest_logs_5mb_cpu	% cpu utilization	+0.69	[-3.81, +5.19]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_logs_5mb_throughput	ingress throughput	+0.01	[-0.11, +0.14]	1	(metrics) (profiles) (logs)
✅	otlp_ingest_logs_5mb_memory	memory utilization	-7.51	[-7.91, -7.12]	1	(metrics) (profiles) (logs)

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	dsd_uds_10mb_3k_contexts_cpu	% cpu utilization	+6.45	[-25.54, +38.44]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_5mb_cpu	% cpu utilization	+2.74	[+0.78, +4.70]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_transform_5mb_cpu	% cpu utilization	+2.05	[+0.02, +4.09]	1	(metrics) (profiles) (logs)
➖	dsd_uds_100mb_3k_contexts_cpu	% cpu utilization	+1.21	[-4.67, +7.08]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_metrics_5mb_cpu	% cpu utilization	+1.13	[-5.16, +7.42]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_logs_5mb_cpu	% cpu utilization	+0.69	[-3.81, +5.19]	1	(metrics) (profiles) (logs)
➖	dsd_uds_500mb_3k_contexts_throughput	ingress throughput	+0.50	[+0.38, +0.61]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_medium	memory utilization	+0.45	[+0.28, +0.62]	1	(metrics) (profiles) (logs)
➖	dsd_uds_100mb_3k_contexts_memory	memory utilization	+0.42	[+0.26, +0.57]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_transform_5mb_memory	memory utilization	+0.37	[+0.21, +0.53]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_idle	memory utilization	+0.20	[+0.16, +0.24]	1	(metrics) (profiles) (logs)
➖	dsd_uds_10mb_3k_contexts_memory	memory utilization	+0.17	[+0.01, +0.33]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_filtering_5mb_memory	memory utilization	+0.12	[-0.12, +0.37]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_heavy	memory utilization	+0.08	[-0.05, +0.20]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_5mb_throughput	ingress throughput	+0.04	[-0.04, +0.11]	1	(metrics) (profiles) (logs)
➖	dsd_uds_10mb_3k_contexts_throughput	ingress throughput	+0.02	[-0.17, +0.20]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_logs_5mb_throughput	ingress throughput	+0.01	[-0.11, +0.14]	1	(metrics) (profiles) (logs)
➖	dsd_uds_512kb_3k_contexts_throughput	ingress throughput	+0.00	[-0.05, +0.06]	1	(metrics) (profiles) (logs)
➖	dsd_uds_1mb_3k_contexts_throughput	ingress throughput	-0.00	[-0.06, +0.06]	1	(metrics) (profiles) (logs)
➖	dsd_uds_100mb_3k_contexts_throughput	ingress throughput	-0.00	[-0.04, +0.03]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_transform_5mb_throughput	ingress throughput	-0.01	[-0.08, +0.07]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_metrics_5mb_throughput	ingress throughput	-0.01	[-0.18, +0.16]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_low	memory utilization	-0.06	[-0.23, +0.10]	1	(metrics) (profiles) (logs)
➖	dsd_uds_1mb_3k_contexts_memory	memory utilization	-0.08	[-0.22, +0.07]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_ultraheavy	memory utilization	-0.10	[-0.23, +0.03]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_5mb_memory	memory utilization	-0.11	[-0.27, +0.05]	1	(metrics) (profiles) (logs)
➖	dsd_uds_512kb_3k_contexts_memory	memory utilization	-0.25	[-0.39, -0.10]	1	(metrics) (profiles) (logs)
➖	dsd_uds_500mb_3k_contexts_memory	memory utilization	-0.27	[-0.41, -0.13]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_filtering_5mb_throughput	ingress throughput	-0.35	[-0.43, -0.28]	1	(metrics) (profiles) (logs)
➖	dsd_uds_500mb_3k_contexts_cpu	% cpu utilization	-0.65	[-2.06, +0.77]	1	(metrics) (profiles) (logs)
➖	dsd_uds_1mb_3k_contexts_cpu	% cpu utilization	-1.00	[-56.55, +54.54]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_filtering_5mb_cpu	% cpu utilization	-1.79	[-3.95, +0.38]	1	(metrics) (profiles) (logs)
➖	dsd_uds_512kb_3k_contexts_cpu	% cpu utilization	-2.16	[-60.62, +56.30]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_metrics_5mb_memory	memory utilization	-2.37	[-2.56, -2.18]	1	(metrics) (profiles) (logs)
✅	otlp_ingest_logs_5mb_memory	memory utilization	-7.51	[-7.91, -7.12]	1	(metrics) (profiles) (logs)

Bounds Checks: ✅ Passed

perf	experiment	bounds_check_name	replicates_passed	observed_value	links
✅	quality_gates_rss_dsd_heavy	memory_usage	10/10	127.33MiB ≤ 140MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_dsd_low	memory_usage	10/10	39.95MiB ≤ 50MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_dsd_medium	memory_usage	10/10	60.77MiB ≤ 75MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_dsd_ultraheavy	memory_usage	10/10	175.21MiB ≤ 200MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_idle	memory_usage	10/10	27.15MiB ≤ 40MiB	(metrics) (profiles) (logs)

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

thieman · 2026-05-11T14:04:31Z

+///
+/// This gives the agents time to flush any remaining aggregated metrics after millstone stops
+/// sending. The value is slightly longer than a full aggregation bucket width.
+const FLUSH_WAIT: Duration = Duration::from_secs(32);


This is the same as before, just moved around and hoisted to a constant

thieman · 2026-05-11T14:06:09Z

+/// - Unix socket targets (`unixgram:///airlock/...`): `/airlock/` is replaced with
+///   `/{group}-airlock/`, directing millstone at the group's volume mount.
+/// - TCP/gRPC targets (`grpc://target:...`): `://target` is replaced with `://{group}`,
+///   directing millstone at the group's network alias.


I'm adding correctness tests on additional supported transports in a subsequent PR which might need to expand on this substitution piece. This is obviously pretty janky / magical, might make more sense to support a $GROUP substitution or something in the millstone YAML. Will take as a follow-up.

…container

…cker bind-mount quirk

…und macOS Docker Desktop VirtioFS

… avoid Docker Desktop VirtioFS path restrictions

…ne runner

…oc comments

…-only wait loop

… stale leftovers

…d of source tree

…arget substitution

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 06676aebaa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-12T12:45:40Z

+        "[ -f /etc/millstone/baseline.toml ] && \
+               [ -f /etc/millstone/comparison.toml ]"


Wait for non-socket targets before launching millstone

When a Kubernetes correctness test uses a TCP/gRPC/UDP target, this branch starts millstone as soon as the two config files exist; the agent pods have only reached phase Running and build_agent_pod does not define a readiness probe or port check. Millstone's gRPC path attempts a single Channel::connect() during startup, so if the agent has not opened port 4317 yet the test can fail immediately (and UDP can silently drop early payloads) rather than waiting as the socket path does via -S checks.

Useful? React with 👍 / 👎.

Addressed in 2bbe5fb

…n kind millstone pod

thieman · 2026-05-12T12:57:46Z

+/// pod's startup script for TCP/gRPC/UDP targets.
+fn extract_target_port(template: &str) -> Option<u16> {
+    let doc: serde_yaml::Value = serde_yaml::from_str(template).ok()?;
+    let target = doc.get("target")?.as_str()?;


This is a bit janky but importing the actual Millstone config schema would require giving it a lib.rs etc. Can certainly do that but this was already a lot more code than I was expecting and this should break pretty loudly if it ever drifts since we run correctness on every PR.

I dream of:

find . -type f -name "*.yaml" -delete

webern · 2026-05-12T13:01:30Z

-/// containing ADP and the Agent working together. These are called 'baseline' and 'comparison', respectively.
+/// In a correctness test, two isolated groups of containers are created. One containing the Agent
+/// alone (baseline), and the other containing ADP and the Agent working together (comparison). A
+/// single shared millstone container runs two parallel millstone processes — one targeting each


Does it have to be two processes? I don't know how, but could the same bytes be, hmmm, copied to two sockets?

That's probably doable for UDP sockets with something like tee, not sure about gRPC. It's comparatively a lot of complexity imo compared to just running two processes, which lets us take advantage of the fact that Lading is deterministic

webern · 2026-05-12T13:02:07Z

@@ -1,5 +1,5 @@
 seed: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131]
-target: "unixgram:///airlock/metrics.sock"
+target: "unixgram:///$GROUP-airlock/metrics.sock"


…exit paths

webern · 2026-05-12T13:06:43Z

+/// pod's startup script for TCP/gRPC/UDP targets.
+fn extract_target_port(template: &str) -> Option<u16> {
+    let doc: serde_yaml::Value = serde_yaml::from_str(template).ok()?;
+    let target = doc.get("target")?.as_str()?;


I dream of:

find . -type f -name "*.yaml" -delete

…container (#1618) ## Human Summary Follows up on #1607, which switched the kind correctness tests to use a single shared millstone pod. This PR does the same for the Docker runtime for consistency, and while doing so fixes several pre-existing issues in the kind runtime as well. ## Summary **`$GROUP` placeholder in `millstone.yaml`:** - All `millstone.yaml` files now use `$GROUP` as a placeholder in the `target:` URL (e.g. `unixgram:///$GROUP-airlock/metrics.sock`, `grpc://$GROUP:4317/...`, `udp://$GROUP:8125`). At test runtime the shared millstone container uses `sed` to substitute `baseline` or `comparison` to produce two per-target configs, then runs both millstone processes in parallel. **Docker runtime (`runner.rs`, `airlock/driver.rs`):** - Removes millstone from the baseline and comparison `GroupRunner`s; each group now only contains `[datadog-intake, target]` - Adds a shared millstone `GroupRunner` with its own isolation group, connected to both agent networks - Each agent container gets a network alias (`baseline` or `comparison`) via `NetworkingConfig` at creation time so the shared millstone can address each agent unambiguously for both DSD (Unix socket) and TCP/gRPC/UDP tests - Splits `DataCollector` into `AgentGroupCollector` (intake port) and `MillstoneWaiter` (millstone handle) - Adds `DriverConfig::with_volume_mount()`, `DriverConfig::with_network_alias()`, `DriverConfig::with_network()`, and `DriverConfig::from_image()` (pub) to airlock - The shared millstone container bind-mounts the original `millstone.yaml` (same mechanism as before) and derives the two per-target configs at startup with `sed` - Fixes Docker volumes and networks leaking on test failure paths — `cleanup_groups` is now called on every exit path, not just the success path **Kind runtime (`k8s.rs`):** - Fixes the same TCP/gRPC/UDP target bug that existed in the kind runtime (was hardcoded to socket paths regardless of test type). Now uses `$GROUP` substitution: `baseline`/`comparison` for socket tests, agent pod cluster IPs for TCP/gRPC/UDP tests - The millstone pod startup wait condition is now conditional: only waits for socket files on DSD tests (`is_socket_target`) - For TCP/gRPC/UDP kind tests, agent pods are started first and their IPs fetched before the millstone pod is created; the millstone pod then includes `nc -z host port` readiness checks in its startup wait condition so millstone does not attempt a connection before the agent has bound its port - Port is extracted from the `$GROUP` placeholder URL via `serde_yaml` — parses the template as YAML and reads the `target` field **New test cases from main (rebased):** - `dsd-uds-stream` (Unix stream socket) and `dsd-udp` (UDP) updated to use `$GROUP` placeholder ## Key changes **Target rewriting (`$GROUP`):** Every `millstone.yaml` uses `$GROUP` in the target URL. At test runtime: - Docker: `sed 's/\$GROUP/baseline/g'` substitutes the network alias - Kind/DSD: `template.replace("$GROUP", "baseline")` substitutes the group name (for airlock volume path) - Kind/TCP: `template.replace("$GROUP", pod_ip)` substitutes the agent pod's cluster IP **Docker millstone container startup:** ```sh sed 's/\$GROUP/baseline/g' /etc/millstone/config.toml > /tmp/millstone-baseline.toml && sed 's/\$GROUP/comparison/g' /etc/millstone/config.toml > /tmp/millstone-comparison.toml && exec sh -c 'millstone /tmp/millstone-baseline.toml & P1=$!; millstone /tmp/millstone-comparison.toml & P2=$!; wait $P1; R1=$?; wait $P2; R2=$?; exit $((R1|R2))' ``` **Kind port-readiness (non-socket targets):** 1. Start agent pods (Phase 1a) 2. Fetch pod IPs; build `nc -z ip port` checks using port extracted via `serde_yaml` (Phase 1b) 3. Start millstone pod with those checks baked into its wait condition (Phase 1c) ## Test plan - [x] `dsd-plain` — PASS - [x] `dsd-added-tags` — PASS - [x] `dsd-events` — PASS - [x] `otlp-metrics` — PASS - [x] `otlp-traces` — PASS - [x] `otlp-traces-kind` (temporary test case, removed after validation) — PASS Co-authored-by: travis.thieman <travis.thieman@datadoghq.com> a0e22b5

dd-octo-sts Bot added the area/test All things testing: unit/integration, correctness, SMP regression, etc. label May 11, 2026

thieman commented May 11, 2026

View reviewed changes

dd-octo-sts Bot added area/test All things testing: unit/integration, correctness, SMP regression, etc. and removed area/test All things testing: unit/integration, correctness, SMP regression, etc. labels May 11, 2026

thieman added 13 commits May 12, 2026 11:36

chore(correctness): switch Docker runtime to single shared millstone …

b762809

…container

fix(correctness): embed millstone configs as base64 to avoid macOS Do…

c2a8790

…cker bind-mount quirk

fix(correctness): use directory bind mount via HOME cache to work aro…

1de8a47

…und macOS Docker Desktop VirtioFS

fix(correctness): write millstone configs alongside millstone.yaml to…

4d65433

… avoid Docker Desktop VirtioFS path restrictions

refactor(correctness): address self-review feedback on shared millsto…

9e8f33d

…ne runner

chore(correctness): restore dropped TODO comments from original runner

309be1f

chore(correctness): restore dropped comment, remove overly specific d…

7ad7692

…oc comments

chore(correctness): fix inaccurate comments in shared millstone runner

b92c4b0

fix(correctness): fix shared millstone for TCP targets and remove DSD…

4544802

…-only wait loop

chore(correctness): gitignore generated millstone run configs, remove…

0fcaf3c

… stale leftovers

fix(correctness): fix shared millstone for TCP targets in kind runtime

3e183a4

refactor(correctness): write millstone run configs to ~/.cache instea…

9775152

…d of source tree

refactor(correctness): use $GROUP placeholder in millstone.yaml for t…

06676ae

…arget substitution

thieman force-pushed the thieman/docker-single-millstone branch from 9e901ae to 06676ae Compare May 12, 2026 09:41

thieman marked this pull request as ready for review May 12, 2026 12:37

thieman requested a review from a team as a code owner May 12, 2026 12:37

chatgpt-codex-connector Bot reviewed May 12, 2026

View reviewed changes

fix(correctness): add port-readiness check for TCP/gRPC/UDP targets i…

2bbe5fb

…n kind millstone pod

thieman commented May 12, 2026

View reviewed changes

webern approved these changes May 12, 2026

View reviewed changes

fix(correctness): always clean up Docker volumes and networks on all …

569b63b

…exit paths

webern approved these changes May 12, 2026

View reviewed changes

gh-worker-dd-devflow-36fce6 Bot added the mergequeue-status: queued label May 12, 2026

gh-worker-dd-devflow-36fce6 Bot added mergequeue-status: in_progress and removed mergequeue-status: queued labels May 12, 2026

gh-worker-dd-mergequeue-cf854d Bot merged commit a0e22b5 into main May 12, 2026
76 checks passed

gh-worker-dd-devflow-36fce6 Bot added mergequeue-status: done and removed mergequeue-status: in_progress labels May 12, 2026

		"[ -f /etc/millstone/baseline.toml ] && \
		[ -f /etc/millstone/comparison.toml ]"

Conversation

thieman commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Human Summary

Summary

Key changes

Test plan

Uh oh!

pr-commenter Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Binary Size Analysis (Agent Data Plane)

Changes by Module

Detailed Symbol Changes

Uh oh!

pr-commenter Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression Detector (Agent Data Plane)

Regression Detector Results

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thieman May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thieman commented May 11, 2026 •

edited

Loading

pr-commenter Bot commented May 11, 2026 •

edited

Loading

pr-commenter Bot commented May 11, 2026 •

edited

Loading

thieman May 12, 2026 •

edited

Loading