chore(correctness): switch Docker runtime to single shared millstone container#1618
Conversation
Binary Size Analysis (Agent Data Plane)Target: 4aeb9c7 (baseline) vs 569b63b (comparison) diff
|
| Module | File Size | Symbols |
|---|---|---|
anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.3228274853015480322 |
-129 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.709450314824670458 |
+128 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.3228274853015480322 |
-114 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.709450314824670458 |
+113 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.3228274853015480322 |
-108 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.709450314824670458 |
+107 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.3228274853015480322 |
-96 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.709450314824670458 |
+95 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.3228274853015480322 |
-94 B | 1 |
anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.709450314824670458 |
+93 B | 1 |
[Unmapped] |
-3 B | 1 |
Detailed Symbol Changes
FILE SIZE VM SIZE
-------------- --------------
[NEW] +128 [NEW] +40 anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.709450314824670458
[NEW] +113 [NEW] +25 anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.709450314824670458
[NEW] +107 [NEW] +19 anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.709450314824670458
[NEW] +95 [NEW] +7 anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.709450314824670458
[NEW] +93 [NEW] +5 anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.709450314824670458
-0.0% -3 [ = ] 0 [Unmapped]
[DEL] -94 [DEL] -5 anon.4f8fd67d74ae1f1600187cfeb0121be9.2.llvm.3228274853015480322
[DEL] -96 [DEL] -7 anon.4f8fd67d74ae1f1600187cfeb0121be9.0.llvm.3228274853015480322
[DEL] -108 [DEL] -19 anon.4f8fd67d74ae1f1600187cfeb0121be9.3.llvm.3228274853015480322
[DEL] -114 [DEL] -25 anon.4f8fd67d74ae1f1600187cfeb0121be9.4.llvm.3228274853015480322
[DEL] -129 [DEL] -40 anon.4f8fd67d74ae1f1600187cfeb0121be9.1.llvm.3228274853015480322
-0.0% -8 [ = ] 0 TOTAL
Regression Detector (Agent Data Plane)Regression Detector ResultsRun ID: 7092381f-55b4-4944-8293-b374c8220798 Baseline: 4aeb9c7 Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | +0.69 | [-3.81, +5.19] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | +0.01 | [-0.11, +0.14] | 1 | (metrics) (profiles) (logs) |
| ✅ | otlp_ingest_logs_5mb_memory | memory utilization | -7.51 | [-7.91, -7.12] | 1 | (metrics) (profiles) (logs) |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | dsd_uds_10mb_3k_contexts_cpu | % cpu utilization | +6.45 | [-25.54, +38.44] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_cpu | % cpu utilization | +2.74 | [+0.78, +4.70] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_cpu | % cpu utilization | +2.05 | [+0.02, +4.09] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_cpu | % cpu utilization | +1.21 | [-4.67, +7.08] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_cpu | % cpu utilization | +1.13 | [-5.16, +7.42] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | +0.69 | [-3.81, +5.19] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_throughput | ingress throughput | +0.50 | [+0.38, +0.61] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_medium | memory utilization | +0.45 | [+0.28, +0.62] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_memory | memory utilization | +0.42 | [+0.26, +0.57] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_memory | memory utilization | +0.37 | [+0.21, +0.53] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_idle | memory utilization | +0.20 | [+0.16, +0.24] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_memory | memory utilization | +0.17 | [+0.01, +0.33] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_memory | memory utilization | +0.12 | [-0.12, +0.37] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_heavy | memory utilization | +0.08 | [-0.05, +0.20] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_throughput | ingress throughput | +0.04 | [-0.04, +0.11] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_throughput | ingress throughput | +0.02 | [-0.17, +0.20] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | +0.01 | [-0.11, +0.14] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_throughput | ingress throughput | +0.00 | [-0.05, +0.06] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_throughput | ingress throughput | -0.00 | [-0.06, +0.06] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_throughput | ingress throughput | -0.00 | [-0.04, +0.03] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_throughput | ingress throughput | -0.01 | [-0.08, +0.07] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_throughput | ingress throughput | -0.01 | [-0.18, +0.16] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_low | memory utilization | -0.06 | [-0.23, +0.10] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_memory | memory utilization | -0.08 | [-0.22, +0.07] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_ultraheavy | memory utilization | -0.10 | [-0.23, +0.03] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_memory | memory utilization | -0.11 | [-0.27, +0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_memory | memory utilization | -0.25 | [-0.39, -0.10] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_memory | memory utilization | -0.27 | [-0.41, -0.13] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_throughput | ingress throughput | -0.35 | [-0.43, -0.28] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_cpu | % cpu utilization | -0.65 | [-2.06, +0.77] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_cpu | % cpu utilization | -1.00 | [-56.55, +54.54] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_cpu | % cpu utilization | -1.79 | [-3.95, +0.38] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_cpu | % cpu utilization | -2.16 | [-60.62, +56.30] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_memory | memory utilization | -2.37 | [-2.56, -2.18] | 1 | (metrics) (profiles) (logs) |
| ✅ | otlp_ingest_logs_5mb_memory | memory utilization | -7.51 | [-7.91, -7.12] | 1 | (metrics) (profiles) (logs) |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | quality_gates_rss_dsd_heavy | memory_usage | 10/10 | 127.33MiB ≤ 140MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_low | memory_usage | 10/10 | 39.95MiB ≤ 50MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_medium | memory_usage | 10/10 | 60.77MiB ≤ 75MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_ultraheavy | memory_usage | 10/10 | 175.21MiB ≤ 200MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_idle | memory_usage | 10/10 | 27.15MiB ≤ 40MiB | (metrics) (profiles) (logs) |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
| /// | ||
| /// This gives the agents time to flush any remaining aggregated metrics after millstone stops | ||
| /// sending. The value is slightly longer than a full aggregation bucket width. | ||
| const FLUSH_WAIT: Duration = Duration::from_secs(32); |
There was a problem hiding this comment.
This is the same as before, just moved around and hoisted to a constant
| /// - Unix socket targets (`unixgram:///airlock/...`): `/airlock/` is replaced with | ||
| /// `/{group}-airlock/`, directing millstone at the group's volume mount. | ||
| /// - TCP/gRPC targets (`grpc://target:...`): `://target` is replaced with `://{group}`, | ||
| /// directing millstone at the group's network alias. |
There was a problem hiding this comment.
I'm adding correctness tests on additional supported transports in a subsequent PR which might need to expand on this substitution piece. This is obviously pretty janky / magical, might make more sense to support a $GROUP substitution or something in the millstone YAML. Will take as a follow-up.
…cker bind-mount quirk
…und macOS Docker Desktop VirtioFS
… avoid Docker Desktop VirtioFS path restrictions
…arget substitution
9e901ae to
06676ae
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 06676aebaa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "[ -f /etc/millstone/baseline.toml ] && \ | ||
| [ -f /etc/millstone/comparison.toml ]" |
There was a problem hiding this comment.
Wait for non-socket targets before launching millstone
When a Kubernetes correctness test uses a TCP/gRPC/UDP target, this branch starts millstone as soon as the two config files exist; the agent pods have only reached phase Running and build_agent_pod does not define a readiness probe or port check. Millstone's gRPC path attempts a single Channel::connect() during startup, so if the agent has not opened port 4317 yet the test can fail immediately (and UDP can silently drop early payloads) rather than waiting as the socket path does via -S checks.
Useful? React with 👍 / 👎.
…n kind millstone pod
| /// pod's startup script for TCP/gRPC/UDP targets. | ||
| fn extract_target_port(template: &str) -> Option<u16> { | ||
| let doc: serde_yaml::Value = serde_yaml::from_str(template).ok()?; | ||
| let target = doc.get("target")?.as_str()?; |
There was a problem hiding this comment.
This is a bit janky but importing the actual Millstone config schema would require giving it a lib.rs etc. Can certainly do that but this was already a lot more code than I was expecting and this should break pretty loudly if it ever drifts since we run correctness on every PR.
There was a problem hiding this comment.
I dream of:
find . -type f -name "*.yaml" -delete| /// containing ADP and the Agent working together. These are called 'baseline' and 'comparison', respectively. | ||
| /// In a correctness test, two isolated groups of containers are created. One containing the Agent | ||
| /// alone (baseline), and the other containing ADP and the Agent working together (comparison). A | ||
| /// single shared millstone container runs two parallel millstone processes — one targeting each |
There was a problem hiding this comment.
Does it have to be two processes? I don't know how, but could the same bytes be, hmmm, copied to two sockets?
There was a problem hiding this comment.
That's probably doable for UDP sockets with something like tee, not sure about gRPC. It's comparatively a lot of complexity imo compared to just running two processes, which lets us take advantage of the fact that Lading is deterministic
| @@ -1,5 +1,5 @@ | |||
| seed: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131] | |||
| target: "unixgram:///airlock/metrics.sock" | |||
| target: "unixgram:///$GROUP-airlock/metrics.sock" | |||
| /// pod's startup script for TCP/gRPC/UDP targets. | ||
| fn extract_target_port(template: &str) -> Option<u16> { | ||
| let doc: serde_yaml::Value = serde_yaml::from_str(template).ok()?; | ||
| let target = doc.get("target")?.as_str()?; |
There was a problem hiding this comment.
I dream of:
find . -type f -name "*.yaml" -delete…container (#1618) ## Human Summary Follows up on #1607, which switched the kind correctness tests to use a single shared millstone pod. This PR does the same for the Docker runtime for consistency, and while doing so fixes several pre-existing issues in the kind runtime as well. ## Summary **`$GROUP` placeholder in `millstone.yaml`:** - All `millstone.yaml` files now use `$GROUP` as a placeholder in the `target:` URL (e.g. `unixgram:///$GROUP-airlock/metrics.sock`, `grpc://$GROUP:4317/...`, `udp://$GROUP:8125`). At test runtime the shared millstone container uses `sed` to substitute `baseline` or `comparison` to produce two per-target configs, then runs both millstone processes in parallel. **Docker runtime (`runner.rs`, `airlock/driver.rs`):** - Removes millstone from the baseline and comparison `GroupRunner`s; each group now only contains `[datadog-intake, target]` - Adds a shared millstone `GroupRunner` with its own isolation group, connected to both agent networks - Each agent container gets a network alias (`baseline` or `comparison`) via `NetworkingConfig` at creation time so the shared millstone can address each agent unambiguously for both DSD (Unix socket) and TCP/gRPC/UDP tests - Splits `DataCollector` into `AgentGroupCollector` (intake port) and `MillstoneWaiter` (millstone handle) - Adds `DriverConfig::with_volume_mount()`, `DriverConfig::with_network_alias()`, `DriverConfig::with_network()`, and `DriverConfig::from_image()` (pub) to airlock - The shared millstone container bind-mounts the original `millstone.yaml` (same mechanism as before) and derives the two per-target configs at startup with `sed` - Fixes Docker volumes and networks leaking on test failure paths — `cleanup_groups` is now called on every exit path, not just the success path **Kind runtime (`k8s.rs`):** - Fixes the same TCP/gRPC/UDP target bug that existed in the kind runtime (was hardcoded to socket paths regardless of test type). Now uses `$GROUP` substitution: `baseline`/`comparison` for socket tests, agent pod cluster IPs for TCP/gRPC/UDP tests - The millstone pod startup wait condition is now conditional: only waits for socket files on DSD tests (`is_socket_target`) - For TCP/gRPC/UDP kind tests, agent pods are started first and their IPs fetched before the millstone pod is created; the millstone pod then includes `nc -z host port` readiness checks in its startup wait condition so millstone does not attempt a connection before the agent has bound its port - Port is extracted from the `$GROUP` placeholder URL via `serde_yaml` — parses the template as YAML and reads the `target` field **New test cases from main (rebased):** - `dsd-uds-stream` (Unix stream socket) and `dsd-udp` (UDP) updated to use `$GROUP` placeholder ## Key changes **Target rewriting (`$GROUP`):** Every `millstone.yaml` uses `$GROUP` in the target URL. At test runtime: - Docker: `sed 's/\$GROUP/baseline/g'` substitutes the network alias - Kind/DSD: `template.replace("$GROUP", "baseline")` substitutes the group name (for airlock volume path) - Kind/TCP: `template.replace("$GROUP", pod_ip)` substitutes the agent pod's cluster IP **Docker millstone container startup:** ```sh sed 's/\$GROUP/baseline/g' /etc/millstone/config.toml > /tmp/millstone-baseline.toml && sed 's/\$GROUP/comparison/g' /etc/millstone/config.toml > /tmp/millstone-comparison.toml && exec sh -c 'millstone /tmp/millstone-baseline.toml & P1=$!; millstone /tmp/millstone-comparison.toml & P2=$!; wait $P1; R1=$?; wait $P2; R2=$?; exit $((R1|R2))' ``` **Kind port-readiness (non-socket targets):** 1. Start agent pods (Phase 1a) 2. Fetch pod IPs; build `nc -z ip port` checks using port extracted via `serde_yaml` (Phase 1b) 3. Start millstone pod with those checks baked into its wait condition (Phase 1c) ## Test plan - [x] `dsd-plain` — PASS - [x] `dsd-added-tags` — PASS - [x] `dsd-events` — PASS - [x] `otlp-metrics` — PASS - [x] `otlp-traces` — PASS - [x] `otlp-traces-kind` (temporary test case, removed after validation) — PASS Co-authored-by: travis.thieman <travis.thieman@datadoghq.com> a0e22b5
Human Summary
Follows up on #1607, which switched the kind correctness tests to use a single shared millstone pod. This PR does the same for the Docker runtime for consistency, and while doing so fixes several pre-existing issues in the kind runtime as well.
Summary
$GROUPplaceholder inmillstone.yaml:millstone.yamlfiles now use$GROUPas a placeholder in thetarget:URL (e.g.unixgram:///$GROUP-airlock/metrics.sock,grpc://$GROUP:4317/...,udp://$GROUP:8125). At test runtime the shared millstone container usessedto substitutebaselineorcomparisonto produce two per-target configs, then runs both millstone processes in parallel.Docker runtime (
runner.rs,airlock/driver.rs):GroupRunners; each group now only contains[datadog-intake, target]GroupRunnerwith its own isolation group, connected to both agent networksbaselineorcomparison) viaNetworkingConfigat creation time so the shared millstone can address each agent unambiguously for both DSD (Unix socket) and TCP/gRPC/UDP testsDataCollectorintoAgentGroupCollector(intake port) andMillstoneWaiter(millstone handle)DriverConfig::with_volume_mount(),DriverConfig::with_network_alias(),DriverConfig::with_network(), andDriverConfig::from_image()(pub) to airlockmillstone.yaml(same mechanism as before) and derives the two per-target configs at startup withsedcleanup_groupsis now called on every exit path, not just the success pathKind runtime (
k8s.rs):$GROUPsubstitution:baseline/comparisonfor socket tests, agent pod cluster IPs for TCP/gRPC/UDP testsis_socket_target)nc -z host portreadiness checks in its startup wait condition so millstone does not attempt a connection before the agent has bound its port$GROUPplaceholder URL viaserde_yaml— parses the template as YAML and reads thetargetfieldNew test cases from main (rebased):
dsd-uds-stream(Unix stream socket) anddsd-udp(UDP) updated to use$GROUPplaceholderKey changes
Target rewriting (
$GROUP):Every
millstone.yamluses$GROUPin the target URL. At test runtime:sed 's/\$GROUP/baseline/g'substitutes the network aliastemplate.replace("$GROUP", "baseline")substitutes the group name (for airlock volume path)template.replace("$GROUP", pod_ip)substitutes the agent pod's cluster IPDocker millstone container startup:
Kind port-readiness (non-socket targets):
nc -z ip portchecks using port extracted viaserde_yaml(Phase 1b)Test plan
dsd-plain— PASSdsd-added-tags— PASSdsd-events— PASSotlp-metrics— PASSotlp-traces— PASSotlp-traces-kind(temporary test case, removed after validation) — PASS