perf(test-runner): parallelize YAML scenario execution by avihut · Pull Request #515 · avihut/daft

avihut · 2026-05-17T04:32:05Z

Summary

Adds --parallel and --jobs N (plus DAFT_MANUAL_TEST_JOBS) to the xtask manual-test runner, executing scenarios concurrently via rayon with per-scenario buffered output flushed in input order. Default stays serial; a follow-up PR will flip the default after a week of opt-in use.
Fixes orphan-spawn accumulation: the test env now sets DAFT_NO_LOG_CLEAN=1, suppressing detached daft __clean-logs children that would otherwise outlive xtask scenarios, get reparented to init, and steal CPU. This was effectively a precondition for parallel scaling working at all on local-dev runs.
Adds a benchmark harness for perf(test-runner): speed up the YAML scenario suite #509 progress tracking: mise run bench:tests:manual:scale does a hyperfine parameter-scan over --jobs values + an opt-in per-scenario timing pass for p50/p95/max distributions. Methodology in benches/README.md; first reference baseline checked in at benches/baselines/test-manual-scale-2026-05-17.md.

Speedup (M1 Max 10-core, 572 scenarios, full corpus)

Mode	Wall-clock	Speedup vs serial	Speedup vs pre-PR main
pre-PR main (serial, no fix)	~1586 s	—	1.00×
this PR, `--jobs 1`	392.10 s ± 1.51 s	1.00×	4.05×
this PR, `--parallel` (cap=5)	100.00 s ± 3.45 s	3.92×	15.9×
this PR, `--jobs 10`	82.19 s	4.78×	19.3×

End-to-end: 26 min → 1.7 min on a 10-core box.

Scaling is near-linear up to num_cpus/2: jobs=2 (1.81× / 90%), jobs=4 (3.45× / 86%), jobs=5 (3.92× / 78%). Past num_cpus/2, returns diminish and scenario flakes appear — available_parallelism()/2 is well-calibrated as the default cap. Zero new failures at the default cap across all measured runs.

Per-scenario distribution at the default cap stays clean: p50 567 ms, p95 1293 ms — only ~14% per-scenario slowdown from CPU contention, paid back many times over in wall-clock.

Rollout

The runner ships with serial as the default. --parallel is opt-in; --jobs N and DAFT_MANUAL_TEST_JOBS are the explicit overrides. The acceptance criteria's "parallel by default" flips in a follow-up PR after a week of opt-in use surfaces no new flakes. Interactive mode and --setup-only hard-error when --jobs > 1 — their semantics (TTY ownership, println! of work_dir for shell capture) don't survive the buffered worker model.

CI

CI in test.yml:487 still invokes serial. A follow-up commit on this PR will enable --parallel there to measure the impact on CI wall-clock.

Test plan

CI checks pass (build, unit, integration matrix)
mise run test:unit locally (1721 + 65 xtask passing)
mise run clippy zero warnings
Manual target/release/xtask manual-test --ci --parallel produces deterministic input-ordered output matching serial
SIGINT mid-run cleans up sandbox dirs (no leaks)
Mixed pass/fail scenario set produces stable failed_scenarios list in input order under both serial and parallel

Refs #510, part of #509.

🤖 Generated with Claude Code

Adds `--parallel` and `--jobs N` (plus `DAFT_MANUAL_TEST_JOBS`) to the xtask manual-test runner, executing scenarios concurrently via rayon with per-scenario buffered output flushed in input order. Default stays serial; a follow-up PR will flip the default after a week of opt-in use. - Sandbox path: nanos+pid+atomic counter to prevent collisions under parallel scheduling. - Cleanup handler: registry promoted from `Option<PathBuf>` to a `HashSet<PathBuf>` with `CleanupGuard` RAII. SIGINT drains under the held lock + bounded re-rm loop to fight subprocess-recreation races. - Aggregator: parallel `par_iter` → `collect` → input-order sort → serial fold, so stats and `failed_scenarios` ordering are stable regardless of completion order. - Interactive / `--setup-only`: hard-error when `jobs > 1`, since TTY ownership and the `println!` of `work_dir` for shell capture don't fit the buffered worker model. Wall-clock on the `tests/manual/scenarios/clone/` batch: 22.3s → 5.75s at `--jobs 8` (~3.9× speedup). Refs #510, part of #509. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`daft` spawns a detached `daft __clean-logs` background child via setsid+spawn from `maybe_clean_logs()`. When the manual-test runner invokes daft hundreds of times back-to-back — especially under `--jobs > 1` — these children outlive their parent xtask scenarios, get reparented to init, and accumulate as orphans that steal CPU. Observed load average above 500 on a 10-core box during a single full-corpus run, with each scenario then taking ~10× longer than its fair share. Effect on a 69-scenario hooks subset (M1 Max): - Serial: 366s → 184s (1.99× faster) - jobs=5: 197s → 16s (12.5× faster) `DAFT_NO_UPDATE_CHECK` and `DAFT_NO_TRUST_PRUNE` already gate the other two spawn-self startup tasks; adding `DAFT_NO_LOG_CLEAN` follows the same pattern. The env var is read by `log_clean::is_disabled` in production code — no daft-side changes needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a benchmark harness so #509 progress is measurable SHA-over-SHA. Companion to `mise run bench` (daft-vs-git command runtime) but at a different layer: this measures the YAML manual-test runner's throughput, not individual daft command latency. - `mise run bench:tests:manual:scale` — hyperfine parameter-scan over --jobs values; writes per-trial wall-clock to `benches/results/test-manual-scale.{md,json}`. - `mise run bench:tests:manual:scale-{baseline,compare}` — pin/diff workflow matching the existing `bench:{baseline,compare}` pair. - `benches/scenarios/test_manual_scale.sh` — driver. Sweeps BENCH_JOBS (default 1,2,4,8), 3 trials per value, then a Phase 2 per-scenario timing pass at jobs=1 and at `--parallel`'s default cap for p50/p95/max distribution. - Opt-in per-scenario timing via `DAFT_MANUAL_TEST_EMIT_TIMING=1` — the runner emits one grep-friendly `[bench] scenario="X" elapsed_ms=N` line per scenario, scoped to the runner half (excludes sandbox setup/teardown overhead so cumulative ≠ wall-clock). - `benches/README.md` documents both bench families (daft-vs-git and test-runner) and the local vs. checked-in baseline split. - `benches/baselines/test-manual-scale-2026-05-17.md` — first reference baseline (M1 Max, 10/10 cores). Shows 3.92× speedup at the default cap, 4.78× at full saturation, scenario failures appearing past num_cpus/2 — concrete evidence that `available_parallelism()/2` is well-chosen. Phase 1 results from that baseline (full 572-scenario corpus): --jobs 1 392.10 s ± 1.51 s (1.00× — serial reference) --jobs 2 217.22 s ± 0.94 s (1.81×, 90% efficiency) --jobs 4 113.78 s ± 0.46 s (3.45×, 86%) --jobs 5 100.00 s ± 3.45 s (3.92×, 78% — `--parallel` default) --jobs 8 ~144 s (flaky) (2.71×, 34%) --jobs 10 82.19 s (4.78×, 48%) End-to-end vs pre-#510 main on the same machine: 1586 s → 100 s, a 15.9× wall-clock reduction. The bulk of the win at `--jobs 1` (4.05×) comes from the orphan-spawn suppression fix in #510's prior commit; the remaining 3.92× is the parallelism this PR introduces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pairs with the parallelization landing in this PR: CI was still on the serial default, so the speedup wasn't visible in workflow runtime. `--jobs $(nproc)` saturates the runner since CI is dedicated to the job — no need for `--parallel`'s `num_cpus/2` headroom that exists for concurrent local work. Baseline serial CI timing on this PR (ubuntu-latest): - integration-tests (default, yaml): 2m 11s - integration-tests (gitoxide, yaml): 2m 02s Next run will show the delta. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

avihut added this to the Public Launch milestone May 17, 2026

avihut added the perf Performance improvement label May 17, 2026

avihut self-assigned this May 17, 2026

avihut and others added 4 commits May 17, 2026 08:11

avihut force-pushed the daft-510/perf/parallelize-yaml-test-runner branch from a156d81 to e6beb36 Compare May 17, 2026 05:12

avihut merged commit f6cc15e into master May 17, 2026
25 of 26 checks passed

avihut deleted the daft-510/perf/parallelize-yaml-test-runner branch May 17, 2026 20:11

wheatley-the-moronic-ci-bot Bot mentioned this pull request May 17, 2026

chore: release v1.14.0 #520

Merged

avihut mentioned this pull request May 22, 2026

release-flow.yml: PR body should include the changelog section in an expandable <details> like release-plz did #546

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(test-runner): parallelize YAML scenario execution#515

perf(test-runner): parallelize YAML scenario execution#515
avihut merged 4 commits into
masterfrom
daft-510/perf/parallelize-yaml-test-runner

avihut commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avihut commented May 17, 2026

Summary

Speedup (M1 Max 10-core, 572 scenarios, full corpus)

Rollout

CI

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant