ci(c-g step 5): flip benchmark job to a 3-OS matrix#88
Merged
Conversation
Closes the matrix-flip half of Plan C-g. The benchmark job was Ubuntu-only because hyperfine had to be installed manually via a DEB and there was no Windows path. After the schema work in #86 made `bench/history.yaml` multi-arch, the only thing standing between us and per-OS regression checks on PR was the toolchain provisioning gap on Windows. `scripts/windows/install-tools.ps1`: - new `-OnlyTool hyperfine` arm; pinned via versions.lock HYPERFINE_VERSION. The release zip extracts to a single version-stamped subdir holding `hyperfine.exe`, so Resolve-SingleSubdir flattens it and the executable lands directly in the install dir (same layout as zig / wasm-tools / wasmtime). - realworldKeys gains `hyperfine = HYPERFINE_VERSION` so a missing pin fails loudly when the Windows installer is asked for it. - ValidateSet + Update-UserPath + final Verify banner all gain a hyperfine entry. `.github/versions.lock`: - HYPERFINE_VERSION 1.18.0 → 1.20.0 to match the nixpkgs version on aarch64-darwin / x86_64-linux (the existing nix devshell already shipped 1.20.0; the Linux DEB step in the previous benchmark job was the only consumer of the older 1.18.0 pin). `.github/workflows/ci.yml`: - benchmark job becomes `os: [ubuntu-latest, macos-latest, windows-latest]`. Linux/macOS provision via nix devshell (same pattern as test-nix); Windows uses `install-tools.ps1 -OnlyTool zig` + `-OnlyTool hyperfine`, skipping Go / TinyGo / Rust / WASI SDK that the realworld test job pulls but the benchmarks don't need. - Existing Linux DEB + setup-zig install steps deleted; the intra-runner regression check (`ci_compare.sh --base=origin/main --threshold=20 --runs=3 --warmup=1 --skip-build`) and the push-to-main `--record-only` step are unchanged in spirit, just wrapped in a per-runner `if RUNNER_OS = Windows` selector that picks `nix develop --command` vs plain bash. - `needs: [test-nix, test]` so the bench fan-out only runs after all three platform test jobs have already gated the PR. Cross-runner comparison is still meaningless (the hardware deltas dwarf any codegen-level signal), and the docstring above the job makes that explicit. The durable per-arch absolute-time baselines remain in `bench/history.yaml` (recorded locally per CLAUDE.md Merge Gate item 10).
chaploud
added a commit
that referenced
this pull request
Apr 29, 2026
chaploud
added a commit
that referenced
this pull request
Apr 29, 2026
Sibling row to the just-recorded aarch64-darwin baseline at the same SHA, taken on OrbStack `my-ubuntu-amd64` (Rosetta). Same caveat as before — Rosetta-translated, not a native x86_64 absolute-time reference; useful for schema validation and intra-runner trend tracking, not for cross-platform absolute comparisons. Native x86_64-linux baselines remain a follow-up: a dedicated workflow_dispatch that runs record-merge-bench.sh on a GitHub-hosted ubuntu-latest runner is the cleanest path now that the matrix-flip in #88 made hyperfine + zig available there identically to the bench job.
3 tasks
chaploud
added a commit
that referenced
this pull request
Apr 30, 2026
…ndows The benchmark job's main-push "Benchmark record" step has been failing on windows-latest since #88 (3-OS matrix flip). Root cause: bench/ci_compare.sh constructs PROJECT_DIR via `pwd` and TMPDIR_CI via `mktemp -d`, both of which return MSYS POSIX form (/c/Users/..., /tmp/...) on Git Bash. The direct verify step (`if ! $cmd >/dev/null 2>&1`) succeeds because bash itself execs zwasm.exe and MSYS translates POSIX paths in argv. But hyperfine on Windows spawns benchmarked commands via `cmd /C`, which does not apply MSYS path translation, so: - zwasm.exe receives the wasm path as `/c/Users/...` and cannot open it (`exit 1 in the first warmup run`). - hyperfine's --export-json target writes to a different real path than where python3 later looks (FileNotFoundError on the json file). Fix: when `cygpath` is available, convert PROJECT_DIR and TMPDIR_CI to mixed form (C:/Users/...) so both the wasm path arguments and the JSON export target resolve to the same on-disk location whether invoked from bash or via cmd /C. Mac/Linux skip the conversion (no cygpath). Verified on windowsmini Git Bash: all 12 benches succeed with mean times populated, RECORD ONLY exit 0.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
bench/history.yamlmulti-arch, the only thing standing between us and per-OS regression checks on PR was the toolchain provisioning gap on Windows.Changes
scripts/windows/install-tools.ps1-OnlyTool hyperfinearm pinned viaversions.lockHYPERFINE_VERSION. Release zip flattens throughResolve-SingleSubdirsohyperfine.exelands at the install root.realworldKeysgainshyperfine = HYPERFINE_VERSION(loud failure if the pin is missing).Update-UserPath, and the final Verify banner pick up hyperfine..github/versions.lockHYPERFINE_VERSION1.18.0 → 1.20.0 to match the nixpkgs version already in the nix devshell on aarch64-darwin / x86_64-linux..github/workflows/ci.ymlbenchmarkjob becomesos: [ubuntu-latest, macos-latest, windows-latest]. Linux/macOS provision via nix devshell (same pattern astest-nix); Windows usesinstall-tools.ps1 -OnlyTool zig+-OnlyTool hyperfine, skipping the realworld toolchain it doesn't need.setup-zigsteps are gone. The intra-runner regression check and the push-to-main--record-onlystep are unchanged in spirit; they just picknix develop --commandvs plain bash based on$RUNNER_OS.needs: [test-nix, test]so the bench matrix fans out only after all three platform test jobs have gated the PR.Cross-runner comparison
Still meaningless. The job docstring spells that out: each runner measures fresh base + PR builds on the same host, so the comparison is intra-runner only. The durable per-arch absolute-time baselines remain in
bench/history.yaml.Test plan
--record-onlystep on Mac/Windows succeeds (no .deb-flavored Linux assumption left in the path)history.yamlcan be supplemented with a native-Linux row in a follow-up