Skip to content

ci(c-g step 5): flip benchmark job to a 3-OS matrix#88

Merged
chaploud merged 1 commit intomainfrom
develop/cg-step5-3os-bench-matrix
Apr 29, 2026
Merged

ci(c-g step 5): flip benchmark job to a 3-OS matrix#88
chaploud merged 1 commit intomainfrom
develop/cg-step5-3os-bench-matrix

Conversation

@chaploud
Copy link
Copy Markdown
Contributor

Summary

  • Closes the matrix-flip half of Plan C-g. The benchmark job was Ubuntu-only because hyperfine had to be installed manually via DEB and there was no Windows path. After bench(c-g): make history.yaml multi-arch #86 made bench/history.yaml multi-arch, the only thing standing between us and per-OS regression checks on PR was the toolchain provisioning gap on Windows.

Changes

scripts/windows/install-tools.ps1

  • new -OnlyTool hyperfine arm pinned via versions.lock HYPERFINE_VERSION. Release zip flattens through Resolve-SingleSubdir so hyperfine.exe lands at the install root.
  • realworldKeys gains hyperfine = HYPERFINE_VERSION (loud failure if the pin is missing).
  • ValidateSet, Update-UserPath, and the final Verify banner pick up hyperfine.

.github/versions.lock

  • HYPERFINE_VERSION 1.18.0 → 1.20.0 to match the nixpkgs version already in the nix devshell on aarch64-darwin / x86_64-linux.

.github/workflows/ci.yml

  • benchmark job becomes os: [ubuntu-latest, macos-latest, windows-latest]. Linux/macOS provision via nix devshell (same pattern as test-nix); Windows uses install-tools.ps1 -OnlyTool zig + -OnlyTool hyperfine, skipping the realworld toolchain it doesn't need.
  • The previous Linux-only DEB + setup-zig steps are gone. The intra-runner regression check and the push-to-main --record-only step are unchanged in spirit; they just pick nix develop --command vs plain bash based on $RUNNER_OS.
  • needs: [test-nix, test] so the bench matrix fans out only after all three platform test jobs have gated the PR.

Cross-runner comparison

Still meaningless. The job docstring spells that out: each runner measures fresh base + PR builds on the same host, so the comparison is intra-runner only. The durable per-arch absolute-time baselines remain in bench/history.yaml.

Test plan

  • CI matrix passes on all three runners
  • First push-to-main --record-only step on Mac/Windows succeeds (no .deb-flavored Linux assumption left in the path)
  • OrbStack/Rosetta x86_64-linux row in history.yaml can be supplemented with a native-Linux row in a follow-up

Closes the matrix-flip half of Plan C-g. The benchmark job was
Ubuntu-only because hyperfine had to be installed manually via a
DEB and there was no Windows path. After the schema work in #86
made `bench/history.yaml` multi-arch, the only thing standing
between us and per-OS regression checks on PR was the toolchain
provisioning gap on Windows.

`scripts/windows/install-tools.ps1`:
- new `-OnlyTool hyperfine` arm; pinned via versions.lock
  HYPERFINE_VERSION. The release zip extracts to a single
  version-stamped subdir holding `hyperfine.exe`, so
  Resolve-SingleSubdir flattens it and the executable lands
  directly in the install dir (same layout as zig / wasm-tools /
  wasmtime).
- realworldKeys gains `hyperfine = HYPERFINE_VERSION` so a missing
  pin fails loudly when the Windows installer is asked for it.
- ValidateSet + Update-UserPath + final Verify banner all gain a
  hyperfine entry.

`.github/versions.lock`:
- HYPERFINE_VERSION 1.18.0 → 1.20.0 to match the nixpkgs version
  on aarch64-darwin / x86_64-linux (the existing nix devshell
  already shipped 1.20.0; the Linux DEB step in the previous
  benchmark job was the only consumer of the older 1.18.0 pin).

`.github/workflows/ci.yml`:
- benchmark job becomes `os: [ubuntu-latest, macos-latest,
  windows-latest]`. Linux/macOS provision via nix devshell (same
  pattern as test-nix); Windows uses
  `install-tools.ps1 -OnlyTool zig` + `-OnlyTool hyperfine`,
  skipping Go / TinyGo / Rust / WASI SDK that the realworld test
  job pulls but the benchmarks don't need.
- Existing Linux DEB + setup-zig install steps deleted; the
  intra-runner regression check (`ci_compare.sh --base=origin/main
  --threshold=20 --runs=3 --warmup=1 --skip-build`) and the
  push-to-main `--record-only` step are unchanged in spirit, just
  wrapped in a per-runner `if RUNNER_OS = Windows` selector that
  picks `nix develop --command` vs plain bash.
- `needs: [test-nix, test]` so the bench fan-out only runs after
  all three platform test jobs have already gated the PR.

Cross-runner comparison is still meaningless (the hardware deltas
dwarf any codegen-level signal), and the docstring above the job
makes that explicit. The durable per-arch absolute-time baselines
remain in `bench/history.yaml` (recorded locally per CLAUDE.md
Merge Gate item 10).
@chaploud chaploud merged commit 7590b4a into main Apr 29, 2026
10 checks passed
@chaploud chaploud deleted the develop/cg-step5-3os-bench-matrix branch April 29, 2026 12:38
chaploud added a commit that referenced this pull request Apr 29, 2026
Sibling row to the just-recorded aarch64-darwin baseline at
the same SHA, taken on OrbStack `my-ubuntu-amd64` (Rosetta).
Same caveat as before — Rosetta-translated, not a native x86_64
absolute-time reference; useful for schema validation and
intra-runner trend tracking, not for cross-platform absolute
comparisons.

Native x86_64-linux baselines remain a follow-up: a dedicated
workflow_dispatch that runs record-merge-bench.sh on a
GitHub-hosted ubuntu-latest runner is the cleanest path now that
the matrix-flip in #88 made hyperfine + zig available there
identically to the bench job.
chaploud added a commit that referenced this pull request Apr 30, 2026
…ndows

The benchmark job's main-push "Benchmark record" step has been failing on
windows-latest since #88 (3-OS matrix flip). Root cause: bench/ci_compare.sh
constructs PROJECT_DIR via `pwd` and TMPDIR_CI via `mktemp -d`, both of
which return MSYS POSIX form (/c/Users/..., /tmp/...) on Git Bash. The
direct verify step (`if ! $cmd >/dev/null 2>&1`) succeeds because bash
itself execs zwasm.exe and MSYS translates POSIX paths in argv. But
hyperfine on Windows spawns benchmarked commands via `cmd /C`, which does
not apply MSYS path translation, so:

  - zwasm.exe receives the wasm path as `/c/Users/...` and cannot open it
    (`exit 1 in the first warmup run`).
  - hyperfine's --export-json target writes to a different real path than
    where python3 later looks (FileNotFoundError on the json file).

Fix: when `cygpath` is available, convert PROJECT_DIR and TMPDIR_CI to
mixed form (C:/Users/...) so both the wasm path arguments and the JSON
export target resolve to the same on-disk location whether invoked from
bash or via cmd /C. Mac/Linux skip the conversion (no cygpath).

Verified on windowsmini Git Bash: all 12 benches succeed with mean times
populated, RECORD ONLY exit 0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant