refactor(benchmarks): harden benchmarks.sh error handling and cross-platform support by jqnatividad · Pull Request #3814 · dathere/qsv

jqnatividad · 2026-05-03T21:29:10Z

Summary

Hardens scripts/benchmarks.sh against silent failures and platform inconsistencies surfaced during a code review:

Bug fixes that were polluting archived results (hyperfine -i was masking these):
- split_chunks_index_j1 was missing the $data argument and reading from stdin instead of benchmarking the dataset.
- validate_dynenum_no_schema and validate_dynenum_no_schema_index were passing a schema despite their names, contradicting the parallel validate_no_schema benchmark.
- luau_filter_no_globals_no_colidx was a byte-for-byte duplicate of luau_filter_no_globals — removed.
- reset was removing benchmark_data.schema.json but the actual file is benchmark_data.csv.schema.json, so reset never actually cleared the schema.
Error handling: error-path exit calls now return exit 1 (CI was treating missing-tool failures as success). curl calls use --fail and clean up partial downloads. dynenum_schema is annotated as a hand-curated fixture so future readers don't expect prep to regenerate it.
Cross-platform memory detection: mem_size now consistently reports total physical memory in bytes on macOS, Linux, and Windows (Linux was reporting available; Windows was reporting free). Windows branch prefers PowerShell Get-CimInstance since wmic is deprecated/removed on recent Windows 11 / Server 2025 builds, with wmic fallback for legacy systems.
Partition cleanup: the partition benchmark now writes to a cwd-relative partitioned/ directory (matching the split_* benchmarks) instead of /tmp/partitioned, and cleanup_files removes it. Previously runs piled up files between invocations and the path didn't exist on Windows.

Test plan

bash -n scripts/benchmarks.sh passes
Smoke ran ./benchmarks.sh split_chunks_index_j1 — 981 ms mean, dataset actually consumed
Smoke ran ./benchmarks.sh validate_dynenum_no_schema — 559 / 567 ms, hyperfine echoes command without schema arg
Smoke ran ./benchmarks.sh partition — 1.51 s mean, partitioned/ dir cleaned up after run
Verify on a Linux box that free -b | awk '/Mem/ {print $2}' reports total memory as expected
Verify on a Windows box (legacy and Win11) that the PowerShell branch reports total cores and total physical memory

🤖 Generated with Claude Code

Return non-zero on failures (use exit 1) for missing tools and setup flows. Add robust curl --fail handling and cleanup of failed downloads. Fix platform memory detection (Linux free -b column, Windows TotalPhysicalMemory) and add comment clarifying mem_size semantics. Remove/restore correct benchmark commands (remove duplicate luau run, add missing $data arg to split_chunks_index_j1). Rename/consistently reference the generated schema file (benchmark_data.csv.schema.json) and add a comment explaining the hand-curated dynenum schema fixture. Also clean up /tmp/partitioned in cleanup_files. [skip ci]

Use PowerShell (Get-CimInstance) on Windows to obtain CPU cores and total physical memory, falling back to wmic for legacy systems. Strip CRs from PowerShell output to avoid Windows line endings. Also change cleanup and partition invocation to use a relative "partitioned" directory instead of hardcoded /tmp/partitioned so the script works correctly on Windows and other environments. [skip ci]

codacy-production · 2026-05-03T21:29:54Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

Copilot

Pull request overview

This PR hardens scripts/benchmarks.sh, the repository’s benchmark harness, so benchmark runs fail more explicitly, clean up generated artifacts more reliably, and behave more consistently across supported environments.

Changes:

Fixes several benchmark command definitions that were producing misleading results or duplicate coverage.
Improves failure handling for missing tools and download failures, including cleanup of partial downloads.
Adjusts platform-specific metadata gathering and partition benchmark cleanup/path handling for better cross-platform behavior.

… validate benchmarks - Use NumberOfLogicalProcessors on Windows (PowerShell + wmic) so cores metadata matches macOS hw.ncpu / Linux nproc semantics. - Add benchmark_data.snappy to the reset cleanup so a subsequent run does not benchmark snappy decompress/validate against a stale file. - Fail fast when 7z extraction of the benchmark archive fails; the script is not run with set -e so the silent fall-through could record bogus results against a missing/partial CSV. - Remove validate_dynenum_no_schema{,_index} benchmarks: they pass no schema, run in RFC 4180 mode, cannot exercise dynamicEnum, and are byte-for-byte identical to validate_no_schema{,_index}. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

[skip ci]

jqnatividad added 2 commits May 3, 2026 17:19

jqnatividad requested a review from Copilot May 3, 2026 21:31

Copilot started reviewing on behalf of jqnatividad May 3, 2026 21:32 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

Comment thread scripts/benchmarks.sh Outdated

Comment thread scripts/benchmarks.sh

Comment thread scripts/benchmarks.sh Outdated

Comment thread scripts/benchmarks.sh Outdated

Comment thread scripts/benchmarks.sh Outdated

jqnatividad and others added 2 commits May 3, 2026 18:04

chore(benchmarks): bump to 8.0.0

f8c1fd1

[skip ci]

jqnatividad merged commit 45e2a52 into master May 4, 2026
1 check was pending

jqnatividad deleted the benchmarks-review-202605 branch May 4, 2026 00:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(benchmarks): harden benchmarks.sh error handling and cross-platform support#3814

refactor(benchmarks): harden benchmarks.sh error handling and cross-platform support#3814
jqnatividad merged 4 commits into
masterfrom
benchmarks-review-202605

jqnatividad commented May 3, 2026

Uh oh!

codacy-production Bot commented May 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jqnatividad commented May 3, 2026

Summary

Test plan

Uh oh!

codacy-production Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codacy-production Bot commented May 3, 2026 •

edited

Loading