Skip to content

refactor(benchmarks): harden benchmarks.sh error handling and cross-platform support#3814

Merged
jqnatividad merged 4 commits into
masterfrom
benchmarks-review-202605
May 4, 2026
Merged

refactor(benchmarks): harden benchmarks.sh error handling and cross-platform support#3814
jqnatividad merged 4 commits into
masterfrom
benchmarks-review-202605

Conversation

@jqnatividad
Copy link
Copy Markdown
Collaborator

Summary

Hardens scripts/benchmarks.sh against silent failures and platform inconsistencies surfaced during a code review:

  • Bug fixes that were polluting archived results (hyperfine -i was masking these):

    • split_chunks_index_j1 was missing the $data argument and reading from stdin instead of benchmarking the dataset.
    • validate_dynenum_no_schema and validate_dynenum_no_schema_index were passing a schema despite their names, contradicting the parallel validate_no_schema benchmark.
    • luau_filter_no_globals_no_colidx was a byte-for-byte duplicate of luau_filter_no_globals — removed.
    • reset was removing benchmark_data.schema.json but the actual file is benchmark_data.csv.schema.json, so reset never actually cleared the schema.
  • Error handling: error-path exit calls now return exit 1 (CI was treating missing-tool failures as success). curl calls use --fail and clean up partial downloads. dynenum_schema is annotated as a hand-curated fixture so future readers don't expect prep to regenerate it.

  • Cross-platform memory detection: mem_size now consistently reports total physical memory in bytes on macOS, Linux, and Windows (Linux was reporting available; Windows was reporting free). Windows branch prefers PowerShell Get-CimInstance since wmic is deprecated/removed on recent Windows 11 / Server 2025 builds, with wmic fallback for legacy systems.

  • Partition cleanup: the partition benchmark now writes to a cwd-relative partitioned/ directory (matching the split_* benchmarks) instead of /tmp/partitioned, and cleanup_files removes it. Previously runs piled up files between invocations and the path didn't exist on Windows.

Test plan

  • bash -n scripts/benchmarks.sh passes
  • Smoke ran ./benchmarks.sh split_chunks_index_j1 — 981 ms mean, dataset actually consumed
  • Smoke ran ./benchmarks.sh validate_dynenum_no_schema — 559 / 567 ms, hyperfine echoes command without schema arg
  • Smoke ran ./benchmarks.sh partition — 1.51 s mean, partitioned/ dir cleaned up after run
  • Verify on a Linux box that free -b | awk '/Mem/ {print $2}' reports total memory as expected
  • Verify on a Windows box (legacy and Win11) that the PowerShell branch reports total cores and total physical memory

🤖 Generated with Claude Code

Return non-zero on failures (use exit 1) for missing tools and setup flows. Add robust curl --fail handling and cleanup of failed downloads. Fix platform memory detection (Linux free -b column, Windows TotalPhysicalMemory) and add comment clarifying mem_size semantics. Remove/restore correct benchmark commands (remove duplicate luau run, add missing $data arg to split_chunks_index_j1). Rename/consistently reference the generated schema file (benchmark_data.csv.schema.json) and add a comment explaining the hand-curated dynenum schema fixture. Also clean up /tmp/partitioned in cleanup_files.

[skip ci]
Use PowerShell (Get-CimInstance) on Windows to obtain CPU cores and total physical memory, falling back to wmic for legacy systems. Strip CRs from PowerShell output to avoid Windows line endings. Also change cleanup and partition invocation to use a relative "partitioned" directory instead of hardcoded /tmp/partitioned so the script works correctly on Windows and other environments.

[skip ci]
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 3, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens scripts/benchmarks.sh, the repository’s benchmark harness, so benchmark runs fail more explicitly, clean up generated artifacts more reliably, and behave more consistently across supported environments.

Changes:

  • Fixes several benchmark command definitions that were producing misleading results or duplicate coverage.
  • Improves failure handling for missing tools and download failures, including cleanup of partial downloads.
  • Adjusts platform-specific metadata gathering and partition benchmark cleanup/path handling for better cross-platform behavior.

Comment thread scripts/benchmarks.sh Outdated
Comment thread scripts/benchmarks.sh
Comment thread scripts/benchmarks.sh Outdated
Comment thread scripts/benchmarks.sh Outdated
Comment thread scripts/benchmarks.sh Outdated
jqnatividad and others added 2 commits May 3, 2026 18:04
… validate benchmarks

- Use NumberOfLogicalProcessors on Windows (PowerShell + wmic) so cores
  metadata matches macOS hw.ncpu / Linux nproc semantics.
- Add benchmark_data.snappy to the reset cleanup so a subsequent run does
  not benchmark snappy decompress/validate against a stale file.
- Fail fast when 7z extraction of the benchmark archive fails; the script
  is not run with set -e so the silent fall-through could record bogus
  results against a missing/partial CSV.
- Remove validate_dynenum_no_schema{,_index} benchmarks: they pass no
  schema, run in RFC 4180 mode, cannot exercise dynamicEnum, and are
  byte-for-byte identical to validate_no_schema{,_index}.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jqnatividad jqnatividad merged commit 45e2a52 into master May 4, 2026
1 check was pending
@jqnatividad jqnatividad deleted the benchmarks-review-202605 branch May 4, 2026 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants