refactor(benchmarks): harden benchmarks.sh error handling and cross-platform support#3814
Merged
Conversation
Return non-zero on failures (use exit 1) for missing tools and setup flows. Add robust curl --fail handling and cleanup of failed downloads. Fix platform memory detection (Linux free -b column, Windows TotalPhysicalMemory) and add comment clarifying mem_size semantics. Remove/restore correct benchmark commands (remove duplicate luau run, add missing $data arg to split_chunks_index_j1). Rename/consistently reference the generated schema file (benchmark_data.csv.schema.json) and add a comment explaining the hand-curated dynenum schema fixture. Also clean up /tmp/partitioned in cleanup_files. [skip ci]
Use PowerShell (Get-CimInstance) on Windows to obtain CPU cores and total physical memory, falling back to wmic for legacy systems. Strip CRs from PowerShell output to avoid Windows line endings. Also change cleanup and partition invocation to use a relative "partitioned" directory instead of hardcoded /tmp/partitioned so the script works correctly on Windows and other environments. [skip ci]
Up to standards ✅🟢 Issues
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens scripts/benchmarks.sh, the repository’s benchmark harness, so benchmark runs fail more explicitly, clean up generated artifacts more reliably, and behave more consistently across supported environments.
Changes:
- Fixes several benchmark command definitions that were producing misleading results or duplicate coverage.
- Improves failure handling for missing tools and download failures, including cleanup of partial downloads.
- Adjusts platform-specific metadata gathering and partition benchmark cleanup/path handling for better cross-platform behavior.
… validate benchmarks
- Use NumberOfLogicalProcessors on Windows (PowerShell + wmic) so cores
metadata matches macOS hw.ncpu / Linux nproc semantics.
- Add benchmark_data.snappy to the reset cleanup so a subsequent run does
not benchmark snappy decompress/validate against a stale file.
- Fail fast when 7z extraction of the benchmark archive fails; the script
is not run with set -e so the silent fall-through could record bogus
results against a missing/partial CSV.
- Remove validate_dynenum_no_schema{,_index} benchmarks: they pass no
schema, run in RFC 4180 mode, cannot exercise dynamicEnum, and are
byte-for-byte identical to validate_no_schema{,_index}.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[skip ci]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Hardens
scripts/benchmarks.shagainst silent failures and platform inconsistencies surfaced during a code review:Bug fixes that were polluting archived results (hyperfine
-iwas masking these):split_chunks_index_j1was missing the$dataargument and reading from stdin instead of benchmarking the dataset.validate_dynenum_no_schemaandvalidate_dynenum_no_schema_indexwere passing a schema despite their names, contradicting the parallelvalidate_no_schemabenchmark.luau_filter_no_globals_no_colidxwas a byte-for-byte duplicate ofluau_filter_no_globals— removed.resetwas removingbenchmark_data.schema.jsonbut the actual file isbenchmark_data.csv.schema.json, soresetnever actually cleared the schema.Error handling: error-path
exitcalls now returnexit 1(CI was treating missing-tool failures as success).curlcalls use--failand clean up partial downloads.dynenum_schemais annotated as a hand-curated fixture so future readers don't expect prep to regenerate it.Cross-platform memory detection:
mem_sizenow consistently reports total physical memory in bytes on macOS, Linux, and Windows (Linux was reporting available; Windows was reporting free). Windows branch prefers PowerShellGet-CimInstancesincewmicis deprecated/removed on recent Windows 11 / Server 2025 builds, withwmicfallback for legacy systems.Partition cleanup: the
partitionbenchmark now writes to a cwd-relativepartitioned/directory (matching thesplit_*benchmarks) instead of/tmp/partitioned, andcleanup_filesremoves it. Previously runs piled up files between invocations and the path didn't exist on Windows.Test plan
bash -n scripts/benchmarks.shpasses./benchmarks.sh split_chunks_index_j1— 981 ms mean, dataset actually consumed./benchmarks.sh validate_dynenum_no_schema— 559 / 567 ms, hyperfine echoes command without schema arg./benchmarks.sh partition— 1.51 s mean,partitioned/dir cleaned up after runfree -b | awk '/Mem/ {print $2}'reports total memory as expected🤖 Generated with Claude Code