perf(frequency): hint UTF-8 failure as cold path in ignore-case hot loop#3821
Merged
Conversation
In `ftables_weighted_internal` and `ftables_unweighted`, the per-cell
`process_field` closures take an `if let Ok(s) = simdutf8::basic::from_utf8(field)`
branch where the `Ok` arm dominates on real data and the `Err` arm is rare.
Mark the four `else` arms with `core::hint::cold_path()` so LLVM keeps the hot
UTF-8-success path contiguous in the instruction cache.
Benchmark on a 1M-row NYC 311 CSV (514 MB, 41 cols), hyperfine, 10 runs:
qsv frequency --ignore-case
baseline 4.399 ± 0.045 s
coldpath 4.139 ± 0.093 s → 1.06× faster
qsv frequency --ignore-case --no-trim
baseline 4.089 ± 0.090 s
coldpath 4.053 ± 0.036 s → noise
qsv frequency (default, cache short-circuit)
baseline 1.880 ± 0.028 s
coldpath 1.864 ± 0.015 s → noise (paths not exercised)
Outputs identical between builds. The 6% gain is concentrated on the trim
+ ignore-case path because that hot body (lowercase + extend_from_slice +
add_borrowed) is the largest of the closure variants, so isolating its
icache layout has the most leverage.
MSRV 1.95 ≥ cold_path stabilization (1.92).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Up to standards ✅🟢 Issues
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ftables_weighted_internalandftables_unweighted(src/cmd/frequency.rs), the per-cellprocess_fieldclosures branch onif let Ok(s) = simdutf8::basic::from_utf8(field)— theOkarm dominates on real data, theErrarm is rare.core::hint::cold_path()(stable since Rust 1.92, MSRV here is 1.95) to the fourelsearms so LLVM keeps the hot UTF-8-success body contiguous in the instruction cache.use core::hint::cold_path;import + fourcold_path();calls. Behaviour unchanged —cold_path()is purely an optimization hint.Benchmark
1M-row NYC 311 sample (
NYC_311_SR_2010-2020-sample-1M.csv, 514 MB, 41 cols), hyperfine 1.20, 10 runs after 2 warmups, on Apple Silicon. Outputs verified identical (diff -q) between baseline and coldpath builds.frequency --ignore-casefrequency --ignore-case --no-trimfrequency(default, cache hit)The 6% gain is concentrated on the trim + ignore-case variant because its hot-body (
util::to_lowercase_into(s.trim(), …)+extend_from_slice+add_borrowed) is the largest of the four closure forms; isolating its icache layout yields the most leverage. The--no-trimand default paths show no detectable change (within σ), confirming zero regression.Test plan
cargo build --release --locked --bin qsv -F all_features— clean.cargo clippy --bin qsv -F all_features— no new warnings vsmaster.cargo test -F all_features --test tests test_frequency— 160 passed, 0 failed.🤖 Generated with Claude Code