Skip to content

perf(frequency): hint UTF-8 failure as cold path in ignore-case hot loop#3821

Merged
jqnatividad merged 1 commit into
masterfrom
freq-coldpath-ignore-case
May 4, 2026
Merged

perf(frequency): hint UTF-8 failure as cold path in ignore-case hot loop#3821
jqnatividad merged 1 commit into
masterfrom
freq-coldpath-ignore-case

Conversation

@jqnatividad
Copy link
Copy Markdown
Collaborator

Summary

  • In ftables_weighted_internal and ftables_unweighted (src/cmd/frequency.rs), the per-cell process_field closures branch on if let Ok(s) = simdutf8::basic::from_utf8(field) — the Ok arm dominates on real data, the Err arm is rare.
  • Add core::hint::cold_path() (stable since Rust 1.92, MSRV here is 1.95) to the four else arms so LLVM keeps the hot UTF-8-success body contiguous in the instruction cache.
  • 5 lines of code: one use core::hint::cold_path; import + four cold_path(); calls. Behaviour unchanged — cold_path() is purely an optimization hint.

Benchmark

1M-row NYC 311 sample (NYC_311_SR_2010-2020-sample-1M.csv, 514 MB, 41 cols), hyperfine 1.20, 10 runs after 2 warmups, on Apple Silicon. Outputs verified identical (diff -q) between baseline and coldpath builds.

Invocation Baseline Coldpath Ratio
frequency --ignore-case 4.399 ± 0.045 s 4.139 ± 0.093 s 1.06× faster
frequency --ignore-case --no-trim 4.089 ± 0.090 s 4.053 ± 0.036 s 1.01× (noise)
frequency (default, cache hit) 1.880 ± 0.028 s 1.864 ± 0.015 s 1.01× (noise — paths not exercised)

The 6% gain is concentrated on the trim + ignore-case variant because its hot-body (util::to_lowercase_into(s.trim(), …) + extend_from_slice + add_borrowed) is the largest of the four closure forms; isolating its icache layout yields the most leverage. The --no-trim and default paths show no detectable change (within σ), confirming zero regression.

Test plan

  • cargo build --release --locked --bin qsv -F all_features — clean.
  • cargo clippy --bin qsv -F all_features — no new warnings vs master.
  • cargo test -F all_features --test tests test_frequency — 160 passed, 0 failed.
  • Hyperfine A/B as above (outputs match).
  • CI green on PR.

🤖 Generated with Claude Code

In `ftables_weighted_internal` and `ftables_unweighted`, the per-cell
`process_field` closures take an `if let Ok(s) = simdutf8::basic::from_utf8(field)`
branch where the `Ok` arm dominates on real data and the `Err` arm is rare.
Mark the four `else` arms with `core::hint::cold_path()` so LLVM keeps the hot
UTF-8-success path contiguous in the instruction cache.

Benchmark on a 1M-row NYC 311 CSV (514 MB, 41 cols), hyperfine, 10 runs:

  qsv frequency --ignore-case
    baseline 4.399 ± 0.045 s
    coldpath 4.139 ± 0.093 s   → 1.06× faster

  qsv frequency --ignore-case --no-trim
    baseline 4.089 ± 0.090 s
    coldpath 4.053 ± 0.036 s   → noise

  qsv frequency  (default, cache short-circuit)
    baseline 1.880 ± 0.028 s
    coldpath 1.864 ± 0.015 s   → noise (paths not exercised)

Outputs identical between builds. The 6% gain is concentrated on the trim
+ ignore-case path because that hot body (lowercase + extend_from_slice +
add_borrowed) is the largest of the closure variants, so isolating its
icache layout has the most leverage.

MSRV 1.95 ≥ cold_path stabilization (1.92).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codacy-production
Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@jqnatividad jqnatividad merged commit 0f6ad16 into master May 4, 2026
17 checks passed
@jqnatividad jqnatividad deleted the freq-coldpath-ignore-case branch May 4, 2026 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant