Skip hash table probe for consecutive duplicate values in ArrowBytesViewMap#21604
Skip hash table probe for consecutive duplicate values in ArrowBytesViewMap#21604Dandandan wants to merge 3 commits into
Conversation
…iewMap Add a last-value cache in insert_if_new_inner that skips the hash table probe when the current view matches the previous one. For inline strings (<=12 bytes), matching views guarantees equal values. For non-inline strings within the same input array, matching views means identical buffer_index + offset, so they point to the same bytes. This is effective for workloads with repeated values such as ClickBench query 5 (COUNT(DISTINCT SearchPhrase)) where ~90% of rows are empty strings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing skip-duplicate-hash-probe (9db8715) to 29c5dd5 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing skip-duplicate-hash-probe (9db8715) to 29c5dd5 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing skip-duplicate-hash-probe (9db8715) to 29c5dd5 (merge-base) diff using: tpch File an issue against this benchmark runner |
…iewMap Add two fast-path caches in insert_if_new_inner that skip hash table probes for duplicate values: 1. Empty-string cache: catches *all* empty strings (view == 0) even when non-consecutive. Comparing against zero is essentially free. 2. Last-value cache: catches consecutive runs of any repeated value. For inline strings (<=12 bytes) matching views guarantees equal values. For non-inline strings within the same input array, matching views means identical buffer_index + offset, i.e. the same bytes. This is effective for workloads with repeated values such as ClickBench query 5 (COUNT(DISTINCT SearchPhrase)) where ~90% of rows are empty strings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
run benchmark clickbench_partiitioned |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing skip-duplicate-hash-probe (0e94b9a) to 29c5dd5 (merge-base) diff File an issue against this benchmark runner |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
…d view Replace the full u128 view + payload in each hash table entry with a compact u32 view_index. The view is already stored in the `views` vec, and the payload (group index) always equals the view index for all callers. This shrinks each entry from 32 bytes to 12 bytes, reducing memory usage and halving the cost of rehashing when the hash table grows. The ArrowBytesViewMap is no longer generic over a payload type V. Instead, the observe callback receives the view index (usize) directly, which callers use as the group index. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
run benchmark clickbench_partitioned |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing skip-duplicate-hash-probe (ec94be3) to 29c5dd5 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
Which issue does this PR close?
N/A - Performance optimization
Rationale for this change
Profiling ClickBench query 5 (
SELECT COUNT(DISTINCT "SearchPhrase") FROM hits) shows ~29% of time inGroupValuesBytesView(hash table probing) and ~13% increate_hashes. In this workload ~90% ofSearchPhrasevalues are empty strings, meaning most hash table probes find an already-existing entry for the same value as the previous row.What changes are included in this PR?
Adds a single-entry "last value" cache in
ArrowBytesViewMap::insert_if_new_inner. Before probing the hash table, the loop checks whether the currentview_u128matches the previous row's view. If so, it reuses the cached payload and skips the hash tablefind()entirely.This is correct for all string lengths:
buffer_index + offset + length, so they reference the exact same bytesThe cost is one
u128comparison per row (~1 cycle, register/L1). The saving is the hash tablefind()(random memory access pattern) for every consecutive duplicate.Are these changes tested?
Existing tests in
binary_view_map::testspass (8/8). The optimization is transparent — same semantics, same output, just fewer hash table probes.Are there any user-facing changes?
No. This is a performance improvement only.
🤖 Generated with Claude Code