Shrink ArrowBytesViewMap Entry from 32+ to 16 bytes#21393
Shrink ArrowBytesViewMap Entry from 32+ to 16 bytes#21393Dandandan wants to merge 6 commits intoapache:mainfrom
Conversation
Remove the payload generic `V` from ArrowBytesViewMap and the redundant
`view: u128` from Entry. Since entries are inserted sequentially, the
view_idx serves as both the index into the views Vec and the group index
for GROUP BY, making a separate payload unnecessary. Entry is now just
`{ view_idx: usize, hash: u64 }` (16 bytes on 64-bit).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing shrink-arrow-bytes-view-map-entry (92dfc2d) to c17c87c (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing shrink-arrow-bytes-view-map-entry (92dfc2d) to c17c87c (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing shrink-arrow-bytes-view-map-entry (92dfc2d) to c17c87c (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
run benchmark clickbench_partitioned |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing shrink-arrow-bytes-view-map-entry (92dfc2d) to c17c87c (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
run benchmark clickbench_extended |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing shrink-arrow-bytes-view-map-entry (92dfc2d) to c17c87c (merge-base) diff using: clickbench_extended File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_extended — base (merge-base)
clickbench_extended — branch
File an issue against this benchmark runner |
- Copy inline views (<=12 bytes) directly as u128 instead of re-reading bytes - Reuse length+prefix from input view for non-inline values via ByteView::from - Specialize vectorized_append for all-inline case using extend - Pre-reserve views capacity in vectorized_append - Hoist as_byte_view cast out of per-row loop in Nulls::Some path Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
run benchmarks |
|
run benchmark clickbench_extended |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing shrink-arrow-bytes-view-map-entry (c0b212a) to c17c87c (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing shrink-arrow-bytes-view-map-entry (c0b212a) to c17c87c (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing shrink-arrow-bytes-view-map-entry (c0b212a) to c17c87c (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing shrink-arrow-bytes-view-map-entry (c0b212a) to c17c87c (merge-base) diff using: clickbench_extended File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_extended — base (merge-base)
clickbench_extended — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
run benchmark clickbench_extended |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing shrink-arrow-bytes-view-map-entry (c0b212a) to c17c87c (merge-base) diff using: clickbench_extended File an issue against this benchmark runner |
Use Dandandan/arrow-rs-object-store#io-uring-get-ranges via [patch.crates-io] to test io_uring-based batch reads for LocalFileSystem (apache/arrow-rs-object-store#684). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit b06c0e9.
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_extended — base (merge-base)
clickbench_extended — branch
File an issue against this benchmark runner |
- Combine len + prefix into single u64 comparison in do_equal_to_inner - For inline values, compare upper 64 bits directly after len+prefix match - Hoist arr.views() out of per-row loops in all vectorized_append paths - Add do_append_val_with_view to avoid redundant view lookups per row Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing shrink-arrow-bytes-view-map-entry (7542f9e) to c17c87c (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing shrink-arrow-bytes-view-map-entry (7542f9e) to c17c87c (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing shrink-arrow-bytes-view-map-entry (7542f9e) to c17c87c (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
Which issue does this PR close?
Related to memory optimization.
Rationale for this change
ArrowBytesViewMapis used forGROUP BYandCOUNT DISTINCTon string/binary view types. Each hash tableEntrypreviously stored{ view: u128, hash: u64, payload: V }(32+ bytes).The
view: u128was redundant with theviewsVec, and thepayload: Vwas always either()(for sets) or equivalent to the insertion index (for group-by). Since entries are inserted sequentially, the view index is the group index.What changes are included in this PR?
Vgeneric parameter fromArrowBytesViewMapandEntryview: u128inEntrywithview_idx: usize(index into theviewsVec)Entryshrinks from 32+ bytes to 16 bytes (50%+ reduction in per-entry hash table memory)insert_if_newAPI from two callbacks (make_payload_fn,observe_payload_fn) to one (observe_fn(usize))GroupValuesBytesViewby removing redundantnum_groupsfieldAre these changes tested?
Yes, existing tests pass. Added
test_entry_sizeto verify Entry is 16 bytes.Are there any user-facing changes?
No.
🤖 Generated with Claude Code