perf: speed up primitive group value interning, use less memory#21977
perf: speed up primitive group value interning, use less memory#21977Dandandan wants to merge 2 commits intoapache:mainfrom
Conversation
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing speed-up-single-primitive-group-by (b79d787) to 948cd09 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing speed-up-single-primitive-group-by (b79d787) to 948cd09 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing speed-up-single-primitive-group-by (b79d787) to 948cd09 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
b79d787 to
611f091
Compare
| } | ||
|
|
||
| fn single_group_by_primitive_intern(c: &mut Criterion) { | ||
| const BATCH_SIZE: usize = 4096; |
There was a problem hiding this comment.
should be batch size be like default?
| EmitTo::All => { | ||
| self.map.clear(); | ||
| build_primitive(std::mem::take(&mut self.values), self.null_group.take()) | ||
| let mut values = vec![T::default_value(); self.num_groups]; |
|
run benchmarks |
|
run benchmark clickbench_extended |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing speed-up-single-primitive-group-by (58beeff) to 948cd09 (merge-base) diff using: clickbench_extended File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing speed-up-single-primitive-group-by (58beeff) to 948cd09 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing speed-up-single-primitive-group-by (58beeff) to 948cd09 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing speed-up-single-primitive-group-by (58beeff) to 948cd09 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_extended — base (merge-base)
clickbench_extended — branch
File an issue against this benchmark runner |
|
run benchmark clickbench_partitioned |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing speed-up-single-primitive-group-by (58beeff) to 948cd09 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
Which issue does this PR close?
N/A.
Rationale for this change
GroupValuesPrimitivestored hashes in the hash table. For single primitive group keys, storing the primitive value in the table lets hashbrown recompute hashes from the value and removes separate hash storage and values-vector lookups in the hot interning path.What changes are included in this PR?
(group_index, value)inGroupValuesPrimitiveinstead of(group_index, hash)plus a separate values vector.Benchmarks, baseline vs this branch:
GroupValuesPrimitive_intern/low_cardinality: 138.57 us -> 96.868 us (-30.10%)GroupValuesPrimitive_intern/high_cardinality: 1.1006 ms -> 1.0390 ms (-5.60%)aggregate_query_group_by_u64 15 12: 785.50 us -> 754.62 us (-3.93%)Are these changes tested?
Note:
cargo clippy --all-targets --all-features -- -D warningscurrently fails in this workspace because--all-featuresenables both benchmark allocator featuressnmallocandmimalloc;ci/scripts/rust_clippy.shis the repository CI clippy command and passed.Are there any user-facing changes?
No.