feat: Improved multiple column aggregation performance by using bitmasks rather than Vec<bool>#21886
feat: Improved multiple column aggregation performance by using bitmasks rather than Vec<bool>#21886huymq1710 wants to merge 9 commits intoapache:mainfrom
Vec<bool>#21886Conversation
Vec<bool>Vec<bool>
This reverts commit 31556f6.
This reverts commit 31556f6.
28b5b37 to
27dfdb7
Compare
…q1710/datafusion into column-aggregation-bitmasks
|
@huymq1710 thank you for this PR, did you see any improvement in performance? my original thought was that I can do bulk matching and build mask for every 64 items but because the indexes may not contiguous it will require scatter/gather which is expensive so I did not continue through |
|
run benchmarks |
|
run benchmark aggregate_query_sql |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing column-aggregation-bitmasks (f3f0def) to bbf67d9 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing column-aggregation-bitmasks (f3f0def) to bbf67d9 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing column-aggregation-bitmasks (f3f0def) to bbf67d9 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing column-aggregation-bitmasks (f3f0def) to bbf67d9 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
Even if it doesn't improve perf measurable, I think we can take the changes. I think aggregation is very sensitive to memory / cache usage, so the more we can do to reduce the state the better and at some point it will pay off as aggregation state moves some level up (RAM - > L3, L3 -> L2, etc.) I think there is more low hanging fruit in this area
|
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
run benchmark aggregate_query_sql |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing column-aggregation-bitmasks (f878431) to bbf67d9 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
Thanks for the reviews and running the benchmarks Performance appears neutral (1.00x - 1.03x), so the benefit is primarily the state reduction and memory efficiency mentioned by @Dandandan My goal is to learn DataFusion's internals, so I'm happy to either merge or close it if you'd prefer to keep the current implementation |
Which issue does this PR close?
Vec<bool>#18676Rationale for this change
Using bitmasks rather than
Vec<bool>What changes are included in this PR?
BooleanBufferBuilderwithas_slice_mut()+apply_bitwise_binary_opfor the primitive non-nullable pathAre these changes tested?
Are there any user-facing changes?
No