perf(parquet): LevelInfoBuilder batch write when no repetition childs#10037
perf(parquet): LevelInfoBuilder batch write when no repetition childs#10037mapleFU wants to merge 3 commits into
Conversation
| /// Returns `true` if the child contains no nested repetition levels, meaning | ||
| /// each child element produces exactly one rep_level entry in the leaf. | ||
| /// This is true for `Primitive` children and `Struct` trees with no list descendants. | ||
| fn child_has_no_nested_rep(&self) -> bool { |
There was a problem hiding this comment.
This can be stored as an member of List element, avoiding querying nested.
| // contiguous non-empty list slots into a single child.write() call, then | ||
| // fix up the rep_levels at list-slot boundaries using offsets directly. | ||
| if child.child_has_no_nested_rep() { | ||
| Self::write_list_last_level(child, ctx, offsets, nulls, range); |
There was a problem hiding this comment.
This doesn't works well on the case of List<Struct<a: List<i32>, b:i32, ...>, once any child has repetition level, the performance would be hurted and cannot batching the write.
|
|
||
| // Classify each slot then detect run boundaries. On each transition | ||
| // (or end of iteration), flush the completed run. | ||
| #[derive(Clone, Copy, PartialEq)] |
There was a problem hiding this comment.
I think the code here is much cleaner...
There was a problem hiding this comment.
list_primitive_sparse_99pct_null/bloom_filter 1.00 10.9±0.05ms 3.4 GB/sec 1.19 13.0±0.04ms 2.8 GB/sec
list_primitive_sparse_99pct_null/cdc 1.00 22.7±0.11ms 1645.6 MB/sec 1.09 24.7±0.07ms 1510.1 MB/sec
list_primitive_sparse_99pct_null/default 1.00 10.6±0.08ms 3.5 GB/sec 1.21 12.8±0.65ms 2.9 GB/sec
list_primitive_sparse_99pct_null/parquet_2 1.00 10.6±0.12ms 3.4 GB/sec 1.19 12.7±0.02ms 2.9 GB/sec
list_primitive_sparse_99pct_null/zstd 1.00 12.5±0.11ms 2.9 GB/sec 1.16 14.5±0.03ms 2.5 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2 1.00 10.8±0.08ms 3.4 GB/sec 1.18 12.8±0.03ms 2.8 GB/sec
I also reproduced this regression on my local env. Then I tried to change this back to the previous plain if-else chain flavor, and the regression seems gone. I wonder if this three-value enum breaks a tight if(false) loop that can be optimized/executed better 🥲
There was a problem hiding this comment.
oops, so this path can be optimized
There was a problem hiding this comment.
Pull request overview
Adds a fast path to LevelInfoBuilder::write_list that batches contiguous non-empty list slots into a single child write when the list child has no nested repetition levels (i.e., primitive children or struct trees of primitives). Previously, non-empty list slots were written one at a time with a reverse scan to mark list boundaries; the new path writes all leaf values in one call and stamps list-start markers at positions computed directly from offsets. This also benefits Map arrays (which internally dispatch to write_list) when their entries are primitive/struct.
Changes:
- New helper
child_has_no_nested_rep()to detect whether the list's child path contains list-like nesting. - New
write_list_last_level()that classifies slots into Null/Empty/NonEmpty runs and emits each kind in a batch. write_listdispatches to the new fast path when applicable.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
run benchmark arrow_writer |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing minor-optimize-batching (dd6d44e) to 1377761 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
Those look like nice improvements for list writing 🚀 |
Which issue does this PR close?
Rationale for this change
Parquet writer writes lists element one by one, this is extremly slow. This patch batches writes.
What changes are included in this PR?
Batches write when writing list with maximum rep level.
Are these changes tested?
Covered by existing
Are there any user-facing changes?
No