perf(interleave): Optimize list interleave_list when child is primitive#10025
perf(interleave): Optimize list interleave_list when child is primitive#10025mapleFU wants to merge 9 commits into
Conversation
|
Benchmark shows |
| ); | ||
| } | ||
| None => { | ||
| // Slow path. For a non-nullable source, set the bit range to all 1s directly. |
There was a problem hiding this comment.
I think this is merely happens so uses slow path
There was a problem hiding this comment.
And I don't find set_bits function, though it's easy to add, I just think we can add it separately.
| /// Specialized interleave for list child arrays that are primitive. | ||
| /// Directly copies typed value slices and null bit ranges without | ||
| /// going through MutableArrayData's function pointer indirection. | ||
| fn interleave_list_primitive_child<O: OffsetSizeTrait, T: ArrowPrimitiveType>( |
There was a problem hiding this comment.
I used to uses MutableArrayData, but it's about 15% slower than this implementation.
| // For primitive child types, directly copy typed value slices and null bit | ||
| // ranges, avoiding both the intermediate child_indices Vec allocation and | ||
| // MutableArrayData's function pointer indirection. | ||
| field.data_type() => (list_primitive_helper), |
There was a problem hiding this comment.
I think this is just for type which could be copied fastly, for List<List<...>>, still we need some optimizations
| _ => { | ||
| // For complex child types (nested lists, structs, views, dictionaries, etc.), | ||
| // use recursive interleave to benefit from type-specific optimizations. | ||
| let mut child_indices = Vec::with_capacity(capacity); |
There was a problem hiding this comment.
This keeps the previous code
There was a problem hiding this comment.
Pull request overview
This PR optimizes interleave_list for List<Primitive> (and LargeList<Primitive>) by introducing a specialized fast path that avoids building per-element child indices and instead copies contiguous slices of primitive values and validity bits.
Changes:
- Added
interleave_list_primitive_childto build interleaved primitive child arrays via contiguous slice copies - Updated
interleave_listto dispatch to the new fast path usingdowncast_primitive!, falling back to recursiveinterleavefor non-primitive children - Added bitmap utilities (
set_bits,bit_util) to efficiently construct the child validity buffer
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Slow path. For a non-nullable source, set the bit range to all 1s directly. | ||
| let buf = null_buf.as_slice_mut(); | ||
| (offset_write..offset_write + len).for_each(|i| bit_util::set_bit(buf, i)); | ||
| } |
There was a problem hiding this comment.
I don't know whether set_bits works well for 0xFF sequence...
c1229f1 to
63c76e4
Compare
|
run benchmark interleave_kernels |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize-list-interleave-primitive (2a619c4) to e470187 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
Which issue does this PR close?
Rationale for this change
Optimize interleave_list when child is primitive type.
What changes are included in this PR?
interleave_list_primitive_childfunctionAre these changes tested?
Covered by existing
Are there any user-facing changes?
no