-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Open
Potential Improved multiple column aggregation performance by using bitmasks rather than
Vec<bool>#18676Feature
Copy link
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
once and if we change from &mut [bool] to mutable packed bits we could:
- evaluate in chunks of 64 items (I tried different variations to see what is the best - you can tweak in the godbolt above with different type and size to check for yourself), 64 is not necessarily the best but it will be the fastest I think for doing AND with the equal_to_results boolean buffer
- add optimization for nullable as well by just doing bitwise operation at 64 items at a time and avoid the cost of getting each bit manually
- skip 64 items right away if the the equal_to_results equal to 0x00 (i.e. all false)
I believe he is referring to this code:
datafusion/datafusion/physical-plan/src/aggregates/group_values/multi_group_by/mod.rs
Lines 71 to 86 in 73038f5
| /// The vectorized version equal to | |
| /// | |
| /// When found nth row stored in this builder at `lhs_row` | |
| /// is equal to the row in `array` at `rhs_row`, | |
| /// it will record the `true` result at the corresponding | |
| /// position in `equal_to_results`. | |
| /// | |
| /// And if found nth result in `equal_to_results` is already | |
| /// `false`, the check for nth row will be skipped. | |
| fn vectorized_equal_to( | |
| &self, | |
| lhs_rows: &[usize], | |
| array: &ArrayRef, | |
| rhs_rows: &[usize], | |
| equal_to_results: &mut [bool], | |
| ); |
Describe the solution you'd like
So basically rather than passing in &[mut bool] it would take a BooleanBufferBuilder or something equivalent.
fn vectorized_equal_to(
&self,
lhs_rows: &[usize],
array: &ArrayRef,
rhs_rows: &[usize],
equal_to_results: &BooleanBufferBuilder, // <--- Pass in some sort of bitmask representation rather than Vec<bool>
);Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request