Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rust] improve performance of filter kernel #25591

Closed
asfimport opened this issue Jul 19, 2020 · 1 comment
Closed

[Rust] improve performance of filter kernel #25591

asfimport opened this issue Jul 19, 2020 · 1 comment

Comments

@asfimport
Copy link

The filter kernel located here https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/filter.rs

currently has the following performance:

filter old u8 low selectivity time: [1.7782 ms 1.7790 ms 1.7801 ms]
filter old u8 high selectivity time: [815.58 us 816.58 us 817.57 us]
filter old u8 w NULLs low selectivity time: [1.8131 ms 1.8231 ms 1.8336 ms]
filter old u8 w NULLs high selectivity time: [817.41 us 820.01 us 823.05 us]

I have been working on a new implementation which performs between approximately 14 and 480 times faster depending mostly on filter selectivity. Here are the benchmark results:

filter u8 low selectivity time: [127.30 us 128.06 us 128.88 us]
filter u8 high selectivity time: [5.4215 us 5.5778 us 5.7335 us]
filter context u8 low selectivity time: [124.21 us 126.21 us 128.38 us]
filter context u8 high selectivity time: [1.6707 us 1.7052 us 1.7476 us]
filter context u8 w NULLs low selectivity time: [142.40 us 142.83 us 143.37 us]
filter context u8 w NULLs high selectivity time: [2.3338 us 2.3788 us 2.4304 us]
filter context f32 low selectivity time: [132.59 us 132.91 us 133.29 us]
filter context f32 high selectivity time: [1.6864 us 1.7026 us 1.7212 us]

This new implementation is based on a few key ideas:

(1) if the data array being filtered doesn't have a null bitmap, no time should be wasted to copy or create a null bitmap in the resulting filtered data array - this is implemented using the CopyNullBit trait which has a no-op implementation and an actual implementation

(2) when the filter is highly selective, e.g. only a small number of values from the data array are selected, the filter implementation should efficiently skip entire batches of 0s in the filter array - this is implemented by transmuting the filter array to u64 which allows to quickly check and skip entire batches of 64 bits 

(3) when an entire record batch is filtered, any computation which only depends on the filter array is done once and then shared for filtering all the data arrays in the record batch - this is implemented using the FilterContext struct

 

@paddyhoran, @andygrove  let me know what you think

Reporter: Yordan Pavlov / @yordan-pavlov

PRs and other links:

Note: This issue was originally created as ARROW-9523. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Andy Grove / @andygrove:
Issue resolved by pull request 7798
#7798

@asfimport asfimport added this to the 2.0.0 milestone Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant