-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-17305: [C++] Avoid spending time in popcount in BitmapAnd benchmark #13794
Conversation
|
…mark This was artificially limiting the reported performance of BitmapAnd. Before: ``` -------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------- BenchmarkBitmapAnd/32768/0 1708 ns 1708 ns 408579 bytes_per_second=17.8726G/s BenchmarkBitmapAnd/131072/0 6968 ns 6965 ns 102223 bytes_per_second=17.5262G/s BenchmarkBitmapAnd/32768/1 3982 ns 3981 ns 175136 bytes_per_second=7.66574G/s BenchmarkBitmapAnd/131072/1 15574 ns 15569 ns 44988 bytes_per_second=7.8404G/s BenchmarkBitmapAnd/32768/2 3999 ns 3998 ns 175021 bytes_per_second=7.63248G/s BenchmarkBitmapAnd/131072/2 15589 ns 15585 ns 44844 bytes_per_second=7.83234G/s ``` After: ``` -------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------- BenchmarkBitmapAnd/32768/0 732 ns 732 ns 967465 bytes_per_second=41.6736G/s BenchmarkBitmapAnd/131072/0 3105 ns 3105 ns 229726 bytes_per_second=39.3198G/s BenchmarkBitmapAnd/32768/1 2913 ns 2913 ns 240233 bytes_per_second=10.4774G/s BenchmarkBitmapAnd/131072/1 11528 ns 11526 ns 60865 bytes_per_second=10.5912G/s BenchmarkBitmapAnd/32768/2 2924 ns 2924 ns 236873 bytes_per_second=10.4378G/s BenchmarkBitmapAnd/131072/2 11552 ns 11550 ns 60619 bytes_per_second=10.5691G/s ``` (I didn't check, but the compiler here probably auto-vectorizes the aligned code path)
8d94169
to
f2f03a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I don't really know the details of what must be done to prevent the compiler from optimizing away a computation but while the new numbers are fast but I'm pretty sure they are also realistic. I get similar numbers of my system and I think it works out to ~8bytes/cycle.
Yep, if I set |
CI error not related |
Benchmark runs are scheduled for baseline = 81ded07 and contender = 56e6caf. 56e6caf is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
['Python', 'R'] benchmarks have high level of regressions. |
This was artificially limiting the reported performance of BitmapAnd.
Before:
After:
(I didn't check, but the compiler here probably auto-vectorizes the aligned code path)