ARROW-17305: [C++] Avoid spending time in popcount in BitmapAnd benchmark #13794

pitrou · 2022-08-04T09:26:02Z

This was artificially limiting the reported performance of BitmapAnd.

Before:

--------------------------------------------------------------------------------------
Benchmark                            Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------
BenchmarkBitmapAnd/32768/0        1708 ns         1708 ns       408579 bytes_per_second=17.8726G/s
BenchmarkBitmapAnd/131072/0       6968 ns         6965 ns       102223 bytes_per_second=17.5262G/s
BenchmarkBitmapAnd/32768/1        3982 ns         3981 ns       175136 bytes_per_second=7.66574G/s
BenchmarkBitmapAnd/131072/1      15574 ns        15569 ns        44988 bytes_per_second=7.8404G/s
BenchmarkBitmapAnd/32768/2        3999 ns         3998 ns       175021 bytes_per_second=7.63248G/s
BenchmarkBitmapAnd/131072/2      15589 ns        15585 ns        44844 bytes_per_second=7.83234G/s

After:

--------------------------------------------------------------------------------------
Benchmark                            Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------
BenchmarkBitmapAnd/32768/0         732 ns          732 ns       967465 bytes_per_second=41.6736G/s
BenchmarkBitmapAnd/131072/0       3105 ns         3105 ns       229726 bytes_per_second=39.3198G/s
BenchmarkBitmapAnd/32768/1        2913 ns         2913 ns       240233 bytes_per_second=10.4774G/s
BenchmarkBitmapAnd/131072/1      11528 ns        11526 ns        60865 bytes_per_second=10.5912G/s
BenchmarkBitmapAnd/32768/2        2924 ns         2924 ns       236873 bytes_per_second=10.4378G/s
BenchmarkBitmapAnd/131072/2      11552 ns        11550 ns        60619 bytes_per_second=10.5691G/s

(I didn't check, but the compiler here probably auto-vectorizes the aligned code path)

github-actions · 2022-08-04T09:26:50Z

https://issues.apache.org/jira/browse/ARROW-17305

github-actions · 2022-08-04T09:26:51Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

…mark This was artificially limiting the reported performance of BitmapAnd. Before: ``` -------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------- BenchmarkBitmapAnd/32768/0 1708 ns 1708 ns 408579 bytes_per_second=17.8726G/s BenchmarkBitmapAnd/131072/0 6968 ns 6965 ns 102223 bytes_per_second=17.5262G/s BenchmarkBitmapAnd/32768/1 3982 ns 3981 ns 175136 bytes_per_second=7.66574G/s BenchmarkBitmapAnd/131072/1 15574 ns 15569 ns 44988 bytes_per_second=7.8404G/s BenchmarkBitmapAnd/32768/2 3999 ns 3998 ns 175021 bytes_per_second=7.63248G/s BenchmarkBitmapAnd/131072/2 15589 ns 15585 ns 44844 bytes_per_second=7.83234G/s ``` After: ``` -------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------- BenchmarkBitmapAnd/32768/0 732 ns 732 ns 967465 bytes_per_second=41.6736G/s BenchmarkBitmapAnd/131072/0 3105 ns 3105 ns 229726 bytes_per_second=39.3198G/s BenchmarkBitmapAnd/32768/1 2913 ns 2913 ns 240233 bytes_per_second=10.4774G/s BenchmarkBitmapAnd/131072/1 11528 ns 11526 ns 60865 bytes_per_second=10.5912G/s BenchmarkBitmapAnd/32768/2 2924 ns 2924 ns 236873 bytes_per_second=10.4378G/s BenchmarkBitmapAnd/131072/2 11552 ns 11550 ns 60619 bytes_per_second=10.5691G/s ``` (I didn't check, but the compiler here probably auto-vectorizes the aligned code path)

westonpace

+1. I don't really know the details of what must be done to prevent the compiler from optimizing away a computation but while the new numbers are fast but I'm pretty sure they are also realistic. I get similar numbers of my system and I think it works out to ~8bytes/cycle.

westonpace · 2022-08-04T22:35:46Z

Yep, if I set ARROW_SIMD_LEVEL=NONE then I get slower performance (it's actually pretty similar to what your before numbers were) so there must be some kind of auto-vectorization going on.

cyb70289 · 2022-08-05T03:01:14Z

CI error not related

ursabot · 2022-08-05T05:21:48Z

Benchmark runs are scheduled for baseline = 81ded07 and contender = 56e6caf. 56e6caf is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Failed ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.27% ⬆️0.65%] test-mac-arm
[Finished ⬇️0.54% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.43% ⬆️1.03%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 56e6caf0 ec2-t3-xlarge-us-east-2
[Failed] 56e6caf0 test-mac-arm
[Finished] 56e6caf0 ursa-i9-9960x
[Finished] 56e6caf0 ursa-thinkcentre-m75q
[Failed] 81ded071 ec2-t3-xlarge-us-east-2
[Finished] 81ded071 test-mac-arm
[Finished] 81ded071 ursa-i9-9960x
[Finished] 81ded071 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

ursabot · 2022-08-05T05:22:03Z

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

pitrou requested a review from westonpace August 4, 2022 09:26

github-actions bot added the Component: C++ label Aug 4, 2022

pitrou force-pushed the ARROW-17305-bitmap-and-popcount branch from 8d94169 to f2f03a1 Compare August 4, 2022 15:03

westonpace approved these changes Aug 4, 2022

View reviewed changes

cyb70289 approved these changes Aug 5, 2022

View reviewed changes

cyb70289 merged commit 56e6caf into apache:master Aug 5, 2022

pitrou deleted the ARROW-17305-bitmap-and-popcount branch August 5, 2022 06:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-17305: [C++] Avoid spending time in popcount in BitmapAnd benchmark #13794

ARROW-17305: [C++] Avoid spending time in popcount in BitmapAnd benchmark #13794

pitrou commented Aug 4, 2022

github-actions bot commented Aug 4, 2022

github-actions bot commented Aug 4, 2022

westonpace left a comment

westonpace commented Aug 4, 2022

cyb70289 commented Aug 5, 2022

ursabot commented Aug 5, 2022

ursabot commented Aug 5, 2022

ARROW-17305: [C++] Avoid spending time in popcount in BitmapAnd benchmark #13794

ARROW-17305: [C++] Avoid spending time in popcount in BitmapAnd benchmark #13794

Conversation

pitrou commented Aug 4, 2022

github-actions bot commented Aug 4, 2022

github-actions bot commented Aug 4, 2022

westonpace left a comment

Choose a reason for hiding this comment

westonpace commented Aug 4, 2022

cyb70289 commented Aug 5, 2022

ursabot commented Aug 5, 2022

ursabot commented Aug 5, 2022