Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-12533: [C++] Add random real distribution function #10283

Closed
wants to merge 3 commits into from

Conversation

cyb70289
Copy link
Contributor

Clang with gnu libstdc++ produces code very slow in generating random
real numbers on Arm64.

This patch implements three random utilities based on clang libc++:

  • std::generate_canonical
  • std::random_real_distribution
  • std::bernoulli_distribution

It brings ~100x speedup on Arm64 and ~8x on x86_64 in generating
random reals when build arrow with clang + gnu libstdc++.
No influence to gcc + libstdc++, or clang + libc++.

@github-actions
Copy link

@cyb70289
Copy link
Contributor Author

cyb70289 commented May 10, 2021

NOTE: Below benchmark results are the same, but the total run time differs much. The saved time is from test preparation (generating random bools by calling bernoulli_distribution), not the test itself.

Test on Neoverse N1. Clang-10, libstdc++-9.

Before: 59.9s total run time

$ time release/arrow-compute-aggregate-benchmark --benchmark_filter="ModeKernelNarrow<BooleanType>/1048576/10000"
------------------------------------------------------------------------------------------------------
Benchmark                                            Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------
ModeKernelNarrow<BooleanType>/1048576/10000        637 us          637 us         1097 bytes_per_second=1.53351G/s null_percent=0.01 size=1048.58k

real    0m59.902s
user    0m59.880s
sys     0m0.020s

After: 1.2s total run time

$ time release/arrow-compute-aggregate-benchmark --benchmark_filter="ModeKernelNarrow<BooleanType>/1048576/10000"
------------------------------------------------------------------------------------------------------
Benchmark                                            Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------
ModeKernelNarrow<BooleanType>/1048576/10000        637 us          637 us         1098 bytes_per_second=1.53253G/s null_percent=0.01 size=1048.58k

real    0m1.210s
user    0m1.202s
sys     0m0.008s

@cyb70289
Copy link
Contributor Author

cyb70289 commented May 10, 2021

Tested on Xeon Gold 5218. Clang10, libstdc++-9.

Before: 5.4s total run time

$ time release/arrow-compute-aggregate-benchmark --benchmark_filter="ModeKernelNarrow<BooleanType>/1048576/10000"
------------------------------------------------------------------------------------------------------
Benchmark                                            Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------
ModeKernelNarrow<BooleanType>/1048576/10000        550 us          550 us         1279 bytes_per_second=1.7744G/s null_percent=0.01 size=1048.58k

real    0m5.445s
user    0m5.417s
sys     0m0.008s

After: 1.2s total run time

$ time release/arrow-compute-aggregate-benchmark --benchmark_filter="ModeKernelNarrow<BooleanType>/1048576/10000"
------------------------------------------------------------------------------------------------------
Benchmark                                            Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------
ModeKernelNarrow<BooleanType>/1048576/10000        551 us          551 us         1267 bytes_per_second=1.7723G/s null_percent=0.01 size=1048.58k

real    0m1.197s
user    0m1.180s
sys     0m0.000s

@cyb70289
Copy link
Contributor Author

Timed running all arrow-compute-*-benchmark tests on x86, clang. Total time drops from 26 to 18 min.

@cyb70289
Copy link
Contributor Author

Clang with gnu libstdc++ produces code very slow in generating random
real numbers on Arm64.

This patch implements three random utilities based on clang libc++:
- std::generate_canonical
- std::random_real_distribution
- std::bernoulli_distribution

It brings ~100x speedup on Arm64 and ~8x on x86_64 in generating
random reals when build arrow with clang + gnu libstdc++.
No influence to gcc + libstdc++, or clang + libc++.
@pitrou
Copy link
Member

pitrou commented May 11, 2021

Thank you @cyb70289 !

@cyb70289 cyb70289 deleted the 12533-random-real branch May 11, 2021 09:12
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 13, 2021
Clang with gnu libstdc++ produces code very slow in generating random
real numbers on Arm64.

This patch implements three random utilities based on clang libc++:
- std::generate_canonical
- std::random_real_distribution
- std::bernoulli_distribution

It brings ~100x speedup on Arm64 and ~8x on x86_64 in generating
random reals when build arrow with clang + gnu libstdc++.
No influence to gcc + libstdc++, or clang + libc++.

Closes apache#10283 from cyb70289/12533-random-real

Authored-by: Yibo Cai <yibo.cai@arm.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants