-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more cuda/hip reducer tunings #1625
Conversation
I'm not sure why the nvcc 10.1.243 + gcc 8.3.1 test is not building, I can build it manually. The other failing tests are timeouts as the reduce tests now build with 6 different reduction policies instead of 1. |
That build fails for me same as the CI log shows. The build is configured with desul atomics enabled. Does that work for you? |
a460b32
to
b31f7c9
Compare
b3fb79c
to
59fde5c
Compare
One option to improve compile times is to expand the reducer policies inside of the reducer tests instead of expanding that into more tests through gtest. That can be done with the newly added for_each_type function. What do you think @rhornung67? |
Sounds reasonable. I think test build times were more of a concern with older vendor compilers (vendors shall not be named here) |
59fde5c
to
b9e01ea
Compare
This lets you choose between cuda/hip_exec and cuda/hip_exec_with_reduce similarly to how cuda/hip_reduce_base<maybe_atomic> lets you choose betwen cuda/hip_reduce and cuda/hip_reduce_atomic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thank you for all your hard work on this @MrBurmark
3826a8a
to
f987c39
Compare
…erse ordering from CUDA/HIP
…RAJA into feature/burmark1/reduction_tunings
Add more cuda/hip reducer tunings
Add option to initialize on the host for reducers using atomics.
Add option to use an algorithm that avoids device scope fences.
TODO (incomplete moved to #1635):
add config flag so the hip intrinsic code can be compiled outadd config flag so the default atomic policy can be the host or non-host policyAdd support in the new reducer interface for tuning options, this interface should allow single reduction trees and coalesced atomics when multiple reducers are usedWill do laterimprove gpu reducer test compile times, maybe break up over reduce policies or move loop over policies into the testsincreased the time for cuda compiles instead