Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more cuda/hip reducer tunings #1625

Merged
merged 45 commits into from
May 3, 2024

Conversation

MrBurmark
Copy link
Member

@MrBurmark MrBurmark commented Apr 7, 2024

Add more cuda/hip reducer tunings

Add option to initialize on the host for reducers using atomics.
Add option to use an algorithm that avoids device scope fences.

  • This PR is a feature
  • It does the following (modify list as needed):
    • Adds more reducer tuning options at the request of people using reducers

TODO (incomplete moved to #1635):

  • change the arguments in the reduction tuning to enum structs, or restructure away from bools
  • add config flag so the hip intrinsic code can be compiled out
  • add config flag so the default atomic policy can be the host or non-host policy
  • Add support in the new reducer interface for tuning options, this interface should allow single reduction trees and coalesced atomics when multiple reducers are used Will do later
  • finalize reducer on host with binary tree reduction
  • improve user docs
  • improve code docs
  • improve gpu reducer test compile times, maybe break up over reduce policies or move loop over policies into the tests increased the time for cuda compiles instead

@MrBurmark
Copy link
Member Author

I'm not sure why the nvcc 10.1.243 + gcc 8.3.1 test is not building, I can build it manually. The other failing tests are timeouts as the reduce tests now build with 6 different reduction policies instead of 1.

@rhornung67
Copy link
Member

I'm not sure why the nvcc 10.1.243 + gcc 8.3.1 test is not building, I can build it manually. The other failing tests are timeouts as the reduce tests now build with 6 different reduction policies instead of 1.

That build fails for me same as the CI log shows. The build is configured with desul atomics enabled. Does that work for you?

@MrBurmark MrBurmark force-pushed the feature/burmark1/reduction_tunings branch 3 times, most recently from a460b32 to b31f7c9 Compare April 20, 2024 00:32
Base automatically changed from feature/burmark1/occgs_tuning_options to develop April 20, 2024 03:10
@MrBurmark MrBurmark force-pushed the feature/burmark1/reduction_tunings branch 4 times, most recently from b3fb79c to 59fde5c Compare April 22, 2024 14:43
@MrBurmark MrBurmark requested review from artv3 and rchen20 April 22, 2024 14:44
@MrBurmark
Copy link
Member Author

MrBurmark commented Apr 29, 2024

One option to improve compile times is to expand the reducer policies inside of the reducer tests instead of expanding that into more tests through gtest. That can be done with the newly added for_each_type function. What do you think @rhornung67?

@rhornung67
Copy link
Member

One option to improve compile times is to expand the reducer policies inside of the reducer tests instead of expanding that into more tests through gtest. That can be done with the newly added for_each_type function. What do you think @rhornung67?

Sounds reasonable. I think test build times were more of a concern with older vendor compilers (vendors shall not be named here)

@MrBurmark MrBurmark force-pushed the feature/burmark1/reduction_tunings branch from 59fde5c to b9e01ea Compare April 29, 2024 16:45
This lets you choose between cuda/hip_exec and
cuda/hip_exec_with_reduce similarly to how
cuda/hip_reduce_base<maybe_atomic> lets you choose betwen
cuda/hip_reduce and cuda/hip_reduce_atomic
@MrBurmark MrBurmark requested a review from rhornung67 May 2, 2024 17:16
@MrBurmark MrBurmark requested a review from artv3 May 2, 2024 17:21
Copy link
Member

@rhornung67 rhornung67 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you for all your hard work on this @MrBurmark

@MrBurmark MrBurmark force-pushed the feature/burmark1/reduction_tunings branch from 3826a8a to f987c39 Compare May 2, 2024 20:59
@MrBurmark MrBurmark enabled auto-merge May 2, 2024 23:24
@MrBurmark MrBurmark merged commit 9f6d349 into develop May 3, 2024
24 checks passed
@MrBurmark MrBurmark deleted the feature/burmark1/reduction_tunings branch May 3, 2024 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants