Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Max Occupancy Fraction Option #1620

Merged
merged 24 commits into from Apr 19, 2024

Conversation

MrBurmark
Copy link
Member

@MrBurmark MrBurmark commented Apr 1, 2024

Add GPU Max Occupancy Fraction Option

Add an option to change the max grid size in for_all policies that use the occupancy calculator.

The most important thing to be aware of is which policy gets the best performance with reducers, so an alias RAJA::hip_exec_rec_for_reduce<BLOCKS_SIZE> now exists which is the policy that performed best with reducers in our testing.

Another thing to note is that the meaning of the default occupancy calculator policy is changing, (hip|cuda)_exec_occ_calc<BLOCKS_SIZE>, is now the policy using the occupancy calculator that performed best in our testing for non-reducer loops.

More occupancy calculator policies exist now as well. (hip|cuda)_exec_occ_max<BLOCKS_SIZE> policy lets you run with the max occupancy. (hip|cuda)_exec_occ_fraction<BLOCKS_SIZE, Fraction<size_t, numerator, denominator>> policy lets you run with a fraction of the max occupancy. (hip|cuda)_exec_occ_custom<BLOCKS_SIZE, Concretizer> policy lets you use any concretizer.

This is implemented by adding a concretizer class to the exec policies used with for_all. When grid size of block size is not specified this class is used to pick the value. For example RAJA::CudaMaxOccupancyConcretizer uses the occupancy calculator to get the highest occupancy possible with each kernel. Another example 'RAJA::CudaFractionOffsetOccupancyConcretizer<Fraction, BLOCKS_PER_SM_OFFSET>` lets you use a fraction of and/or offset the max occupancy.

  • This PR is a feature
  • It does the following:
    • Adds hip_exec_rec_for_reduce at the request of people using reducers

Copy link
Member

@rhornung67 rhornung67 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrBurmark Please add some simple, introductory user guide documentation for this. If you put some basic stuff in, I will help expand it as needed.

@MrBurmark
Copy link
Member Author

MrBurmark commented Apr 2, 2024

@rhornung67 I'll add documentation.
I think the design could use some more thorough thought though.

Copy link
Member

@tomstitt tomstitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MrBurmark !

@MrBurmark
Copy link
Member Author

I thought more thoroughly on the design with @rhornung67 help and the implementation is now a little different. I'll update the description above.

Template iteration_mapping types to allow a modifying fraction
to be added that is used when calculating the max number of blocks
to launch of kernels where the number of blocks is not specified.
These policies will represent the recommended way to use the
occupancy calculator.
cuda/hip_exec_occ_calc_recommended changed to
cuda/hip_exec_rec_for_reduce
Remove modifiers from loop iteration mappings
and move the occupancy calculator modifications into
Concretizer classes that are used when block size or grid size
is not specified in the ForallDimensionCalculator.
occ_calc now uses the default (may not be max)
occ_max added to use max
occ_custom added for using whatever concretizer you'd like
occ_avoid_max removed
@MrBurmark MrBurmark force-pushed the feature/burmark1/occgs_tuning_options branch from c4f2460 to 69e8bd7 Compare April 5, 2024 22:13
Copy link
Member

@rhornung67 rhornung67 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is looking good. I made a variety of comments, suggestions, etc. about documentation, code structure, etc.

MrBurmark and others added 2 commits April 19, 2024 11:12
Co-authored-by: Rich Hornung <hornung1@llnl.gov>
Co-authored-by: Rich Hornung <hornung1@llnl.gov>
Co-authored-by: Rich Hornung <hornung1@llnl.gov>
@MrBurmark MrBurmark merged commit 4074748 into develop Apr 19, 2024
24 checks passed
@MrBurmark MrBurmark deleted the feature/burmark1/occgs_tuning_options branch April 20, 2024 03:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants