New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Max Occupancy Fraction Option #1620
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MrBurmark Please add some simple, introductory user guide documentation for this. If you put some basic stuff in, I will help expand it as needed.
@rhornung67 I'll add documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MrBurmark !
I thought more thoroughly on the design with @rhornung67 help and the implementation is now a little different. I'll update the description above. |
Template iteration_mapping types to allow a modifying fraction to be added that is used when calculating the max number of blocks to launch of kernels where the number of blocks is not specified.
These policies will represent the recommended way to use the occupancy calculator.
cuda/hip_exec_occ_calc_recommended changed to cuda/hip_exec_rec_for_reduce
Remove modifiers from loop iteration mappings and move the occupancy calculator modifications into Concretizer classes that are used when block size or grid size is not specified in the ForallDimensionCalculator.
occ_calc now uses the default (may not be max) occ_max added to use max occ_custom added for using whatever concretizer you'd like occ_avoid_max removed
c4f2460
to
69e8bd7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this is looking good. I made a variety of comments, suggestions, etc. about documentation, code structure, etc.
Co-authored-by: Rich Hornung <hornung1@llnl.gov>
Co-authored-by: Rich Hornung <hornung1@llnl.gov>
Co-authored-by: Rich Hornung <hornung1@llnl.gov>
Add GPU Max Occupancy Fraction Option
Add an option to change the max grid size in for_all policies that use the occupancy calculator.
The most important thing to be aware of is which policy gets the best performance with reducers, so an alias
RAJA::hip_exec_rec_for_reduce<BLOCKS_SIZE>
now exists which is the policy that performed best with reducers in our testing.Another thing to note is that the meaning of the default occupancy calculator policy is changing,
(hip|cuda)_exec_occ_calc<BLOCKS_SIZE>
, is now the policy using the occupancy calculator that performed best in our testing for non-reducer loops.More occupancy calculator policies exist now as well.
(hip|cuda)_exec_occ_max<BLOCKS_SIZE>
policy lets you run with the max occupancy.(hip|cuda)_exec_occ_fraction<BLOCKS_SIZE, Fraction<size_t, numerator, denominator>>
policy lets you run with a fraction of the max occupancy.(hip|cuda)_exec_occ_custom<BLOCKS_SIZE, Concretizer>
policy lets you use any concretizer.This is implemented by adding a concretizer class to the exec policies used with for_all. When grid size of block size is not specified this class is used to pick the value. For example
RAJA::CudaMaxOccupancyConcretizer
uses the occupancy calculator to get the highest occupancy possible with each kernel. Another example 'RAJA::CudaFractionOffsetOccupancyConcretizer<Fraction, BLOCKS_PER_SM_OFFSET>` lets you use a fraction of and/or offset the max occupancy.hip_exec_rec_for_reduce
at the request of people using reducers