-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce new utilities for writing Alpaka kernels [13.3.x] #43280
Introduce new utilities for writing Alpaka kernels [13.3.x] #43280
Conversation
`blocks_with_stride(acc, size)` returns a range than spans the (virtual) block indices required to cover the given problem size. For example, if size is 1000 and the block size is 16, it will return the range from 1 to 62. If the work division has more than 63 blocks, only the first 63 will perform one iteration of the loop, and the other will exit immediately. if the work division has less than 63 blocks, some of the blocks will perform more than one iteration, in order to cover then whole problem space. All threads in a block see the same loop iterations, while threads in different blocks may see a different number of iterations. `elements_in_block(acc, block, size)` returns a range that spans all the elements within the given block. Iterating over the range yields values of type ElementIndex, that contain both .global and .local indices of the corresponding element. If the work division has only one element per thread, the loop will perform at most one iteration. If the work division has more than one elements per thread, the loop will perform that number of iterations, or less if it reaches size.
`once_per_grid(acc)` returns true for a single thread within the kernel execution grid. Usually the condition is true for block 0 and thread 0, but these indices should not be relied upon. `once_per_block(acc)` returns true for a single thread within the block. Usually the condition is true for thread 0, but this index should not be relied upon.
backport #43205 |
please test |
+heterogeneous |
A new Pull Request was created by @fwyzard (Andrea Bocci) for CMSSW_13_3_X. It involves the following packages:
@makortel, @fwyzard can you please review it and eventually sign? Thanks. cms-bot commands are listed here
|
This pull request is fully signed and it will be integrated in one of the next CMSSW_13_3_X IBs after it passes the integration tests and once validation in the development release cycle CMSSW_14_0_X is complete. This pull request will now be reviewed by the release team before it's merged. @rappoccio, @sextonkennedy, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2) |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9c0391/35821/summary.html Comparison SummarySummary:
|
+1 |
PR description:
Introduce four new utilities for writing Alpaka kernels:
blocks_with_stride(acc, size)
elements_in_block(acc, block, size)
once_per_grid(acc)
once_per_block(acc)
Simplify the unit tests, and extend them to cover the newly introduced functionality.
blocks_with_stride
blocks_with_stride(acc, size)
returns a range than spans the (virtual) block indices required to cover the given problem size.For example, if size is 1000 and the block size is 16, it will return the range from 0 to 62 (63 blocks of 16 elements covers 1008 elements, enough for a total size of 1000).
If the work division has more than 63 blocks, only the first 63 will perform one iteration of the loop, and the other will exit immediately.
If the work division has less than 63 blocks, some of the blocks will perform more than one iteration, in order to cover then whole problem space.
All threads in a block see the same loop iterations, while threads in different blocks may see a different number of iterations.
elements_in_block
elements_in_block(acc, block, size)
returns a range that spans all the elements within the given block. Iterating over the range yields values of typeElementIndex
, that contain both.global
and.local
indices of the corresponding element.If the work division has only one element per thread, the loop will perform at most one iteration.
If the work division has more than one elements per thread, the loop will perform that number of iterations, or less if it reaches size.
once_per_grid
once_per_grid(acc)
evaluates to true for a single thread within the kernel execution grid.Usually the condition is true for block 0 and thread 0, but these indices should not be relied upon.
once_per_block
once_per_block(acc)
evaluates to true for a single thread within the block.Usually the condition is true for thread 0, but this index should not be relied upon.
PR validation:
The updated unit tests compile and pass.
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
Backport of #43205 to
CMSSW_13_3_X
.