Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new utilities for writing Alpaka kernels [13.3.x] #43280

Merged
merged 5 commits into from
Nov 16, 2023

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Nov 14, 2023

PR description:

Introduce four new utilities for writing Alpaka kernels:

  • blocks_with_stride(acc, size)
  • elements_in_block(acc, block, size)
  • once_per_grid(acc)
  • once_per_block(acc)

Simplify the unit tests, and extend them to cover the newly introduced functionality.


blocks_with_stride

blocks_with_stride(acc, size) returns a range than spans the (virtual) block indices required to cover the given problem size.

For example, if size is 1000 and the block size is 16, it will return the range from 0 to 62 (63 blocks of 16 elements covers 1008 elements, enough for a total size of 1000).
If the work division has more than 63 blocks, only the first 63 will perform one iteration of the loop, and the other will exit immediately.
If the work division has less than 63 blocks, some of the blocks will perform more than one iteration, in order to cover then whole problem space.

All threads in a block see the same loop iterations, while threads in different blocks may see a different number of iterations.

elements_in_block

elements_in_block(acc, block, size) returns a range that spans all the elements within the given block. Iterating over the range yields values of type ElementIndex, that contain both .global and .local indices of the corresponding element.

If the work division has only one element per thread, the loop will perform at most one iteration.
If the work division has more than one elements per thread, the loop will perform that number of iterations, or less if it reaches size.

once_per_grid

once_per_grid(acc) evaluates to true for a single thread within the kernel execution grid.

Usually the condition is true for block 0 and thread 0, but these indices should not be relied upon.

once_per_block

once_per_block(acc) evaluates to true for a single thread within the block.

Usually the condition is true for thread 0, but this index should not be relied upon.

PR validation:

The updated unit tests compile and pass.

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

Backport of #43205 to CMSSW_13_3_X.

`blocks_with_stride(acc, size)` returns a range than spans the
(virtual) block indices required to cover the given problem size.

For example, if size is 1000 and the block size is 16, it will return
the range from 1 to 62.
If the work division has more than 63 blocks, only the first 63 will
perform one iteration of the loop, and the other will exit immediately.
if the work division has less than 63 blocks, some of the blocks will
perform more than one iteration, in order to cover then whole problem
space.

All threads in a block see the same loop iterations, while threads in
different blocks may see a different number of iterations.

`elements_in_block(acc, block, size)` returns a range that spans all
the elements within the given block.
Iterating over the range yields values of type ElementIndex, that
contain both .global and .local indices of the corresponding element.

If the work division has only one element per thread, the loop will
perform at most one iteration. If the work division has more than one
elements per thread, the loop will perform that number of iterations,
or less if it reaches size.
`once_per_grid(acc)` returns true for a single thread within the kernel
execution grid. Usually the condition is true for block 0 and thread 0,
but these indices should not be relied upon.

`once_per_block(acc)` returns true for a single thread within the block.
Usually the condition is true for thread 0, but this index should not be
relied upon.
@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 14, 2023

backport #43205

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 14, 2023

please test

@cmsbuild cmsbuild added this to the CMSSW_13_3_X milestone Nov 14, 2023
@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 14, 2023

+heterogeneous

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 14, 2023

A new Pull Request was created by @fwyzard (Andrea Bocci) for CMSSW_13_3_X.

It involves the following packages:

  • HeterogeneousCore/AlpakaInterface (heterogeneous)

@makortel, @fwyzard can you please review it and eventually sign? Thanks.
@missirol, @makortel, @rovere this is something you requested to watch as well.
@antoniovilela, @sextonkennedy, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_13_3_X IBs after it passes the integration tests and once validation in the development release cycle CMSSW_14_0_X is complete. This pull request will now be reviewed by the release team before it's merged. @rappoccio, @sextonkennedy, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9c0391/35821/summary.html
COMMIT: 8c859bc
CMSSW: CMSSW_13_3_X_2023-11-14-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/43280/35821/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 240 lines from the logs
  • Reco comparison results: 137 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3363028
  • DQMHistoTests: Total failures: 1792
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3361214
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 214 log files, 167 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@antoniovilela
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 30f6fae into cms-sw:CMSSW_13_3_X Nov 16, 2023
24 checks passed
@fwyzard fwyzard deleted the implement_blocks_with_stride branch January 30, 2024 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants