Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support array_sizes that are not divisible by the thread block size for CUDA and HIP #185

Merged
merged 10 commits into from
Mar 8, 2024

Conversation

gonzalobg
Copy link
Contributor

Currently, CUDA and HIP do not support array sizes that are not divisible by the thread block size, Since most other programming models support these, this PR removes this limitation to improve BabelStream's fairness.

This is done using a technique that is widely used in practice, commonly known as a "grid-strided loop" (see CUDA Pro Tip: Write Flexible Kernels with Grid-Stride Loops). The number of thread blocks in the launch configuration is rounded up to cover all array elements using ceil_div.

This PR also updates the CUDAStream implementation to modern CUDA C++ practices by simplifying error handling, and using an explicit cudaStream_t. The impact of these two changes on performance is negligible; they are only cosmetic changes.

@tomdeakin tomdeakin changed the base branch from main to develop February 27, 2024 11:27
@gonzalobg gonzalobg force-pushed the bugfix/cuda_pow2 branch 2 times, most recently from bc710e7 to e5ac2a3 Compare February 27, 2024 13:21
@tom91136
Copy link
Member

LGTM, cc @tomdeakin

@tomdeakin tomdeakin merged commit 3f7075b into UoB-HPC:develop Mar 8, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants