Support array_sizes that are not divisible by the thread block size for CUDA and HIP #185

gonzalobg · 2024-02-27T11:14:29Z

Currently, CUDA and HIP do not support array sizes that are not divisible by the thread block size, Since most other programming models support these, this PR removes this limitation to improve BabelStream's fairness.

This is done using a technique that is widely used in practice, commonly known as a "grid-strided loop" (see CUDA Pro Tip: Write Flexible Kernels with Grid-Stride Loops). The number of thread blocks in the launch configuration is rounded up to cover all array elements using ceil_div.

This PR also updates the CUDAStream implementation to modern CUDA C++ practices by simplifying error handling, and using an explicit cudaStream_t. The impact of these two changes on performance is negligible; they are only cosmetic changes.

tom91136 · 2024-02-28T10:12:41Z

LGTM, cc @tomdeakin

src/cuda/CUDAStream.cu

src/hip/HIPStream.cpp

gonzalobg added 5 commits February 27, 2024 02:50

[CUDA]: Remove pow2 array_size description for fairness

087777c

[HIP]: Remove pow2 array_size description for fairness

4ae7800

[CUDA] Simplify error handling

fcb7093

[CUDA] Kernel launch config covers all elements

64cc7d7

[HIP] Kernel launch config covers all elements

ab2f38f

tomdeakin changed the base branch from main to develop February 27, 2024 11:27

gonzalobg force-pushed the bugfix/cuda_pow2 branch 2 times, most recently from bc710e7 to e5ac2a3 Compare February 27, 2024 13:21

gonzalobg added 2 commits February 28, 2024 11:42

[CUDA] Use a CUDA stream

1c6788c

[CUDA] Revert to C++11 for now until CMake is updated

607e050

gonzalobg force-pushed the bugfix/cuda_pow2 branch from e5ac2a3 to 607e050 Compare February 28, 2024 19:43

tomdeakin reviewed Mar 8, 2024

View reviewed changes

src/cuda/CUDAStream.cu Show resolved Hide resolved

tomdeakin reviewed Mar 8, 2024

View reviewed changes

src/cuda/CUDAStream.cu Show resolved Hide resolved

tomdeakin added 2 commits March 8, 2024 10:33

Update src/cuda/CUDAStream.cu

a302715

Update src/cuda/CUDAStream.cu

7e0400f

tomdeakin reviewed Mar 8, 2024

View reviewed changes

src/hip/HIPStream.cpp Show resolved Hide resolved

Update src/hip/HIPStream.cpp

fb17315

tomdeakin merged commit 3f7075b into UoB-HPC:develop Mar 8, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support array_sizes that are not divisible by the thread block size for CUDA and HIP #185

Support array_sizes that are not divisible by the thread block size for CUDA and HIP #185

gonzalobg commented Feb 27, 2024

tom91136 commented Feb 28, 2024

Support array_sizes that are not divisible by the thread block size for CUDA and HIP #185

Support array_sizes that are not divisible by the thread block size for CUDA and HIP #185

Conversation

gonzalobg commented Feb 27, 2024

tom91136 commented Feb 28, 2024