Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix getValidWorkDiv for CUDA and ROCm [I] #2222 #2251

Merged
merged 8 commits into from
Jun 24, 2024

Conversation

mehmetyusufoglu
Copy link
Contributor

@mehmetyusufoglu mehmetyusufoglu commented Mar 21, 2024

getValidWorkDiv has a bug. It only considers device hard properties ( TApi::getDeviceProperties()); does not consider the kernel function. Actually kernel function properties can limit number of threads per block.

In the case of CUDA and ROCm, the function getValidWorkDiv should use TApi::funcGetAttributes(&funcAttrs, kernelName) together with TApi::getDeviceProperties().

I have added 3 functions. getFunctionAttributes,getValidWorkDivKernel and isValidWorkDivKernel. One function intrinsic to Alpaka changed: subDivideGridElems. Now has an additional argument; "threads per block", if the value is zero the function uses device hard properties as before.

Next PR: The 2 functions getValidWorkDivKernel and isValidWorkDivKernel will be used in Examples and Tests instead of getValidWorkDiv and isValidWorkDiv at the next PR. The names can be determined.

Credits to @psychocoderHPC for the PR.
Tests: The tests for 1D and 2D workdivs were added.

@mehmetyusufoglu mehmetyusufoglu marked this pull request as draft March 21, 2024 10:50
@mehmetyusufoglu mehmetyusufoglu force-pushed the fixValidWorkDiv branch 6 times, most recently from e8dca57 to 6400b7f Compare March 25, 2024 10:23
@psychocoderHPC psychocoderHPC added this to the 1.2.0 milestone Mar 26, 2024
@mehmetyusufoglu mehmetyusufoglu force-pushed the fixValidWorkDiv branch 6 times, most recently from 503e26e to f0dd41e Compare March 26, 2024 17:34
@mehmetyusufoglu mehmetyusufoglu force-pushed the fixValidWorkDiv branch 3 times, most recently from 30694dc to de0aed5 Compare April 9, 2024 17:32
@mehmetyusufoglu mehmetyusufoglu force-pushed the fixValidWorkDiv branch 11 times, most recently from 589c616 to 3d3322e Compare April 16, 2024 08:35
@psychocoderHPC
Copy link
Member

offline discussed in the developer meeting:
https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaFuncAttributes.html#structcudaFuncAttributes provides more properties than the maximum number of threads per block.
We should at least design the API so that we can get more than one property.

maxThreadsPerBlock and maxDynamicSharedSizeBytes are what all our backends can provide. I am not fully sure if for CPU backends maxDynamicSharedSizeBytes must be reduced by the compile time shared memory, if so it will be hard to provide the correct amount of dynamic shared memory.

@mehmetyusufoglu mehmetyusufoglu force-pushed the fixValidWorkDiv branch 5 times, most recently from f49f05c to 3d0b477 Compare June 14, 2024 14:43
@mehmetyusufoglu mehmetyusufoglu marked this pull request as ready for review June 14, 2024 14:45
@mehmetyusufoglu mehmetyusufoglu force-pushed the fixValidWorkDiv branch 4 times, most recently from 7f1676c to 9ddfaf1 Compare June 16, 2024 19:39
@mehmetyusufoglu mehmetyusufoglu marked this pull request as draft June 16, 2024 21:55
@mehmetyusufoglu mehmetyusufoglu force-pushed the fixValidWorkDiv branch 4 times, most recently from 0a04381 to 5a21705 Compare June 17, 2024 17:22
@psychocoderHPC
Copy link
Member

psychocoderHPC commented Jun 19, 2024

please rebase against dev to get the fix from #2294 in and remove draft from the PR if it is ready

@mehmetyusufoglu mehmetyusufoglu marked this pull request as ready for review June 20, 2024 07:13
@mehmetyusufoglu
Copy link
Contributor Author

please rebase against dev to get the fix from #2294 in and remove draft from the PR if it is ready

Done, thanks.

Copy link
Member

@psychocoderHPC psychocoderHPC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two tiny things, than it is IMO mergable

include/alpaka/workdiv/WorkDivHelpers.hpp Outdated Show resolved Hide resolved
{
// Expected valid workdiv for this kernel. These values might change depending on the GPU type and compiler
// therefore commented out. Number of registers used by kernel might limit threads per block value differently
// for different GPUs. CHECK(workDiv == WorkDiv{Vec{8, 729444}, Vec{1, 736}, Vec{1, 1}});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment looks outdated with this hard coded values

Copy link
Contributor Author

@mehmetyusufoglu mehmetyusufoglu Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not outdated, but i changed the comment since could be confusing and values would change for different GPU versions.

@psychocoderHPC psychocoderHPC merged commit bc083db into alpaka-group:develop Jun 24, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants