New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why HSA_ISA_INFO_WORKGROUP_MAX_SIZE is hardcoded to 1024? #55
Comments
There are other resources, such as LDS, which are used by groups that enforce finer partitioning than thread count. |
You may be interested in this post I made last year that details the various workgroup size limitations that AMD GPUs have. In particular, the AMD GCN ISA only allows up to 1024 threads in a workgroup. That is a hardware ISA limitation. However, you may not always be able to fit a 1024-thread workgroup into a compute unit (e.g. if you request 256 VGPRs per thread, we can only fit 256 threads in a CU). So we can only guarantee that 256-thread workgroups will always work -- that is why the OpenCL API claims 256 (see the linked post for more details). While you can fit at most 2560 threads into a CU, those 2560 threads cannot all be in the same workgroup. |
@jlgreathouse Thanks for point me to ISA, I did checked chapter 4.3, you are right. it's hardware limitation, not resource issue. |
@jlgreathouse my experiment confirmed that it's a hardware thing lol. |
or firmware ... |
As my understanding , hcc can only use 1024 thread per kernel. this is because libhsa_runtime.so return 1024 as max HSA_ISA_INFO_WORKGROUP_MAX_SIZE.
But my curious is the max wave front is 40, and wave front size is 64. it means the HSA_ISA_INFO_WORKGROUP_MAX_SIZE actually can be 40*64 = 2560 .
In opencl api , the max thread is 256, in hcc is 1024, I still feel it not right as it actually can be 2560, right?
Why ROCR choose 1024 instead of 2560 which is the hardware's max threads value?
The text was updated successfully, but these errors were encountered: