why HSA_ISA_INFO_WORKGROUP_MAX_SIZE is hardcoded to 1024? #55

smartbitcoin · 2019-03-01T20:48:52Z

As my understanding , hcc can only use 1024 thread per kernel. this is because libhsa_runtime.so return 1024 as max HSA_ISA_INFO_WORKGROUP_MAX_SIZE.
But my curious is the max wave front is 40, and wave front size is 64. it means the HSA_ISA_INFO_WORKGROUP_MAX_SIZE actually can be 40*64 = 2560 .
In opencl api , the max thread is 256, in hcc is 1024, I still feel it not right as it actually can be 2560, right?
Why ROCR choose 1024 instead of 2560 which is the hardware's max threads value?

skeelyamd · 2019-03-01T20:52:49Z

There are other resources, such as LDS, which are used by groups that enforce finer partitioning than thread count.

jlgreathouse · 2019-03-01T22:05:27Z

Hi @smartbitcoin

You may be interested in this post I made last year that details the various workgroup size limitations that AMD GPUs have. In particular, the AMD GCN ISA only allows up to 1024 threads in a workgroup. That is a hardware ISA limitation.

However, you may not always be able to fit a 1024-thread workgroup into a compute unit (e.g. if you request 256 VGPRs per thread, we can only fit 256 threads in a CU). So we can only guarantee that 256-thread workgroups will always work -- that is why the OpenCL API claims 256 (see the linked post for more details).

While you can fit at most 2560 threads into a CU, those 2560 threads cannot all be in the same workgroup.

smartbitcoin · 2019-03-02T02:53:54Z

@jlgreathouse Thanks for point me to ISA, I did checked chapter 4.3, you are right. it's hardware limitation, not resource issue.
Packing 2560 thread into one kernel with proper LDS and VGPRs allocation definitely works for plenty of algorithms and I test that LLVM definitely able to generate binary for that. Still curious why Vega ISA have this design b/c 10bit ( as 1024 ) not aligned to any boundary, maybe it's hardware stack size limitation.

smartbitcoin · 2019-03-02T03:36:04Z

@jlgreathouse
I am also able to pump HSA_ISA_INFO_WORKGROUP_MAX_SIZE to 1280 and with proper LLVM compiled kernel.

my experiment confirmed that it's a hardware thing lol.

smartbitcoin · 2019-03-02T03:40:41Z

or firmware ...

skeelyamd closed this as completed Mar 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why HSA_ISA_INFO_WORKGROUP_MAX_SIZE is hardcoded to 1024? #55

why HSA_ISA_INFO_WORKGROUP_MAX_SIZE is hardcoded to 1024? #55

smartbitcoin commented Mar 1, 2019

skeelyamd commented Mar 1, 2019

jlgreathouse commented Mar 1, 2019

smartbitcoin commented Mar 2, 2019

smartbitcoin commented Mar 2, 2019

smartbitcoin commented Mar 2, 2019

why HSA_ISA_INFO_WORKGROUP_MAX_SIZE is hardcoded to 1024? #55

why HSA_ISA_INFO_WORKGROUP_MAX_SIZE is hardcoded to 1024? #55

Comments

smartbitcoin commented Mar 1, 2019

skeelyamd commented Mar 1, 2019

jlgreathouse commented Mar 1, 2019

smartbitcoin commented Mar 2, 2019

smartbitcoin commented Mar 2, 2019

smartbitcoin commented Mar 2, 2019