-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[libclc] Declare workitem built-ins in clc, move ptx-nvidiacl workitem built-ins into clc #144333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…m built-ins into clc Changes in this PR: * Declare all workitem functions in clc and opencl folders. * Call clc workitem function in corresponding OpenCL workitem function. * Move ptx-nvidiacl workitem built-in implementations into clc. * clc_get_global_offset, clc_get_enqueued_num_sub_groups, clc_get_enqueued_local_size and clc_get_work_dim are not implemented because it is not possible to provide generic implementation. * Include only needed headers in OpenCL workitem functions. * Add a FIXME to OpenCL get_global_id.cl. Note opencl/lib/ptx-nvidiacl/workitem/get_global_id.cl isn't changed because it has different implementation from generic. llvm-diff shows this PR only adds a dozen new symbols to amdgcn--amdhsa.bc and nvptx64--nvidiacl.bc.
Is an issue here perhaps that downstream users may have implemented (e.g.) I suppose they would have to implement CLC functions, like I'm not sure about leaving dangling CLC declarations in OpenCL libraries. It feels like an implementation detail we're exposing to users. |
thanks @frasercrmck. As to get_work_dim and above mentioned get_global_offset, I see there are amdgpu implementations https://github.com/llvm/llvm-project/blob/main/libclc/opencl/lib/amdgcn/workitem/get_global_offset.cl and https://github.com/llvm/llvm-project/blob/main/libclc/opencl/lib/r600/workitem/get_global_offset.cl. Moving them into clc should resolve the concern, but I don't know where is destination directory in clc. I don't know how to map As to clc_get_enqueued_num_sub_groups and clc_get_enqueued_local_size that there is no definition in this repo, I'll delete them from this PR. The motivation of adding them in the first commit was to avoid adding them in the downstream only since they are shared code. |
r600 and amdgcn should be treated as separate targets. we should stop trying to have common code under an "amdgpu" name |
…ork_dim/__clc_get_global_offset
this issue has been fixed. There is no unresolved __clc symbol in final OpenCL library. I moved a few amdgcn workitem builtins from opencl/lib/amdgcn/workitem to clc/lib/amdgcn/workitem/ so that a few newly added header file has use. llvm-diff shows no change to amdgcn--amdhsa.bc. I moved nvptx64--nvidiacl workitem builtin to clc and added a few new workitem built-ins for nvptx64--nvidiacl so that newly added header file has use. |
I am seeing unresolved CLC functions in |
…idiacl, revert change to opencl/lib/generic/workitem/get_global_size.cl
thanks @frasercrmck for the careful review. Please let me know if I should revert following style change and follow existing tyle.
|
I think that's a good approach, thanks. We generally include far too much stuff in the OpenCL layer. One comment is that I had initially intended for |
See #146840 for an example of this approach. |
thanks @frasercrmck I'll update OpenCL headers of this PR after #146840 is landed. |
done in 9bfa243 |
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#include <clc/workitem/clc_get_global_size.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are all of these using #include <>
instead of #include ""
? Probably should do a bulk fix of these at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are all of these using
#include <>
instead of#include ""
? Probably should do a bulk fix of these at some point
it is following existing style. I'll do the bulk fix.
case 2: | ||
return __builtin_amdgcn_grid_size_z(); | ||
default: | ||
return 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was supposed to be UB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was supposed to be UB
The code is moved as it is from https://github.com/llvm/llvm-project/blob/main/libclc/opencl/lib/amdgcn/workitem/get_global_size.cl and OpenCL spec says For other values of dimindx, get_global_size() returns 1.
Do you mean the implementation in clc should change to be __builtin_unreachable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's the spec then and it should follow
Changes in this PR:
get_sub_group_id, get_sub_group_local_id, get_sub_group_size for ptx-nvidiacl.
llvm-diff shows this PR adds a few new symbols to nvptx64--nvidiacl.bc.
llvm-diff shows no change to amdgcn--amdhsa.bc, nvptx--.bc and nvptx64--.bc.