[libclc] Declare workitem built-ins in clc, move ptx-nvidiacl workitem built-ins into clc #144333

wenju-he · 2025-06-16T11:36:00Z

Changes in this PR:

Declare most of workitem functions in clc and opencl folders.
Call clc workitem function in corresponding OpenCL workitem function.
Move ptx-nvidiacl workitem built-in implementations into clc.
Move a few amdgcn workitem built-in implementations into clc.
Include only needed headers in OpenCL workitem functions.
Implement get_local_linear_id, get_max_sub_group_size, get_num_sub_groups,
get_sub_group_id, get_sub_group_local_id, get_sub_group_size for ptx-nvidiacl.

llvm-diff shows this PR adds a few new symbols to nvptx64--nvidiacl.bc.
llvm-diff shows no change to amdgcn--amdhsa.bc, nvptx--.bc and nvptx64--.bc.

…m built-ins into clc Changes in this PR: * Declare all workitem functions in clc and opencl folders. * Call clc workitem function in corresponding OpenCL workitem function. * Move ptx-nvidiacl workitem built-in implementations into clc. * clc_get_global_offset, clc_get_enqueued_num_sub_groups, clc_get_enqueued_local_size and clc_get_work_dim are not implemented because it is not possible to provide generic implementation. * Include only needed headers in OpenCL workitem functions. * Add a FIXME to OpenCL get_global_id.cl. Note opencl/lib/ptx-nvidiacl/workitem/get_global_id.cl isn't changed because it has different implementation from generic. llvm-diff shows this PR only adds a dozen new symbols to amdgcn--amdhsa.bc and nvptx64--nvidiacl.bc.

libclc/clc/include/clc/workitem/clc_get_enqueued_local_size.h

frasercrmck · 2025-06-19T15:57:37Z

Is an issue here perhaps that downstream users may have implemented (e.g.) get_global_offset in their toolchains, but after this change they'll suddenly see a definition of this function that they're not expecting?

I suppose they would have to implement CLC functions, like __clc_get_global_offset to work with this approach. Or they could have their libclc target not call into __clc_get_global_offset, or just not provide a definition of get_global_offset as before. Either way this has the potential to be a breaking change.

I'm not sure about leaving dangling CLC declarations in OpenCL libraries. It feels like an implementation detail we're exposing to users.

wenju-he · 2025-06-20T00:44:41Z

Is an issue here perhaps that downstream users may have implemented (e.g.) get_global_offset in their toolchains, but after this change they'll suddenly see a definition of this function that they're not expecting?

I suppose they would have to implement CLC functions, like __clc_get_global_offset to work with this approach. Or they could have their libclc target not call into __clc_get_global_offset, or just not provide a definition of get_global_offset as before. Either way this has the potential to be a breaking change.

I'm not sure about leaving dangling CLC declarations in OpenCL libraries. It feels like an implementation detail we're exposing to users.

thanks @frasercrmck. As to get_work_dim and above mentioned get_global_offset, I see there are amdgpu implementations https://github.com/llvm/llvm-project/blob/main/libclc/opencl/lib/amdgcn/workitem/get_global_offset.cl and https://github.com/llvm/llvm-project/blob/main/libclc/opencl/lib/r600/workitem/get_global_offset.cl. Moving them into clc should resolve the concern, but I don't know where is destination directory in clc. I don't know how to map opencl/lib/amdgcn and opencl/lib/r600 to clc/lib/amdgcn and clc/lib/amdgpu. Do you have suggestions about the mapping?

As to clc_get_enqueued_num_sub_groups and clc_get_enqueued_local_size that there is no definition in this repo, I'll delete them from this PR. The motivation of adding them in the first commit was to avoid adding them in the downstream only since they are shared code.

arsenm · 2025-06-20T01:28:24Z

Moving them into clc should resolve the concern, but I don't know where is destination directory in clc. I don't know how to map opencl/lib/amdgcn and opencl/lib/r600 to clc/lib/amdgcn and clc/lib/amdgpu. Do you have suggestions about the mapping?

r600 and amdgcn should be treated as separate targets. we should stop trying to have common code under an "amdgpu" name

…ork_dim/__clc_get_global_offset

wenju-he · 2025-06-27T09:39:43Z

I'm not sure about leaving dangling CLC declarations in OpenCL libraries. It feels like an implementation detail we're exposing to users.

this issue has been fixed. There is no unresolved __clc symbol in final OpenCL library.

I moved a few amdgcn workitem builtins from opencl/lib/amdgcn/workitem to clc/lib/amdgcn/workitem/ so that a few newly added header file has use. llvm-diff shows no change to amdgcn--amdhsa.bc.

I moved nvptx64--nvidiacl workitem builtin to clc and added a few new workitem built-ins for nvptx64--nvidiacl so that newly added header file has use.

frasercrmck · 2025-07-02T09:08:19Z

I am seeing unresolved CLC functions in nvptx--.bc and nvptx64--.bc. I think it's because we're now building things like get_num_groups for nvptx/nvptx64, whereas previously they were not being built for those targets. We're also not building __clc_get_num_groups because we're (correctly) not including ptx-nvidiacl sources.

…idiacl, revert change to opencl/lib/generic/workitem/get_global_size.cl

wenju-he · 2025-07-03T03:09:15Z

I am seeing unresolved CLC functions in nvptx--.bc and nvptx64--.bc. I think it's because we're now building things like get_num_groups for nvptx/nvptx64, whereas previously they were not being built for those targets. We're also not building __clc_get_num_groups because we're (correctly) not including ptx-nvidiacl sources.

thanks @frasercrmck for the careful review. nvptx--.bc and nvptx64--.bc issue is fixed in b1397a4

Please let me know if I should revert following style change and follow existing tyle.

add #ifndef _CLC_OPENCL_WORKITEM* guard in opencl header file and include needed #include <clc/internal/clc.h>. E.g. libclc/opencl/include/clc/opencl/workitem/get_global_id.h.
include only needed headers in OpenCL lib files, e.g. in libclc/opencl/lib/ptx-nvidiacl/workitem/get_local_linear_id.cl

#include <clc/opencl/workitem/get_local_linear_id.h>
#include <clc/workitem/clc_get_local_linear_id.h>

_CLC_OVERLOAD _CLC_DEF size_t get_local_linear_id() {
  return __clc_get_local_linear_id();
}

frasercrmck · 2025-07-03T09:26:10Z

I am seeing unresolved CLC functions in nvptx--.bc and nvptx64--.bc. I think it's because we're now building things like get_num_groups for nvptx/nvptx64, whereas previously they were not being built for those targets. We're also not building __clc_get_num_groups because we're (correctly) not including ptx-nvidiacl sources.

thanks @frasercrmck for the careful review. nvptx--.bc and nvptx64--.bc issue is fixed in b1397a4

Please let me know if I should revert following style change and follow existing tyle.
1. add #ifndef __CLC_OPENCL_WORKITEM_* guard in opencl header file and include needed #include <clc/internal/clc.h>. E.g. libclc/opencl/include/clc/opencl/workitem/get_global_id.h.

2. include only needed headers in OpenCL lib files, e.g. in libclc/opencl/lib/ptx-nvidiacl/workitem/get_local_linear_id.cl
#include <clc/opencl/workitem/get_local_linear_id.h>
#include <clc/workitem/clc_get_local_linear_id.h>

_CLC_OVERLOAD _CLC_DEF size_t get_local_linear_id() {
  return __clc_get_local_linear_id();
}

I think that's a good approach, thanks. We generally include far too much stuff in the OpenCL layer.

One comment is that I had initially intended for clc/internal to only be used by the CLC builtins. I realise it's already being used by some fma.cls but we might want to nip it in the bud. Perhaps we need a stripped down OpenCL header, like clc/opencl/opencl-base.h which includes just the types and macro defs?

frasercrmck · 2025-07-03T09:48:43Z

Perhaps we need a stripped down OpenCL header, like clc/opencl/opencl-base.h which includes just the types and macro defs?

See #146840 for an example of this approach.

wenju-he · 2025-07-04T02:27:11Z

Perhaps we need a stripped down OpenCL header, like clc/opencl/opencl-base.h which includes just the types and macro defs?

See #146840 for an example of this approach.

thanks @frasercrmck I'll update OpenCL headers of this PR after #146840 is landed.

wenju-he · 2025-07-08T00:54:28Z

Perhaps we need a stripped down OpenCL header, like clc/opencl/opencl-base.h which includes just the types and macro defs?

See #146840 for an example of this approach.

thanks @frasercrmck I'll update OpenCL headers of this PR after #146840 is landed.

done in 9bfa243

arsenm · 2025-07-08T01:01:53Z

libclc/clc/lib/amdgcn/workitem/clc_get_global_size.cl

+//
+//===----------------------------------------------------------------------===//
+
+#include <clc/workitem/clc_get_global_size.h>


Why are all of these using #include <> instead of #include ""? Probably should do a bulk fix of these at some point

Why are all of these using #include <> instead of #include ""? Probably should do a bulk fix of these at some point

it is following existing style. I'll do the bulk fix.

arsenm · 2025-07-08T01:02:06Z

libclc/clc/lib/amdgcn/workitem/clc_get_global_size.cl

+  case 2:
+    return __builtin_amdgcn_grid_size_z();
+  default:
+    return 1;


I thought this was supposed to be UB

I thought this was supposed to be UB

The code is moved as it is from https://github.com/llvm/llvm-project/blob/main/libclc/opencl/lib/amdgcn/workitem/get_global_size.cl and OpenCL spec says For other values of dimindx, get_global_size() returns 1.

Do you mean the implementation in clc should change to be __builtin_unreachable?

Ok, that's the spec then and it should follow

wenju-he requested a review from frasercrmck June 16, 2025 11:36

frasercrmck added the libclc label Jun 16, 2025

frasercrmck reviewed Jun 16, 2025

View reviewed changes

libclc/clc/include/clc/workitem/clc_get_enqueued_local_size.h Outdated Show resolved Hide resolved

add __ prefix to clc functions

3bab4ee

wenju-he requested a review from frasercrmck June 17, 2025 10:30

frasercrmck requested a review from arsenm June 19, 2025 15:57

move a few amdgcn functions into clc to add definition of __clc_get_w…

8b30c53

…ork_dim/__clc_get_global_offset

move get_group_id.cl, get_local_id.cl and get_num_groups.cl to ptx-nv…

b1397a4

…idiacl, revert change to opencl/lib/generic/workitem/get_global_size.cl

wenju-he added 2 commits July 8, 2025 02:28

Merge branch 'main' into clc-workitem

0d5bca2

include clc/opencl/opencl-base.h in opencl headers

9bfa243

arsenm reviewed Jul 8, 2025

View reviewed changes

Merge branch 'main' into clc-workitem

97cdc98

frasercrmck approved these changes Jul 9, 2025

View reviewed changes

wenju-he merged commit 28aa5a6 into llvm:main Jul 10, 2025
9 checks passed

wenju-he deleted the clc-workitem branch July 10, 2025 00:11

[libclc] Declare workitem built-ins in clc, move ptx-nvidiacl workitem built-ins into clc #144333

[libclc] Declare workitem built-ins in clc, move ptx-nvidiacl workitem built-ins into clc #144333

Uh oh!

Conversation

wenju-he commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

frasercrmck commented Jun 19, 2025

Uh oh!

wenju-he commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Jun 20, 2025

Uh oh!

wenju-he commented Jun 27, 2025

Uh oh!

frasercrmck commented Jul 2, 2025

Uh oh!

wenju-he commented Jul 3, 2025

Uh oh!

frasercrmck commented Jul 3, 2025

Uh oh!

frasercrmck commented Jul 3, 2025

Uh oh!

wenju-he commented Jul 4, 2025

Uh oh!

wenju-he commented Jul 8, 2025

Uh oh!

arsenm Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

wenju-he Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

wenju-he Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wenju-he commented Jun 16, 2025 •

edited

Loading

wenju-he commented Jun 20, 2025 •

edited

Loading