Skip to content

[libclc] Declare workitem built-ins in clc, move ptx-nvidiacl workitem built-ins into clc #144333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 10, 2025

Conversation

wenju-he
Copy link
Contributor

@wenju-he wenju-he commented Jun 16, 2025

Changes in this PR:

  • Declare most of workitem functions in clc and opencl folders.
  • Call clc workitem function in corresponding OpenCL workitem function.
  • Move ptx-nvidiacl workitem built-in implementations into clc.
  • Move a few amdgcn workitem built-in implementations into clc.
  • Include only needed headers in OpenCL workitem functions.
  • Implement get_local_linear_id, get_max_sub_group_size, get_num_sub_groups,
    get_sub_group_id, get_sub_group_local_id, get_sub_group_size for ptx-nvidiacl.

llvm-diff shows this PR adds a few new symbols to nvptx64--nvidiacl.bc.
llvm-diff shows no change to amdgcn--amdhsa.bc, nvptx--.bc and nvptx64--.bc.

…m built-ins into clc

Changes in this PR:
* Declare all workitem functions in clc and opencl folders.
* Call clc workitem function in corresponding OpenCL workitem function.
* Move ptx-nvidiacl workitem built-in implementations into clc.
* clc_get_global_offset, clc_get_enqueued_num_sub_groups,
  clc_get_enqueued_local_size and clc_get_work_dim are not implemented
  because it is not possible to provide generic implementation.
* Include only needed headers in OpenCL workitem functions.
* Add a FIXME to OpenCL get_global_id.cl.

Note opencl/lib/ptx-nvidiacl/workitem/get_global_id.cl isn't changed
because it has different implementation from generic.
llvm-diff shows this PR only adds a dozen new symbols to
amdgcn--amdhsa.bc and nvptx64--nvidiacl.bc.
@wenju-he wenju-he requested a review from frasercrmck June 16, 2025 11:36
@frasercrmck frasercrmck added the libclc libclc OpenCL library label Jun 16, 2025
@wenju-he wenju-he requested a review from frasercrmck June 17, 2025 10:30
@frasercrmck
Copy link
Contributor

Is an issue here perhaps that downstream users may have implemented (e.g.) get_global_offset in their toolchains, but after this change they'll suddenly see a definition of this function that they're not expecting?

I suppose they would have to implement CLC functions, like __clc_get_global_offset to work with this approach. Or they could have their libclc target not call into __clc_get_global_offset, or just not provide a definition of get_global_offset as before. Either way this has the potential to be a breaking change.

I'm not sure about leaving dangling CLC declarations in OpenCL libraries. It feels like an implementation detail we're exposing to users.

@frasercrmck frasercrmck requested a review from arsenm June 19, 2025 15:57
@wenju-he
Copy link
Contributor Author

wenju-he commented Jun 20, 2025

Is an issue here perhaps that downstream users may have implemented (e.g.) get_global_offset in their toolchains, but after this change they'll suddenly see a definition of this function that they're not expecting?

I suppose they would have to implement CLC functions, like __clc_get_global_offset to work with this approach. Or they could have their libclc target not call into __clc_get_global_offset, or just not provide a definition of get_global_offset as before. Either way this has the potential to be a breaking change.

I'm not sure about leaving dangling CLC declarations in OpenCL libraries. It feels like an implementation detail we're exposing to users.

thanks @frasercrmck. As to get_work_dim and above mentioned get_global_offset, I see there are amdgpu implementations https://github.com/llvm/llvm-project/blob/main/libclc/opencl/lib/amdgcn/workitem/get_global_offset.cl and https://github.com/llvm/llvm-project/blob/main/libclc/opencl/lib/r600/workitem/get_global_offset.cl. Moving them into clc should resolve the concern, but I don't know where is destination directory in clc. I don't know how to map opencl/lib/amdgcn and opencl/lib/r600 to clc/lib/amdgcn and clc/lib/amdgpu. Do you have suggestions about the mapping?

As to clc_get_enqueued_num_sub_groups and clc_get_enqueued_local_size that there is no definition in this repo, I'll delete them from this PR. The motivation of adding them in the first commit was to avoid adding them in the downstream only since they are shared code.

@arsenm
Copy link
Contributor

arsenm commented Jun 20, 2025

Moving them into clc should resolve the concern, but I don't know where is destination directory in clc. I don't know how to map opencl/lib/amdgcn and opencl/lib/r600 to clc/lib/amdgcn and clc/lib/amdgpu. Do you have suggestions about the mapping?

r600 and amdgcn should be treated as separate targets. we should stop trying to have common code under an "amdgpu" name

@wenju-he
Copy link
Contributor Author

I'm not sure about leaving dangling CLC declarations in OpenCL libraries. It feels like an implementation detail we're exposing to users.

this issue has been fixed. There is no unresolved __clc symbol in final OpenCL library.

I moved a few amdgcn workitem builtins from opencl/lib/amdgcn/workitem to clc/lib/amdgcn/workitem/ so that a few newly added header file has use. llvm-diff shows no change to amdgcn--amdhsa.bc.

I moved nvptx64--nvidiacl workitem builtin to clc and added a few new workitem built-ins for nvptx64--nvidiacl so that newly added header file has use.

@frasercrmck
Copy link
Contributor

I am seeing unresolved CLC functions in nvptx--.bc and nvptx64--.bc. I think it's because we're now building things like get_num_groups for nvptx/nvptx64, whereas previously they were not being built for those targets. We're also not building __clc_get_num_groups because we're (correctly) not including ptx-nvidiacl sources.

…idiacl, revert change to opencl/lib/generic/workitem/get_global_size.cl
@wenju-he
Copy link
Contributor Author

wenju-he commented Jul 3, 2025

I am seeing unresolved CLC functions in nvptx--.bc and nvptx64--.bc. I think it's because we're now building things like get_num_groups for nvptx/nvptx64, whereas previously they were not being built for those targets. We're also not building __clc_get_num_groups because we're (correctly) not including ptx-nvidiacl sources.

thanks @frasercrmck for the careful review. nvptx--.bc and nvptx64--.bc issue is fixed in b1397a4

Please let me know if I should revert following style change and follow existing tyle.

  1. add #ifndef _CLC_OPENCL_WORKITEM* guard in opencl header file and include needed #include <clc/internal/clc.h>. E.g. libclc/opencl/include/clc/opencl/workitem/get_global_id.h.
  2. include only needed headers in OpenCL lib files, e.g. in libclc/opencl/lib/ptx-nvidiacl/workitem/get_local_linear_id.cl
#include <clc/opencl/workitem/get_local_linear_id.h>
#include <clc/workitem/clc_get_local_linear_id.h>

_CLC_OVERLOAD _CLC_DEF size_t get_local_linear_id() {
  return __clc_get_local_linear_id();
}

@frasercrmck
Copy link
Contributor

I am seeing unresolved CLC functions in nvptx--.bc and nvptx64--.bc. I think it's because we're now building things like get_num_groups for nvptx/nvptx64, whereas previously they were not being built for those targets. We're also not building __clc_get_num_groups because we're (correctly) not including ptx-nvidiacl sources.

thanks @frasercrmck for the careful review. nvptx--.bc and nvptx64--.bc issue is fixed in b1397a4

Please let me know if I should revert following style change and follow existing tyle.

1. add #ifndef __CLC_OPENCL_WORKITEM_* guard in opencl header file and include needed #include <clc/internal/clc.h>. E.g. libclc/opencl/include/clc/opencl/workitem/get_global_id.h.

2. include only needed headers in OpenCL lib files, e.g. in libclc/opencl/lib/ptx-nvidiacl/workitem/get_local_linear_id.cl
#include <clc/opencl/workitem/get_local_linear_id.h>
#include <clc/workitem/clc_get_local_linear_id.h>

_CLC_OVERLOAD _CLC_DEF size_t get_local_linear_id() {
  return __clc_get_local_linear_id();
}

I think that's a good approach, thanks. We generally include far too much stuff in the OpenCL layer.

One comment is that I had initially intended for clc/internal to only be used by the CLC builtins. I realise it's already being used by some fma.cls but we might want to nip it in the bud. Perhaps we need a stripped down OpenCL header, like clc/opencl/opencl-base.h which includes just the types and macro defs?

@frasercrmck
Copy link
Contributor

Perhaps we need a stripped down OpenCL header, like clc/opencl/opencl-base.h which includes just the types and macro defs?

See #146840 for an example of this approach.

@wenju-he
Copy link
Contributor Author

wenju-he commented Jul 4, 2025

Perhaps we need a stripped down OpenCL header, like clc/opencl/opencl-base.h which includes just the types and macro defs?

See #146840 for an example of this approach.

thanks @frasercrmck I'll update OpenCL headers of this PR after #146840 is landed.

@wenju-he
Copy link
Contributor Author

wenju-he commented Jul 8, 2025

Perhaps we need a stripped down OpenCL header, like clc/opencl/opencl-base.h which includes just the types and macro defs?

See #146840 for an example of this approach.

thanks @frasercrmck I'll update OpenCL headers of this PR after #146840 is landed.

done in 9bfa243

//
//===----------------------------------------------------------------------===//

#include <clc/workitem/clc_get_global_size.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are all of these using #include <> instead of #include ""? Probably should do a bulk fix of these at some point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are all of these using #include <> instead of #include ""? Probably should do a bulk fix of these at some point

it is following existing style. I'll do the bulk fix.

case 2:
return __builtin_amdgcn_grid_size_z();
default:
return 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was supposed to be UB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was supposed to be UB

The code is moved as it is from https://github.com/llvm/llvm-project/blob/main/libclc/opencl/lib/amdgcn/workitem/get_global_size.cl and OpenCL spec says For other values of dimindx, get_global_size() returns 1.

Do you mean the implementation in clc should change to be __builtin_unreachable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's the spec then and it should follow

@wenju-he wenju-he merged commit 28aa5a6 into llvm:main Jul 10, 2025
9 checks passed
@wenju-he wenju-he deleted the clc-workitem branch July 10, 2025 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libclc libclc OpenCL library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants