-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PTX: Add cuda::ptx::get_sreg
#1351
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed internally, we will need to put the content of ptx.h
into separate sub headers to keep it maintainable.
However, that can be done in a follow up
@Artem-B it looks like we have some issues with the ptx detection on clang-cuda We have clang cuda complaining about an unsupported instruction:
However, that code is guarded by #if __cccl_ptx_isa >= 800
NV_IF_TARGET(NV_PROVIDES_SM_50, (
// mov.u64 sreg_value, %%current_graph_exec;
*fn_ptr++ = reinterpret_cast<void*>(static_cast<uint64_t (*)()>(cuda::ptx::get_sreg_current_graph_exec));
));
#endif // __cccl_ptx_isa >= 800 With the definition of #if (defined(__CUDACC_VER_MAJOR__) && (__CUDACC_VER_MAJOR__ >= 12 && __CUDACC_VER_MINOR__ >= 3)) || (!defined(__CUDACC_VER_MAJOR__))
# define __cccl_ptx_isa 830ULL
// PTX ISA 8.2 is available from CUDA 12.2, driver r535
#elif (defined(__CUDACC_VER_MAJOR__) && (__CUDACC_VER_MAJOR__ >= 12 && __CUDACC_VER_MINOR__ >= 2)) || (!defined(__CUDACC_VER_MAJOR__))
# define __cccl_ptx_isa 820ULL
... Is there something else we need to do in order to properly detect ptx ISA on clang-cuda |
clang-16 does not have CUDA-12 support and defaults to assuming CUDA-11.8 and PTX 7.8. It will not be able to generate any newer PTX version. clang-17 should work. |
Thanks a lot, that would have taken e ages to figure out |
Head branch was pushed to by a user without write access
Description
closes #1350
Adds the ability to query the PTX special registers.
Noting the Hopper-specific ones:
10.12. Special Registers: %clusterid
10.13. Special Registers: %nclusterid
10.14. Special Registers: %cluster_ctaid
10.15. Special Registers: %cluster_nctaid
10.16. Special Registers: %cluster_ctarank
10.17. Special Registers: %cluster_nctarank
10.31. Special Registers: %aggr_smem_size
Other requested additions include
lanemask_{lt,gt,ge,le}
, as requested by @canonizer. These functions are also used in CUBcccl/cub/cub/util_ptx.h
.Checklist