PTX: Add `cuda::ptx::get_sreg` #1351

ahendriksen · 2024-02-08T09:16:09Z

Description

Adds the ability to query the PTX special registers.

Noting the Hopper-specific ones:
10.12. Special Registers: %clusterid
10.13. Special Registers: %nclusterid
10.14. Special Registers: %cluster_ctaid
10.15. Special Registers: %cluster_nctaid
10.16. Special Registers: %cluster_ctarank
10.17. Special Registers: %cluster_nctarank
10.31. Special Registers: %aggr_smem_size

Other requested additions include lanemask_{lt,gt,ge,le}, as requested by @canonizer. These functions are also used in CUB cccl/cub/cub/util_ptx.h.

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

miscco

As discussed internally, we will need to put the content of ptx.h into separate sub headers to keep it maintainable.

However, that can be done in a follow up

miscco · 2024-02-08T11:34:00Z

@Artem-B it looks like we have some issues with the ptx detection on clang-cuda

We have clang cuda complaining about an unsupported instruction:

  ptxas /tmp/ptx-f74e03/ptx-sm_60.s, line 556; error   : Feature '%current_graph_exec' requires PTX ISA .version 8.0 or later
  ptxas fatal   : Ptx assembly aborted due to errors

However, that code is guarded by

#if __cccl_ptx_isa >= 800
  NV_IF_TARGET(NV_PROVIDES_SM_50, (
    // mov.u64 sreg_value, %%current_graph_exec;
    *fn_ptr++ = reinterpret_cast<void*>(static_cast<uint64_t (*)()>(cuda::ptx::get_sreg_current_graph_exec));
  ));
#endif // __cccl_ptx_isa >= 800

With the definition of __cccl_ptx_isa being:

#if   (defined(__CUDACC_VER_MAJOR__) && (__CUDACC_VER_MAJOR__ >= 12 && __CUDACC_VER_MINOR__ >= 3)) || (!defined(__CUDACC_VER_MAJOR__))
#  define __cccl_ptx_isa 830ULL
// PTX ISA 8.2 is available from CUDA 12.2, driver r535
#elif (defined(__CUDACC_VER_MAJOR__) && (__CUDACC_VER_MAJOR__ >= 12 && __CUDACC_VER_MINOR__ >= 2)) || (!defined(__CUDACC_VER_MAJOR__))
#  define __cccl_ptx_isa 820ULL
...

Is there something else we need to do in order to properly detect ptx ISA on clang-cuda

Artem-B · 2024-02-08T17:24:54Z

clang-16 does not have CUDA-12 support and defaults to assuming CUDA-11.8 and PTX 7.8. It will not be able to generate any newer PTX version.

clang-17 should work.

miscco · 2024-02-08T18:17:09Z

clang-16 does not have CUDA-12 support and defaults to assuming CUDA-11.8 and PTX 7.8. It will not be able to generate any newer PTX version.

clang-17 should work.

Thanks a lot, that would have taken e ages to figure out

PTX: Add cuda::ptx::get_sreg

1f70628

ahendriksen requested review from a team as code owners February 8, 2024 09:16

ahendriksen requested review from ericniebler and wmaxey February 8, 2024 09:16

ahendriksen self-assigned this Feb 8, 2024

Avoid use of __out variable

704b170

miscco approved these changes Feb 8, 2024

View reviewed changes

miscco enabled auto-merge (squash) February 8, 2024 11:04

Remove clang support for test

765d002

auto-merge was automatically disabled February 9, 2024 08:27
Head branch was pushed to by a user without write access

miscco mentioned this pull request Feb 9, 2024

clang-16 does not have CUDA-12 support #1358

Open

miscco merged commit 74f1160 into NVIDIA:main Feb 12, 2024
537 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PTX: Add `cuda::ptx::get_sreg` #1351

PTX: Add `cuda::ptx::get_sreg` #1351

ahendriksen commented Feb 8, 2024

miscco left a comment

miscco commented Feb 8, 2024 •

edited

Loading

Artem-B commented Feb 8, 2024

miscco commented Feb 8, 2024

PTX: Add cuda::ptx::get_sreg #1351

PTX: Add cuda::ptx::get_sreg #1351

Conversation

ahendriksen commented Feb 8, 2024

Description

Checklist

miscco left a comment

Choose a reason for hiding this comment

miscco commented Feb 8, 2024 • edited Loading

Artem-B commented Feb 8, 2024

miscco commented Feb 8, 2024

PTX: Add `cuda::ptx::get_sreg` #1351

PTX: Add `cuda::ptx::get_sreg` #1351

miscco commented Feb 8, 2024 •

edited

Loading