Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PTX: Add cuda::ptx::get_sreg #1351

Merged
merged 3 commits into from
Feb 12, 2024
Merged

Conversation

ahendriksen
Copy link
Contributor

Description

closes #1350

Adds the ability to query the PTX special registers.

Noting the Hopper-specific ones:
10.12. Special Registers: %clusterid
10.13. Special Registers: %nclusterid
10.14. Special Registers: %cluster_ctaid
10.15. Special Registers: %cluster_nctaid
10.16. Special Registers: %cluster_ctarank
10.17. Special Registers: %cluster_nctarank
10.31. Special Registers: %aggr_smem_size

Other requested additions include lanemask_{lt,gt,ge,le}, as requested by @canonizer. These functions are also used in CUB cccl/cub/cub/util_ptx.h.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@ahendriksen ahendriksen requested review from a team as code owners February 8, 2024 09:16
@ahendriksen ahendriksen self-assigned this Feb 8, 2024
Copy link
Collaborator

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed internally, we will need to put the content of ptx.h into separate sub headers to keep it maintainable.

However, that can be done in a follow up

@miscco miscco enabled auto-merge (squash) February 8, 2024 11:04
@miscco
Copy link
Collaborator

miscco commented Feb 8, 2024

@Artem-B it looks like we have some issues with the ptx detection on clang-cuda

We have clang cuda complaining about an unsupported instruction:

  ptxas /tmp/ptx-f74e03/ptx-sm_60.s, line 556; error   : Feature '%current_graph_exec' requires PTX ISA .version 8.0 or later
  ptxas fatal   : Ptx assembly aborted due to errors

However, that code is guarded by

#if __cccl_ptx_isa >= 800
  NV_IF_TARGET(NV_PROVIDES_SM_50, (
    // mov.u64 sreg_value, %%current_graph_exec;
    *fn_ptr++ = reinterpret_cast<void*>(static_cast<uint64_t (*)()>(cuda::ptx::get_sreg_current_graph_exec));
  ));
#endif // __cccl_ptx_isa >= 800

With the definition of __cccl_ptx_isa being:

#if   (defined(__CUDACC_VER_MAJOR__) && (__CUDACC_VER_MAJOR__ >= 12 && __CUDACC_VER_MINOR__ >= 3)) || (!defined(__CUDACC_VER_MAJOR__))
#  define __cccl_ptx_isa 830ULL
// PTX ISA 8.2 is available from CUDA 12.2, driver r535
#elif (defined(__CUDACC_VER_MAJOR__) && (__CUDACC_VER_MAJOR__ >= 12 && __CUDACC_VER_MINOR__ >= 2)) || (!defined(__CUDACC_VER_MAJOR__))
#  define __cccl_ptx_isa 820ULL
...

Is there something else we need to do in order to properly detect ptx ISA on clang-cuda

@Artem-B
Copy link
Contributor

Artem-B commented Feb 8, 2024

clang-16 does not have CUDA-12 support and defaults to assuming CUDA-11.8 and PTX 7.8. It will not be able to generate any newer PTX version.

clang-17 should work.

@miscco
Copy link
Collaborator

miscco commented Feb 8, 2024

clang-16 does not have CUDA-12 support and defaults to assuming CUDA-11.8 and PTX 7.8. It will not be able to generate any newer PTX version.

clang-17 should work.

Thanks a lot, that would have taken e ages to figure out

auto-merge was automatically disabled February 9, 2024 08:27

Head branch was pushed to by a user without write access

@miscco miscco merged commit 74f1160 into NVIDIA:main Feb 12, 2024
537 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[FEA]: Add cuda::ptx::get_sreg to support Hopper-specific special registers
3 participants