Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable clang to compile <atomic> in device code #1020

Closed
jrhemstad opened this issue May 23, 2023 · 9 comments
Closed

Enable clang to compile <atomic> in device code #1020

jrhemstad opened this issue May 23, 2023 · 9 comments

Comments

@jrhemstad
Copy link
Collaborator

jrhemstad commented May 23, 2023

We guard the <atomic> header based on the definition of _LIBCUDACXX_HAS_CUDA_ATOMIC_IMPL.

This definition currently only allows nvcc, nvrtc, or nvc++ and excludes clang. We should update it to allow clang.

Based on llvm/llvm-project#58410 (comment), we likely need to version guard this to >=clang13.

Furthermore, before clang17, <cuda/(std/)atomic> requires compiling with -fcuda-allow-variadic-functions. Hopefully there is a macro that lets us test if this is enabled so we can guard based on this as well

Perhaps we should add a new definition

#if defined(_LIBCUDACXX_COMPILER_CLANG_CUDA) && /* clang version >= 13 */ && /* clang device variadics enabled */
   #define _LIBCUDACXX_CLANG_HAS_CUDA_ATOMIC_IMPL
#endif

and update as:

#if defined(_LIBCUDACXX_COMPILER_NVCC) || defined(_LIBCUDACXX_COMPILER_NVRTC) || defined(_LIBCUDACXX_COMPILER_NVHPC) || defined(__CUDACC__) || defined(_LIBCUDACXX_CLANG_HAS_CUDA_ATOMIC_IMPL)
#  define _LIBCUDACXX_HAS_CUDA_ATOMIC_IMPL
#endif
@wmaxey
Copy link
Member

wmaxey commented May 23, 2023

IIUC This change would be meant to guard against using atomics on versions of Clang where the PTX or variadic calls are not supported?

@Artem-B
Copy link
Contributor

Artem-B commented May 24, 2023

For things to work, users would need both support for variadics (can be enabled with old clang versions via -Xclang -fcuda-allow-variadic-functions ) and libcudacxx enabling atomics.

clang-17 will have variadics enabled by default, but the atomics change would still be useful for users of the older versions. Clang packages in the field tend to be updated rather slowly.

@jrhemstad
Copy link
Collaborator Author

jrhemstad commented May 24, 2023

Ah, I missed the part about needing variadics enabled.

Is there macro to test if -fcuda-allow-variadic-functions is enabled? We'd want to guard LIBCUDACXX_CLANG_HAS_CUDA_ATOMIC_IMPL on that as well.

@Artem-B
Copy link
Contributor

Artem-B commented May 24, 2023

Unfortunately not. Adding it now would not make sense either as clang already enables variadics.

We'd want to guard LIBCUDACXX_CLANG_HAS_CUDA_ATOMIC_IMPL on [variadics support] as well.

I do not think it's necessary. Variadics and atomics support are independent issues. We just happen to run into variadics first.
In order to work out of the box we'll need both, but there's no point to restrict atomics availability.

@jarmak-nv jarmak-nv transferred this issue from NVIDIA/libcudacxx Nov 8, 2023
@AustinSchuh
Copy link

I just hit this trying to build cuCollections with clang 17. Since it sounds like there is no way to distinguish if clang has -fcuda-allow-variadic-functions enabled, what about starting with only enabling this for clang 17 or newer, and then following up with something which allows older clangs? I'm happy to put a patch together since I'm patching this locally anyways to make progress.

@jrhemstad
Copy link
Collaborator Author

jrhemstad commented Nov 14, 2023

@AustinSchuh do you have an example of something that still doesn't work today? I think this issue may already be resolved.

We added clang as a device code compiler to our CI a while back: #493 and looking at one of the most recent CI jobs, it looks like the atomic tests are running fine with clang, e.g., https://github.com/NVIDIA/cccl/actions/runs/6860273081/job/18653871971#step:7:307

That said, it looks like the definition of _LIBCUDACXX_HAS_CUDA_ATOMIC_IMPL hasn't changed, so I'm not 100% sure how the atomic tests are currently passing 😅

@wmaxey or @miscco could you check this out?

@AustinSchuh
Copy link

AustinSchuh commented Nov 14, 2023

@jrhemstad , I don't think I'm doing anything too special. I was trying to get cuco working, but really a C++ file with the following is enough to fail:

#include <cuda/atomic>

I'm using clang 17.0.2, and Debian Bookworm's cuda installation. (I've got a sysroot built up with debootstrap plus installing the cuda libraries inside that sysroot that I'm pointing clang to, but from the digging I've done, I don't think that is causing the problem)

In file included from frc971/orin/cuda.cc:1:
In file included from external/amd64_debian_sysroot//usr/include/cuda/atomic:10:
external/amd64_debian_sysroot//usr/include/cuda/std/atomic:271:32: error: no
      member named '__atomic_thread_fence_cuda' in namespace 'cuda::std::__detail'
  271 |                 std::__detail::__atomic_thread_fence_cuda((int)__m, __detail::__thread_scope...
      |                 ~~~~~~~~~~~~~~~^
external/amd64_debian_sysroot//usr/include/nv/detail/__target_macros:483:61: note: 
      expanded from macro 'NV_DISPATCH_TARGET'
  483 | #  define NV_DISPATCH_TARGET(...)       _NV_TARGET_DISPATCH(__VA_ARGS__)
      |                                                             ^~~~~~~~~~~
external/amd64_debian_sysroot//usr/include/nv/detail/__target_macros:478:100: note: 
      expanded from macro '_NV_TARGET_DISPATCH'
  478 |   ..._NV_BLOCK_EXPAND(_NV_DISPATCH_N_ARY(_NV_TARGET_DISPATCH_HANDLE, __VA_ARGS__))
      |                                                                      ^~~~~~~~~~~
external/amd64_debian_sysroot//usr/include/nv/detail/__preprocessor:115:65: note: 
      expanded from macro '_NV_DISPATCH_N_ARY'
  115 | #define _NV_DISPATCH_N_ARY(name, ...) _NV_DISPATCH_N_IMPL(name, __VA_ARGS__)
      |                                                                 ^~~~~~~~~~~

The error above goes on quite a bit longer and repeats 3 times, but I don't think the rest of the error adds value. The problem is what is described in this ticket.

If I add -D_LIBCUDACXX_HAS_CUDA_ATOMIC_IMPL to the compilation command, it compiles successfully.

Holler if more info would be helpful.

@jrhemstad
Copy link
Collaborator Author

Could you try pulling the latest main and see if it still reproduces? The version of libcu++ that shipped with your CTK may not include the relevant changes.

@AustinSchuh
Copy link

Success! I can confirm that the latest main (once I wrangled my compiler to use it) doesn't have this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

4 participants