Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add jobs using clang as CUDA compiler #493

Merged
merged 70 commits into from
Oct 11, 2023
Merged

Conversation

jrhemstad
Copy link
Collaborator

@jrhemstad jrhemstad commented Sep 26, 2023

Description

Adds jobs using clang as the CUDA compiler. Only adding a job for the newest version of the CTK and clang.

The clang-cuda jobs are a bit shoehorned in because they don't fit the existing job structure right now. This will take a bit of massaging to make it nicer, but I'll probably defer that to future work after the cmake presets are done.

  • Thrust clang-cuda job
  • CUB clang-cuda job
  • libcu++ clang-cuda job
  • @robertmaynard to fix sccache failure on -x cuda generated when using clang as cuda compiler

In order to build with clang as the CUDA compiler, you would do:

CMAKE_CUDA_COMPILER=clang++ ci/build_thrust.sh clang++ 17 70

This is only tested to work on the cuda12.2-llvm16 devcontainer.

closes #344

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@@ -674,7 +674,7 @@ __constexpr_isfinite(_A1 __lcpp_x) noexcept
return isfinite(__lcpp_x);
}

#if defined(_MSC_VER) || defined(__CUDACC_RTC__)
#if defined(_MSC_VER) || defined(__CUDACC_RTC__) || defined(_LIBCUDACXX_COMPILER_CLANG_CUDA)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miscco I have no idea what I'm doing. Please, take a look.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that clang builtins are not available on device:

__global__ void kernel(double *ptr) {
    ptr[0] = __builtin_logb(42.0);
}

int main() {
    kernel<<<1, 1>>>(nullptr);
}

results in:

ptxas fatal   : Unresolved extern function 'logb'
clang: error: ptxas command failed with exit code 255 (use -v to see invocation)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not have the GPU-side standard library these builtins would normally end up calling. Some of the math calls do end up being replaced by their __nv_* counterparts, but there's none for logb and that's why you see the unresolved reference.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something we could contribute? I guess there is a special list of builtins and how they are delegated?

Comment on lines +37 to +39
#ifdef TEST_COMPILER_CLANG_CUDA
#pragma clang diagnostic ignored "-Wunneeded-internal-declaration"
#endif // TEST_COMPILER_CLANG_CUDA
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should have a _LIBCUDACXX_DISABLE_CLANG_CUDA_DIAGNOSTIC macro?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need one for gcc / clang diagnostics. I can cook something up, but it should not be required for this PR

@miscco
Copy link
Collaborator

miscco commented Oct 1, 2023

@jrhemstad I believe this is ready to be merged whenever we get our hands on a new sccache binary

@jrhemstad
Copy link
Collaborator Author

@jrhemstad I believe this is ready to be merged whenever we get our hands on a new sccache binary

clang is so fast that I'm fine merging this even without sccache.

@miscco
Copy link
Collaborator

miscco commented Oct 2, 2023

I mean we could just go with the release as with windows, but yeah it wont prolong our CI time compared to windows

@Artem-B
Copy link

Artem-B commented Oct 2, 2023

clang is so fast that I'm fine merging this even without sccache.

Interesting. I have not been paying attention to relative compilation speed of clang vs nvcc for CUDA, but over time clang did get slower. Early on we were a bit faster than NVCC, then, I think around CUDA-10 time-frame we became somewhat slower.

I'd be very curious to see the build time comparison on something non-trivial, like thrust tests.

I have another favor to ask. As you were making the build work with clang, what were the the issues/annoyances you ran into, in addition to the deduction guideline one? Positives are welcome, too. :-) With my own world view being heavily clang-tinted, It would be great to hear some feedback from folks who mostly work with nvcc.

libcudacxx/.upstream-tests/test/CMakeLists.txt Outdated Show resolved Hide resolved
else:
self.cxx.link_flags += ['-lc++']
# Device code does not have binary components, don't link libc++
# elif self.cxx.type != 'nvcc' and self.cxx.type != 'pgi':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just remove completely

@miscco
Copy link
Collaborator

miscco commented Oct 2, 2023

I have another favor to ask. As you were making the build work with clang, what were the the issues/annoyances you ran into, in addition to the deduction guideline one?

I was quite surprised how easy it was to get clang-cuda working with our test suite. My hat is off for that 👏

Looking at the gymnastics I had to do to try and replace __managed__ I believe it is something that would be beneficial, especially with our newer architectures.

I personally do not have any numbers about compile times, but I spend the last few weeks on getting windows to pass so I am heavily tinted there ;)

@jrhemstad jrhemstad merged commit d372fbc into NVIDIA:main Oct 11, 2023
467 checks passed
elstehle pushed a commit to elstehle/cccl that referenced this pull request Nov 16, 2023
wmaxey added a commit that referenced this pull request Mar 1, 2024
* Restore disabling benchmarks from ci scripts (removed in #493)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[FEA]: Add CI jobs for clang as a CUDA compiler
7 participants