Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silent cudaLaunch failures when compiling with clang's CUDA implementation #49

Closed
illuhad opened this issue Apr 18, 2019 · 10 comments
Closed

Comments

@illuhad
Copy link
Collaborator

illuhad commented Apr 18, 2019

Moving the discussion from #42 to a dedicated issue. As mentioned in #42, @psalz found out that this simple code:

#include <CL/sycl.hpp>

int main() {
    cl::sycl::queue queue;
    cl::sycl::buffer<float, 1> buf(10);

    queue.submit([&](cl::sycl::handler& cgh) {}); // The culprit

    queue.submit([&](cl::sycl::handler& cgh) {
        auto acc = buf.get_access<cl::sycl::access::mode::discard_write>(cgh);
        cgh.parallel_for<class fail>(buf.get_range(), [=](cl::sycl::item<1> item) {
            acc[item] = 1.f;
        }); 
    }); 

    return 0;
}

Silently fails to run the kernel. The error is however not restricted to this particular code and can also seemingly "strike at random". An error can only be seen if using cuda-memcheck, which reveals error cudaErrorInvalidDeviceFunction (error 8) due to "invalid device function" on CUDA API call to cudaLaunch.

We know:

  • the problem disappears when changing small seemingly unrelated bits of code (in this case, either removing the first empty command group or moving it after the second command group solves the issue)
  • Generated device code and cuda launch code on the host side is the same for both working and non-working versions
  • Reproducible (at least?) with clang 8
  • Everything works fine if compiled with nvcc.

Things to try:

  • Does adding an explicit queue.wait_and_throw() at the end change anything? Terminating a program without synchronization either via queue or by creating a host accessor is not allowed by spec, although hipSYCL historically has handled that well. The question is: Could it happen that hipSYCL in some destructor tries to run the kernel while CUDA runtime has already started shutting down? EDIT: No, explicit synchronization doesn't help
@illuhad
Copy link
Collaborator Author

illuhad commented Apr 23, 2019

The kokkos guys may have a similar (same?) problem: kokkos/kokkos#1547

@illuhad
Copy link
Collaborator Author

illuhad commented Apr 23, 2019

Here it's suggested that such errors can be caused by using data after it has left scope: kokkos/kokkos#1173

@psalz
Copy link
Member

psalz commented Apr 25, 2019

I have some news on this: We discovered that the bug appears to have been fixed in the current Clang trunk (i.e., Clang 9), however an assertion is still thrown in debug builds. I've also narrowed the fix down to a particular commit, and created an issue about it in the LLVM bug tracker: https://bugs.llvm.org/show_bug.cgi?id=41597.

@illuhad
Copy link
Collaborator Author

illuhad commented Apr 25, 2019

Excellent, thank you!

@psalz
Copy link
Member

psalz commented May 17, 2019

Unfortunately I have since encountered this issue again, using Clang 9. This means the root cause really hasn't been fixed, only the circumstances triggering the bug are different. I also fear we'll have to dig into Clang ourselves if we want to get this fixed anytime soon...

@illuhad
Copy link
Collaborator Author

illuhad commented May 18, 2019

Okay, let's try figuring this out on our own :) We know that the generated code is identical for both working/non-working versions, with the only difference being the mangled name of the kernel, right? I would propose that we first try to verify if the issue is on the host side as generated by clang:

  • We know it compiles with nvcc, but nvcc likely uses a different mangled kernel name. Let's verify if the nvcc kernel name is indeed different...
  • ... if this is indeed the case, let's see what happens if we launch the clang-compiled ptx kernel without clang: We can launch the PTX code directly using the CUDA driver API, based on the kernel name. We need to be careful about kernel parameters (in SYCL, captured accessors), so let's see if we can reproduce the behavior using a kernel that doesn't capture anything (e.g. just calls printf) and use that for testing.
  • If we cannot reproduce the issue with a non-capturing kernel, it may be an issue with clang's implementation of lambda captures or kernel parameters.
  • Otherwise, if the issue also appears when launching the kernel directly with the driver API, it can either be
    • an issue with the generated PTX (which would be weird, because we know that the working version is the same except for the kernel name)
    • It may be a bug in CUDA - perhaps it just has a problem with certain mangled kernel names, which may only be triggered when compiling with clang
  • If the issue doesn't appear
    • It is likely a problem on the host side, related to how clang invokes the kernel
    • Since we use a kernel that doesn't capture anything it cannot be related to kernel parameters/captures

@psalz
Copy link
Member

psalz commented Jun 28, 2019

After blissfully ignoring this issue for a couple of weeks I ran into it again in a major way a couple of days ago. I decided to take another look and I think I have a solid lead now. It looks like it might actually be two distinct issues (albeit very closely related), with one being surfaced through pruning on hipSYCL's side, and the other being a pure Clang/LLVM bug. Needs a bit more investigation, I'll check back next week!

@psalz
Copy link
Member

psalz commented Jul 2, 2019

I've got a minimal pure CUDA test case and preliminary fix in place, see https://reviews.llvm.org/D64015. If this gets merged it'll also require a change in the Clang plugin (i.e., use getSharedMangleContext), but I'll make a PR once that happens!

@illuhad
Copy link
Collaborator Author

illuhad commented Jul 2, 2019

Wow, great news! Thank you!

@illuhad
Copy link
Collaborator Author

illuhad commented May 27, 2022

Kernel name mangling issues are well known by now and addressed in hipSYCL in various ways, depending on clang version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants