New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add post-kernel-launch assertions #119
Add post-kernel-launch assertions #119
Conversation
Curious to know the reasoning behind this reordering of includes. |
@sethrj Is there a way to customize the CI build for this run? In the default settings the test "app/demo-rasterizer" is not run. (It needs CELERITAS_USE_VecGeom enabled). |
This is unrelated to this PR. It is already in the master via: #118 |
a4cbcf7
to
2650d44
Compare
@pcanal isn't it run in the cuda-11 build? |
Indeed it is. I missed that there was 2 builds. So on wc.fnal.gov+cuda10 it fails in the CI with cuda11 it works. I will try locally with cuda 11. |
@pcanal Looks like something got mangled pretty badly in the title, can you fix it? It looks like this is just adding the call to check for a launch error? Would you be up for checking the other places in the code where we launch kernels, and make sure they also have the error peeking? |
Sure ... But first I want to try to understand why this kernel fails on wc.fnal.gov with both cuda 10 and 11 and works fine on the CI with cuda 11. See commit message for error detail. |
Ahh I get it now. |
Note: it seems to use to work .. so bisecting now. |
It seems that the first commit when it failed is c63c329. |
Uh oh. The thing that stands out to me is adding separable compilation to the main library. |
I agree ... (the build fails without it though) ... One odd part is "why" does it work on some machine (the CI, probably yours) and not mine ... |
@pcanal Would you be up for changing this PR to add the requisite post-kernel-launch check to any other relevant locations? The need for that extra check was a good catch -- it was a bad assumption on my end that the device synchronize call would error if there had been a problem. |
Yes, I will review and update the existing kernel launch to make sure the error are tested. |
This is to see if CI see arning: Cuda API error detected: cudaLaunchKernel returned (0x62) warning: Cuda API error detected: cudaPeekAtLastError returned (0x62) warning: Cuda API error detected: cudaGetLastError returned (0x62) /wclustre/g4p/pcanal/geant/sources/celeritas/app/demo-rasterizer/demo-rasterizer.cc:139: critical: caught exception: /wclustre/g4p/pcanal/geant/sources/celeritas/app/demo-rasterizer/RDemoKernel.cu:107: celeritas: cuda error: invalid device function cudaPeekAtLastError()
2650d44
to
259f034
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks @pcanal !
No description provided.