-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NVRTC support in accelerate-cuda #258
Comments
Oh cool, I have wanted to try the new NVRTC features, but hadn't gotten a chance to yet. Your initial results look promising. It looks like NVRTC is stumbling on our use of statement expressions, which is quite important for us at the moment to ensure we generate correct code. This also happens to be a problem on windows (#234). Finding a way around that problem will, I think, be a much larger change. |
Thanks to #234 hack, I was able to push this feature further. NVRTC supports lambdas when option "-std=c++11" is passed. However, this had a side-effect of Accelerate headers not compiling anymore, because they used a non-standard extension - unnamed structs in unions. It's fixed in mikusp/accelerate-cuda@0ef1abc. I also needed to prevent math_constants.h header from being included when runtime compilation is enabled. With these changes, accelerate-nofib runs all the way to the completion. There are some tests failing, but I'm fairly sure they are failing even with vanilla accelerate-cuda. Please tell what you think about it. I'm willing to make necessary changes to get it included in accelerate-cuda package. |
What comes from |
math_constants.h is provided by CUDA package, located in $CUDA_HOME/include/. Preventing it from being included by using #ifndef CUDACC_RTC, doesn't mean that symbols defined in it aren't available. NVRTC API seems to know how to resolve all the symbols coming from CUDA headers without explicitly including them - I had to do the same thing with cuda_runtime.h and everything works fine. |
I casually note that |
closing as |
In CUDA 7, NVIDIA introduced online code compilation under the name NVRTC. I thought it would be interesting to play with it and try to use it in accelerate-cuda package instead of running an external nvcc process. So far, I managed to introduce NVRTC support to both cuda and accelerate-cuda packages. The results are encouraging - my simple test program (included below) runs in about 2s with vanilla accelerate-cuda and about 0.4s with runtime compiled CUDA code. This includes kernel compilation and execution.
Testing my implementation using accelerate-nofib revealed an error that I think is connected to the way Accelerate generates CUDA code. Relevant part of accelerate-nofib output:
In case of a compilation error, I dump the code to stdout. In this case, line numbers mentioned in the output, contain "({...})" syntax. I suppose that while nvcc (and its backend, gcc) supports it, NVRTC is more picky and complains about it.
What is your opinion about this? Is NVRTC support desirable in accelerate-cuda?
Links to my forks:
https://github.com/mikusp/cuda/tree/nvrtc
https://github.com/mikusp/accelerate-cuda/tree/nvrtc
The text was updated successfully, but these errors were encountered: