-
Notifications
You must be signed in to change notification settings - Fork 548
Add CUDA runtime compilation support using nvrtc #2437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Some other CUDA tests failed, may be which use JIT. I am looking into those failures. |
This is nice. good to use this for full fledged half precision support. |
@arrayfire/core-devel I am still debugging some failures on the linux-cuda ci job, but I think it is ready for reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments. I have suggested an API change which may be more manageable and avoids creating a map on load.
thats an odd error on linux, that didn't happen prior to rebase I did today. trying a fresh build on my machine. Update: was able to reproduce this on fresh build. |
I have noticed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of small things here and there. Great comments. We need to strive to document more of our internal functions.
We need to do this but I am worried that there are going to be a few combinations of template parameters that are valid but will not be tested. We need be vigilant to test all types are parameters. |
Added documentation for nvrtc cache mechanism Moved the following functions in CUDA backt to use runtime compilation * Transpose (In place transpose hasn't been ported yet) * Convolutions * Scan and Scan by Key The eventual goal is to use math.hpp even inside jit kernels and remove the code-path controlled by isJIT parameter of compileKernel function.
I have rebased/squashed all CUDA work. Will push once OpenCL changes are ready. |
You should run Bloaty McBloatface before and after on this if its not too difficult. It would be interesting to see. |
bloaty output
|
* Add CUDA runtime compilation support using nvrtc Moved the following functions in CUDA backend to use runtime compilation * Transpose (In place transpose hasn't been ported yet) * Convolutions * Scan and Scan by Key The eventual goal is to use math.hpp even inside jit kernels and remove the code-path controlled by isJIT parameter of compileKernel function. (cherry picked from commit 7797d01)
* Add CUDA runtime compilation support using nvrtc Moved the following functions in CUDA backend to use runtime compilation * Transpose (In place transpose hasn't been ported yet) * Convolutions * Scan and Scan by Key The eventual goal is to use math.hpp even inside jit kernels and remove the code-path controlled by isJIT parameter of compileKernel function. (cherry picked from commit 7797d01)
* Add CUDA runtime compilation support using nvrtc Moved the following functions in CUDA backend to use runtime compilation * Transpose (In place transpose hasn't been ported yet) * Convolutions * Scan and Scan by Key The eventual goal is to use math.hpp even inside jit kernels and remove the code-path controlled by isJIT parameter of compileKernel function. (cherry picked from commit 7797d01)
* Add CUDA runtime compilation support using nvrtc Moved the following functions in CUDA backend to use runtime compilation * Transpose (In place transpose hasn't been ported yet) * Convolutions * Scan and Scan by Key The eventual goal is to use math.hpp even inside jit kernels and remove the code-path controlled by isJIT parameter of compileKernel function. (cherry picked from commit 7797d01)
* Add CUDA runtime compilation support using nvrtc Moved the following functions in CUDA backend to use runtime compilation * Transpose (In place transpose hasn't been ported yet) * Convolutions * Scan and Scan by Key The eventual goal is to use math.hpp even inside jit kernels and remove the code-path controlled by isJIT parameter of compileKernel function. (cherry picked from commit 7797d01)
src/backend/cuda/nvrtc/cache.hpp
)Moved the following functions to use runtime compilation while refining the API defined in cache.hpp
All required headers(listed below) for runtime compilation will be embedded into the built library. Therefore, developers writing kernels just have to
#include
any required files as usually inside the cuh kernel file. Checktranspose
function to get an idea of how it is being done.The eventual goal is to use math.hpp even inside jit kernels and remove
the code-path controlled by isJIT parameter of buildKernel function. But this has been deferred for another PR.
Notes: