Add CUDA runtime compilation support using nvrtc #2437

9prady9 · 2019-02-15T07:26:42Z

Documetation for new functions (src/backend/cuda/nvrtc/cache.hpp)
Add checks for compute versions not supported in CUDA 10

Moved the following functions to use runtime compilation while refining the API defined in cache.hpp

convolve
scan
scan_by_key
separable convolution
transpose
where

All required headers(listed below) for runtime compilation will be embedded into the built library. Therefore, developers writing kernels just have to #include any required files as usually inside the cuh kernel file. Check transpose function to get an idea of how it is being done.

backend.hpp
complex.hpp
jit.cuh
math.hpp
ops.hpp
optypes.hpp
Param.hpp
shared.hpp
types.hpp

The eventual goal is to use math.hpp even inside jit kernels and remove
the code-path controlled by isJIT parameter of buildKernel function. But this has been deferred for another PR.

Notes:

Just this change brought down afcuda.so file size by 200MB for single compute version 61. Hopefully, we will see drastic reduction in our final binary once all feasible functions are ported to runtime compilation.
CUB can't be included into this framework since it includes some system headers, there is an open issue regarding this on the corresponding repository.
Functions using thrust can only be ported if thrust calls and raw kernels are cleanly separated. Thrust API involving CUDA runtime constructs are basically the blockers.

9prady9 · 2019-02-15T12:05:32Z

Some other CUDA tests failed, may be which use JIT. I am looking into those failures.

pavanky · 2019-02-20T01:57:30Z

This is nice. good to use this for full fledged half precision support.

9prady9 · 2019-02-20T16:32:54Z

@arrayfire/core-devel I am still debugging some failures on the linux-cuda ci job, but I think it is ready for reviews.

umar456

Minor comments. I have suggested an API change which may be more manageable and avoids creating a map on load.

src/backend/cuda/CMakeLists.txt

src/api/c/ops.hpp

src/backend/cuda/CMakeLists.txt

src/backend/cuda/Param.hpp

src/backend/cuda/kernel/complex.hpp

src/backend/cuda/nvrtc/cache.hpp

src/backend/cuda/nvrtc/kernel_map.hpp

src/backend/cuda/nvrtc/launch_args.hpp

Addressed all feedback

9prady9 · 2019-03-05T16:46:42Z

thats an odd error on linux, that didn't happen prior to rebase I did today. trying a fresh build on my machine.

Update: was able to reproduce this on fresh build.

9prady9 · 2019-03-05T18:31:41Z

I have noticed __int64 type failures on windows, will debug them soon.

umar456

A couple of small things here and there. Great comments. We need to strive to document more of our internal functions.

src/backend/cuda/CMakeLists.txt

src/backend/cuda/nvrtc/cache.cpp

src/backend/cuda/nvrtc/cache.hpp

src/backend/cuda/nvrtc/launch_args.hpp

src/backend/cuda/scan.cpp

src/backend/cuda/nvrtc/launch_args.hpp

src/backend/cuda/nvrtc/cache.hpp

umar456 · 2019-03-05T20:03:22Z

We need to do this but I am worried that there are going to be a few combinations of template parameters that are valid but will not be tested. We need be vigilant to test all types are parameters.

Added documentation for nvrtc cache mechanism Moved the following functions in CUDA backt to use runtime compilation * Transpose (In place transpose hasn't been ported yet) * Convolutions * Scan and Scan by Key The eventual goal is to use math.hpp even inside jit kernels and remove the code-path controlled by isJIT parameter of compileKernel function.

9prady9 · 2019-03-06T07:45:49Z

I have rebased/squashed all CUDA work. Will push once OpenCL changes are ready.

umar456 · 2019-03-06T08:55:03Z

You should run Bloaty McBloatface before and after on this if its not too difficult. It would be interesting to see.

9prady9 · 2019-03-06T11:49:00Z

bloaty output

nvrtc ./src/backend/cuda/libafcuda.so.3.7.0
     VM SIZE                         FILE SIZE
 --------------                   --------------
 100.0%   119Mi TOTAL               529Mi 100.0%
  

master ./src/backend/cuda/libafcuda.so
   VM SIZE             FILE SIZE
 --------------          --------------
 100.0%  180Mi TOTAL        640Mi 100.0%

Addressed feedback

* Add CUDA runtime compilation support using nvrtc Moved the following functions in CUDA backend to use runtime compilation * Transpose (In place transpose hasn't been ported yet) * Convolutions * Scan and Scan by Key The eventual goal is to use math.hpp even inside jit kernels and remove the code-path controlled by isJIT parameter of compileKernel function. (cherry picked from commit 7797d01)

9prady9 added feature CUDA internal labels Feb 15, 2019

9prady9 force-pushed the nvrtc branch from 398c965 to 3b201c5 Compare February 15, 2019 09:21

9prady9 force-pushed the nvrtc branch from 3b201c5 to ae40804 Compare February 18, 2019 05:16

9prady9 force-pushed the nvrtc branch from 325a42c to c972504 Compare February 26, 2019 08:15

umar456 previously requested changes Feb 26, 2019

View reviewed changes

9prady9 force-pushed the nvrtc branch from c972504 to af35d70 Compare March 5, 2019 16:06

9prady9 requested a review from umar456 March 5, 2019 16:07

umar456 previously requested changes Mar 5, 2019

View reviewed changes

9prady9 force-pushed the nvrtc branch from 366498e to 9325c93 Compare March 6, 2019 07:45

Clang format changes

f27a7cd

9prady9 added 2 commits March 12, 2019 23:55

Specialize stringified name of long long type for nvrtc

fc4213a

Avoid redirection to templated constructor of TemplateArg

1eb4c50

9prady9 requested a review from umar456 March 12, 2019 21:28

umar456 approved these changes Mar 12, 2019

View reviewed changes

umar456 merged commit 7797d01 into arrayfire:master Mar 12, 2019

9prady9 deleted the nvrtc branch March 12, 2019 21:41

9prady9 mentioned this pull request Apr 29, 2020

Windows Binary Installer -- afcuda.dll is absolutely massive... why? #2863

Closed

Add CUDA runtime compilation support using nvrtc #2437

Add CUDA runtime compilation support using nvrtc #2437

Uh oh!

Conversation

9prady9 commented Feb 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

9prady9 commented Feb 15, 2019

Uh oh!

pavanky commented Feb 20, 2019

Uh oh!

9prady9 commented Feb 20, 2019

Uh oh!

umar456 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

9prady9 commented Mar 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

9prady9 commented Mar 5, 2019

Uh oh!

umar456 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

umar456 commented Mar 5, 2019

Uh oh!

9prady9 commented Mar 6, 2019

Uh oh!

umar456 commented Mar 6, 2019

Uh oh!

9prady9 commented Mar 6, 2019

Uh oh!

Uh oh!

9prady9 commented Feb 15, 2019 •

edited

Loading

9prady9 commented Mar 5, 2019 •

edited

Loading