NVRTC support in accelerate-cuda #258

mikusp · 2015-04-06T22:10:05Z

In CUDA 7, NVIDIA introduced online code compilation under the name NVRTC. I thought it would be interesting to play with it and try to use it in accelerate-cuda package instead of running an external nvcc process. So far, I managed to introduce NVRTC support to both cuda and accelerate-cuda packages. The results are encouraging - my simple test program (included below) runs in about 2s with vanilla accelerate-cuda and about 0.4s with runtime compiled CUDA code. This includes kernel compilation and execution.

import Data.Array.Accelerate      as A
import Data.Array.Accelerate.CUDA as C

main :: IO ()
main = do
    let xs = A.use $ A.fromList (Z :. (1000::Int)) [(1::Int)..1000]
    print $ C.run $ A.zipWith (+) xs xs

Testing my implementation using accelerate-nofib revealed an error that I think is connected to the way Accelerate generates CUDA code. Relevant part of accelerate-nofib output:

  zipWith:
    Int32:
      DIM0:
generate(9): error: expected an expression

generate(14): error: expected an expression

2 errors detected in the compilation of "generate".

#include <accelerate_cuda.h>
extern "C" __global__ void generate(const Int32* __restrict__ arrIn0_0, const Int32* __restrict__ arrIn1_0, Int32* __restrict__ arrOut_0)
{
const int shapeSize = 1;
const int gridSize = blockDim.x * gridDim.x;
int ix;

for (ix = blockDim.x * blockIdx.x + threadIdx.x; ix < shapeSize; ix += gridSize) {
arrOut_0[ix] = ({
const Int64 v0 = 0;

;
arrIn1_0[v0];
}) + ({
const Int64 v1 = 0;

;
arrIn0_0[v1];
});
}
}


        (+): [Failed]
*** Failed! (after 1 test and 1 shrink): 
Exception:

  *** Internal error in package accelerate ***
  *** Please submit a bug report at https://github.com/AccelerateHS/accelerate/issues
  ./Data/Array/Accelerate/CUDA/State.hs:84:9: (unhandled): CUDA Exception: device kernel image is invalid
Array (Z) [0]
Array (Z) [0]
(used seed -2809692722276780105)

In case of a compilation error, I dump the code to stdout. In this case, line numbers mentioned in the output, contain "({...})" syntax. I suppose that while nvcc (and its backend, gcc) supports it, NVRTC is more picky and complains about it.

What is your opinion about this? Is NVRTC support desirable in accelerate-cuda?

Links to my forks:
https://github.com/mikusp/cuda/tree/nvrtc
https://github.com/mikusp/accelerate-cuda/tree/nvrtc

The text was updated successfully, but these errors were encountered:

tmcdonell · 2015-04-06T23:21:00Z

Oh cool, I have wanted to try the new NVRTC features, but hadn't gotten a chance to yet. Your initial results look promising.

It looks like NVRTC is stumbling on our use of statement expressions, which is quite important for us at the moment to ensure we generate correct code. This also happens to be a problem on windows (#234). Finding a way around that problem will, I think, be a much larger change.

mikusp · 2015-08-19T08:50:46Z

Thanks to #234 hack, I was able to push this feature further. NVRTC supports lambdas when option "-std=c++11" is passed. However, this had a side-effect of Accelerate headers not compiling anymore, because they used a non-standard extension - unnamed structs in unions. It's fixed in mikusp/accelerate-cuda@0ef1abc. I also needed to prevent math_constants.h header from being included when runtime compilation is enabled.

With these changes, accelerate-nofib runs all the way to the completion. There are some tests failing, but I'm fairly sure they are failing even with vanilla accelerate-cuda.

Please tell what you think about it. I'm willing to make necessary changes to get it included in accelerate-cuda package.

tmcdonell · 2015-08-25T05:45:05Z

What comes from math_constants.h ? I vaguely recall it might be needed for the NaN or Infinity names?

mikusp · 2015-08-25T07:03:43Z

math_constants.h is provided by CUDA package, located in $CUDA_HOME/include/. Preventing it from being included by using #ifndef CUDACC_RTC, doesn't mean that symbols defined in it aren't available. NVRTC API seems to know how to resolve all the symbols coming from CUDA headers without explicitly including them - I had to do the same thing with cuda_runtime.h and everything works fine.

tmcdonell · 2016-11-17T02:00:08Z

I casually note that accelerate-llvm-ptx work similarly to the proposed; code generation is entirely in-memory and significantly faster than going via nvcc.

tmcdonell · 2018-01-11T01:31:53Z

closing as accelerate-cuda is deprecated and accelerate-llvm-ptx works on windows

mwu-tow mentioned this issue Aug 18, 2015

MSVC does not support statement expressions #234

Closed

mwu-tow mentioned this issue Aug 19, 2015

Support for lambda-expressions mainland/language-c-quote#52

Closed

mwu-tow mentioned this issue Sep 9, 2015

Lambda-expressions support as needed by accelerate-cuda mainland/language-c-quote#53

Closed

tmcdonell added the cuda backend [deprecated] label Nov 17, 2016

tmcdonell modified the milestone: _|_ Apr 14, 2017

tmcdonell closed this as completed Jan 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVRTC support in accelerate-cuda #258

NVRTC support in accelerate-cuda #258

mikusp commented Apr 6, 2015

tmcdonell commented Apr 6, 2015

mikusp commented Aug 19, 2015

tmcdonell commented Aug 25, 2015

mikusp commented Aug 25, 2015

tmcdonell commented Nov 17, 2016

tmcdonell commented Jan 11, 2018

NVRTC support in accelerate-cuda #258

NVRTC support in accelerate-cuda #258

Comments

mikusp commented Apr 6, 2015

tmcdonell commented Apr 6, 2015

mikusp commented Aug 19, 2015

tmcdonell commented Aug 25, 2015

mikusp commented Aug 25, 2015

tmcdonell commented Nov 17, 2016

tmcdonell commented Jan 11, 2018