Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVRTC support in accelerate-cuda #258

Closed
mikusp opened this issue Apr 6, 2015 · 6 comments
Closed

NVRTC support in accelerate-cuda #258

mikusp opened this issue Apr 6, 2015 · 6 comments
Labels
cuda backend [deprecated]
Milestone

Comments

@mikusp
Copy link
Contributor

mikusp commented Apr 6, 2015

In CUDA 7, NVIDIA introduced online code compilation under the name NVRTC. I thought it would be interesting to play with it and try to use it in accelerate-cuda package instead of running an external nvcc process. So far, I managed to introduce NVRTC support to both cuda and accelerate-cuda packages. The results are encouraging - my simple test program (included below) runs in about 2s with vanilla accelerate-cuda and about 0.4s with runtime compiled CUDA code. This includes kernel compilation and execution.

import Data.Array.Accelerate      as A
import Data.Array.Accelerate.CUDA as C

main :: IO ()
main = do
    let xs = A.use $ A.fromList (Z :. (1000::Int)) [(1::Int)..1000]
    print $ C.run $ A.zipWith (+) xs xs

Testing my implementation using accelerate-nofib revealed an error that I think is connected to the way Accelerate generates CUDA code. Relevant part of accelerate-nofib output:

  zipWith:
    Int32:
      DIM0:
generate(9): error: expected an expression

generate(14): error: expected an expression

2 errors detected in the compilation of "generate".

#include <accelerate_cuda.h>
extern "C" __global__ void generate(const Int32* __restrict__ arrIn0_0, const Int32* __restrict__ arrIn1_0, Int32* __restrict__ arrOut_0)
{
const int shapeSize = 1;
const int gridSize = blockDim.x * gridDim.x;
int ix;

for (ix = blockDim.x * blockIdx.x + threadIdx.x; ix < shapeSize; ix += gridSize) {
arrOut_0[ix] = ({
const Int64 v0 = 0;

;
arrIn1_0[v0];
}) + ({
const Int64 v1 = 0;

;
arrIn0_0[v1];
});
}
}


        (+): [Failed]
*** Failed! (after 1 test and 1 shrink): 
Exception:

  *** Internal error in package accelerate ***
  *** Please submit a bug report at https://github.com/AccelerateHS/accelerate/issues
  ./Data/Array/Accelerate/CUDA/State.hs:84:9: (unhandled): CUDA Exception: device kernel image is invalid
Array (Z) [0]
Array (Z) [0]
(used seed -2809692722276780105)

In case of a compilation error, I dump the code to stdout. In this case, line numbers mentioned in the output, contain "({...})" syntax. I suppose that while nvcc (and its backend, gcc) supports it, NVRTC is more picky and complains about it.

What is your opinion about this? Is NVRTC support desirable in accelerate-cuda?

Links to my forks:
https://github.com/mikusp/cuda/tree/nvrtc
https://github.com/mikusp/accelerate-cuda/tree/nvrtc

@tmcdonell
Copy link
Member

Oh cool, I have wanted to try the new NVRTC features, but hadn't gotten a chance to yet. Your initial results look promising.

It looks like NVRTC is stumbling on our use of statement expressions, which is quite important for us at the moment to ensure we generate correct code. This also happens to be a problem on windows (#234). Finding a way around that problem will, I think, be a much larger change.

@mikusp
Copy link
Contributor Author

mikusp commented Aug 19, 2015

Thanks to #234 hack, I was able to push this feature further. NVRTC supports lambdas when option "-std=c++11" is passed. However, this had a side-effect of Accelerate headers not compiling anymore, because they used a non-standard extension - unnamed structs in unions. It's fixed in mikusp/accelerate-cuda@0ef1abc. I also needed to prevent math_constants.h header from being included when runtime compilation is enabled.

With these changes, accelerate-nofib runs all the way to the completion. There are some tests failing, but I'm fairly sure they are failing even with vanilla accelerate-cuda.

Please tell what you think about it. I'm willing to make necessary changes to get it included in accelerate-cuda package.

@tmcdonell
Copy link
Member

What comes from math_constants.h ? I vaguely recall it might be needed for the NaN or Infinity names?

@mikusp
Copy link
Contributor Author

mikusp commented Aug 25, 2015

math_constants.h is provided by CUDA package, located in $CUDA_HOME/include/. Preventing it from being included by using #ifndef CUDACC_RTC, doesn't mean that symbols defined in it aren't available. NVRTC API seems to know how to resolve all the symbols coming from CUDA headers without explicitly including them - I had to do the same thing with cuda_runtime.h and everything works fine.

@tmcdonell
Copy link
Member

I casually note that accelerate-llvm-ptx work similarly to the proposed; code generation is entirely in-memory and significantly faster than going via nvcc.

@tmcdonell tmcdonell modified the milestone: _|_ Apr 14, 2017
@tmcdonell
Copy link
Member

closing as accelerate-cuda is deprecated and accelerate-llvm-ptx works on windows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda backend [deprecated]
Projects
None yet
Development

No branches or pull requests

2 participants