Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Exception: too many resources requested for launch #25

Closed
nushio3 opened this issue Jul 14, 2011 · 12 comments
Closed

CUDA Exception: too many resources requested for launch #25

nushio3 opened this issue Jul 14, 2011 · 12 comments
Labels
cuda backend [deprecated] frontend sharing recovery, fusion, optimisation

Comments

@nushio3
Copy link

nushio3 commented Jul 14, 2011

Hello,
I'm trying to use Accelerate for hydrodynamic simulations.
As a training, I'm writing a Lattice-Boltzmann solver with Accelerate. The program, under construction, is

https://github.com/nushio3/accelerate-test/blob/7a8248fa30c0e728cea0fe03ccd21bf5bed8a5ef/step05/MainAcc.hs

I have expressed what I want to write also in C++ and CUDA. They are
main-omp.cpp and main-cuda.cu at the same folder.

To begin with, I wrote a function to initialize the array in Accelerate,
(it corresponds to the function 'initialize()' in fluid.h)
but it fails with 'submit a bug report' error.

It says 'too many resources requested,' so I looked at the printout of Accelerate's kernel,
but for me it looks normal.
Am I doing something wrong, so that I'm wasting resources?
Or shall I decrease e.g. the resolution?

./MainAcc.hs 0
... some warnings omitted ...
map
(\x0 -> (+) ((+) ((+) ((+) ((+) ((+) ((+) ((+) (2 (3 x0),
1 (3 x0)),
0 (3 x0)),
2 (2 x0)),
1 (2 x0)),
0 (2 x0)),
2 (1 x0)),
1 (1 x0)),
0 (1 x0)))
(generate
(Z :. 1024) :. 768
(\x0 -> ((0.0,0.0,0.0),
(0.1,
0.7,
(+) (0.2,
() (1.0e-3,
(/) ((
) (12.0, fromIntegral (indexHead x0)), 768.0)))),
(0.0,0.0,0.0),
((<) ((+) (() (64.0,
() ((-) (fromIntegral (indexHead (indexTail x0)),
(/) (768.0, 6.0)),
(-) (fromIntegral (indexHead (indexTail x0)),
(/) (768.0, 6.0)))),
(
) ((-) (fromIntegral (indexHead x0), (/) (768.0, 2.0)),
(-) (fromIntegral (indexHead x0), (/) (768.0, 2.0)))),
() ((/) (768.0, 24.0), (/) (768.0, 24.0)))) ?
(1.0, 0.0))))
MainAcc.hs:
*
* Internal error in package accelerate ***
*** Please submit a bug report at https://github.com/mchakravarty/accelerate/issues
./Data/Array/Accelerate/CUDA.hs:59 (unhandled): CUDA Exception: too many resources requested for launch

@tmcdonell
Copy link
Member

I'm currently away on an internship and thus not working on accelerate right now, but if I have a spare moment I'll try to look into it. Sorry about that =

@nushio3
Copy link
Author

nushio3 commented Jul 16, 2011

Hi tmcdonell, thank you for your response. I wish you do a good job on your intern.

By the way, I've managed to write a working simulator, by using Cell (Acc (Array DIM2 Real)) instead of
Acc (Array DIM2 (Cell Real)) as the representation of the state, where type Cell a = ((a,a,a) , (a,a,a) , (a,a,a) , a) .

However, the Accelerate implementation was about 500 times slower than CUDA counterpart, sadly. This is due to my awkwardness in using Accelerate.

Here I prepared a source code to explain you.
https://github.com/nushio3/accelerate-test/blob/fa6e7b3b92e8d2ab357dad26d133c591d5756ef1/step05/OptTest.hs

You can run it like this.

> ./OptTest.hs 1 /dev/null
. . . .
success.

see that, in line 72-73, I have

instance Num AWR where
  a+b = A.use $ run $ A.zipWith (+) a b
  a-b = A.use $ run $ A.zipWith (-) a b

Since A.use . run equals id in semantics, We should be able to remove those. But when I do so:

instance Num AWR where
  a+b = A.zipWith (+) a b
  a-b = A.use $ run $ A.zipWith (-) a b

I get this:

> ./OptTest.hs 1 /dev/null
. . . .
OptTest.hs: 
*** Internal error in package accelerate ***
*** Please submit a bug report at https://github.com/mchakravarty/accelerate/issues
./Data/Array/Accelerate/Smart.hs:886 ((+++)): Precondition violated

Or by doing this:

instance Num AWR where
  a+b = A.use $ run $ A.zipWith (+) a b
  a-b = A.zipWith (-) a b

I get this:

> ./OptTest.hs 1 /dev/null
. . . .
*** Internal error in package accelerate ***
*** Please submit a bug report at https://github.com/mchakravarty/accelerate/issues
./Data/Array/Accelerate/Smart.hs:321 (convertSharingAcc (prjIdx)): inconsistent valuation; sa = 51; env = [57]

The effect of A.use . run compared to id is to force smaller AST, hindering the optimizations.
I guess there are some bugs in optimization routines?

@mchakravarty
Copy link
Member

Trevor,

What do you think might be the problem here.

Manuel

Am 14/07/2011 um 22:05 schrieb nushio3:

Hello,
I'm trying to use Accelerate for hydrodynamic simulations.
As a training, I'm writing a Lattice-Boltzmann solver with Accelerate. The program, under construction, is

https://github.com/nushio3/accelerate-test/blob/7a8248fa30c0e728cea0fe03ccd21bf5bed8a5ef/step05/MainAcc.hs

I have expressed what I want to write also in C++ and CUDA. They are
main-omp.cpp and main-cuda.cu at the same folder.

To begin with, I wrote a function to initialize the array in Accelerate,
(it corresponds to the function 'initialize()' in fluid.h)
but it fails with 'submit a bug report' error.

It says 'too many resources requested,' so I looked at the printout of Accelerate's kernel,
but for me it looks normal.
Am I doing something wrong, so that I'm wasting resources?
Or shall I decrease e.g. the resolution?

./MainAcc.hs 0
... some warnings omitted ...
map
(\x0 -> (+) ((+) ((+) ((+) ((+) ((+) ((+) ((+) (2 (3 x0),
1 (3 x0)),
0 (3 x0)),
2 (2 x0)),
1 (2 x0)),
0 (2 x0)),
2 (1 x0)),
1 (1 x0)),
0 (1 x0)))
(generate
(Z :. 1024) :. 768
(\x0 -> ((0.0,0.0,0.0),
(0.1,
0.7,
(+) (0.2,
() (1.0e-3,
(/) ((
) (12.0, fromIntegral (indexHead x0)), 768.0)))),
(0.0,0.0,0.0),
((<) ((+) (() (64.0,
() ((-) (fromIntegral (indexHead (indexTail x0)),
(/) (768.0, 6.0)),
(-) (fromIntegral (indexHead (indexTail x0)),
(/) (768.0, 6.0)))),
(
) ((-) (fromIntegral (indexHead x0), (/) (768.0, 2.0)),
(-) (fromIntegral (indexHead x0), (/) (768.0, 2.0)))),
() ((/) (768.0, 24.0), (/) (768.0, 24.0)))) ?
(1.0, 0.0))))
MainAcc.hs:
*
* Internal error in package accelerate ***
*** Please submit a bug report at https://github.com/mchakravarty/accelerate/issues
./Data/Array/Accelerate/CUDA.hs:59 (unhandled): CUDA Exception: too many resources requested for launch

Reply to this email directly or view it on GitHub:
https://github.com/mchakravarty/accelerate/issues/25

@tmcdonell
Copy link
Member

I can not reproduce the first bug report, unfortunately. Specs for my test machine follow; as you can see it is not one of the high-end CUDA cards. Which version of the CUDA toolkit are you using? I haven't tried with the 4.x series yet, so maybe that has something to do with it (for example, if the way device capabilities are reported has changed). I'll test that next...

Prelude Foreign.CUDA.Driver> props =<< device 0
DeviceProperties {deviceName = "GeForce GT 120", computeCapability = 1.1, totalGlobalMem = 268107776, totalConstMem = 65536, sharedMemPerBlock = 16384, regsPerBlock = 8192, warpSize = 32, maxThreadsPerBlock = 512, maxBlockSize = (512,512,64), maxGridSize = (65535,65535,1), maxTextureDim1D = 8192, maxTextureDim2D = (65536,32768), maxTextureDim3D = (2048,2048,2048), clockRate = 1250000, multiProcessorCount = 4, memPitch = 2147483647, textureAlignment = 256, computeMode = Default, deviceOverlap = True, concurrentKernels = False, eccEnabled = False, kernelExecTimeoutEnabled = True, integrated = False, canMapHostMemory = True}

For the second, the program runs without the use . run statements if using @sseefried's patch for issue #22, although I'm not sure of the status of that patch relative to your own changes to sharing recovery.

@nushio3
Copy link
Author

nushio3 commented Aug 2, 2011

Thank you tmcdonell, for your effort. Let me see, the ghci trick
Prelude> :m +Foreign.CUDA.Driver
Prelude Foreign.CUDA.Driver> props =<< device 0
Loading package extensible-exceptions-0.1.1.2 ... linking ... done.
Loading package bytestring-0.9.1.10 ... linking ... done.
Loading package cuda-0.3.2.2 ... linking ... done.
*** Exception: CUDA Exception: driver not initialised
... didn't work for me. I'm using Tesla M2050 (device capability 2.0) with CUDA 3.2 . I'll upload the result of deviceQuery, if you need. I have tried CUDA 4.0 environment, too, but I couldn't install the hackage cuda-0.3.2.2 (which is the latest) into CUDA 4.0 environment.

I'm trying the patch 5c24257 now...

@tmcdonell
Copy link
Member

Ah, first you need to run initialise []; sorry for the omission. No matter --- the model number and driver version tell me everything I was interested in. I am also using nvcc version 3.2. Do you happen to be running a 64-bit version of GHC?

I have only done light testing on compute-2.0 devices since I only briefly had access to one. I recall there being some problems when the 2.0 series devices were released; maybe this is why the first example works on my 1.x series card but not your own...

@nushio3
Copy link
Author

nushio3 commented Aug 2, 2011

Thanks, tmcdonell, with initialize I could query the device by props =<< device 0 .
With patch 5c24257 , I could compile the code without use . run . Now benchmarking.

@mchakravarty
Copy link
Member

Any progress on this problem?

@nushio3
Copy link
Author

nushio3 commented Apr 25, 2012

Nice to hear from you again! I haven't tried accelerate since ICFP2011, where I was possible to compute what I want in accelerate (but was slow.) Maybe it's a good time for me to touch the lates accelerate again!

@mchakravarty
Copy link
Member

Good to hear from you as well. There have been many changes to Accelerate in the last few months. So, it may indeed be worthwhile to have another look.

@tmcdonell
Copy link
Member

I'm going to go ahead and close this issue, as both of the example programs work now (it is still slow, but that's a different issue).

Congratulations on your recent release of Paraiso!

@nushio3
Copy link
Author

nushio3 commented Jun 20, 2012

Thank you for your congratulations!

I've been watching that Ryan Newton came in and accelerate is recently
seeing rapid progress. I'd really like to try it again but I've been
having something to do first...

Please keep up the good work!

2012/6/20 Trevor L. McDonell
reply@reply.github.com:

I'm going to go ahead and close this issue, as both of the example programs work now (it is still slow, but that's a different issue).

Congratulations on your recent release of Paraiso!


Reply to this email directly or view it on GitHub:
#25 (comment)

Takayuki MURANUSHI
The Hakubi Center for Advanced Research, Kyoto University
http://www.hakubi.kyoto-u.ac.jp/02_mem/h22/muranushi.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda backend [deprecated] frontend sharing recovery, fusion, optimisation
Projects
None yet
Development

No branches or pull requests

3 participants