Calculating Backwards For SRU Results in CUDA error. #8

NickShahML · 2017-09-12T18:10:04Z

I'm not sure how, but I'm seeing this error when I try to compute the backwards function. Don't know if you've come across this during your debug?

Traceback (most recent call last):
  File "gan_language.py", line 341, in <module>
    G.backward(one)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 156, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/__init__.py", line 98, in backward
    variables, grad_variables, retain_graph)
  File "/home/nick/wgan-gp/sru/cuda_functional.py", line 417, in backward
    stream=SRU_STREAM
  File "cupy/cuda/function.pyx", line 129, in cupy.cuda.function.Function.__call__ (cupy/cuda/function.cpp:4010)  File "cupy/cuda/function.pyx", line 111, in cupy.cuda.function._launch (cupy/cuda/function.cpp:3647)
  File "cupy/cuda/driver.pyx", line 127, in cupy.cuda.driver.launchKernel (cupy/cuda/driver.cpp:2541)
  File "cupy/cuda/driver.pyx", line 62, in cupy.cuda.driver.check_status (cupy/cuda/driver.cpp:1446)
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_HANDLE: invalid resource handle

The text was updated successfully, but these errors were encountered:

taolei87 · 2017-09-12T18:15:44Z

#4 and #6 may be related. the code now only works for single GPU training, so using a CPU or data on a different GPU may cause the issue.
I'm not sure what exactly the problem is though.

taolei87 · 2017-09-12T18:18:32Z

need to change the CUDA stream handling later. here

NickShahML · 2017-09-12T18:21:28Z

@taolei87 the fix was just to use one gpu, that is...gpu 0. Thanks for the help!

taolei87 · 2017-09-12T18:22:34Z

@NickShahML u r welcome!

taolei87 · 2017-09-12T18:31:28Z

@NickShahML btw, you can specify GPU using flag e.g. CUDA_VISIBLE_DEVICES=1, just in case you don't know (probably not true :))

NickShahML · 2017-09-15T17:20:30Z

@taolei87 yeap! That's what I did earlier. Thanks for informing me.

sriniiyer · 2017-11-13T00:50:09Z

@taolei87 Any better fix for this? It works on gpu 0 right now and other GPUS using CUDA_VISIBLE_DEVICES, but there isn't a workaround for multi-gpu training with data parallelism :(

Thanks!

NickShahML closed this as completed Sep 12, 2017

IdiosyncraticDragon mentioned this issue Nov 13, 2017

SRU doesn't support multi-gpu OpenNMT/OpenNMT-py#379

Closed

Archer-Tatsu mentioned this issue Jun 6, 2018

Multi gpu jonkhler/s2cnn#16

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculating Backwards For SRU Results in CUDA error. #8

Calculating Backwards For SRU Results in CUDA error. #8

NickShahML commented Sep 12, 2017

taolei87 commented Sep 12, 2017

taolei87 commented Sep 12, 2017

NickShahML commented Sep 12, 2017

taolei87 commented Sep 12, 2017

taolei87 commented Sep 12, 2017

NickShahML commented Sep 15, 2017

sriniiyer commented Nov 13, 2017

Calculating Backwards For SRU Results in CUDA error. #8

Calculating Backwards For SRU Results in CUDA error. #8

Comments

NickShahML commented Sep 12, 2017

taolei87 commented Sep 12, 2017

taolei87 commented Sep 12, 2017

NickShahML commented Sep 12, 2017

taolei87 commented Sep 12, 2017

taolei87 commented Sep 12, 2017

NickShahML commented Sep 15, 2017

sriniiyer commented Nov 13, 2017