Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculating Backwards For SRU Results in CUDA error. #8

Closed
NickShahML opened this issue Sep 12, 2017 · 7 comments
Closed

Calculating Backwards For SRU Results in CUDA error. #8

NickShahML opened this issue Sep 12, 2017 · 7 comments

Comments

@NickShahML
Copy link

I'm not sure how, but I'm seeing this error when I try to compute the backwards function. Don't know if you've come across this during your debug?

Traceback (most recent call last):
  File "gan_language.py", line 341, in <module>
    G.backward(one)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 156, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/__init__.py", line 98, in backward
    variables, grad_variables, retain_graph)
  File "/home/nick/wgan-gp/sru/cuda_functional.py", line 417, in backward
    stream=SRU_STREAM
  File "cupy/cuda/function.pyx", line 129, in cupy.cuda.function.Function.__call__ (cupy/cuda/function.cpp:4010)  File "cupy/cuda/function.pyx", line 111, in cupy.cuda.function._launch (cupy/cuda/function.cpp:3647)
  File "cupy/cuda/driver.pyx", line 127, in cupy.cuda.driver.launchKernel (cupy/cuda/driver.cpp:2541)
  File "cupy/cuda/driver.pyx", line 62, in cupy.cuda.driver.check_status (cupy/cuda/driver.cpp:1446)
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_HANDLE: invalid resource handle
@taolei87
Copy link
Contributor

#4 and #6 may be related. the code now only works for single GPU training, so using a CPU or data on a different GPU may cause the issue.
I'm not sure what exactly the problem is though.

@taolei87
Copy link
Contributor

need to change the CUDA stream handling later. here

@NickShahML
Copy link
Author

@taolei87 the fix was just to use one gpu, that is...gpu 0. Thanks for the help!

@taolei87
Copy link
Contributor

@NickShahML u r welcome!

@taolei87
Copy link
Contributor

@NickShahML btw, you can specify GPU using flag e.g. CUDA_VISIBLE_DEVICES=1, just in case you don't know (probably not true :))

@NickShahML
Copy link
Author

@taolei87 yeap! That's what I did earlier. Thanks for informing me.

@sriniiyer
Copy link

@taolei87 Any better fix for this? It works on gpu 0 right now and other GPUS using CUDA_VISIBLE_DEVICES, but there isn't a workaround for multi-gpu training with data parallelism :(

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants