Training error: the GPU program failed #37

saruvora · 2019-04-08T11:45:55Z

Hi I am training the SGAN model with all the datasets provided . After a few iterations I face the following error:

Traceback (most recent call last): File "/workspace/code/scripts/train.py", line 512, in <module> main(args) File "/workspace/code/scripts/train.py", line 191, in main optimizer_g) File "/workspace/code/scripts/train.py", line 387, in generator_step loss.backward() File "/opt/conda/lib/python3.6/site-packages/torch/tensor.py", line 96, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/opt/conda/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/pytorch/pytorch/aten/src/THC/THCBlas.cu:258

It would be great if someone can help me fix this error.
This is with num_epochs= 200
and when I tried it with num_epochs = 5 it works fine

The text was updated successfully, but these errors were encountered:

saruvora · 2019-04-10T08:19:53Z

I trained the model again and it worked. Is still do not why

saruvora closed this as completed Apr 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training error: the GPU program failed #37

Training error: the GPU program failed #37

saruvora commented Apr 8, 2019 •

edited

Loading

saruvora commented Apr 10, 2019

Training error: the GPU program failed #37

Training error: the GPU program failed #37

Comments

saruvora commented Apr 8, 2019 • edited Loading

saruvora commented Apr 10, 2019

saruvora commented Apr 8, 2019 •

edited

Loading