System:
Problem:
Whenever I try to run train.py, it runs for a few epochs, then I run into the issue stated in the title:
Traceback (most recent call last):
File "C:/repo/flowtron-custom/train.py", line 425, in <module>
train(n_gpus, rank, **train_config)
File "C:/repo/flowtron-custom/train.py", line 336, in train
loss.backward()
File "C:\Users\Serguei\anaconda3\envs\flowtron-nightly-36\lib\site-packages\torch\tensor.py", line 233, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "C:\Users\Serguei\anaconda3\envs\flowtron-nightly-36\lib\site-packages\torch\autograd\__init__.py", line 146, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: transform: failed to synchronize: cudaErrorLaunchFailure: unspecified launch failure
Funnily enough, when I simply remove the loss.backwards() line, it stops breaking and runs perfectly.
What I've tried:
Nvidia drivers:
- 457.51 (latest as of Dec 8 2020)
- 465.12 (beta drivers for WSL 2)
Input data:
Batch size:
Commits
PyTorch/CUDA configs:
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
torch.cudnn.enabled = False
Python versions:
PyTorch versions:
- 1.7.0 with CUDA 11.0, cuDNN 8.0.4
- 1.8.0 nightly build (12/07) with CUDA 11.0, cuDNN 8.0.4
- 1.7.1 with CUDA 11.0, cuDNN 8.0.5
FP16:
What worked?
The only thing that worked to solve the issue was os.environ['CUDA_LAUNCH_BLOCKING'] = '1', but it also slowed the training down by a lot, so it's a pretty awful solution.
System:
Windows 10
RTX 3080
Python 3.6 with PyTorch CUDA 11.0
Problem:
Whenever I try to run
train.py, it runs for a few epochs, then I run into the issue stated in the title:Funnily enough, when I simply remove the
loss.backwards()line, it stops breaking and runs perfectly.What I've tried:
Nvidia drivers:
Input data:
Batch size:
Commits
PyTorch/CUDA configs:
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'torch.cudnn.enabled = FalsePython versions:
PyTorch versions:
FP16:
What worked?
The only thing that worked to solve the issue was
os.environ['CUDA_LAUNCH_BLOCKING'] = '1', but it also slowed the training down by a lot, so it's a pretty awful solution.