RuntimeError: CUDA error: unspecified launch failure #9

berna-ylmz · 2020-12-20T20:56:31Z

Hi,
I reduced my dataset as hmdb51,and training process started.To avoid cuda out of memory error, I set 1 to batch size.
But after 17 epochs, sometimes even most of the time, I am getting cuda error.

File "......\train.py", line 74, in
input_var = [input.cuda() for input in inputs]
RuntimeError: CUDA error: unspecified launch failure

Please help me.
Thanks.

Windows 10
NVIDIA GeForce GTX 1060

IDKiro · 2020-12-21T01:46:22Z

I am sorry that I have no idea too.
I suggest you use debugging to determine the variable‘s changes.
Considering that you are using your own dataset, it is better to check whether the data is broken, or there is only one image in a sample folder.
In addition, are you sure that the training process always downs at 17 epochs?

berna-ylmz · 2020-12-21T07:19:18Z

Hi,

for i, (inputs, target, _) in enumerate(train_loader):
		print(torch.cuda.is_available())
		print(len(inputs))
		input_var = [input.cuda() for input in inputs]

I insert print function for variable.Variable is more than 1 always.
For example output is ;

True
136
Traceback (most recent call last):

  File "....\train.py", line 273, in <module>
    train(train_loader, model, criterion, optimizer, epoch)

  File ".....\train.py", line 75, in train
    input_var = [input.cuda() for input in inputs]

  File "......\train.py", line 75, in <listcomp>
    input_var = [input.cuda() for input in inputs]

RuntimeError: CUDA error: unspecified launch failure

I run the train code. For example, it does not work 5 times and it gives this error. It works without error on the 6th stage. But this changes a lot. Sometimes it gives an error 8 times,and 9th works.

I found pytorch/pytorch#27837 this says
"Hello, I recently faced and solved this issue on my Windows machine.
In my case, this issue was invoked by Windows Timeout Detection and Recovery (TDR), which shuts down CUDA kernels that fail to respond in time.

The fix is as follow:

Run "Registry Editor" as Administrator, navigate to KeyPath : HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers
Change KeyValue : TdrDelay to a higher value. Default is 2 or 8 seconds, and in my case, setting it to 64 seconds does the trick.
Reboot.
This should do the trick on Windows 10. Hope this helps.

PS. setting CUDA_LAUNCH_BLOCKING=1 also solves the issue but comes at a heavy performance penalty."

But it still did not solve the issue.
Do you think it could be about that?

IDKiro · 2020-12-21T07:32:33Z

Whether a fixed sample causes this error maybe the key.
Or maybe you need to try to use the Linux system to eliminate the interference of the environment configuration.
And you'd better ask this question on stackoverflow, because I don't use windows to train the models.

IDKiro · 2020-12-21T07:33:31Z

I can't help you more.
I'll close this issue.

berna-ylmz · 2020-12-21T07:34:04Z

Thank you.

IDKiro · 2020-12-21T07:35:49Z

You can try to change the dataloader parameters, such as pin_memory, shuffle and num_workers. I don’t know if this will work, but it’s worth trying.

berna-ylmz · 2020-12-21T07:37:42Z

Ok I will try,thanks for helping

IDKiro closed this as completed Dec 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA error: unspecified launch failure #9

RuntimeError: CUDA error: unspecified launch failure #9

berna-ylmz commented Dec 20, 2020

IDKiro commented Dec 21, 2020

berna-ylmz commented Dec 21, 2020 •

edited

Loading

IDKiro commented Dec 21, 2020

IDKiro commented Dec 21, 2020

berna-ylmz commented Dec 21, 2020

IDKiro commented Dec 21, 2020

berna-ylmz commented Dec 21, 2020

RuntimeError: CUDA error: unspecified launch failure #9

RuntimeError: CUDA error: unspecified launch failure #9

Comments

berna-ylmz commented Dec 20, 2020

IDKiro commented Dec 21, 2020

berna-ylmz commented Dec 21, 2020 • edited Loading

IDKiro commented Dec 21, 2020

IDKiro commented Dec 21, 2020

berna-ylmz commented Dec 21, 2020

IDKiro commented Dec 21, 2020

berna-ylmz commented Dec 21, 2020

berna-ylmz commented Dec 21, 2020 •

edited

Loading