Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: unspecified launch failure #9

Closed
berna-ylmz opened this issue Dec 20, 2020 · 7 comments
Closed

RuntimeError: CUDA error: unspecified launch failure #9

berna-ylmz opened this issue Dec 20, 2020 · 7 comments

Comments

@berna-ylmz
Copy link

Hi,
I reduced my dataset as hmdb51,and training process started.To avoid cuda out of memory error, I set 1 to batch size.
But after 17 epochs, sometimes even most of the time, I am getting cuda error.

File "......\train.py", line 74, in
input_var = [input.cuda() for input in inputs]
RuntimeError: CUDA error: unspecified launch failure

Please help me.
Thanks.

Windows 10
NVIDIA GeForce GTX 1060

@IDKiro
Copy link
Owner

IDKiro commented Dec 21, 2020

I am sorry that I have no idea too.
I suggest you use debugging to determine the variable‘s changes.
Considering that you are using your own dataset, it is better to check whether the data is broken, or there is only one image in a sample folder.
In addition, are you sure that the training process always downs at 17 epochs?

@berna-ylmz
Copy link
Author

berna-ylmz commented Dec 21, 2020

Hi,

for i, (inputs, target, _) in enumerate(train_loader):
		print(torch.cuda.is_available())
		print(len(inputs))
		input_var = [input.cuda() for input in inputs]

I insert print function for variable.Variable is more than 1 always.
For example output is ;

True
136
Traceback (most recent call last):

  File "....\train.py", line 273, in <module>
    train(train_loader, model, criterion, optimizer, epoch)

  File ".....\train.py", line 75, in train
    input_var = [input.cuda() for input in inputs]

  File "......\train.py", line 75, in <listcomp>
    input_var = [input.cuda() for input in inputs]

RuntimeError: CUDA error: unspecified launch failure

I run the train code. For example, it does not work 5 times and it gives this error. It works without error on the 6th stage. But this changes a lot. Sometimes it gives an error 8 times,and 9th works.

I found pytorch/pytorch#27837 this says
"Hello, I recently faced and solved this issue on my Windows machine.
In my case, this issue was invoked by Windows Timeout Detection and Recovery (TDR), which shuts down CUDA kernels that fail to respond in time.

The fix is as follow:

Run "Registry Editor" as Administrator, navigate to KeyPath : HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers
Change KeyValue : TdrDelay to a higher value. Default is 2 or 8 seconds, and in my case, setting it to 64 seconds does the trick.
Reboot.
This should do the trick on Windows 10. Hope this helps.

PS. setting CUDA_LAUNCH_BLOCKING=1 also solves the issue but comes at a heavy performance penalty."

But it still did not solve the issue.
Do you think it could be about that?

@IDKiro
Copy link
Owner

IDKiro commented Dec 21, 2020

Whether a fixed sample causes this error maybe the key.
Or maybe you need to try to use the Linux system to eliminate the interference of the environment configuration.
And you'd better ask this question on stackoverflow, because I don't use windows to train the models.

@IDKiro
Copy link
Owner

IDKiro commented Dec 21, 2020

I can't help you more.
I'll close this issue.

@IDKiro IDKiro closed this as completed Dec 21, 2020
@berna-ylmz
Copy link
Author

Thank you.

@IDKiro
Copy link
Owner

IDKiro commented Dec 21, 2020

You can try to change the dataloader parameters, such as pin_memory, shuffle and num_workers. I don’t know if this will work, but it’s worth trying.

@berna-ylmz
Copy link
Author

Ok I will try,thanks for helping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants