Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Backward is not reentrant #16

Closed
leon532 opened this issue Mar 22, 2019 · 5 comments
Closed

RuntimeError: Backward is not reentrant #16

leon532 opened this issue Mar 22, 2019 · 5 comments

Comments

@leon532
Copy link

leon532 commented Mar 22, 2019

It raises RuntimeError: Backward is not reentrant when i run the test.py.

torch.Size([2, 128, 128, 128])
torch.Size([2, 128, 128, 128])
torch.Size([20, 32, 7, 7])
torch.Size([20, 32, 7, 7])
torch.Size([20, 32, 7, 7])
checking
dconv im2col_step forward passed with 0.0
tensor(0., device='cuda:0', grad_fn=)
dconv im2col_step backward passed with 7.450580596923828e-09 = 7.450580596923828e-09+0.0+0.0+0.0
mdconv im2col_step forward passed with 0.0
tensor(0., device='cuda:0', grad_fn=)
mdconv im2col_step backward passed with 3.725290298461914e-09
0.971507, 1.943014
0.971507, 1.943014
tensor(0., device='cuda:0')
dconv zero offset passed with 1.1920928955078125e-07
dconv zero offset identify passed with 0.0
tensor(0., device='cuda:0')
mdconv zero offset passed with 1.7881393432617188e-07
mdconv zero offset identify passed with 0.0
check_gradient_conv: True
Traceback (most recent call last):
File "test.py", line 624, in
check_gradient_dconv()
File "test.py", line 400, in check_gradient_dconv
eps=1e-3, atol=1e-3, rtol=1e-2, raise_exception=True))
File "/data/yli18/miniconda3/envs/pytorch-1.0/lib/python3.6/site-packages/torch/autograd/gradcheck.py", line 208, in gradcheck
return fail_test('Backward is not reentrant, i.e., running backward with same '
File "/data/yli18/miniconda3/envs/pytorch-1.0/lib/python3.6/site-packages/torch/autograd/gradcheck.py", line 185, in fail_test
raise RuntimeError(msg)
RuntimeError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, although analytical gradient matches numerical gradient

Is this a serious problem? and how can i resolve it?
Thanks for you time and suggestion.

@xvjiarui
Copy link
Collaborator

xvjiarui commented Mar 26, 2019

Sorry for the late reply.

# """
# ****** Note: backward is not reentrant error may not be a serious problem,
# ****** since the max error is less than 1e-7,
# ****** Still looking for what trigger this problem
# """

As in comment, it won't affect the performance. Feel free to ignore.
Please reopen this issue if you have a solution. Thx

@GreenTeaHua
Copy link

thx

@JiancongWang
Copy link

JiancongWang commented Jun 10, 2019

I checked the code. Numerically the code should be correct. I verified that the error is very small using CUDA 9.0 on a GTX 1080 ~ 1e-16. The error is raised when calculating the gradient for input. The code uses atomicadd to calculate that since one can not possibly know how many time a pixel get used. Then the problem is the sequence of atomicadd is uncertain. Different sequence of adding will result in different value up to numerical error. The gradient on the bias/weight/offset does not have the atomicadd and thus have no such issue.

@JiancongWang
Copy link

That's only one possible cause of the issue though. I will double check and see if there is any other problem. For now I don't see any.

@heartInsert
Copy link

I checked the code. Numerically the code should be correct. I verified that the error is very small using CUDA 9.0 on a GTX 1080 ~ 1e-16. The error is raised when calculating the gradient for input. The code uses atomicadd to calculate that since one can not possibly know how many time a pixel get used. Then the problem is the sequence of atomicadd is uncertain. Different sequence of adding will result in different value up to numerical error. The gradient on the bias/weight/offset does not have the atomicadd and thus have no such issue.

You mean I can ignore this exception?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants