You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I train the model using python main.py , it works well.
But when I try to refine the model, using python main.py --refine --lr 1e-5 --reload --previous_dir
it reports error:
Traceback (most recent call last):
File "main.py", line 225, in
loss = train(opt, actions, train_dataloader, model, optimizer_all, epoch)
File "main.py", line 23, in train
return step('train', opt, actions, train_loader, model, optimizer, epoch)
File "main.py", line 94, in step
loss.backward()
File "D:\Anaconda\envs\pose\lib\site-packages\torch_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "D:\Anaconda\envs\pose\lib\site-packages\torch\autograd_init_.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 1024]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
I can't fine the problem, I almostly use with torch.autograd.set_detect_anomaly(True): , but still can't find the crux.
The text was updated successfully, but these errors were encountered:
When I train the model using python main.py , it works well.
But when I try to refine the model, using python main.py --refine --lr 1e-5 --reload --previous_dir
it reports error:
Traceback (most recent call last):
File "main.py", line 225, in
loss = train(opt, actions, train_dataloader, model, optimizer_all, epoch)
File "main.py", line 23, in train
return step('train', opt, actions, train_loader, model, optimizer, epoch)
File "main.py", line 94, in step
loss.backward()
File "D:\Anaconda\envs\pose\lib\site-packages\torch_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "D:\Anaconda\envs\pose\lib\site-packages\torch\autograd_init_.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 1024]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
I can't fine the problem, I almostly use with torch.autograd.set_detect_anomaly(True): , but still can't find the crux.
The text was updated successfully, but these errors were encountered: