refine problem #27

nora1827 · 2022-12-27T03:39:10Z

When I train the model using python main.py , it works well.
But when I try to refine the model, using python main.py --refine --lr 1e-5 --reload --previous_dir
it reports error：

Traceback (most recent call last):
File "main.py", line 225, in
loss = train(opt, actions, train_dataloader, model, optimizer_all, epoch)
File "main.py", line 23, in train
return step('train', opt, actions, train_loader, model, optimizer, epoch)
File "main.py", line 94, in step
loss.backward()
File "D:\Anaconda\envs\pose\lib\site-packages\torch_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "D:\Anaconda\envs\pose\lib\site-packages\torch\autograd_init_.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 1024]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

I can't fine the problem, I almostly use with torch.autograd.set_detect_anomaly(True): , but still can't find the crux.

Vegetebird · 2022-12-27T06:53:51Z

See #19 (comment)

nora1827 · 2022-12-27T09:24:37Z

See #19 (comment)

Thank you! Using torch==1.7.1 could avoid this problem, and "nn.Relu(inplace=True)" couldn't.

Vegetebird closed this as completed Jan 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refine problem #27

refine problem #27

nora1827 commented Dec 27, 2022

Vegetebird commented Dec 27, 2022

nora1827 commented Dec 27, 2022

refine problem #27

refine problem #27

Comments

nora1827 commented Dec 27, 2022

Vegetebird commented Dec 27, 2022

nora1827 commented Dec 27, 2022