pytorch lighting seems can't use? #36

langdaoliu · 2021-07-27T09:52:31Z

i use SAM train with pytorch lighting.when i use muti-GPU,there is some error.

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256]], which is output 73 of BroadcastBackward, is at version 4; expected version 3 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Alicegaz · 2021-08-07T07:26:50Z

This error arises when you do two subsequent forward passes, then try to backward separately. I was able to solve this in DDP case by merging training_step and training_step_end this way:

def __init__(self, hparams, data_module):
    super().__init__(hparams, data_module)
    self.automatic_optimization = False

def training_step(self, batch, batch_idx,
                      dataloader_idx=None):
    self.enable_bn(self.model)
    out = self(x)
    loss, losses = self.criterion(out, y)
    self.manual_backward(loss, opt)
    opt.first_step(zero_grad=True)
    
    self.disable_bn(self.model)
    out_2, emb_2 = self(x)
    loss_2, losses_2 = self.criterion(out_2, y)
    self.manual_backward(loss_2, opt)
    
    opt.second_step(zero_grad=True)
    
    self.trainer.train_loop.running_loss.append(loss)
    return loss

However, note that in Pytorch Lightning versions <1.0.7 automatic_optimization set to False leads to logging bugs. If you see NaN loss in the progress bar when its actual value is not NaN, then upgrading Pytorch Lightning to 1.0.7 and adding self.trainer.train_loop.running_loss.append(loss) to the training_step() should solve the problem.

stale · 2021-08-28T09:21:55Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale bot added the stale label Aug 28, 2021

stale bot closed this as completed Sep 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pytorch lighting seems can't use? #36

pytorch lighting seems can't use? #36

langdaoliu commented Jul 27, 2021

Alicegaz commented Aug 7, 2021 •

edited

Loading

stale bot commented Aug 28, 2021

pytorch lighting seems can't use? #36

pytorch lighting seems can't use? #36

Comments

langdaoliu commented Jul 27, 2021

Alicegaz commented Aug 7, 2021 • edited Loading

stale bot commented Aug 28, 2021

Alicegaz commented Aug 7, 2021 •

edited

Loading