About LPF-SGD implementation #1

lucasliunju · 2022-02-27T06:14:22Z

Thanks so much for releasing the code. I have several questions about the implementation about LPF-SGD.

noise.append(- init_mp - temp) the reason that using init_mp to obtain the noise
mp.grad.add_((-(n**2 + 1) / mp.view(-1).norm().item())*batch_loss.item()) why we still need this value to the gradient.

The text was updated successfully, but these errors were encountered:

devansh20la · 2022-02-28T17:09:37Z

I think you might be looking at the wrong file. This was from another optimizer.

EDIT: I had a naming error. The code you are looking for starts here:

LPF-SGD/codes/resnets_nodataaug/lpf_train.py

Lines 57 to 80 in 9a35c5f

    
           with torch.no_grad(): 
        
               noise = [] 
        
               for mp in model.parameters(): 
        
                   if len(mp.shape) > 1: 
        
                       sh = mp.shape 
        
                       sh_mul = np.prod(sh[1:]) 
        
                       temp = mp.view(sh[0], -1).norm(dim=1, keepdim=True).repeat(1, sh_mul).view(mp.shape) 
        
                       temp = torch.normal(0, args.std*temp).to(mp.data.device) 
        
                   else: 
        
                       temp = torch.empty_like(mp, device=mp.data.device) 
        
                       temp.normal_(0, args.std*(mp.view(-1).norm().item() + 1e-16)) 
        
                   noise.append(temp) 
        
                   mp.data.add_(noise[-1]) 
        
           # single sample convolution approximation 
        
           with torch.set_grad_enabled(True): 
        
               outputs = model(inputs) 
        
               batch_loss = criterion(outputs, targets) / args.M 
        
               batch_loss.backward() 
        
           # going back to without theta 
        
           with torch.no_grad(): 
        
               for mp, n in zip(model.parameters(), noise): 
        
                   mp.data.sub_(n)

lucasliunju · 2022-02-28T17:15:41Z

Thank you very much for your reply!

lucasliunju · 2022-03-09T03:34:39Z

Dear Devansh,

This is a great work for me and LPF-SGD can significantly improve the performance compared with vanilla momentum sgd. I would like to ask whether cosine learning rate decay can also work for LPF-SGD on WRN and ResNet. Since I find the most learning rate decay is StepLR

Thank you very much!

Best,
Lucas

devansh20la · 2022-03-09T15:59:20Z

I did not try the cosine learning rate decay. I can't think of any reason it shouldn't work, it might just require finetuning to find the best hyperparameters.

lucasliunju · 2022-03-09T16:29:24Z

Dear Devansh,
Thank you very much for your reply. I have reproduced the results based on cosine decay, which is also the same as the results in the paper. Thanks again!

Best,
Lucas

lucasliunju closed this as completed Feb 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About LPF-SGD implementation #1

About LPF-SGD implementation #1

lucasliunju commented Feb 27, 2022

devansh20la commented Feb 28, 2022 •

edited

Loading

lucasliunju commented Feb 28, 2022

lucasliunju commented Mar 9, 2022

devansh20la commented Mar 9, 2022

lucasliunju commented Mar 9, 2022

About LPF-SGD implementation #1

About LPF-SGD implementation #1

Comments

lucasliunju commented Feb 27, 2022

devansh20la commented Feb 28, 2022 • edited Loading

lucasliunju commented Feb 28, 2022

lucasliunju commented Mar 9, 2022

devansh20la commented Mar 9, 2022

lucasliunju commented Mar 9, 2022

devansh20la commented Feb 28, 2022 •

edited

Loading