-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About LPF-SGD implementation #1
Comments
I think you might be looking at the wrong file. This was from another optimizer. EDIT: I had a naming error. The code you are looking for starts here: LPF-SGD/codes/resnets_nodataaug/lpf_train.py Lines 57 to 80 in 9a35c5f
|
Thank you very much for your reply! |
Dear Devansh, This is a great work for me and LPF-SGD can significantly improve the performance compared with vanilla momentum sgd. I would like to ask whether cosine learning rate decay can also work for LPF-SGD on WRN and ResNet. Since I find the most learning rate decay is Thank you very much! Best, |
I did not try the cosine learning rate decay. I can't think of any reason it shouldn't work, it might just require finetuning to find the best hyperparameters. |
Dear Devansh, Best, |
Thanks so much for releasing the code. I have several questions about the implementation about LPF-SGD.
noise.append(- init_mp - temp)
the reason that using init_mp to obtain the noisemp.grad.add_((-(n**2 + 1) / mp.view(-1).norm().item())*batch_loss.item())
why we still need this value to the gradient.The text was updated successfully, but these errors were encountered: