-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimizer step #20
Comments
Hello @tojiboyevf, thank you for you interest, Q: In optimizer_step() function in main.py file you multiply lr by lr_scale till 650 steps but you didn't mention this part in paper. Do you do the same warm up for Adam and AdamW? Q: How do you select the learning rate for SGD, Adam, AdamW when you increase the batch size? For instance, some authors in self-supervised models select the learning rate by this formula: lr = base_lr * batch_size / 256 and base_lr can be 0.2, 0.3 and other values. Q: Do you use the same scheduler with the same settings for Adam and AdamW optimizers? Q: Did you use any framework to find the best hyperparameters? |
Cool! Thanks for your answers! |
Dear @amaralibey,
I have some questions. I would be grateful if you could answer them.
optimizer_step()
function inmain.py
file you multiplylr
bylr_scale
till 650 steps but you didn't mention this part in paper. Do you do the same warm up for Adam and AdamW?lr = base_lr * batch_size / 256
andbase_lr
can be 0.2, 0.3 and other values.Thanks for your attention!
The text was updated successfully, but these errors were encountered: