-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding suggested LR #86
Comments
And how should I set my number of iterations? Does it need to be the same as the real number of iterations in my model? |
Hi @sivannavis As you can see that suggested LR is defined as a point at the steepest location in the LR curve with negative slope, there should be a minimum negative value within the range of [10^-1, 0]. Since differential is sensitive to high frequency variations (even though In my opinion, I would pick a LR within the range of [10^5, 10^-4] according to this figure if I need to do it manually. But if you have saved the dictionary
In general, it would be sufficient to pick a number which can make the model ingest all batches of data in one run. You can still adjust the number according to the size of dataset and available batch size for model. By the way, here is the thread we discussed before regarding "how to selected a proper point as the suggested LR". You can find that there's currently no a standard definition of "what a LR should be the best to use", because this technique (learning rate finding) should be used as a tool to let you explore & experiment faster, instead of determining the best hyper-parameters for final model. For the latter case, I would suggest you searching some other tools regarding hyper-parameter tuning. Hope this helps! |
Thanks for clarifying and the helpful tips! |
Hi, what's the meaning of "the model ingest all batches of data in one run", do you mean the number of dataset? |
It means the number of iteration to run (the argument Moreover, model is actually under training while
Therefore, lots of things related to training would also affect the result generated by |
Hi! Thanks for the work!
![image](https://user-images.githubusercontent.com/50509896/187046414-676cd6f4-258d-413b-866a-373d8ea2f8de.png)
I got my results like this:
And I'm wondering why the steepest gradient is the suggested LR in this case. As the author suggested, maybe in this case the base lr should be 10^-5 and the max lr should be 10^-1?
Could you give any suggestions about whether my understanding is correct?
Thanks!
The text was updated successfully, but these errors were encountered: