Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding suggested LR #86

Closed
sivannavis opened this issue Aug 27, 2022 · 5 comments
Closed

Understanding suggested LR #86

sivannavis opened this issue Aug 27, 2022 · 5 comments

Comments

@sivannavis
Copy link

Hi! Thanks for the work!
I got my results like this:
image
And I'm wondering why the steepest gradient is the suggested LR in this case. As the author suggested, maybe in this case the base lr should be 10^-5 and the max lr should be 10^-1?
Could you give any suggestions about whether my understanding is correct?
Thanks!

@sivannavis
Copy link
Author

And how should I set my number of iterations? Does it need to be the same as the real number of iterations in my model?

@NaleRaphael
Copy link
Contributor

Hi @sivannavis
Your understanding is correct, proper learning rate should be selected in the range of [10^-5, 10^-1].

As you can see that suggested LR is defined as a point at the steepest location in the LR curve with negative slope, there should be a minimum negative value within the range of [10^-1, 0]. Since differential is sensitive to high frequency variations (even though numpy.gradient() uses a second-order central difference), this should be the case you got here.

In my opinion, I would pick a LR within the range of [10^5, 10^-4] according to this figure if I need to do it manually. But if you have saved the dictionary lr_finder.history, you can also process it with your own implementation for LR selection. Or you can pass a greater value of smooth_f to lr_finder.range_test() to make it perform a stronger smoothing.

Does it need to be the same as the real number of iterations in my model?

In general, it would be sufficient to pick a number which can make the model ingest all batches of data in one run. You can still adjust the number according to the size of dataset and available batch size for model.


By the way, here is the thread we discussed before regarding "how to selected a proper point as the suggested LR". You can find that there's currently no a standard definition of "what a LR should be the best to use", because this technique (learning rate finding) should be used as a tool to let you explore & experiment faster, instead of determining the best hyper-parameters for final model. For the latter case, I would suggest you searching some other tools regarding hyper-parameter tuning.

Hope this helps!

@sivannavis
Copy link
Author

Thanks for clarifying and the helpful tips!

@woaiwojia4816294
Copy link

Hi @sivannavis Your understanding is correct, proper learning rate should be selected in the range of [10^-5, 10^-1].

As you can see that suggested LR is defined as a point at the steepest location in the LR curve with negative slope, there should be a minimum negative value within the range of [10^-1, 0]. Since differential is sensitive to high frequency variations (even though numpy.gradient() uses a second-order central difference), this should be the case you got here.

In my opinion, I would pick a LR within the range of [10^5, 10^-4] according to this figure if I need to do it manually. But if you have saved the dictionary lr_finder.history, you can also process it with your own implementation for LR selection. Or you can pass a greater value of smooth_f to lr_finder.range_test() to make it perform a stronger smoothing.

Does it need to be the same as the real number of iterations in my model?

In general, it would be sufficient to pick a number which can make the model ingest all batches of data in one run. You can still adjust the number according to the size of dataset and available batch size for model.

By the way, here is the thread we discussed before regarding "how to selected a proper point as the suggested LR". You can find that there's currently no a standard definition of "what a LR should be the best to use", because this technique (learning rate finding) should be used as a tool to let you explore & experiment faster, instead of determining the best hyper-parameters for final model. For the latter case, I would suggest you searching some other tools regarding hyper-parameter tuning.

Hope this helps!

Hi, what's the meaning of "the model ingest all batches of data in one run", do you mean the number of dataset?
Thanks in advance

@NaleRaphael
Copy link
Contributor

NaleRaphael commented Sep 8, 2022

Hi @woaiwojia4816294

It means the number of iteration to run (the argument num_iter to pass to range_test()). Since there is no a standard way to decide how many iterations to run, it's generally a good idea to make sure the model is able to look over the whole dataset at least once. But it actually depends on the size of and the complexity of model. If you got a really huge dataset, then it might be sufficient to run LRFinder with hundreds of iteration while each batch of samples is randomly picked. But if the size of dataset is small, you might need to iterate over the whole dataset a few times.

Moreover, model is actually under training while LRFinder.range_test() is running. Differences between this training and normal training are:

  • learning rate will be adjusted every iteration
  • initial state of model, optimizer will be restored after range_test() is finished

Therefore, lots of things related to training would also affect the result generated by LRFinder.range_test().
Hope this can help you understand how LRFinder works and how to choose a proper number of iterations for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants