Understanding suggested LR #86

sivannavis · 2022-08-27T20:10:35Z

Hi! Thanks for the work!
I got my results like this:

And I'm wondering why the steepest gradient is the suggested LR in this case. As the author suggested, maybe in this case the base lr should be 10^-5 and the max lr should be 10^-1?
Could you give any suggestions about whether my understanding is correct?
Thanks!

sivannavis · 2022-08-27T20:12:27Z

And how should I set my number of iterations? Does it need to be the same as the real number of iterations in my model?

NaleRaphael · 2022-08-27T23:57:03Z

Hi @sivannavis
Your understanding is correct, proper learning rate should be selected in the range of [10^-5, 10^-1].

As you can see that suggested LR is defined as a point at the steepest location in the LR curve with negative slope, there should be a minimum negative value within the range of [10^-1, 0]. Since differential is sensitive to high frequency variations (even though numpy.gradient() uses a second-order central difference), this should be the case you got here.

In my opinion, I would pick a LR within the range of [10^5, 10^-4] according to this figure if I need to do it manually. But if you have saved the dictionary lr_finder.history, you can also process it with your own implementation for LR selection. Or you can pass a greater value of smooth_f to lr_finder.range_test() to make it perform a stronger smoothing.

Does it need to be the same as the real number of iterations in my model?

In general, it would be sufficient to pick a number which can make the model ingest all batches of data in one run. You can still adjust the number according to the size of dataset and available batch size for model.

By the way, here is the thread we discussed before regarding "how to selected a proper point as the suggested LR". You can find that there's currently no a standard definition of "what a LR should be the best to use", because this technique (learning rate finding) should be used as a tool to let you explore & experiment faster, instead of determining the best hyper-parameters for final model. For the latter case, I would suggest you searching some other tools regarding hyper-parameter tuning.

Hope this helps!

sivannavis · 2022-08-28T01:12:00Z

Thanks for clarifying and the helpful tips!

woaiwojia4816294 · 2022-09-08T06:35:48Z

Hi @sivannavis Your understanding is correct, proper learning rate should be selected in the range of [10^-5, 10^-1].

As you can see that suggested LR is defined as a point at the steepest location in the LR curve with negative slope, there should be a minimum negative value within the range of [10^-1, 0]. Since differential is sensitive to high frequency variations (even though numpy.gradient() uses a second-order central difference), this should be the case you got here.

In my opinion, I would pick a LR within the range of [10^5, 10^-4] according to this figure if I need to do it manually. But if you have saved the dictionary lr_finder.history, you can also process it with your own implementation for LR selection. Or you can pass a greater value of smooth_f to lr_finder.range_test() to make it perform a stronger smoothing.

Does it need to be the same as the real number of iterations in my model?

In general, it would be sufficient to pick a number which can make the model ingest all batches of data in one run. You can still adjust the number according to the size of dataset and available batch size for model.

By the way, here is the thread we discussed before regarding "how to selected a proper point as the suggested LR". You can find that there's currently no a standard definition of "what a LR should be the best to use", because this technique (learning rate finding) should be used as a tool to let you explore & experiment faster, instead of determining the best hyper-parameters for final model. For the latter case, I would suggest you searching some other tools regarding hyper-parameter tuning.

Hope this helps!

Hi, what's the meaning of "the model ingest all batches of data in one run", do you mean the number of dataset?
Thanks in advance

NaleRaphael · 2022-09-08T15:09:09Z

Hi @woaiwojia4816294

It means the number of iteration to run (the argument num_iter to pass to range_test()). Since there is no a standard way to decide how many iterations to run, it's generally a good idea to make sure the model is able to look over the whole dataset at least once. But it actually depends on the size of and the complexity of model. If you got a really huge dataset, then it might be sufficient to run LRFinder with hundreds of iteration while each batch of samples is randomly picked. But if the size of dataset is small, you might need to iterate over the whole dataset a few times.

Moreover, model is actually under training while LRFinder.range_test() is running. Differences between this training and normal training are:

learning rate will be adjusted every iteration
initial state of model, optimizer will be restored after range_test() is finished

Therefore, lots of things related to training would also affect the result generated by LRFinder.range_test().
Hope this can help you understand how LRFinder works and how to choose a proper number of iterations for it.

sivannavis closed this as completed Aug 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding suggested LR #86

Understanding suggested LR #86

sivannavis commented Aug 27, 2022

sivannavis commented Aug 27, 2022

NaleRaphael commented Aug 27, 2022

sivannavis commented Aug 28, 2022

woaiwojia4816294 commented Sep 8, 2022

NaleRaphael commented Sep 8, 2022 •

edited

Understanding suggested LR #86

Understanding suggested LR #86

Comments

sivannavis commented Aug 27, 2022

sivannavis commented Aug 27, 2022

NaleRaphael commented Aug 27, 2022

sivannavis commented Aug 28, 2022

woaiwojia4816294 commented Sep 8, 2022

NaleRaphael commented Sep 8, 2022 • edited

NaleRaphael commented Sep 8, 2022 •

edited