New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Add optional plot for suggested LR #44
Conversation
Tweaked from fastai [`c815325`](fastai/fastai@c815325)
Sorry, I forgot to check my code by
|
@chAwater Thanks for your contribution! It's a nice feature, but it may bring potential risk of misleading for users who are new to deep learning. Let me explain it. The concept behind learning rate suggestion is actually a good attempt for automating deep learning pipelines. However, choosing a learning rate with minimal gradient is not the only acceptable approach.
In addition, it can be a controversial issue for those datasets/models that are not applicable. See also this case. Overall, it's more like a decision should be made by users rather than package developers. So that I would prefer not to use a fixed algorithm provided by us for learning rate suggestion. Instead, making users able to choose desired learning rate by their own algorithm might be a better solution, and it can be easily done with the history returned by I was wondering whether there is a generalized solution for picking a good learning rate. There might be some breakthroughs in the future, but what I believe currently is more like this comment made by Sylvain Gugger (also the author of this article). Anyway, this is just my opinion, I'd like to hear yours about it. @chAwater,@davidtvs Besides, if drawing a suggested learning rate in the graph is necessary, you can try to implement it with the feature proposed in PR#23. fig, ax = plt.subplots()
lr_finder.plot(ax=ax)
# Draw the point indicating best lr
ax.plot(x_best_lr, y_best_lr, marker='o') |
@NaleRaphael Thanks for the explanation! It's crystal clear! Sometimes the loss curve and suggested LR in fastai_v1 are confused. I didn't think that much and just grabbed fastai_v1 code into this PR. @davidtvs I'd like to know about your opinion. Also feel free to close it if this feature is not make sense/unnecessary to you. BTW, I'm still very new to deep learning. It's a really long way to go. Suggested LR can be plot outside by: fig, ax = plt.subplots()
lr_finder.plot(ax=ax)
lrs = lr_finder.history["lr"]
losses = lr_finder.history["loss"]
### You can skip some data points by
# lrs = lrs[skip_start:-skip_end]
# losses = losses[skip_start:-skip_end]
# This may cause error when data is not enough
mg = (np.gradient(np.array(losses))).argmin()
print(f"Min numerical gradient: {lrs[mg]:.2E}")
# Draw the point indicating best lr
ax.plot(lrs[mg], losses[mg], markersize=10, marker='o', color='red') |
Thanks for the PR @chAwater 👍 Initially, I didn't implement any functionality to suggest a learning rate due to the reasons that @NaleRaphael posted. There's no single algorithm that will always provide a good suggestion. On the other hand, I think that if it was absolutely clear in the documentation that these are suggestions that are not guaranteed to work for everyone then I would be okay with it. Another way of reinforcing that these are suggestions is to suggest a range of values instead of a single value. We can then offer a mean value of said range too but I think this way the user would see that any value in the range could be good and that he should experiment a bit. What do you guys think? |
@davidtvs I think the second approach is a good way to go. Given a suggestion with an acceptable range can implicitly indicate that it isn't a rigid suggestion. And it clearly resolves the concern I mentioned above: "it's more like a decision should be made by users rather than package developers". BTW, you made a great summary of what I said: "There's no single algorithm that will always provide a good suggestion." As for the mean value and range to be returned, I tend to find out a region excluding all others which are obviously improper to be adopted. This region should meet the following conditions:
And here is the implementation: But I'm not saying this implementation is good, it's a bit complicated and I just take it as an alternative solution. If one those solutions mentioned in my previous comment seems good and elegant to you, it will be great to adopt it. |
@davidtvs I think we can create a series of functions that each function suggests a learning rate. For now (in my code), its like To be honest, at first, I just want to check whether the @NaleRaphael Wow, your code is so geek! It is a bit complicated (took me a while to understand), but the algorithm is clear, smart and efficient! As I said, I'm a beginner of deep learning, I'm not sure your suggested |
That's a good point @chAwater, we can take this as a starting point and then add more algorithms that provide different suggestions like the ones that @NaleRaphael mentioned above and even explore the range idea (nice work implementing it @NaleRaphael). So I'll give this PR a review and move forward with it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A unit test for this would also be nice
@davidtvs Yeah, it seems nice to provide suggestions by different algorithms. 😃 @chAwater You're right, applying a sliding window before running this algorithm can make it more robust. As for verifying this algorithm, it's truly difficult to make sure it would work well in various cases. An obviously simple case to make this algorithm failed to work properly is running it on a model which has been trained for serveral epochs. You can check out this notebook, and also note that it used a fixed random seed. You can change that as you want to get different result, but there will be no obvious difference. The cause is also easy to figure out: if a model can be trained properly, the training loss should become smaller after each epoch. In this situation, the effect of changing learning rate would be hard to observe. For a clearer case, here are some records from my past experiment of training a BiLSTM and trying to adjust learning rate by Therefore, if I need to train a model from scratch, I usually use And that's why I said it's difficult to make this algorithm work well in various cases. But as you said, we can still make it become robust. 😄 |
@davidtvs Thanks for your review. I think I'm just gonna provide the |
yes, that's okay @chAwater. That's what I was expecting actually, sorry if I wasn't clear before. This PR is for a first step in bringing learning rate suggestions and the scope is limited to the steepest gradient algorithm. |
Yeah, it's my pleasure to implement it. Just feel free to let me know if you need any further help. |
Rename `suggestion` -> `suggest_lr`, replace `while` by `if` and replace f-strings by `format`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the changes to return of the plot function are made then you can also add a test that checks the value of the suggested learning rate. Example:
def test_suggest_lr():
task = mod_task.XORTask()
lr_finder = prepare_lr_finder(task)
lr_finder.history["loss"] = [10, 8, 4, 1, 4, 16]
lr_finder.history["lr"] = range(len(lr_finder.history["loss"]))
fig, ax = plt.subplots()
ax, lr = lr_finder.plot(skip_start=0, skip_end=0, suggest_lr=True, ax=ax)
assert lr == 2
/black-check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No linting violations have been found in this PR.
@davidtvs Thanks for your review. I made some changes to clarify the code, but I don't think a Maybe, in the future, we can add an attribute (such as |
Any new thoughts or suggestions? |
@chAwater sorry for no feedback, ended up getting busy with other stuff. I agree that it's weird that the
I would prefer option 1, it's cleaner and there's a clear separation of concerns between |
@davidtvs I partially agree with you. It will be clearer if we separate But:
Therefore, I think I'll take option 2 in this PR as a temporary solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/flake8-lint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lintly has detected code quality issues in this pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra spaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
black style
/flake8-lint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No linting violations have been found in this PR.
Merged. Thanks for contributing @chAwater! |
Thank you for being so nice and patient! It's my pleasure. |
Plot suggested LR
Now you can plot suggested LR by
lr_finder.plot(suggestion=True)
Tweaked from fastai commit
c815325
and fastai doc.Thanks for this great repo!