Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add optional plot for suggested LR #44

Merged
merged 9 commits into from Jul 8, 2020

Conversation

chAwater
Copy link
Contributor

@chAwater chAwater commented Jun 3, 2020

Plot suggested LR

Now you can plot suggested LR by lr_finder.plot(suggestion=True)

Tweaked from fastai commit c815325 and fastai doc.


Thanks for this great repo!

@chAwater
Copy link
Contributor Author

chAwater commented Jun 3, 2020

Sorry, I forgot to check my code by flake8:

./torch_lr_finder/lr_finder.py:469:17: W291 trailing whitespace
./torch_lr_finder/lr_finder.py:471:13: E722 do not use bare 'except'
./torch_lr_finder/lr_finder.py:113:89: E501 line too long (90 > 88 characters)
./torch_lr_finder/lr_finder.py:207:89: E501 line too long (91 > 88 characters)
./torch_lr_finder/lr_finder.py:215:89: E501 line too long (116 > 88 characters)
./torch_lr_finder/lr_finder.py:220:89: E501 line too long (96 > 88 characters)
./torch_lr_finder/lr_finder.py:221:89: E501 line too long (102 > 88 characters)
./torch_lr_finder/lr_finder.py:223:89: E501 line too long (116 > 88 characters)
./torch_lr_finder/lr_finder.py:225:89: E501 line too long (109 > 88 characters)
./torch_lr_finder/lr_finder.py:226:89: E501 line too long (92 > 88 characters)
./torch_lr_finder/lr_finder.py:230:89: E501 line too long (113 > 88 characters)
./torch_lr_finder/lr_finder.py:232:89: E501 line too long (116 > 88 characters)
./torch_lr_finder/lr_finder.py:233:89: E501 line too long (90 > 88 characters)
./torch_lr_finder/lr_finder.py:236:89: E501 line too long (107 > 88 characters)
./torch_lr_finder/lr_finder.py:418:89: E501 line too long (100 > 88 characters)
./torch_lr_finder/lr_finder.py:434:89: E501 line too long (91 > 88 characters)
./torch_lr_finder/lr_finder.py:472:89: E501 line too long (91 > 88 characters)

@NaleRaphael
Copy link
Contributor

NaleRaphael commented Jun 3, 2020

@chAwater Thanks for your contribution!

It's a nice feature, but it may bring potential risk of misleading for users who are new to deep learning. Let me explain it.

The concept behind learning rate suggestion is actually a good attempt for automating deep learning pipelines. However, choosing a learning rate with minimal gradient is not the only acceptable approach.

  • In fastai_v2, they provide 2 suggestions: lr_min/10. and lr_steep (minimal gradient).

  • In pytorch-lightning, they have an implementation which is similar to the one in fastai_v1.

  • In this thread, the author proposed a method that using a sliding window (called "interval slide rule" in that post) with a given loss_threshold to find out all intervals with flat loss, and choose the minimal one (left endpoint) as a candidate while iterating the loss curve.

  • In the same thread mentioned above, this comment stated that: "picking a learning rate about midway or 2/3rd of the way in the section of the loss curve is going down worked best for us".

In addition, it can be a controversial issue for those datasets/models that are not applicable. See also this case.

Overall, it's more like a decision should be made by users rather than package developers. So that I would prefer not to use a fixed algorithm provided by us for learning rate suggestion. Instead, making users able to choose desired learning rate by their own algorithm might be a better solution, and it can be easily done with the history returned by LRFinder currently.

I was wondering whether there is a generalized solution for picking a good learning rate. There might be some breakthroughs in the future, but what I believe currently is more like this comment made by Sylvain Gugger (also the author of this article).

Anyway, this is just my opinion, I'd like to hear yours about it. @chAwater,@davidtvs

Besides, if drawing a suggested learning rate in the graph is necessary, you can try to implement it with the feature proposed in PR#23.

fig, ax = plt.subplots()
lr_finder.plot(ax=ax)

# Draw the point indicating best lr
ax.plot(x_best_lr, y_best_lr, marker='o')

@chAwater
Copy link
Contributor Author

chAwater commented Jun 3, 2020

@NaleRaphael Thanks for the explanation! It's crystal clear!

Sometimes the loss curve and suggested LR in fastai_v1 are confused. I didn't think that much and just grabbed fastai_v1 code into this PR.

@davidtvs I'd like to know about your opinion. Also feel free to close it if this feature is not make sense/unnecessary to you.

BTW, I'm still very new to deep learning. It's a really long way to go.


Suggested LR can be plot outside by:

fig, ax = plt.subplots()
lr_finder.plot(ax=ax)

lrs    = lr_finder.history["lr"]
losses = lr_finder.history["loss"]
### You can skip some data points by
# lrs = lrs[skip_start:-skip_end]
# losses = losses[skip_start:-skip_end]

# This may cause error when data is not enough
mg = (np.gradient(np.array(losses))).argmin() 

print(f"Min numerical gradient: {lrs[mg]:.2E}")

# Draw the point indicating best lr
ax.plot(lrs[mg], losses[mg], markersize=10, marker='o', color='red')

@davidtvs
Copy link
Owner

davidtvs commented Jun 5, 2020

Thanks for the PR @chAwater 👍

Initially, I didn't implement any functionality to suggest a learning rate due to the reasons that @NaleRaphael posted. There's no single algorithm that will always provide a good suggestion.

On the other hand, I think that if it was absolutely clear in the documentation that these are suggestions that are not guaranteed to work for everyone then I would be okay with it.

Another way of reinforcing that these are suggestions is to suggest a range of values instead of a single value. We can then offer a mean value of said range too but I think this way the user would see that any value in the range could be good and that he should experiment a bit. What do you guys think?

@NaleRaphael
Copy link
Contributor

@davidtvs I think the second approach is a good way to go.

Given a suggestion with an acceptable range can implicitly indicate that it isn't a rigid suggestion. And it clearly resolves the concern I mentioned above: "it's more like a decision should be made by users rather than package developers". BTW, you made a great summary of what I said: "There's no single algorithm that will always provide a good suggestion."

As for the mean value and range to be returned, I tend to find out a region excluding all others which are obviously improper to be adopted. This region should meet the following conditions:

  1. there are at least 2 points in this region
  2. gradient of all points should be negative
  3. the point with minimal gradient (the steepest one) of the whole curve should locate in this region

And here is the implementation:
https://colab.research.google.com/drive/1ataA0U5zxrxPiZohZHw3ELhhFm6Wx4Hv?usp=sharing

But I'm not saying this implementation is good, it's a bit complicated and I just take it as an alternative solution. If one those solutions mentioned in my previous comment seems good and elegant to you, it will be great to adopt it.

@chAwater
Copy link
Contributor Author

chAwater commented Jun 6, 2020

@davidtvs I think we can create a series of functions that each function suggests a learning rate. For now (in my code), its like suggestion="steepest" or some better name...

To be honest, at first, I just want to check whether the steepest point is located in a common used LR interval (such as [1e-4,1e-2]). If it's true, then I will choice a common used LR around the steepest point (such as the magic 3e-4, or 1e-3, 3e-3 etc.).

@NaleRaphael Wow, your code is so geek! It is a bit complicated (took me a while to understand), but the algorithm is clear, smart and efficient!

As I said, I'm a beginner of deep learning, I'm not sure your suggested interval is working well or not in some complicated models or pre-trained models. Maybe a sliding window will help to get a more robust interval?

@davidtvs
Copy link
Owner

davidtvs commented Jun 6, 2020

That's a good point @chAwater, we can take this as a starting point and then add more algorithms that provide different suggestions like the ones that @NaleRaphael mentioned above and even explore the range idea (nice work implementing it @NaleRaphael).

So I'll give this PR a review and move forward with it.

torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved
torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved
torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved
torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved
torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved
Copy link
Owner

@davidtvs davidtvs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A unit test for this would also be nice

@NaleRaphael
Copy link
Contributor

@davidtvs Yeah, it seems nice to provide suggestions by different algorithms. 😃

@chAwater You're right, applying a sliding window before running this algorithm can make it more robust.

As for verifying this algorithm, it's truly difficult to make sure it would work well in various cases. An obviously simple case to make this algorithm failed to work properly is running it on a model which has been trained for serveral epochs. You can check out this notebook, and also note that it used a fixed random seed. You can change that as you want to get different result, but there will be no obvious difference.

The cause is also easy to figure out: if a model can be trained properly, the training loss should become smaller after each epoch. In this situation, the effect of changing learning rate would be hard to observe.

For a clearer case, here are some records from my past experiment of training a BiLSTM and trying to adjust learning rate by LRFinder for every 5 epochs. And you can see that the lr-loss curve is almost a flat line at epoch 5 in the same scale.

  • History of LRFinder at the beginning
  • History of LRFinder at epoch 5

Therefore, if I need to train a model from scratch, I usually use LRFinder to pick a proper learing rate at the beginning of a training task, then run the task with other learning rate schedulers (e.g. CosineAnnealingLR, OneCycleLR). But if I need to fine-tune a model, I would try to run LRFinder and pick a smaller learning rate for it.

And that's why I said it's difficult to make this algorithm work well in various cases. But as you said, we can still make it become robust. 😄

@chAwater chAwater marked this pull request as draft June 8, 2020 14:55
@chAwater
Copy link
Contributor Author

chAwater commented Jun 8, 2020

@davidtvs Thanks for your review. I think I'm just gonna provide the steepest in this PR as a starting point. Maybe @NaleRaphael or others can provide different algorithms/implementations by making a new PR. Is that OK?

@davidtvs
Copy link
Owner

davidtvs commented Jun 8, 2020

yes, that's okay @chAwater. That's what I was expecting actually, sorry if I wasn't clear before. This PR is for a first step in bringing learning rate suggestions and the scope is limited to the steepest gradient algorithm.

@NaleRaphael
Copy link
Contributor

Yeah, it's my pleasure to implement it. Just feel free to let me know if you need any further help.

tests/test_lr_finder.py Outdated Show resolved Hide resolved
Rename `suggestion` -> `suggest_lr`, replace `while` by `if` and replace f-strings by `format`
@chAwater chAwater marked this pull request as ready for review June 11, 2020 11:10
@chAwater chAwater requested a review from davidtvs June 11, 2020 11:37
Copy link
Owner

@davidtvs davidtvs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the changes to return of the plot function are made then you can also add a test that checks the value of the suggested learning rate. Example:

def test_suggest_lr():
    task = mod_task.XORTask()
    lr_finder = prepare_lr_finder(task)

    lr_finder.history["loss"] = [10, 8, 4, 1, 4, 16]
    lr_finder.history["lr"] = range(len(lr_finder.history["loss"]))

    fig, ax = plt.subplots()
    ax, lr = lr_finder.plot(skip_start=0, skip_end=0, suggest_lr=True, ax=ax)

    assert lr == 2

tests/test_lr_finder.py Outdated Show resolved Hide resolved
torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved
torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved
torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved
tests/test_lr_finder.py Outdated Show resolved Hide resolved
tests/test_lr_finder.py Outdated Show resolved Hide resolved
@davidtvs
Copy link
Owner

/black-check

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No linting violations have been found in this PR.

@chAwater
Copy link
Contributor Author

@davidtvs Thanks for your review. I made some changes to clarify the code, but I don't think a plot function should return a LR (although a suggest_lr function should do it).

Maybe, in the future, we can add an attribute (such as best_lr/lr_candidate) to the LRFinder object and use that attribute to store the suggested LR for different suggest_lr algorithms.

@chAwater chAwater requested a review from davidtvs June 14, 2020 08:09
@chAwater
Copy link
Contributor Author

Any new thoughts or suggestions?

@davidtvs
Copy link
Owner

davidtvs commented Jun 27, 2020

@chAwater sorry for no feedback, ended up getting busy with other stuff.

I agree that it's weird that the plot function returns a learning rate but it's also weird that it computes said learning rate internally to begin with. I see two ways to go about this:

  1. A new function that both computes and returns the suggested learning rate. And then the user can pass that value to plot using the show_lr argument if he wants to draw it.
  2. plot computes and returns the suggested learning rate.

I would prefer option 1, it's cleaner and there's a clear separation of concerns between plot and this new function to compute the suggested learning rate. Option 2 would be okay for me as a temporary solution that would eventually become option 1.

@chAwater
Copy link
Contributor Author

@davidtvs I partially agree with you. It will be clearer if we separate suggest_lr from plot.

But:

  • Some cases (such as suggest a range of LR) are not suitable for show_lr.
  • If we have more suggest_lr algorithms, we need a better design (function or class, I don't have much experience in python OOP).

Therefore, I think I'll take option 2 in this PR as a temporary solution.
Maybe we can wait for some feedback and add new things into it in the future.

chAwater added a commit to chAwater/pytorch-lr-finder that referenced this pull request Jun 30, 2020
Copy link
Owner

@davidtvs davidtvs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidtvs
Copy link
Owner

davidtvs commented Jul 6, 2020

/flake8-lint

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lintly has detected code quality issues in this pull request.

torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved
torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved
torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved
torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@chAwater chAwater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra spaces

tests/test_lr_finder.py Outdated Show resolved Hide resolved
tests/test_lr_finder.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@chAwater chAwater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

black style

torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved
@davidtvs
Copy link
Owner

davidtvs commented Jul 8, 2020

/flake8-lint

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No linting violations have been found in this PR.

@davidtvs davidtvs merged commit c476676 into davidtvs:master Jul 8, 2020
@davidtvs
Copy link
Owner

davidtvs commented Jul 8, 2020

Merged. Thanks for contributing @chAwater!

@chAwater
Copy link
Contributor Author

chAwater commented Jul 9, 2020

Thank you for being so nice and patient! It's my pleasure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants