[ENH] Add optional plot for suggested LR #44

chAwater · 2020-06-03T08:18:04Z

Plot suggested LR

Now you can plot suggested LR by lr_finder.plot(suggestion=True)

Tweaked from fastai commit c815325 and fastai doc.

Thanks for this great repo!

Tweaked from fastai [`c815325`](fastai/fastai@c815325)

chAwater · 2020-06-03T08:28:07Z

Sorry, I forgot to check my code by flake8:

./torch_lr_finder/lr_finder.py:469:17: W291 trailing whitespace
./torch_lr_finder/lr_finder.py:471:13: E722 do not use bare 'except'

./torch_lr_finder/lr_finder.py:113:89: E501 line too long (90 > 88 characters)
./torch_lr_finder/lr_finder.py:207:89: E501 line too long (91 > 88 characters)
./torch_lr_finder/lr_finder.py:215:89: E501 line too long (116 > 88 characters)
./torch_lr_finder/lr_finder.py:220:89: E501 line too long (96 > 88 characters)
./torch_lr_finder/lr_finder.py:221:89: E501 line too long (102 > 88 characters)
./torch_lr_finder/lr_finder.py:223:89: E501 line too long (116 > 88 characters)
./torch_lr_finder/lr_finder.py:225:89: E501 line too long (109 > 88 characters)
./torch_lr_finder/lr_finder.py:226:89: E501 line too long (92 > 88 characters)
./torch_lr_finder/lr_finder.py:230:89: E501 line too long (113 > 88 characters)
./torch_lr_finder/lr_finder.py:232:89: E501 line too long (116 > 88 characters)
./torch_lr_finder/lr_finder.py:233:89: E501 line too long (90 > 88 characters)
./torch_lr_finder/lr_finder.py:236:89: E501 line too long (107 > 88 characters)
./torch_lr_finder/lr_finder.py:418:89: E501 line too long (100 > 88 characters)
./torch_lr_finder/lr_finder.py:434:89: E501 line too long (91 > 88 characters)
./torch_lr_finder/lr_finder.py:472:89: E501 line too long (91 > 88 characters)

NaleRaphael · 2020-06-03T19:09:34Z

@chAwater Thanks for your contribution!

It's a nice feature, but it may bring potential risk of misleading for users who are new to deep learning. Let me explain it.

The concept behind learning rate suggestion is actually a good attempt for automating deep learning pipelines. However, choosing a learning rate with minimal gradient is not the only acceptable approach.

In fastai_v2, they provide 2 suggestions: lr_min/10. and lr_steep (minimal gradient).
In pytorch-lightning, they have an implementation which is similar to the one in fastai_v1.
In this thread, the author proposed a method that using a sliding window (called "interval slide rule" in that post) with a given loss_threshold to find out all intervals with flat loss, and choose the minimal one (left endpoint) as a candidate while iterating the loss curve.
In the same thread mentioned above, this comment stated that: "picking a learning rate about midway or 2/3rd of the way in the section of the loss curve is going down worked best for us".

In addition, it can be a controversial issue for those datasets/models that are not applicable. See also this case.

Overall, it's more like a decision should be made by users rather than package developers. So that I would prefer not to use a fixed algorithm provided by us for learning rate suggestion. Instead, making users able to choose desired learning rate by their own algorithm might be a better solution, and it can be easily done with the history returned by LRFinder currently.

I was wondering whether there is a generalized solution for picking a good learning rate. There might be some breakthroughs in the future, but what I believe currently is more like this comment made by Sylvain Gugger (also the author of this article).

Anyway, this is just my opinion, I'd like to hear yours about it. @chAwater,@davidtvs

Besides, if drawing a suggested learning rate in the graph is necessary, you can try to implement it with the feature proposed in PR#23.

fig, ax = plt.subplots()
lr_finder.plot(ax=ax)

# Draw the point indicating best lr
ax.plot(x_best_lr, y_best_lr, marker='o')

chAwater · 2020-06-03T23:05:52Z

@NaleRaphael Thanks for the explanation! It's crystal clear!

Sometimes the loss curve and suggested LR in fastai_v1 are confused. I didn't think that much and just grabbed fastai_v1 code into this PR.

@davidtvs I'd like to know about your opinion. Also feel free to close it if this feature is not make sense/unnecessary to you.

BTW, I'm still very new to deep learning. It's a really long way to go.

Suggested LR can be plot outside by:

fig, ax = plt.subplots()
lr_finder.plot(ax=ax)

lrs    = lr_finder.history["lr"]
losses = lr_finder.history["loss"]
### You can skip some data points by
# lrs = lrs[skip_start:-skip_end]
# losses = losses[skip_start:-skip_end]

# This may cause error when data is not enough
mg = (np.gradient(np.array(losses))).argmin() 

print(f"Min numerical gradient: {lrs[mg]:.2E}")

# Draw the point indicating best lr
ax.plot(lrs[mg], losses[mg], markersize=10, marker='o', color='red')

davidtvs · 2020-06-05T21:07:54Z

Thanks for the PR @chAwater 👍

Initially, I didn't implement any functionality to suggest a learning rate due to the reasons that @NaleRaphael posted. There's no single algorithm that will always provide a good suggestion.

On the other hand, I think that if it was absolutely clear in the documentation that these are suggestions that are not guaranteed to work for everyone then I would be okay with it.

Another way of reinforcing that these are suggestions is to suggest a range of values instead of a single value. We can then offer a mean value of said range too but I think this way the user would see that any value in the range could be good and that he should experiment a bit. What do you guys think?

NaleRaphael · 2020-06-06T08:09:06Z

@davidtvs I think the second approach is a good way to go.

Given a suggestion with an acceptable range can implicitly indicate that it isn't a rigid suggestion. And it clearly resolves the concern I mentioned above: "it's more like a decision should be made by users rather than package developers". BTW, you made a great summary of what I said: "There's no single algorithm that will always provide a good suggestion."

As for the mean value and range to be returned, I tend to find out a region excluding all others which are obviously improper to be adopted. This region should meet the following conditions:

there are at least 2 points in this region
gradient of all points should be negative
the point with minimal gradient (the steepest one) of the whole curve should locate in this region

And here is the implementation:
https://colab.research.google.com/drive/1ataA0U5zxrxPiZohZHw3ELhhFm6Wx4Hv?usp=sharing

But I'm not saying this implementation is good, it's a bit complicated and I just take it as an alternative solution. If one those solutions mentioned in my previous comment seems good and elegant to you, it will be great to adopt it.

chAwater · 2020-06-06T19:54:06Z

@davidtvs I think we can create a series of functions that each function suggests a learning rate. For now (in my code), its like suggestion="steepest" or some better name...

To be honest, at first, I just want to check whether the steepest point is located in a common used LR interval (such as [1e-4,1e-2]). If it's true, then I will choice a common used LR around the steepest point (such as the magic 3e-4, or 1e-3, 3e-3 etc.).

@NaleRaphael Wow, your code is so geek! It is a bit complicated (took me a while to understand), but the algorithm is clear, smart and efficient!

As I said, I'm a beginner of deep learning, I'm not sure your suggested interval is working well or not in some complicated models or pre-trained models. Maybe a sliding window will help to get a more robust interval?

davidtvs · 2020-06-06T21:20:25Z

That's a good point @chAwater, we can take this as a starting point and then add more algorithms that provide different suggestions like the ones that @NaleRaphael mentioned above and even explore the range idea (nice work implementing it @NaleRaphael).

So I'll give this PR a review and move forward with it.

torch_lr_finder/lr_finder.py

davidtvs

A unit test for this would also be nice

NaleRaphael · 2020-06-07T04:13:05Z

@davidtvs Yeah, it seems nice to provide suggestions by different algorithms. 😃

@chAwater You're right, applying a sliding window before running this algorithm can make it more robust.

As for verifying this algorithm, it's truly difficult to make sure it would work well in various cases. An obviously simple case to make this algorithm failed to work properly is running it on a model which has been trained for serveral epochs. You can check out this notebook, and also note that it used a fixed random seed. You can change that as you want to get different result, but there will be no obvious difference.

The cause is also easy to figure out: if a model can be trained properly, the training loss should become smaller after each epoch. In this situation, the effect of changing learning rate would be hard to observe.

For a clearer case, here are some records from my past experiment of training a BiLSTM and trying to adjust learning rate by LRFinder for every 5 epochs. And you can see that the lr-loss curve is almost a flat line at epoch 5 in the same scale.

History of LRFinder at the beginning
History of LRFinder at epoch 5

Therefore, if I need to train a model from scratch, I usually use LRFinder to pick a proper learing rate at the beginning of a training task, then run the task with other learning rate schedulers (e.g. CosineAnnealingLR, OneCycleLR). But if I need to fine-tune a model, I would try to run LRFinder and pick a smaller learning rate for it.

And that's why I said it's difficult to make this algorithm work well in various cases. But as you said, we can still make it become robust. 😄

chAwater · 2020-06-08T15:58:20Z

@davidtvs Thanks for your review. I think I'm just gonna provide the steepest in this PR as a starting point. Maybe @NaleRaphael or others can provide different algorithms/implementations by making a new PR. Is that OK?

davidtvs · 2020-06-08T19:42:07Z

yes, that's okay @chAwater. That's what I was expecting actually, sorry if I wasn't clear before. This PR is for a first step in bringing learning rate suggestions and the scope is limited to the steepest gradient algorithm.

NaleRaphael · 2020-06-09T13:46:44Z

Yeah, it's my pleasure to implement it. Just feel free to let me know if you need any further help.

torch_lr_finder/lr_finder.py

tests/test_lr_finder.py

Rename `suggestion` -> `suggest_lr`, replace `while` by `if` and replace f-strings by `format`

davidtvs

If the changes to return of the plot function are made then you can also add a test that checks the value of the suggested learning rate. Example:

def test_suggest_lr():
    task = mod_task.XORTask()
    lr_finder = prepare_lr_finder(task)

    lr_finder.history["loss"] = [10, 8, 4, 1, 4, 16]
    lr_finder.history["lr"] = range(len(lr_finder.history["loss"]))

    fig, ax = plt.subplots()
    ax, lr = lr_finder.plot(skip_start=0, skip_end=0, suggest_lr=True, ax=ax)

    assert lr == 2

tests/test_lr_finder.py

torch_lr_finder/lr_finder.py

tests/test_lr_finder.py

davidtvs · 2020-06-12T13:28:19Z

/black-check

github-actions

No linting violations have been found in this PR.

…pest point

chAwater · 2020-06-14T08:09:15Z

@davidtvs Thanks for your review. I made some changes to clarify the code, but I don't think a plot function should return a LR (although a suggest_lr function should do it).

Maybe, in the future, we can add an attribute (such as best_lr/lr_candidate) to the LRFinder object and use that attribute to store the suggested LR for different suggest_lr algorithms.

chAwater · 2020-06-21T01:46:05Z

Any new thoughts or suggestions?

davidtvs · 2020-06-27T11:50:04Z

@chAwater sorry for no feedback, ended up getting busy with other stuff.

I agree that it's weird that the plot function returns a learning rate but it's also weird that it computes said learning rate internally to begin with. I see two ways to go about this:

A new function that both computes and returns the suggested learning rate. And then the user can pass that value to plot using the show_lr argument if he wants to draw it.
plot computes and returns the suggested learning rate.

I would prefer option 1, it's cleaner and there's a clear separation of concerns between plot and this new function to compute the suggested learning rate. Option 2 would be okay for me as a temporary solution that would eventually become option 1.

chAwater · 2020-06-30T12:46:03Z

@davidtvs I partially agree with you. It will be clearer if we separate suggest_lr from plot.

But:

Some cases (such as suggest a range of LR) are not suitable for show_lr.
If we have more suggest_lr algorithms, we need a better design (function or class, I don't have much experience in python OOP).

Therefore, I think I'll take option 2 in this PR as a temporary solution.
Maybe we can wait for some feedback and add new things into it in the future.

…iew))

torch_lr_finder/lr_finder.py

davidtvs

LGTM

davidtvs · 2020-07-06T17:30:20Z

/flake8-lint

github-actions

Lintly has detected code quality issues in this pull request.

torch_lr_finder/lr_finder.py

chAwater

Remove extra spaces

tests/test_lr_finder.py

chAwater

black style

torch_lr_finder/lr_finder.py

davidtvs · 2020-07-08T18:13:08Z

/flake8-lint

github-actions

No linting violations have been found in this PR.

davidtvs · 2020-07-08T19:30:25Z

Merged. Thanks for contributing @chAwater!

chAwater · 2020-07-09T02:47:14Z

Thank you for being so nice and patient! It's my pleasure.

Add optional plot for suggested LR

dfa4a24

Tweaked from fastai [`c815325`](fastai/fastai@c815325)

davidtvs requested changes Jun 6, 2020

View reviewed changes

davidtvs reviewed Jun 6, 2020

View reviewed changes

chAwater marked this pull request as draft June 8, 2020 14:55

chAwater commented Jun 10, 2020

View reviewed changes

torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved

[TST] Add test for plot and suggest_lr

920131b

chAwater commented Jun 11, 2020

View reviewed changes

tests/test_lr_finder.py Outdated Show resolved Hide resolved

Apply suggestions from code review

7ea7a24

Rename `suggestion` -> `suggest_lr`, replace `while` by `if` and replace f-strings by `format`

chAwater marked this pull request as ready for review June 11, 2020 11:10

chAwater requested a review from davidtvs June 11, 2020 11:37

davidtvs requested changes Jun 11, 2020

View reviewed changes

tests/test_lr_finder.py Outdated Show resolved Hide resolved

torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved

torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved

torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved

davidtvs reviewed Jun 11, 2020

View reviewed changes

tests/test_lr_finder.py Outdated Show resolved Hide resolved

chAwater commented Jun 12, 2020

View reviewed changes

tests/test_lr_finder.py Outdated Show resolved Hide resolved

github-actions bot approved these changes Jun 12, 2020

View reviewed changes

chAwater added 2 commits June 14, 2020 12:11

[OPT] Add test for skip_end, remove mpl backend setting

4981ad5

[OPT] Turn on suggest_lr by default, add legend, use scatter for stee…

c186737

…pest point

chAwater requested a review from davidtvs June 14, 2020 08:09

NaleRaphael mentioned this pull request Jun 17, 2020

how to define num_iter? #50

Closed

[OPT][TST] return lr in plot if suggest_lr and add test for it

35a6d8e

chAwater added a commit to chAwater/pytorch-lr-finder that referenced this pull request Jun 30, 2020

[TST] Add test for suggest_lr according to [comment](davidtvs#44 (rev…

ae1ab0e

…iew))

[TST] Add test for suggest_lr according to davidtvs#44 (review)

d73ff03

chAwater force-pushed the suggested_lr branch from ae1ab0e to d73ff03 Compare June 30, 2020 17:08

davidtvs requested changes Jul 5, 2020

View reviewed changes

torch_lr_finder/lr_finder.py Show resolved Hide resolved

[OPT][DOC] Update Returns docstring for suggest_lr

da82893

davidtvs approved these changes Jul 6, 2020

View reviewed changes

github-actions bot requested changes Jul 6, 2020

View reviewed changes

torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved

torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved

chAwater commented Jul 7, 2020

View reviewed changes

torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved

chAwater commented Jul 7, 2020

View reviewed changes

torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved

chAwater commented Jul 7, 2020

View reviewed changes

tests/test_lr_finder.py Outdated Show resolved Hide resolved

tests/test_lr_finder.py Outdated Show resolved Hide resolved

chAwater commented Jul 7, 2020

View reviewed changes

torch_lr_finder/lr_finder.py Outdated Show resolved Hide resolved

[STY] Update, warp long code, remove extra spaces

11930a4

github-actions bot approved these changes Jul 8, 2020

View reviewed changes

davidtvs merged commit c476676 into davidtvs:master Jul 8, 2020

davidtvs mentioned this pull request Jul 20, 2020

How to find the best lr? #56

Closed

This was referenced Nov 19, 2020

Question: Number of iterations #68

Closed

Steepest gradiant value #70

Closed

NaleRaphael mentioned this pull request Aug 27, 2022

Understanding suggested LR #86

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Add optional plot for suggested LR #44

[ENH] Add optional plot for suggested LR #44

chAwater commented Jun 3, 2020

chAwater commented Jun 3, 2020

NaleRaphael commented Jun 3, 2020 •

edited

chAwater commented Jun 3, 2020

davidtvs commented Jun 5, 2020

NaleRaphael commented Jun 6, 2020

chAwater commented Jun 6, 2020

davidtvs commented Jun 6, 2020

davidtvs left a comment

NaleRaphael commented Jun 7, 2020

chAwater commented Jun 8, 2020

davidtvs commented Jun 8, 2020

NaleRaphael commented Jun 9, 2020

davidtvs left a comment

davidtvs commented Jun 12, 2020

github-actions bot left a comment

chAwater commented Jun 14, 2020

chAwater commented Jun 21, 2020

davidtvs commented Jun 27, 2020 •

edited

chAwater commented Jun 30, 2020

davidtvs left a comment

davidtvs commented Jul 6, 2020

github-actions bot left a comment

chAwater left a comment

chAwater left a comment

davidtvs commented Jul 8, 2020

github-actions bot left a comment

davidtvs commented Jul 8, 2020

chAwater commented Jul 9, 2020

[ENH] Add optional plot for suggested LR #44

[ENH] Add optional plot for suggested LR #44

Conversation

chAwater commented Jun 3, 2020

Plot suggested LR

chAwater commented Jun 3, 2020

NaleRaphael commented Jun 3, 2020 • edited

chAwater commented Jun 3, 2020

davidtvs commented Jun 5, 2020

NaleRaphael commented Jun 6, 2020

chAwater commented Jun 6, 2020

davidtvs commented Jun 6, 2020

davidtvs left a comment

Choose a reason for hiding this comment

NaleRaphael commented Jun 7, 2020

chAwater commented Jun 8, 2020

davidtvs commented Jun 8, 2020

NaleRaphael commented Jun 9, 2020

davidtvs left a comment

Choose a reason for hiding this comment

davidtvs commented Jun 12, 2020

github-actions bot left a comment

Choose a reason for hiding this comment

chAwater commented Jun 14, 2020

chAwater commented Jun 21, 2020

davidtvs commented Jun 27, 2020 • edited

chAwater commented Jun 30, 2020

davidtvs left a comment

Choose a reason for hiding this comment

davidtvs commented Jul 6, 2020

github-actions bot left a comment

Choose a reason for hiding this comment

chAwater left a comment

Choose a reason for hiding this comment

chAwater left a comment

Choose a reason for hiding this comment

davidtvs commented Jul 8, 2020

github-actions bot left a comment

Choose a reason for hiding this comment

davidtvs commented Jul 8, 2020

chAwater commented Jul 9, 2020

NaleRaphael commented Jun 3, 2020 •

edited

davidtvs commented Jun 27, 2020 •

edited