Keep track of the best model's path saved by ModelCheckpoint #1799

kepler · 2020-05-12T14:09:35Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?
If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Add an additional attribute to ModelCheckpoint to keep track of the best model's path. Currently, only the best metric value is directly tracked.

When training finishes, if we want to know where the best model is, we need to either do:

op = min if checkpoint_callback.mode == 'min' else max
best_model_path = op(
    checkpoint_callback.best_k_models,
    key=checkpoint_callback.best_k_models.get
)

or

best_model_path = [
    path
    for path, metric in checkpoint_callback.best_k_models.items()
    if metric == checkpoint_callback.best
][0]

which are both somewhat cumbersome.

Since the best value is already tracked, this PR simply tracks the saved path as well.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Glad to make PTL even more practical.

mergify · 2020-05-17T13:25:50Z

This pull request is now in conflict... :(

…est model's path Currently, only the best metric value is directly tracked. This new attribute will help in uses cases where the trained model needs to be used or tracked right after training.

codecov · 2020-05-19T11:23:00Z

Codecov Report

Merging #1799 into master will decrease coverage by 0%.
The diff coverage is 86%.

@@          Coverage Diff           @@
##           master   #1799   +/-   ##
======================================
- Coverage      88%     88%   -0%     
======================================
  Files          74      74           
  Lines        4654    4664   +10     
======================================
+ Hits         4076    4083    +7     
- Misses        578     581    +3

williamFalcon · 2020-05-25T11:54:22Z

@kepler nice!
Can you add to the docs?

kepler · 2020-05-27T10:35:46Z

Thanks, @williamFalcon.
Not sure where to add any documentation, though. None of the attributes in ModelCheckpoint, like best and best_k_models, are documented. Should I simply add a note in the class' docstring, right before Args?

pep8speaks · 2020-05-27T10:56:20Z

Hello @kepler! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-05-31 10:39:39 UTC

kepler · 2020-05-27T10:57:01Z

@williamFalcon let me know if this is enough.

williamFalcon · 2020-05-27T11:04:50Z

@kepler that's great!
However, can you do a new PR with the changes?

kepler · 2020-05-27T11:45:28Z

@williamFalcon are you sure this is already merged to master? I can't see the changes there (hence creating a new PR based on this branch seems convoluted to me).

williamFalcon · 2020-05-28T02:49:05Z

@kepler you're right! sorry about that. let's merge this PR then :)

Let's just get the tests to pass

kepler · 2020-05-28T06:09:59Z

@williamFalcon no problem.

I have to say I don't understand why the tests failed (for Ubuntu and MacOS but passed for Windows). Could you please rerun them, just in case? Otherwise, I'll trigger it with a small commit (in about two hours).

justusschock · 2020-05-28T06:39:45Z

https://github.com/PyTorchLightning/pytorch-lightning/pull/1799/checks?check_run_id=712909136 they don't pass on windows. For windows this is just not correctly indicated.

On linux and macos, you have a print statement in your doctests which produce some output although no output is expected and thus the tests fail

kepler · 2020-05-28T08:12:29Z

So it was indeed a problem with the doctest (totally missed it). Thanks, @justusschock.

pytorch_lightning/callbacks/model_checkpoint.py

Borda · 2020-05-28T08:23:10Z

I have to say I don't understand why the tests failed (for Ubuntu and MacOS but passed for Windows). Could you please rerun them, just in case? Otherwise, I'll trigger it with a small commit (in about two hours).

Win is pure and not have implemented all functions properly...

kepler · 2020-05-28T13:25:59Z

@Borda This should be ready now.

pytorch_lightning/callbacks/model_checkpoint.py

Borda

pls clarify the best var and add note to changelog

Borda · 2020-05-28T14:06:51Z

pytorch_lightning/callbacks/model_checkpoint.py

@@ -265,7 +276,8 @@ def _do_check_save(self, filepath, current, epoch):
            self.kth_value = self.best_k_models[self.kth_best_model]

        _op = min if self.mode == 'min' else max
-        self.best = _op(self.best_k_models.values())
+        self.best_model = _op(self.best_k_models, key=self.best_k_models.get)
+        self.best = self.best_k_models[self.best_model]


here the best means best_score? otherwise, it is a bit confusing to have best and best_model
pls add types to the init 🐰

Yes, self.best is the best "monitored quantity". But that's existing naming. Changing it will break backwards compatibility. If that's OK, I can surely rename it. Otherwise, best_model could be best_model_path. I used best_model to keep consistency with the existing kth_best_model attribute.

As for type hints, there's no changes to the arguments. Where specifically would I add them?

@PyTorchLightning/core-contributors are we fine to rename best >> best_score?

i would prefer it (best_score), but if we do that we need to provide our users with a utility to upgrade their checkpoints (basically switch best to best_score) and temporarily have a warning which catches old checkpoints at load time and alerts users of the utility to convert. it's a very simple utility but we should provide it for our users if we move forward with this.

@jeremyjordan you mean to keep compatible with past saved checkpoint which contains best? That shall be simple, in loading, we will map best >> best_score... and yes we shall wrap the deprecated best with some warning :]

OK, I renamed best to best_model_score and best_model to best_model_path, since having best_score and best_model would still be a bit confusing (IMHO). To keep consistency, I also renamed kth_best_model to kth_best_model_path.

I added properties for best and kth_best_model that log a deprecation warning and return the correct value.

When loading a checkpoint, if it's in an old format, the value for best is simply assigned to best_model_score. In my opinion, adding a warning in this part will not really help the user, as there's not much they can do.

Let me know of any further changes.

Also rename `ModelCheckpoint.best_model` (added in this PR) to `ModelCheckpoint.best_model_path`, for consistency, and `kth_best_model` to `kth_best_model_path`.

mergify · 2020-05-29T16:49:32Z

This pull request is now in conflict... :(

Borda

just minor things

CHANGELOG.md

pytorch_lightning/callbacks/model_checkpoint.py

pytorch_lightning/trainer/training_io.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

mergify · 2020-05-31T04:49:14Z

This pull request is now in conflict... :(

* Add an additional attribute to ModelCheckpoint to keep track of the best model's path Currently, only the best metric value is directly tracked. This new attribute will help in uses cases where the trained model needs to be used or tracked right after training. * Add small description and usage example to docs * Fix PEP8 issues * Fix doctest example * Fix expected output in doctest * Apply suggestions from code review * Show example as code block instead of doctest * Apply suggestions from code review * Update CHANGELOG.md * Rename `ModelCheckpoint.best` to `ModelCheckpoint.best_model_score` Also rename `ModelCheckpoint.best_model` (added in this PR) to `ModelCheckpoint.best_model_path`, for consistency, and `kth_best_model` to `kth_best_model_path`. * Update pytorch_lightning/trainer/training_io.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Add warning when loading checkpoint from an old version Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

mergify bot requested a review from a team May 12, 2020 14:10

Add an additional attribute to ModelCheckpoint to keep track of the b…

ed3e6f3

…est model's path Currently, only the best metric value is directly tracked. This new attribute will help in uses cases where the trained model needs to be used or tracked right after training.

kepler force-pushed the feature/keep-track-of-best-model branch from f16a602 to ed3e6f3 Compare May 19, 2020 10:42

Borda added the feature Is an improvement or enhancement label May 20, 2020

williamFalcon added the add docs label May 25, 2020

Borda added waiting on author Waiting on user action, correction, or update and removed add docs labels May 25, 2020

williamFalcon mentioned this pull request May 27, 2020

Best model in ModelCheckpoint #1960

Closed

Add small description and usage example to docs

33ea275

Fix PEP8 issues

5fe9a58

Fix doctest example

7a92c00

williamFalcon added the priority: 0 High priority task label May 28, 2020

justusschock approved these changes May 28, 2020

View reviewed changes

mergify bot requested a review from a team May 28, 2020 06:06

Fix expected output in doctest

1434249

Borda reviewed May 28, 2020

View reviewed changes

pytorch_lightning/callbacks/model_checkpoint.py Outdated Show resolved Hide resolved

mergify bot requested a review from a team May 28, 2020 08:17

Borda reviewed May 28, 2020

View reviewed changes

pytorch_lightning/callbacks/model_checkpoint.py Outdated Show resolved Hide resolved

Apply suggestions from code review

648c152

mergify bot requested a review from a team May 28, 2020 08:19

Borda requested changes May 28, 2020

View reviewed changes

pytorch_lightning/callbacks/model_checkpoint.py Outdated Show resolved Hide resolved

mergify bot requested a review from a team May 28, 2020 08:21

Show example as code block instead of doctest

8c59480

Borda reviewed May 28, 2020

View reviewed changes

pytorch_lightning/callbacks/model_checkpoint.py Outdated Show resolved Hide resolved

Apply suggestions from code review

0ee63a9

mergify bot requested a review from a team May 28, 2020 14:04

Borda requested changes May 28, 2020

View reviewed changes

mergify bot requested a review from a team May 28, 2020 14:07

Update CHANGELOG.md

24c27a0

lgvaz mentioned this pull request May 29, 2020

Names of subdirectories for checkpoint and logging #1953

Closed

Rename ModelCheckpoint.best to ModelCheckpoint.best_model_score

3f4901d

Also rename `ModelCheckpoint.best_model` (added in this PR) to `ModelCheckpoint.best_model_path`, for consistency, and `kth_best_model` to `kth_best_model_path`.

Merge branch 'master' into feature/keep-track-of-best-model

5a7ea43

Borda requested changes May 29, 2020

View reviewed changes

mergify bot requested a review from a team May 29, 2020 17:17

kepler and others added 2 commits May 29, 2020 18:58

Update pytorch_lightning/trainer/training_io.py

f09372f

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Apply suggestions from code review

bc2baad

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

kepler added 2 commits May 31, 2020 09:20

Add warning when loading checkpoint from an old version

ac36651

Merge branch 'master' into feature/keep-track-of-best-model

7b8a633

williamFalcon merged commit 8b9b923 into Lightning-AI:master May 31, 2020

kepler deleted the feature/keep-track-of-best-model branch June 1, 2020 07:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep track of the best model's path saved by ModelCheckpoint #1799

Keep track of the best model's path saved by ModelCheckpoint #1799

kepler commented May 12, 2020 •

edited

Loading

mergify bot commented May 17, 2020

codecov bot commented May 19, 2020 •

edited

Loading

williamFalcon commented May 25, 2020

kepler commented May 27, 2020

pep8speaks commented May 27, 2020 •

edited

Loading

kepler commented May 27, 2020

williamFalcon commented May 27, 2020

kepler commented May 27, 2020

williamFalcon commented May 28, 2020 •

edited

Loading

kepler commented May 28, 2020

justusschock commented May 28, 2020

kepler commented May 28, 2020

Borda commented May 28, 2020

kepler commented May 28, 2020

Borda left a comment

Borda May 28, 2020

kepler May 28, 2020

Borda May 28, 2020

jeremyjordan May 29, 2020

Borda May 29, 2020

kepler May 29, 2020

mergify bot commented May 29, 2020

Borda left a comment

mergify bot commented May 31, 2020

Keep track of the best model's path saved by ModelCheckpoint #1799

Keep track of the best model's path saved by ModelCheckpoint #1799

Conversation

kepler commented May 12, 2020 • edited Loading

Before submitting

What does this PR do?

PR review

Did you have fun?

mergify bot commented May 17, 2020

codecov bot commented May 19, 2020 • edited Loading

Codecov Report

williamFalcon commented May 25, 2020

kepler commented May 27, 2020

pep8speaks commented May 27, 2020 • edited Loading

Comment last updated at 2020-05-31 10:39:39 UTC

kepler commented May 27, 2020

williamFalcon commented May 27, 2020

kepler commented May 27, 2020

williamFalcon commented May 28, 2020 • edited Loading

kepler commented May 28, 2020

justusschock commented May 28, 2020

kepler commented May 28, 2020

Borda commented May 28, 2020

kepler commented May 28, 2020

Borda left a comment

Choose a reason for hiding this comment

Borda May 28, 2020

Choose a reason for hiding this comment

kepler May 28, 2020

Choose a reason for hiding this comment

Borda May 28, 2020

Choose a reason for hiding this comment

jeremyjordan May 29, 2020

Choose a reason for hiding this comment

Borda May 29, 2020

Choose a reason for hiding this comment

kepler May 29, 2020

Choose a reason for hiding this comment

mergify bot commented May 29, 2020

Borda left a comment

Choose a reason for hiding this comment

mergify bot commented May 31, 2020

kepler commented May 12, 2020 •

edited

Loading

codecov bot commented May 19, 2020 •

edited

Loading

pep8speaks commented May 27, 2020 •

edited

Loading

williamFalcon commented May 28, 2020 •

edited

Loading