Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep track of the best model's path saved by ModelCheckpoint #1799

Merged

Conversation

kepler
Copy link
Contributor

@kepler kepler commented May 12, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Add an additional attribute to ModelCheckpoint to keep track of the best model's path. Currently, only the best metric value is directly tracked.

When training finishes, if we want to know where the best model is, we need to either do:

op = min if checkpoint_callback.mode == 'min' else max
best_model_path = op(
    checkpoint_callback.best_k_models,
    key=checkpoint_callback.best_k_models.get
)

or

best_model_path = [
    path
    for path, metric in checkpoint_callback.best_k_models.items()
    if metric == checkpoint_callback.best
][0]

which are both somewhat cumbersome.

Since the best value is already tracked, this PR simply tracks the saved path as well.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Glad to make PTL even more practical.

@mergify mergify bot requested a review from a team May 12, 2020 14:10
@mergify
Copy link
Contributor

mergify bot commented May 17, 2020

This pull request is now in conflict... :(

…est model's path

Currently, only the best metric value is directly tracked. This new attribute will help in uses cases where the trained model needs to be used or tracked right after training.
@kepler kepler force-pushed the feature/keep-track-of-best-model branch from f16a602 to ed3e6f3 Compare May 19, 2020 10:42
@codecov
Copy link

codecov bot commented May 19, 2020

Codecov Report

Merging #1799 into master will decrease coverage by 0%.
The diff coverage is 86%.

@@          Coverage Diff           @@
##           master   #1799   +/-   ##
======================================
- Coverage      88%     88%   -0%     
======================================
  Files          74      74           
  Lines        4654    4664   +10     
======================================
+ Hits         4076    4083    +7     
- Misses        578     581    +3     

@Borda Borda added the feature Is an improvement or enhancement label May 20, 2020
@williamFalcon
Copy link
Contributor

@kepler nice!
Can you add to the docs?

@Borda Borda added waiting on author Waiting on user action, correction, or update and removed add docs labels May 25, 2020
@kepler
Copy link
Contributor Author

kepler commented May 27, 2020

Thanks, @williamFalcon.
Not sure where to add any documentation, though. None of the attributes in ModelCheckpoint, like best and best_k_models, are documented. Should I simply add a note in the class' docstring, right before Args?

@pep8speaks
Copy link

pep8speaks commented May 27, 2020

Hello @kepler! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-05-31 10:39:39 UTC

@kepler
Copy link
Contributor Author

kepler commented May 27, 2020

@williamFalcon let me know if this is enough.

@williamFalcon
Copy link
Contributor

@kepler that's great!
However, can you do a new PR with the changes?

@kepler
Copy link
Contributor Author

kepler commented May 27, 2020

@williamFalcon are you sure this is already merged to master? I can't see the changes there (hence creating a new PR based on this branch seems convoluted to me).

@williamFalcon
Copy link
Contributor

williamFalcon commented May 28, 2020

@kepler you're right! sorry about that. let's merge this PR then :)

Let's just get the tests to pass

@williamFalcon williamFalcon added the priority: 0 High priority task label May 28, 2020
@mergify mergify bot requested a review from a team May 28, 2020 06:06
@kepler
Copy link
Contributor Author

kepler commented May 28, 2020

@williamFalcon no problem.

I have to say I don't understand why the tests failed (for Ubuntu and MacOS but passed for Windows). Could you please rerun them, just in case? Otherwise, I'll trigger it with a small commit (in about two hours).

@justusschock
Copy link
Member

https://github.com/PyTorchLightning/pytorch-lightning/pull/1799/checks?check_run_id=712909136 they don't pass on windows. For windows this is just not correctly indicated.

On linux and macos, you have a print statement in your doctests which produce some output although no output is expected and thus the tests fail

@kepler
Copy link
Contributor Author

kepler commented May 28, 2020

So it was indeed a problem with the doctest (totally missed it). Thanks, @justusschock.

@mergify mergify bot requested a review from a team May 28, 2020 08:17
@mergify mergify bot requested a review from a team May 28, 2020 08:19
@mergify mergify bot requested a review from a team May 28, 2020 08:21
@Borda
Copy link
Member

Borda commented May 28, 2020

I have to say I don't understand why the tests failed (for Ubuntu and MacOS but passed for Windows). Could you please rerun them, just in case? Otherwise, I'll trigger it with a small commit (in about two hours).

Win is pure and not have implemented all functions properly...

@kepler
Copy link
Contributor Author

kepler commented May 28, 2020

@Borda This should be ready now.

@mergify mergify bot requested a review from a team May 28, 2020 14:04
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls clarify the best var and add note to changelog

@@ -265,7 +276,8 @@ def _do_check_save(self, filepath, current, epoch):
self.kth_value = self.best_k_models[self.kth_best_model]

_op = min if self.mode == 'min' else max
self.best = _op(self.best_k_models.values())
self.best_model = _op(self.best_k_models, key=self.best_k_models.get)
self.best = self.best_k_models[self.best_model]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here the best means best_score? otherwise, it is a bit confusing to have best and best_model
pls add types to the init 🐰

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, self.best is the best "monitored quantity". But that's existing naming. Changing it will break backwards compatibility. If that's OK, I can surely rename it. Otherwise, best_model could be best_model_path. I used best_model to keep consistency with the existing kth_best_model attribute.

As for type hints, there's no changes to the arguments. Where specifically would I add them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PyTorchLightning/core-contributors are we fine to rename best >> best_score?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would prefer it (best_score), but if we do that we need to provide our users with a utility to upgrade their checkpoints (basically switch best to best_score) and temporarily have a warning which catches old checkpoints at load time and alerts users of the utility to convert. it's a very simple utility but we should provide it for our users if we move forward with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyjordan you mean to keep compatible with past saved checkpoint which contains best? That shall be simple, in loading, we will map best >> best_score... and yes we shall wrap the deprecated best with some warning :]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I renamed best to best_model_score and best_model to best_model_path, since having best_score and best_model would still be a bit confusing (IMHO). To keep consistency, I also renamed kth_best_model to kth_best_model_path.

I added properties for best and kth_best_model that log a deprecation warning and return the correct value.

When loading a checkpoint, if it's in an old format, the value for best is simply assigned to best_model_score. In my opinion, adding a warning in this part will not really help the user, as there's not much they can do.

Let me know of any further changes.

@mergify mergify bot requested a review from a team May 28, 2020 14:07
Also rename `ModelCheckpoint.best_model` (added in this PR) to `ModelCheckpoint.best_model_path`, for consistency, and `kth_best_model` to `kth_best_model_path`.
@mergify
Copy link
Contributor

mergify bot commented May 29, 2020

This pull request is now in conflict... :(

Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just minor things

CHANGELOG.md Show resolved Hide resolved
pytorch_lightning/callbacks/model_checkpoint.py Outdated Show resolved Hide resolved
pytorch_lightning/callbacks/model_checkpoint.py Outdated Show resolved Hide resolved
pytorch_lightning/trainer/training_io.py Show resolved Hide resolved
pytorch_lightning/trainer/training_io.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team May 29, 2020 17:17
kepler and others added 2 commits May 29, 2020 18:58
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
@mergify
Copy link
Contributor

mergify bot commented May 31, 2020

This pull request is now in conflict... :(

@williamFalcon williamFalcon merged commit 8b9b923 into Lightning-AI:master May 31, 2020
@kepler kepler deleted the feature/keep-track-of-best-model branch June 1, 2020 07:34
justusschock pushed a commit that referenced this pull request Jun 29, 2020
* Add an additional attribute to ModelCheckpoint to keep track of the best model's path

Currently, only the best metric value is directly tracked. This new attribute will help in uses cases where the trained model needs to be used or tracked right after training.

* Add small description and usage example to docs

* Fix PEP8 issues

* Fix doctest example

* Fix expected output in doctest

* Apply suggestions from code review

* Show example as code block instead of doctest

* Apply suggestions from code review

* Update CHANGELOG.md

* Rename `ModelCheckpoint.best` to `ModelCheckpoint.best_model_score`

Also rename `ModelCheckpoint.best_model` (added in this PR) to `ModelCheckpoint.best_model_path`, for consistency, and `kth_best_model` to `kth_best_model_path`.

* Update pytorch_lightning/trainer/training_io.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Add warning when loading checkpoint from an old version

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement priority: 0 High priority task waiting on author Waiting on user action, correction, or update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants