Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue Jeremy's early stopping PR #1504 #2391

Merged
merged 155 commits into from Jun 29, 2020

Conversation

awaelchli
Copy link
Member

@awaelchli awaelchli commented Jun 27, 2020

What does this PR do?

This is just a continuation of #1504 as now it is their repo branch and much easier for @PyTorchLightning/core-contributors cot access :]

Fixes #1464
Fixes #1463
Fixes #1699
Fixes #2151
Related #1458

Just trying to help complete #1504

TODO:

  • test_resume_early_stopping_from_checkpoint
  • test_wandb_pickle

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team June 28, 2020 16:21
Copy link
Contributor

@jeremyjordan jeremyjordan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for carrying this over the line @awaelchli :)

@mergify mergify bot requested a review from a team June 28, 2020 16:54
@Borda
Copy link
Member

Borda commented Jun 28, 2020

thanks for carrying this over the line @awaelchli :)

add @jeremyjordan that is the Lightning spirit I love... 💓

@awaelchli
Copy link
Member Author

@williamFalcon as requested I added a test to make sure checkpointing only happens once in ddp (ddp_cpu) mode (i.e. they don't get overwritten by other processes).

regarding loggers, on master (and here) when launching ddp with e.g. 2 processes the logger will still create to folders version_0 and version_1. I'd rather not try to fix it here and make the PR even bigger. I can open an issue for this so we can track it down, ok?

pytorch_lightning/trainer/training_loop.py Outdated Show resolved Hide resolved
tests/callbacks/test_model_checkpoint.py Outdated Show resolved Hide resolved
tests/callbacks/test_model_checkpoint.py Show resolved Hide resolved
tests/callbacks/test_model_checkpoint.py Show resolved Hide resolved
tests/callbacks/test_model_checkpoint.py Outdated Show resolved Hide resolved
tests/callbacks/test_model_checkpoint.py Show resolved Hide resolved
@mergify mergify bot requested a review from a team June 28, 2020 21:15
@mergify
Copy link
Contributor

mergify bot commented Jun 28, 2020

This pull request is now in conflict... :(

@mergify mergify bot requested a review from a team June 28, 2020 21:21
@mergify
Copy link
Contributor

mergify bot commented Jun 29, 2020

This pull request is now in conflict... :(

@williamFalcon williamFalcon merged commit 25ee51b into master Jun 29, 2020
@williamFalcon
Copy link
Contributor

great work everyone!!!

@awaelchli awaelchli deleted the bugfix/early-stopping-state branch June 29, 2020 01:46
@lucmos
Copy link
Contributor

lucmos commented Jul 8, 2020

Hi guys, sorry if I chime in.

I noticed that in lightning 0.8.4 the early stopping saves the last model, not the best one.
"Saves" meaning that when the fit function early stops, the model weights are not the best ones but the ones in the last epoch.

Is this solved in this PR or should I open a new issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature Is an improvement or enhancement priority: 0 High priority task
Projects
None yet
6 participants