Fix inconsistent outputs in `on__end` and `_end` #6969

ethanwharris · 2021-04-12T13:50:07Z

What does this PR do?

Several changes:

Move training_epoch_end (which actual handles model epoch end) from logger connector to training_loop.py
Use one static method (_prepare_outputs) in training_loop.py to unpack all Result object outputs from both epoch_end and batch_end
Pass these unpacked outputs to both on_train_epoch_end and training_epoch_end hooks (previously, training_epoch_end behaved differently)
Moved evaluation output handling to trainer.py so that both evaluation_epoch_end and on_evaluation_epoch_end get the same outputs argument
Changed order of calls to training_epoch_end and on_train_epoch_end to be consistent with the docs

Previously, running the code from the issue, the user saw this output:

callback on_validation_batch_end: {'loss': tensor(1.9916), 'foo': 'from_val_step'}
validation_epoch_end: [{'loss': tensor(1.9916), 'foo': 'from_val_step'}]
callback on_validation_epoch_end: [[{'loss': tensor(1.9916), 'foo': 'from_val_step'}]]
on_validation_epoch_end: [[{'loss': tensor(1.9916), 'foo': 'from_val_step'}]]
callback on_train_batch_end: [[{'extra': {'foo': 123}, 'minimize': tensor(0.2295)}]]
callback on_train_epoch_end: [[[{'extra': {'foo': 123}, 'minimize': tensor(0.2295)}]]]
on_train_epoch_end: [[[{'extra': {'foo': 123}, 'minimize': tensor(0.2295)}]]]
training_epoch_end: [{'foo': 123, 'loss': tensor(0.2295)}]
callback on_validation_batch_end: {'loss': tensor(2.2656), 'foo': 'from_val_step'}
validation_epoch_end: [{'loss': tensor(2.2656), 'foo': 'from_val_step'}]
callback on_validation_epoch_end: [[{'loss': tensor(2.2656), 'foo': 'from_val_step'}]]
on_validation_epoch_end: [[{'loss': tensor(2.2656), 'foo': 'from_val_step'}]]

Now sees this:

callback on_validation_batch_end: {'loss': tensor(0.6422), 'foo': 'from_val_step'}
validation_epoch_end: [{'loss': tensor(0.6422), 'foo': 'from_val_step'}]
callback on_validation_epoch_end: [{'loss': tensor(0.6422), 'foo': 'from_val_step'}]
on_validation_epoch_end: [{'loss': tensor(0.6422), 'foo': 'from_val_step'}]
callback on_train_batch_end: {'foo': 123, 'loss': tensor(1.2562)}
training_epoch_end: [{'foo': 123, 'loss': tensor(1.2562)}]
callback on_train_epoch_end: [{'foo': 123, 'loss': tensor(1.2562)}]
on_train_epoch_end: [{'foo': 123, 'loss': tensor(1.2562)}]
callback on_validation_batch_end: {'loss': tensor(0.4613), 'foo': 'from_val_step'}
validation_epoch_end: [{'loss': tensor(0.4613), 'foo': 'from_val_step'}]
callback on_validation_epoch_end: [{'loss': tensor(0.4613), 'foo': 'from_val_step'}]
on_validation_epoch_end: [{'loss': tensor(0.4613), 'foo': 'from_val_step'}]

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

codecov · 2021-04-12T14:12:12Z

Codecov Report

Merging #6969 (d3b1735) into master (e891ceb) will decrease coverage by 0%.
The diff coverage is 98%.

@@          Coverage Diff           @@
##           master   #6969   +/-   ##
======================================
- Coverage      92%     92%   -0%     
======================================
  Files         194     194           
  Lines       12328   12329    +1     
======================================
- Hits        11324   11322    -2     
- Misses       1004    1007    +3

ananthsub · 2021-04-12T14:46:39Z

@ethanwharris n00b question: should the lightning module hook come before the callback hook runs?

ethanwharris · 2021-04-12T14:47:51Z

@ananthsub The docs certainly think it should, so it will following this PR 😃

CHANGELOG.md

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

pytorch_lightning/trainer/training_loop.py

…rchLightning/pytorch-lightning into bugfix/inconsistent_outputs

ethanwharris · 2021-04-13T09:42:45Z

@carmocca Have added some tests for training_loop 😃 - one for call order, and one checking that they are called with the correct outputs objects.

pytorch_lightning/trainer/training_loop.py

tests/trainer/test_evaluation_loop.py

tchaton

LGTM !

ethanwharris added 6 commits April 12, 2021 12:28

Fix output consistency

ad8647c

Updates

de66f3d

Fix tests

2bc7035

Remove commented code

243ad46

Remove commented code

e4efdb7

Remove unused imports

7ce977a

ethanwharris added 4 commits April 12, 2021 15:18

Fix broken test

6c79531

Fix broken test

e6642be

Fix broken test

61a777e

Fix broken docs

34bec23

Update CHANGELOG.md

2e57f48

ethanwharris marked this pull request as ready for review April 12, 2021 14:50

ethanwharris requested review from awaelchli, Borda, carmocca, edenlightning, justusschock, kaushikb11, SeanNaren, tchaton and williamFalcon as code owners April 12, 2021 14:50

ethanwharris changed the title ~~Fix inconsistent outputs in on_*_end and *_end~~ [WIP] Fix inconsistent outputs in on_*_end and *_end Apr 12, 2021

ethanwharris added bug Something isn't working priority: 0 High priority task labels Apr 12, 2021

ethanwharris changed the title ~~[WIP] Fix inconsistent outputs in on_*_end and *_end~~ Fix inconsistent outputs in on_*_end and *_end Apr 12, 2021

ethanwharris changed the title ~~Fix inconsistent outputs in on_*_end and *_end~~ [WIP] Fix inconsistent outputs in on_*_end and *_end Apr 12, 2021

ananthsub reviewed Apr 12, 2021

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

ananthsub approved these changes Apr 13, 2021

View reviewed changes

Add typing

d4a0fae

awaelchli approved these changes Apr 13, 2021

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

awaelchli added the ready PRs ready to be merged label Apr 13, 2021

Update CHANGELOG.md

b049b6a

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

mergify bot added the has conflicts label Apr 13, 2021

Fix typing bug

47c9119

kaushikb11 reviewed Apr 13, 2021

View reviewed changes

pytorch_lightning/trainer/training_loop.py Show resolved Hide resolved

kaushikb11 approved these changes Apr 13, 2021

View reviewed changes

Merge branch 'master' into bugfix/inconsistent_outputs

4b30fbc

mergify bot removed the has conflicts label Apr 13, 2021

ethanwharris added 2 commits April 13, 2021 10:41

Add tests

4fac95a

Merge branch 'bugfix/inconsistent_outputs' of https://github.com/PyTo…

c968cb4

…rchLightning/pytorch-lightning into bugfix/inconsistent_outputs

ethanwharris commented Apr 13, 2021

View reviewed changes

pytorch_lightning/trainer/training_loop.py Outdated Show resolved Hide resolved

ethanwharris added 4 commits April 13, 2021 10:45

Update pytorch_lightning/trainer/training_loop.py

4dfc5be

Remove unused imports

7b84f02

Fix test

35a3365

Fix broken test

597e173

carmocca approved these changes Apr 13, 2021

View reviewed changes

tests/trainer/test_evaluation_loop.py Outdated Show resolved Hide resolved

ethanwharris added 2 commits April 13, 2021 12:25

Fix test name and doc

793a69b

Remove unused import

036fc15

carmocca added this to the 1.3 milestone Apr 13, 2021

mergify bot added the has conflicts label Apr 13, 2021

Merge branch 'master' into bugfix/inconsistent_outputs

d3b1735

mergify bot removed the has conflicts label Apr 13, 2021

kaushikb11 enabled auto-merge (squash) April 13, 2021 13:20

tchaton approved these changes Apr 13, 2021

View reviewed changes

kaushikb11 merged commit b9bc772 into master Apr 13, 2021

kaushikb11 deleted the bugfix/inconsistent_outputs branch April 13, 2021 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix inconsistent outputs in `on__end` and `_end` #6969

Fix inconsistent outputs in `on__end` and `_end` #6969

ethanwharris commented Apr 12, 2021 •

edited by carmocca

codecov bot commented Apr 12, 2021 •

edited

ananthsub commented Apr 12, 2021

ethanwharris commented Apr 12, 2021

ethanwharris commented Apr 13, 2021 •

edited

tchaton left a comment

Fix inconsistent outputs in on_*_end and *_end #6969

Fix inconsistent outputs in on_*_end and *_end #6969

Conversation

ethanwharris commented Apr 12, 2021 • edited by carmocca

What does this PR do?

Before submitting

PR review

Did you have fun?

codecov bot commented Apr 12, 2021 • edited

Codecov Report

ananthsub commented Apr 12, 2021

ethanwharris commented Apr 12, 2021

ethanwharris commented Apr 13, 2021 • edited

tchaton left a comment

Choose a reason for hiding this comment

Fix inconsistent outputs in `on__end` and `_end` #6969

Fix inconsistent outputs in `on__end` and `_end` #6969

ethanwharris commented Apr 12, 2021 •

edited by carmocca

codecov bot commented Apr 12, 2021 •

edited

ethanwharris commented Apr 13, 2021 •

edited