Metric aggregation testing #3517

SkafteNicki · 2020-09-16T08:25:53Z

What does this PR do?

With PR #3245 we changed the way we are doing ddp sync and also introduced aggregation over multiple batches. This PR will add test for all metrics such that we are sure that they work in the following cases:

Aggregation over multiple devices (aka running in ddp mode)
Aggregation over multiple equal size batches
Aggregation over multiple unequal size batches (important because a lot of assumptions that the final metric value can be computed as the mean over individual computed values break in this example)
Aggregation over multiple devices and multiple batches
Model integration

Should fix #3230

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

pep8speaks · 2020-09-16T12:54:05Z

Hello @SkafteNicki! Thanks for updating this PR.

In the file pytorch_lightning/metrics/regression.py:

Line 282:121: E501 line too long (129 > 120 characters)

Comment last updated at 2020-10-01 11:32:20 UTC

…on_testing

codecov · 2020-09-17T14:07:42Z

Codecov Report

Merging #3517 into master will increase coverage by 3%.
The diff coverage is 95%.

@@           Coverage Diff           @@
##           master   #3517    +/-   ##
=======================================
+ Coverage      87%     90%    +3%     
=======================================
  Files         110     110            
  Lines        8858    8818    -40     
=======================================
+ Hits         7731    7921   +190     
+ Misses       1127     897   -230

SkafteNicki · 2020-09-18T14:17:28Z

@Borda , @awaelchli , @justusschock would like your guys input on this.
The testing of metrics in ddp mode has long been overdue and after doing this to 7 of them clearly shows that they are broken when we start to aggregate (both multi device and multi batch).
It requires that most metrics divide their computations into pre-ddp and post-ddp (as I hinted at in #2528 and borda pointed to in #3510).
Lukily, we do have the interface for this: pre-ddp goes into forward method and post-ddp goes in the static method compute.
As stated I have done this to 7 of them and would like your guys input before continuing:

Is the test setup too complicated? Espicially should testing be split into their respective files (classification, regression ect) or one large as in this PR?
How to keep code duplication low with the functional backend? Right now I have basically implemented a return_state argument to the functional version of each metric which returns the metric-state right before ddp sync so only minimal computations needs to happen after ddp sync.
How to split the work? I can do many of them but I guess one big PR will be too complicated to review.

ananyahjha93 · 2020-09-18T16:16:55Z

@SkafteNicki hey, I am am working on this as well, and I think there are some metrics which wouldn't work in that order. I think it is better to change the order to pre_ddp (input_convert) -> ddp_reduce/ddp_gather -> post_ddp (forward/output_convert). i.e. we need to gather statistics before computing the metrics themselves, rather than computing the metrics on individual GPUs and figuring out how metrics from different batches are combined. It won't just be mean/weighted mean or something along those lines for any specific metric. So, I have been changing this itself for metrics but was breaking a lot of things yesterday.

awaelchli

nice, LGTM on a high level. I'm not a metrics guy, so no comment on correctness :)

pytorch_lightning/metrics/functional/classification.py

pytorch_lightning/metrics/functional/regression.py

pytorch_lightning/metrics/regression.py

tests/metrics/test_metrics.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

justusschock

LGTM. Just left some comments.

pytorch_lightning/metrics/classification.py

…teNicki/pytorch-lightning into metrics/aggregation_testing

mergify · 2020-09-29T13:39:29Z

This pull request is now in conflict... :(

pytorch_lightning/metrics/functional/regression.py

pytorch_lightning/metrics/metric.py

pytorch_lightning/metrics/self_supervised.py

tests/base/model_test_epoch_ends.py

tests/metrics/test_aggregation.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

mergify · 2020-09-30T12:38:48Z

This pull request is now in conflict... :(

mergify · 2020-10-01T11:03:36Z

This pull request is now in conflict... :(

SkafteNicki and others added 3 commits September 16, 2020 07:51

aggregation testing

af12f24

add more tests

b8dc18d

mse

64b6fa2

SkafteNicki added the Metrics label Sep 16, 2020

SkafteNicki added this to in Progress in Metrics package via automation Sep 16, 2020

more tests

a334380

SkafteNicki mentioned this pull request Sep 16, 2020

add tests for metrics in DDP #3512

Closed

fix tests

7195d57

Merge remote-tracking branch 'upstream/master' into metrics/aggregati…

2567dc0

…on_testing

SkafteNicki added 4 commits September 17, 2020 16:41

fix doctest

206bf4c

fix codefactor

a9f729b

fix import error

4b7c909

fix doctest

f256823

SkafteNicki mentioned this pull request Sep 17, 2020

DDP TensorMetric and NumpyMetric exception #3507

Closed

SkafteNicki added 3 commits September 18, 2020 10:14

revert docfix

3cc4dae

test for model integration

aeac804

fix integration test

ffc9996

SkafteNicki and others added 9 commits September 21, 2020 12:52

added test cases

6582e9b

fix rmsle

029b76d

aggregation testing

7b6b3e8

add more tests

d23f607

mse

d657cce

more tests

b361bb0

fix tests

6843e30

fix doctest

b8a8c84

fix codefactor

6e48f1d

awaelchli reviewed Sep 28, 2020

View reviewed changes

mergify bot requested a review from a team September 28, 2020 12:10

Apply suggestions from code review

3978a8a

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

justusschock approved these changes Sep 28, 2020

View reviewed changes

pytorch_lightning/metrics/classification.py Outdated Show resolved Hide resolved

mergify bot requested a review from a team September 28, 2020 13:09

SkafteNicki added 2 commits September 28, 2020 16:08

code cleanup + changes based on suggestions

34e36db

Merge branch 'metrics/aggregation_testing' of https://github.com/Skaf…

b923ad5

…teNicki/pytorch-lightning into metrics/aggregation_testing

ananyahjha93 requested review from Borda and awaelchli September 29, 2020 14:45

ananyahjha93 approved these changes Sep 29, 2020

View reviewed changes

mergify bot requested a review from a team September 29, 2020 14:59

Borda reviewed Sep 29, 2020

View reviewed changes

pytorch_lightning/metrics/functional/regression.py Outdated Show resolved Hide resolved

mergify bot requested a review from a team September 29, 2020 15:12

Borda reviewed Sep 29, 2020

View reviewed changes

mergify bot requested a review from a team September 29, 2020 15:18

Nicki Skafte and others added 4 commits September 30, 2020 08:09

update based on suggestion

94fb0a1

update based on suggestions

015b138

Apply suggestions from code review

f32402c

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Merge branch 'master' into metrics/aggregation_testing

b2eb760

Merge branch 'master' into metrics/aggregation_testing

9490025

Merge branch 'master' into metrics/aggregation_testing

04c9f45

awaelchli approved these changes Oct 1, 2020

View reviewed changes

SkafteNicki merged commit fe29028 into Lightning-AI:master Oct 1, 2020

Metrics package automation moved this from in Progress to Done Oct 1, 2020

SkafteNicki mentioned this pull request Oct 5, 2020

Default reduction always applied by Metric, even when requesting 'none' reduction #3752

Closed

Borda added this to the 0.10.0 milestone Oct 7, 2020

SkafteNicki deleted the metrics/aggregation_testing branch December 30, 2020 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metric aggregation testing #3517

Metric aggregation testing #3517

SkafteNicki commented Sep 16, 2020 •

edited

pep8speaks commented Sep 16, 2020 •

edited

codecov bot commented Sep 17, 2020 •

edited

SkafteNicki commented Sep 18, 2020

ananyahjha93 commented Sep 18, 2020

awaelchli left a comment

justusschock left a comment

mergify bot commented Sep 29, 2020

mergify bot commented Sep 30, 2020

mergify bot commented Oct 1, 2020

Metric aggregation testing #3517

Metric aggregation testing #3517

Conversation

SkafteNicki commented Sep 16, 2020 • edited

What does this PR do?

Before submitting

PR review

Did you have fun?

pep8speaks commented Sep 16, 2020 • edited

Comment last updated at 2020-10-01 11:32:20 UTC

codecov bot commented Sep 17, 2020 • edited

Codecov Report

SkafteNicki commented Sep 18, 2020

ananyahjha93 commented Sep 18, 2020

awaelchli left a comment

Choose a reason for hiding this comment

justusschock left a comment

Choose a reason for hiding this comment

mergify bot commented Sep 29, 2020

mergify bot commented Sep 30, 2020

mergify bot commented Oct 1, 2020

SkafteNicki commented Sep 16, 2020 •

edited

pep8speaks commented Sep 16, 2020 •

edited

codecov bot commented Sep 17, 2020 •

edited