Addition of `TextTester` class and update tests for current text metrics. #450

karthikrangasai · 2021-08-16T10:50:46Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Fixes #411 .

To Do:

Add TextTester class.
Update testing for BLEU Score
Update testing for ROUGE Metric
Update testing for WER

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

codecov · 2021-08-16T10:54:26Z

Codecov Report

Merging #450 (d5ebbfc) into master (313c868) will decrease coverage by 0%.
The diff coverage is 99%.

@@          Coverage Diff          @@
##           master   #450   +/-   ##
=====================================
- Coverage      96%    96%   -0%     
=====================================
  Files         130    130           
  Lines        4338   4357   +19     
=====================================
+ Hits         4157   4174   +17     
- Misses        181    183    +2

SkafteNicki

okay, a mix of comments. Looking great so far and will really improve the testing of our text metrics

torchmetrics/text/bleu.py

tests/text/test_wer.py

tests/text/test_rouge.py

tests/text/helpers.py

… tests, but ddp=True fails.

karthikrangasai · 2021-08-21T11:44:49Z

Updating the implementation of ROUGE by adding a state helped get rid of some failed tests. But the class based implementation tests still fail when ddp=True.
Can I get insights on this issue ? It would be helpful.

SkafteNicki

going to take a look at the failing test

tests/helpers/testers.py

Borda · 2021-08-26T07:33:45Z

@karthikrangasai mind check the failing tests? 🐰

karthikrangasai · 2021-08-26T07:36:04Z

@Borda I'm working on them. The ROUGE metric is failing the tests and maybe the implementation might have some errors. I'm debugging it and trying fixes currently.

They fail specifically for ddp=True case.

Borda · 2021-08-26T07:41:12Z

The ROUGE metric is failing the tests and maybe the implementation might have some errors

hmm, that would raise the importance of this test very much if it helps to find some hidden bugs 🐰
cc: @stancld @SkafteNicki

stancld · 2021-08-26T07:50:10Z

@Borda @karthikrangasai I think the root of this problem is gonna be in the way of how the intermediate results are stored after calling the update method of ROUGEScore.
cc: @SkafteNicki

karthikrangasai · 2021-08-26T07:54:22Z

@stancld One thing is no state is used for the storing of the values.

Another thing I noticed is we are computing scores for every example pair where instead I feel we should be computing each score for all examples together.

I mean instead of

for pred, target in zip(preds, targets):
    # compute metric for this example

I think we should be doing

for rouge_key in self.rouge_keys_given:
     # compute metric for all examples together in the batch

What do you think about this ?

SkafteNicki · 2021-08-26T08:07:31Z

The root of the problem in the current implementation is that the dict that is returned by the _update call is not a dict of tensors but a dict of floats. Since the sync is only applied to tensors:
https://github.com/PyTorchLightning/metrics/blob/94a158c2674e197620a79010ba1d78ea57706774/torchmetrics/metric.py#L224-L229
when end up not syncing the dict and you essentially just get the result from the local process.

The simple fix is to cast the values in the dict to tensor and then do a bit of reformatting in the compute call after the sync.

stancld · 2021-08-26T08:08:07Z

@stancld One thing is no state is used for the storing of the values.

Another thing I noticed is we are computing scores for every example pair where instead I feel we should be computing each score for all examples together.

I mean instead of
for pred, target in zip(preds, targets):
    # compute metric for this example
I think we should be doing
for rouge_key in self.rouge_keys_given:
     # compute metric for all examples together in the batch
What do you think about this ?

@karthikrangasai Rouge is a sentence-level metric (not like a BLEU score which is designed to be a corpus-level one). The score should be thus computed separately for each sentence.

…thikrangasai/metrics into feature/411_TextTester_class

SkafteNicki

LGTM, great to see more robust testing of the new text metrics :]

tests/helpers/testers.py

torchmetrics/text/rouge.py

Borda · 2021-08-27T08:05:33Z

@karthikrangasai the GPU test were killed after 45min when on master are running for about 20...
seems like something is hanging there, mind have look :]
https://dev.azure.com/PytorchLightning/Metrics/_build/results?buildId=27573&view=logs&j=3afc50db-e620-5b81-6016-870a6976ad29&t=98354d77-e326-51ec-536f-1549451db1fa

karthikrangasai · 2021-08-27T08:24:23Z

@Borda
torchmetrics.image.lpip_similarity.LPIPS is taking too long to load I think. I hadn't made any change to that file.
Should I add #doctest: +SKIP although it wasn't present before ?

Borda · 2021-08-27T08:41:20Z

torchmetrics.image.lpip_similarity.LPIPS is taking too long to load I think. I hadn't made any change to that file.

@SkafteNicki thought?

SkafteNicki · 2021-08-27T09:25:28Z

test seems to be passing LPIPS now so maybe a bad connection when it was tested last time

karthikrangasai · 2021-08-27T09:31:06Z

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

This means there is some error with the tensors not being on the GPU right ?

Edit: The compute function of the functional implementation of the BLEU metric did not create tensors on the same devices, which I have fixed now.

* Added TextTester class * Updating TextTester class with small naming changes * Update TextTester with input processing, appropriate concatenations. * Updated tests for BLEU Score based on TextTester. * Updated WER metric to use the TextTester class. * Updated tests for ROUGEScore based on TestTester. Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> (cherry picked from commit e6ad813)

karthikrangasai added 2 commits August 13, 2021 12:25

Added TextTester class

e6299e6

Updating TextTester class with small naming changes

3e2f2cc

SkafteNicki added this to the v0.6 milestone Aug 16, 2021

Borda added enhancement New feature or request test / CI testing or CI labels Aug 16, 2021

Borda added this to In progress in Text via automation Aug 16, 2021

Borda assigned justusschock and unassigned justusschock Aug 16, 2021

karthikrangasai added 6 commits August 18, 2021 11:46

Merge branch 'master' into feature/411_TextTester_class

587ab77

Update TextTester with input processing, appropriate concatenations.

2231f40

Updated tests for BLEU Score based on TextTester.

3a6bfa4

Merge branch 'master' into feature/411_TextTester_class

956dca0

Updated WER metric to use the TextTester class.

3e4f22a

Updated tests for ROUGEScore based on TestTester.

99263ba

SkafteNicki reviewed Aug 20, 2021

View reviewed changes

karthikrangasai added 2 commits August 20, 2021 18:21

Changes from review.

814dd32

Fix ROUGEScore class implementation to include state. This fixes some…

4aa50ff

… tests, but ddp=True fails.

karthikrangasai changed the title ~~[wip] Addition of TextTester class.~~ Addition of TextTester class and update tests for current text metrics. Aug 21, 2021

karthikrangasai marked this pull request as ready for review August 21, 2021 11:42

karthikrangasai requested review from ananyahjha93, Borda, ethanwharris, justusschock, SeanNaren and tchaton as code owners August 21, 2021 11:42

Fix naming errors.

1bba606

SkafteNicki reviewed Aug 23, 2021

View reviewed changes

tests/helpers/testers.py Outdated Show resolved Hide resolved

tests/helpers/testers.py Outdated Show resolved Hide resolved

tests/helpers/testers.py Outdated Show resolved Hide resolved

Update typing for assert functions.

027a3ba

Borda assigned SeanNaren and SkafteNicki Aug 26, 2021

karthikrangasai added 3 commits August 26, 2021 15:01

Merge branch 'feature/411_TextTester_class' of https://github.com/kar…

9d92d05

…thikrangasai/metrics into feature/411_TextTester_class

Fix ROUGE Metric to pass all tests. Added extra example for testing.

715b65e

Fix typing issues.

7de1097

SkafteNicki approved these changes Aug 26, 2021

View reviewed changes

tests/helpers/testers.py Outdated Show resolved Hide resolved

tests/helpers/testers.py Outdated Show resolved Hide resolved

tests/helpers/testers.py Outdated Show resolved Hide resolved

torchmetrics/text/rouge.py Outdated Show resolved Hide resolved

SkafteNicki and others added 2 commits August 26, 2021 14:26

Merge branch 'master' into feature/411_TextTester_class

3983044

Changes from review.

faad1f8

mergify bot added the ready label Aug 26, 2021

karthikrangasai added 2 commits August 27, 2021 15:44

Fixed tensor on different device issue for BLEU metric.

6d64666

Update CHANGELOG.

d5ebbfc

Borda approved these changes Aug 27, 2021

View reviewed changes

Text automation moved this from In progress to Reviewer approved Aug 27, 2021

Borda merged commit e6ad813 into Lightning-AI:master Aug 27, 2021

Text automation moved this from Reviewer approved to Done Aug 27, 2021

Borda added the topic: Text label Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addition of `TextTester` class and update tests for current text metrics. #450

Addition of `TextTester` class and update tests for current text metrics. #450

karthikrangasai commented Aug 16, 2021 •

edited

codecov bot commented Aug 16, 2021 •

edited

SkafteNicki left a comment

karthikrangasai commented Aug 21, 2021

SkafteNicki left a comment

Borda commented Aug 26, 2021

karthikrangasai commented Aug 26, 2021 •

edited

Borda commented Aug 26, 2021

stancld commented Aug 26, 2021

karthikrangasai commented Aug 26, 2021 •

edited

SkafteNicki commented Aug 26, 2021

stancld commented Aug 26, 2021

SkafteNicki left a comment

Borda commented Aug 27, 2021

karthikrangasai commented Aug 27, 2021 •

edited

Borda commented Aug 27, 2021

SkafteNicki commented Aug 27, 2021

karthikrangasai commented Aug 27, 2021 •

edited

Addition of TextTester class and update tests for current text metrics. #450

Addition of TextTester class and update tests for current text metrics. #450

Conversation

karthikrangasai commented Aug 16, 2021 • edited

Before submitting

What does this PR do?

To Do:

PR review

Did you have fun?

codecov bot commented Aug 16, 2021 • edited

Codecov Report

SkafteNicki left a comment

Choose a reason for hiding this comment

karthikrangasai commented Aug 21, 2021

SkafteNicki left a comment

Choose a reason for hiding this comment

Borda commented Aug 26, 2021

karthikrangasai commented Aug 26, 2021 • edited

Borda commented Aug 26, 2021

stancld commented Aug 26, 2021

karthikrangasai commented Aug 26, 2021 • edited

SkafteNicki commented Aug 26, 2021

stancld commented Aug 26, 2021

SkafteNicki left a comment

Choose a reason for hiding this comment

Borda commented Aug 27, 2021

karthikrangasai commented Aug 27, 2021 • edited

Borda commented Aug 27, 2021

SkafteNicki commented Aug 27, 2021

karthikrangasai commented Aug 27, 2021 • edited

Addition of `TextTester` class and update tests for current text metrics. #450

Addition of `TextTester` class and update tests for current text metrics. #450

karthikrangasai commented Aug 16, 2021 •

edited

codecov bot commented Aug 16, 2021 •

edited

karthikrangasai commented Aug 26, 2021 •

edited

karthikrangasai commented Aug 26, 2021 •

edited

karthikrangasai commented Aug 27, 2021 •

edited

karthikrangasai commented Aug 27, 2021 •

edited