New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addition of TextTester
class and update tests for current text metrics.
#450
Addition of TextTester
class and update tests for current text metrics.
#450
Conversation
Codecov Report
@@ Coverage Diff @@
## master #450 +/- ##
=====================================
- Coverage 96% 96% -0%
=====================================
Files 130 130
Lines 4338 4357 +19
=====================================
+ Hits 4157 4174 +17
- Misses 181 183 +2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, a mix of comments. Looking great so far and will really improve the testing of our text metrics
… tests, but ddp=True fails.
TextTester
class.TextTester
class and update tests for current text metrics.
Updating the implementation of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
going to take a look at the failing test
@karthikrangasai mind check the failing tests? 🐰 |
@Borda I'm working on them. The ROUGE metric is failing the tests and maybe the implementation might have some errors. I'm debugging it and trying fixes currently. They fail specifically for |
hmm, that would raise the importance of this test very much if it helps to find some hidden bugs 🐰 |
@Borda @karthikrangasai I think the root of this problem is gonna be in the way of how the intermediate results are stored after calling the |
@stancld One thing is no state is used for the storing of the values. Another thing I noticed is we are computing scores for every example pair where instead I feel we should be computing each score for all examples together. I mean instead of
I think we should be doing
What do you think about this ? |
The root of the problem in the current implementation is that the dict that is returned by the The simple fix is to cast the values in the dict to tensor and then do a bit of reformatting in the |
@karthikrangasai Rouge is a sentence-level metric (not like a BLEU score which is designed to be a corpus-level one). The score should be thus computed separately for each sentence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great to see more robust testing of the new text metrics :]
@karthikrangasai the GPU test were killed after 45min when on master are running for about 20... |
@Borda |
@SkafteNicki thought? |
test seems to be passing LPIPS now so maybe a bad connection when it was tested last time |
This means there is some error with the tensors not being on the GPU right ? Edit: The compute function of the functional implementation of the BLEU metric did not create tensors on the same devices, which I have fixed now. |
* Added TextTester class * Updating TextTester class with small naming changes * Update TextTester with input processing, appropriate concatenations. * Updated tests for BLEU Score based on TextTester. * Updated WER metric to use the TextTester class. * Updated tests for ROUGEScore based on TestTester. Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> (cherry picked from commit e6ad813)
* Added TextTester class * Updating TextTester class with small naming changes * Update TextTester with input processing, appropriate concatenations. * Updated tests for BLEU Score based on TextTester. * Updated WER metric to use the TextTester class. * Updated tests for ROUGEScore based on TestTester. Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> (cherry picked from commit e6ad813)
Before submitting
What does this PR do?
Fixes #411 .
To Do:
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃