refactor: make `TransformersDocumentClassifier` output consistent between different types of classification by anakin87 · Pull Request #3224 · deepset-ai/haystack

anakin87 · 2022-09-15T17:56:17Z

Related Issues

fixes TransformersDocumentClassifier: inconsistent output between ordinary and zero-shot classification #3167

Proposed Changes:

TransformersDocumentClassifier output structure was different between ordinary and zero-shot classification.
Now the output is consistent. For doc.meta['classification'] we always have a structure like this:

{"label": "love",
"score": 0.9608993530273438, 
"details": {"love": 0.9608993530273438, "joy": 0.032584577798843384, ...}}}

After the discussion in #3167, I decided to keep the score attribute in the first level of doc.meta['classification']: I think that it can be useful for filtering.

How did you test it?

Manual verification.
I can add some tests if you think that's the case.

Notes for the reviewer

Other refactoring aspects:

replaced the attribute return_all_scores with top_k: it mimicked the same attribute of the HF pipeline, which is now deprecated in favor of the latter.
simplified batched predictions accumulation
removed document classifier from a test, where it was unused

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added tests that demonstrate the correct behavior of the change
I've used the conventional commit convention for my PR title
I documented my code
I ran pre-commit hooks and fixed any issue

…DocumentClassifier_output

ZanSara · 2022-09-20T07:58:26Z

Hey @anakin87 ! Code looks good! Can you add a small test to ensure that the details field is there in both cases, and that it gets populated with the right amount of labels specified on top_k? Once that is done, this is ready to merge 👍

anakin87 · 2022-09-20T21:48:59Z

Hey @ZanSara!

I added two tests, since:

for ordinary classification, the number of labels in details is determined by top_k
for zero-shot classification, the number of labels in details is just the number of labels specified in the constructor

ZanSara

Thank you! I found a couple of improvements but it's already good to go. I'll go ahead and approve, feel free to commit or resolve the suggestions as you see fit. I'll merge it around EOD or tomorrow morning.

test/nodes/test_document_classifier.py

…ween different types of classification (#3224) * make output consistent * make output consistent * added tests for details * better tests * Update test_document_classifier.py * make black happy * Update test_document_classifier.py * Update test_document_classifier.py

anakin87 added 3 commits September 15, 2022 18:54

Merge remote-tracking branch 'origin/main' into refactor_Transformers…

a42033c

…DocumentClassifier_output

make output consistent

89f392b

make output consistent

0e9b282

anakin87 marked this pull request as ready for review September 15, 2022 18:33

anakin87 requested review from a team as code owners September 15, 2022 18:33

anakin87 requested review from masci and removed request for a team September 15, 2022 18:33

anakin87 marked this pull request as draft September 15, 2022 19:26

anakin87 marked this pull request as ready for review September 15, 2022 19:26

ZanSara requested review from ZanSara and removed request for masci September 19, 2022 09:33

added tests for details

9e176b8

better tests

0d3b399

ZanSara approved these changes Sep 21, 2022

View reviewed changes

anakin87 added 4 commits September 21, 2022 10:54

Update test_document_classifier.py

2d7ed4b

make black happy

99aead9

Update test_document_classifier.py

eedb2eb

Update test_document_classifier.py

542aa59

ZanSara merged commit 89247b8 into deepset-ai:main Sep 21, 2022

anakin87 deleted the refactor_TransformersDocumentClassifier_output branch September 21, 2022 13:27

anakin87 mentioned this pull request Sep 22, 2022

refactor: better tests for TransformersDocumentClassifier #3270

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: make `TransformersDocumentClassifier` output consistent between different types of classification#3224

refactor: make `TransformersDocumentClassifier` output consistent between different types of classification#3224
ZanSara merged 9 commits intodeepset-ai:mainfrom
anakin87:refactor_TransformersDocumentClassifier_output

anakin87 commented Sep 15, 2022 •

edited

Loading

Uh oh!

ZanSara commented Sep 20, 2022

Uh oh!

anakin87 commented Sep 20, 2022

Uh oh!

ZanSara left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anakin87 commented Sep 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

Uh oh!

ZanSara commented Sep 20, 2022

Uh oh!

anakin87 commented Sep 20, 2022

Uh oh!

ZanSara left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anakin87 commented Sep 15, 2022 •

edited

Loading