Skip to content

refactor: make TransformersDocumentClassifier output consistent between different types of classification#3224

Merged
ZanSara merged 9 commits intodeepset-ai:mainfrom
anakin87:refactor_TransformersDocumentClassifier_output
Sep 21, 2022
Merged

refactor: make TransformersDocumentClassifier output consistent between different types of classification#3224
ZanSara merged 9 commits intodeepset-ai:mainfrom
anakin87:refactor_TransformersDocumentClassifier_output

Conversation

@anakin87
Copy link
Member

@anakin87 anakin87 commented Sep 15, 2022

Related Issues

Proposed Changes:

TransformersDocumentClassifier output structure was different between ordinary and zero-shot classification.
Now the output is consistent. For doc.meta['classification'] we always have a structure like this:

{"label": "love",
"score": 0.9608993530273438, 
"details": {"love": 0.9608993530273438, "joy": 0.032584577798843384, ...}}}

After the discussion in #3167, I decided to keep the score attribute in the first level of doc.meta['classification']: I think that it can be useful for filtering.

How did you test it?

Manual verification.
I can add some tests if you think that's the case.

Notes for the reviewer

Other refactoring aspects:

  • replaced the attribute return_all_scores with top_k: it mimicked the same attribute of the HF pipeline, which is now deprecated in favor of the latter.
  • simplified batched predictions accumulation
  • removed document classifier from a test, where it was unused

Checklist

@anakin87 anakin87 marked this pull request as ready for review September 15, 2022 18:33
@anakin87 anakin87 requested review from a team as code owners September 15, 2022 18:33
@anakin87 anakin87 requested review from masci and removed request for a team September 15, 2022 18:33
@anakin87 anakin87 marked this pull request as draft September 15, 2022 19:26
@anakin87 anakin87 marked this pull request as ready for review September 15, 2022 19:26
@ZanSara ZanSara requested review from ZanSara and removed request for masci September 19, 2022 09:33
@ZanSara
Copy link
Contributor

ZanSara commented Sep 20, 2022

Hey @anakin87 ! Code looks good! Can you add a small test to ensure that the details field is there in both cases, and that it gets populated with the right amount of labels specified on top_k? Once that is done, this is ready to merge 👍

@anakin87
Copy link
Member Author

Hey @ZanSara!

I added two tests, since:

  • for ordinary classification, the number of labels in details is determined by top_k
  • for zero-shot classification, the number of labels in details is just the number of labels specified in the constructor

Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I found a couple of improvements but it's already good to go. I'll go ahead and approve, feel free to commit or resolve the suggestions as you see fit. I'll merge it around EOD or tomorrow morning.

@ZanSara ZanSara merged commit 89247b8 into deepset-ai:main Sep 21, 2022
@anakin87 anakin87 deleted the refactor_TransformersDocumentClassifier_output branch September 21, 2022 13:27
brandenchan pushed a commit that referenced this pull request Sep 21, 2022
…ween different types of classification (#3224)

* make output consistent

* make output consistent

* added tests for details

* better tests

* Update test_document_classifier.py

* make black happy

* Update test_document_classifier.py

* Update test_document_classifier.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TransformersDocumentClassifier: inconsistent output between ordinary and zero-shot classification

2 participants