Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multi labels textcat pipe for transformers #14

Merged
merged 27 commits into from Feb 19, 2020

Conversation

tamuhey
Copy link
Contributor

@tamuhey tamuhey commented Feb 17, 2020

closes #9

@ghost
Copy link

ghost commented Feb 18, 2020

How about implementing _labels method, mainly for multi-label classification?
The method name is up to you. _labels may be ambiguous.

def _labels(doc: Doc) -> List[Tuple[str, Any]]:
    if not doc.cats:
        return []
    return list(doc.cats.items())

@tamuhey
Copy link
Contributor Author

tamuhey commented Feb 18, 2020

It's not a very good way.
Since doc.cats contains all labels, it is same as using doc.cats directly.
You need to set a threshold to return high scoring labels, but I think this should be left to user.

@ghost
Copy link

ghost commented Feb 18, 2020

Since doc.cats contains all labels, it is same as using doc.cats directly.

I agree with you! I'm sorry!

@tamuhey tamuhey added the enhancement New feature or request label Feb 18, 2020
@ghost
Copy link

ghost commented Feb 19, 2020

Is it possible to validate ".jsonl" for textcat (not multitextcat)?
To avoid the below case.

When trying multitextcat and forgetting to switch the argument from model.textcat_label to model.multitextcat_label, the training phase works as textcat(not multitextcat) even if the sum of labels is not 1.

@tamuhey
Copy link
Contributor Author

tamuhey commented Feb 19, 2020

Added data validator

@tamuhey tamuhey merged commit b54bf7c into master Feb 19, 2020
@tamuhey tamuhey deleted the feature/multi-textcat branch February 19, 2020 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support multi-label classification
1 participant