Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split do_extarctor_overlap to correctly handle overllappine entities #1737

Closed
wants to merge 4 commits into from

Conversation

EPedrotti
Copy link
Contributor

@EPedrotti EPedrotti commented Feb 25, 2019

Proposed changes:

  • split determine_token_labels in two functions, determin_token_labels and determine_true_token_labels. This should handle correctly overlapping entities, and throw an exception only if overlapping entities are found and a extractor that does not support them is used.

Status (please check what you already did):

  • made PR ready for code review
  • added some tests for the functionality
  • updated the documentation
  • updated the changelog

@codeclimate
Copy link

codeclimate bot commented Feb 25, 2019

Code Climate has analyzed commit dff3172 and detected 0 issues on this pull request.

View more on Code Climate.

@EPedrotti EPedrotti requested review from akelad and removed request for akelad March 4, 2019 15:16
Copy link
Contributor

@akelad akelad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok actually i think we've ended up complicating this way too much, simply changing the do_extractors_support_overlap to:

def do_extractors_support_overlap(extractors):
    """Checks if extractors support overlapping entities
    """
    if extractors is None:
        return False
    return CRFEntityExtractor.name not in extractors

will solve all our problems i believe. So if you could close this PR and open a new one with that change, that'd be great. I think writing some tests for this makes sense as well, so some examples where there's no extractor field/crf in there and the entities are overlapping, and some for when the extractors are ner_spacy etc.

@tmbo
Copy link
Member

tmbo commented Mar 6, 2019

@EPedrotti can we close this in favor of #1760 ?

@tmbo tmbo closed this Mar 6, 2019
@tmbo tmbo deleted the fix-do-extractors-overlap branch May 9, 2019 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants