Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abbreviation detection not working where short form contains a space followed by digits #4

Closed
ICLRandD opened this issue Aug 7, 2019 · 1 comment
Labels
help wanted Extra attention is needed

Comments

@ICLRandD
Copy link
Owner

ICLRandD commented Aug 7, 2019

The current implementation of the AbbreviationDetector() does not handle abbreviations that contain a short form followed by a space followed by a number

For example, in this scenario:

The Proceeds of Crime Act 2002 ("PoCA 2000")

The abbreviation is not matched.

The original implementation in scispaCy does not appear to have been built to handle instances in which the short form is bounded by quote marks).

@ICLRandD ICLRandD added the help wanted Extra attention is needed label Aug 7, 2019
@ICLRandD ICLRandD changed the title Abbreviation detection not working where short form contains a space followed by digits ⚫ Abbreviation detection not working where short form contains a space followed by digits Aug 7, 2019
@ICLRandD ICLRandD changed the title ⚫ Abbreviation detection not working where short form contains a space followed by digits Abbreviation detection not working where short form contains a space followed by digits Aug 7, 2019
@philgooch
Copy link

You might be interested in an alternative Python implementation of Schwartz-Hearst which handles this scenario.

https://github.com/philgooch/abbreviation-extraction

E.g.

pip install abbreviations
In [1]: from abbreviations import schwartz_hearst                                                                                                  

In [2]: schwartz_hearst.extract_abbreviation_definition_pairs(doc_text='The Proceeds of Crime Act 2002 ("PoCA 2002")')                             
Out[2]: {'PoCA 2002': 'Proceeds of Crime Act 2002'}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants