Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

English en_core_web_sm tagging and parsing errors #1975

Closed
thedataist opened this issue Feb 12, 2018 · 2 comments
Closed

English en_core_web_sm tagging and parsing errors #1975

thedataist opened this issue Feb 12, 2018 · 2 comments
Labels
lang / en English language data and models models Issues related to the statistical models perf / accuracy Performance: accuracy

Comments

@thedataist
Copy link

thedataist commented Feb 12, 2018

Doing some large scale testing, so noting general errors in POS tagging and dependencies observed in the most recent en_core_web_sm that should likely be trained out. We train these out on our own models, but if it is useful I'll pass them on here if I feel like they should be part of the core models.

  • Short expressions of times in the format "1am","2am", etc., are being split into the numeral and "am", with the "am" then tagged as a verb or noun with an inappropriate dependency. Correct parse should likely be the "am" tagged as an adjective with a num_mod dependency on the numeral. FYI - looks like a lot of inconsistency on tagging and parsing of "a.m." and "p.m." as well.

  • The word "spanish" is being tagged as an adjective in all contexts, e.g. in "does she speak spanish", spanish is returned as an adjective instead of a noun

  • spaCy version: 2.0.5
  • Platform: Darwin-16.7.0-x86_64-i386-64bit
  • Python version: 3.6.1
  • Models: en
@ines ines added performance lang / en English language data and models models Issues related to the statistical models labels Mar 27, 2018
@ines ines added perf / accuracy Performance: accuracy and removed performance labels Aug 15, 2018
@ines
Copy link
Member

ines commented Dec 14, 2018

Merging this with #3052. We've now added a master thread for incorrect predictions and related reports – see the issue for more details.

@ines ines closed this as completed Dec 14, 2018
@lock
Copy link

lock bot commented Jan 13, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Jan 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lang / en English language data and models models Issues related to the statistical models perf / accuracy Performance: accuracy
Projects
None yet
Development

No branches or pull requests

2 participants