Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect lemmatization #13231

Closed
msal4 opened this issue Jan 10, 2024 · 2 comments
Closed

Incorrect lemmatization #13231

msal4 opened this issue Jan 10, 2024 · 2 comments
Labels
feat / lemmatizer Feature: Rule-based and lookup lemmatization lang / de German language data and models models Issues related to the statistical models perf / accuracy Performance: accuracy

Comments

@msal4
Copy link

msal4 commented Jan 10, 2024

I was running the sentence "Folgst du der Weser weiter, fährst du durch das Oldenburger Land, wo du eindrucksvolle Naturschutzgebiete, Wälder und Moore erlebst, bis du nach Nordenham gelangst." through spacy and noticed that "fährst" has the incorrect lemmatization of "fähren" instead of "fahren"

How to reproduce the behaviour

I tried a simpler sentences and got the same result:
"Fährst du Auto?" this time I get "Fährst" as the lemma
"isst du Auto?" -> "issen" instead of "essen"
and other verbs
I used de_core_news_lg

Your Environment

  • spaCy version: 3.7.2
  • Platform: macOS-14.2.1-arm64-arm-64bit
  • Python version: 3.11.7
  • Pipelines: de_core_news_lg (3.7.0)
@svlandeg svlandeg added lang / de German language data and models models Issues related to the statistical models feat / lemmatizer Feature: Rule-based and lookup lemmatization perf / accuracy Performance: accuracy labels Jan 11, 2024
@svlandeg
Copy link
Member

Hi! Thanks for the report - I'll go ahead and merge this with the master thread #3052.

Copy link
Contributor

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feat / lemmatizer Feature: Rule-based and lookup lemmatization lang / de German language data and models models Issues related to the statistical models perf / accuracy Performance: accuracy
Projects
None yet
Development

No branches or pull requests

2 participants