You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Normally, I would not be too picky about wrong lemmatization (this happens all the time). But since these are among the most common German words, this should be correct. Their respective lemma should be the same word each: "Das" -> "Das", "Die" -> "Die", "Der" -> "Der". The lemma should be lowercase, if the original token is lowerecased, too.
The text was updated successfully, but these errors were encountered:
Hi @GrazingScientist, there are reasons for this behavior: In most treebanks the definite articles are reduced to a single version, mostly "der" or "d". That why I opted for this form, it's a choice, even if it's questionable.
As a side note, writing lang=('de') (i.e. string instead of tuple) should be fine.
Hi @adbar,
thanks for providing this really cool (and capable) library.
Currently, all German definite articles ("der", "die", "das") are lemmatized to "der", which is wrong.
Normally, I would not be too picky about wrong lemmatization (this happens all the time). But since these are among the most common German words, this should be correct. Their respective lemma should be the same word each: "Das" -> "Das", "Die" -> "Die", "Der" -> "Der". The lemma should be lowercase, if the original token is lowerecased, too.
The text was updated successfully, but these errors were encountered: