New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lemmatization of ordinal numbers #24
Comments
They should be treated similarly. However, the current annotation in English is wrong. They are definitely not nouns. Ordinal numerals are generally tagged as adjectives, with the additional feature The inconsistencies in German should now be fixed in the |
There seem to be some inconsistencies in the handling of ordinal numbers. Some ordinal numbers are lemmatized as an adverb with the period (
word="21.", lemma="21.", pos=ADV
), some as an adverb without the period (word="21.", lemma="21", pos=ADV
), and some split the number and the period, treating them as NUM and PUNCT. Just looking at the "dev" dataset:dev-s576
,dev-s585
,dev-s607
,dev-s609
,dev-s610
,dev-s611
,dev-s637
treat the ordinal number asword="21.", lemma="21", pos=ADV
.dev-s528
,dev-s566
,dev-s621
treat the ordinal number asword="21.", lemma="21.", pos=ADV
.dev-s29
,dev-s511
,dev-s461
treat the ordinal number as two words:word="21", lemma="21", pos=NUM
andword=".", lemma=".", pos=PUNCT
Furthermore, the days of months situation is IMHO very similar to the English case:
so I would expect them to have the same treatment. Both are dates, and in both cases, the day is an ordinal number meaning "the 27th day of May".
However, in the following English dataset, days of months seem to be lemmatized consistently as NOUN (just search the dataset for "1st", "2nd", "3rd" etc.):
https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu
Examples:
Shouldn't the German day-of-month ordinal numbers be treated as nouns too? What's the difference to the English case?
The text was updated successfully, but these errors were encountered: