Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lemmatization of ordinal numbers #24

Closed
yolpsoftware opened this issue Nov 25, 2021 · 1 comment
Closed

Lemmatization of ordinal numbers #24

yolpsoftware opened this issue Nov 25, 2021 · 1 comment

Comments

@yolpsoftware
Copy link

yolpsoftware commented Nov 25, 2021

There seem to be some inconsistencies in the handling of ordinal numbers. Some ordinal numbers are lemmatized as an adverb with the period (word="21.", lemma="21.", pos=ADV), some as an adverb without the period (word="21.", lemma="21", pos=ADV), and some split the number and the period, treating them as NUM and PUNCT. Just looking at the "dev" dataset:

  • dev-s576, dev-s585, dev-s607, dev-s609, dev-s610, dev-s611, dev-s637 treat the ordinal number as word="21.", lemma="21", pos=ADV.
  • dev-s528, dev-s566, dev-s621 treat the ordinal number as word="21.", lemma="21.", pos=ADV.
  • dev-s29, dev-s511, dev-s461 treat the ordinal number as two words: word="21", lemma="21", pos=NUM and word=".", lemma=".", pos=PUNCT

Furthermore, the days of months situation is IMHO very similar to the English case:

Am 27. Mai
On May 27th

so I would expect them to have the same treatment. Both are dates, and in both cases, the day is an ordinal number meaning "the 27th day of May".

However, in the following English dataset, days of months seem to be lemmatized consistently as NOUN (just search the dataset for "1st", "2nd", "3rd" etc.):

https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu

Examples:

I was on vacation from October 4th to October 19th and I didn't submit my timesheet yet.
My boyfriend's birthday is November 22nd and we are going to Del Frisco's for dinner.
Just to let you all know Matt has confirmed the booking for 3rd Dec i s OK.

Shouldn't the German day-of-month ordinal numbers be treated as nouns too? What's the difference to the English case?

@dan-zeman
Copy link
Member

Shouldn't the German day-of-month ordinal numbers be treated as nouns too? What's the difference to the English case?

They should be treated similarly. However, the current annotation in English is wrong. They are definitely not nouns. Ordinal numerals are generally tagged as adjectives, with the additional feature NumType=Ord (see ADJ).

The inconsistencies in German should now be fixed in the dev branch. The fixes will be propagated to the next UD release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants