Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numbered wars #44

Open
AngledLuffa opened this issue Dec 7, 2023 · 11 comments
Open

Numbered wars #44

AngledLuffa opened this issue Dec 7, 2023 · 11 comments

Comments

@AngledLuffa
Copy link
Contributor

AngledLuffa commented Dec 7, 2023

It seems weird that we have

# sent_id = w01096074
10      World   world   PROPN   NN      Number=Sing     11      compound        11:compound     _
11      War     war     PROPN   NN      Number=Sing     8       nmod    8:nmod:of       _
12      I       I       NUM     CD      NumForm=Roman|NumType=Card      11      compound        11:compound     _

but then also have

# sent_id = w01100046
9       First   First   PROPN   NNP     Number=Sing     11      compound        11:compound     _
10      Opium   Opium   PROPN   NNP     Number=Sing     11      compound        11:compound     _
11      War     War     PROPN   NNP     Number=Sing     7       obj     7:obj   _

and

25      Second  second  ADJ     JJ      Degree=Pos|NumForm=Word|NumType=Ord     27      amod    27:amod Proper=True
26      World   world   PROPN   NN      Number=Sing     27      compound        27:compound     _
27      War     war     PROPN   NN      Number=Sing     22      nmod    22:nmod:of      SpaceAfter=No
@rhdunn
Copy link

rhdunn commented Dec 7, 2023

Example 2 and 3 are ordinal number words, so should be XPOS=CD with NumForm=Word|NumType=Ord according to UD guidelines.

IIRC, NNP is used in the XPOS for compatibility with PTB. In this case, example 3 should match example 2. This gives a conflicting XPOS candidate (CD or NNP).

The cambridge dictionary classifies the ordinals as determiners (but notes that another determiner like "the" or "a" can preceed the ordinal):

  1. https://dictionary.cambridge.org/grammar/british-grammar/number

However, wiktionary classifies them as adjectives:

  1. https://en.wiktionary.org/wiki/first#Adjective

Wikipedia doesn't mention ordinals as adjectives in the adjective order page:

  1. https://en.wikipedia.org/wiki/Adjective#Order

But Wikipedia seems to agree with the Cambridge dictionary and not wiktionary on that page:

Determiners and postdeterminers—articles, numerals, and other limiters (e.g. three blind mice)—come before attributive adjectives in English.

@nschneid
Copy link
Contributor

nschneid commented Dec 7, 2023

Ordinal numbers should be ADJ: https://universaldependencies.org/u/pos/ADJ.html

@AngledLuffa
Copy link
Contributor Author

So First_ADJ Opium War? NNP or JJ for the xpos?

@nschneid
Copy link
Contributor

nschneid commented Dec 7, 2023

My hunch is NNP

@AngledLuffa
Copy link
Contributor Author

World and War both NNP? It looks very weird having them be PROPN but NN

@nschneid
Copy link
Contributor

nschneid commented Dec 7, 2023

In "World War I", definitely "World" and "War" are NNP. I would lean that way also for "First World War", and that seems to be consistent with OntoNotes.

@AngledLuffa
Copy link
Contributor Author

NNP or JJ for the xpos?

My hunch is NNP

Worth pointing out that in GUM, the 2002 World Cup gets the tag CD (not NNP). However, it might be considered not actually part of the name, I suppose.

@AngledLuffa
Copy link
Contributor Author

... although they later annotate

Instruments for Research into Second Languages (IRIS)
Second City

with Second_NNP

@AngledLuffa
Copy link
Contributor Author

How do the changes here look?

1a81cda

@nschneid
Copy link
Contributor

nschneid commented Dec 7, 2023

How do the changes here look?

1a81cda

LGTM

@amir-zeldes
Copy link

2002 World Cup gets the tag CD (not NNP)

I think that's canon, let me know if someone wants to argue it's not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants