Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PROPN or NOUN #91

Closed
jheinecke opened this issue Jul 23, 2020 · 5 comments
Closed

PROPN or NOUN #91

jheinecke opened this issue Jul 23, 2020 · 5 comments

Comments

@jheinecke
Copy link
Contributor

jheinecke commented Jul 23, 2020

I've just come across a some 14 sentences where the (common noun?) president (except in titles like president Bush) is annotated as PROPN (George W. Bush alleged Thursday that John Edwards lacks the experience necessary to be president., weblog-juancole.com_juancole_20040708181175_ENG_20040708_181175-0001 and others). Similar case for governor (e.g. Davis spokesman Steve Maviglio said the governor felt "betrayed" by the actions of Winter., email-enronsent07_01-0031)

Others cases where the PROPN tag seems correct, but is spelled in lower case, are (YES, I am west of broad., reviews-351561-0022). If broad is a PRON, shoud the lemma column at least be capitalized? Cf. also christmas (christmas cake for christmas day., answers-20111107144339AA0qw5S_ans-0018).
In ...has appeared in all the english Pakistan and India papers., weblog-blogspot.com_dakbangla_20050311135387_ENG_20050311_135387-0126 english is a PROPN although used as an adjective. Are language names always tagged as PROPN?

are these erroneous annotations are did I miss something in the guidelines?

@nschneid
Copy link
Contributor

The short answer is that the EWT corpus was converted from PTB-style annotation to UD, but the tagging decisions do not always line up perfectly. In the PTB style, for example, adjectives in proper names are tagged as proper nouns, though this is not really ideal per UD guidelines.

I think I agree with all of your suggestions, as long as they can be applied consistently in the corpus (and are not too much of a departure from policies of other corpora). The main reason nobody has tried to clean up all the proper name annotations is that it seems like a lot of work—but if you want to volunteer that would be great!

Related issues:
UniversalDependencies/docs/issues/678
UniversalDependencies/docs/issues/702
UniversalDependencies/docs/issues/562

@jheinecke
Copy link
Contributor Author

Thaks for your reply. I did not want to reignite the discussion on how to annotate named entities. But these are cases where IMHO NOUN should be a better UPOS tag. I could provide a list here of PROPN (and sentences) which I would retag as NOUN (it won't be very long anyway)

@nschneid
Copy link
Contributor

If you can submit a pull request for the dev branch we'll take a look!

@jheinecke
Copy link
Contributor Author

No problem! (after the summer break :-)

@nschneid
Copy link
Contributor

#164 overhauled the use of PROPN. If there are problems that we missed, let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants