Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POS tagging errors for directional preposition + here/there #64

Closed
lgessler opened this issue Sep 30, 2020 · 4 comments
Closed

POS tagging errors for directional preposition + here/there #64

lgessler opened this issue Sep 30, 2020 · 4 comments

Comments

@lgessler
Copy link
Collaborator

lgessler commented Sep 30, 2020

example query:

I imagined a man down/RB there/RB in the dark
Someone's hung it up/RB here/RB
've really got something stuck up/IN there/RB
# 4 is a phrasal verb
a few new girls over/IN there/PP
is there really someone in a craft up/RB there/RB

RB RB is the most common analysis, but PTB has majority IN RB, which I think is the correct analysis. There are other reasons to not like a word like down as an adjective:

  A downwardly oriented arrow
* A down oriented arrow
@amir-zeldes
Copy link
Owner

Thanks for reporting! Number 2 is not the construction "down there" IMO, so we shouldn't touch the following in any case:

  • "hung it up/RP here/RB", i.e. hang+up phrasal verb, and I see "up" is actually tagged RP in the corpus

I think there are contexts in which "down" can be an adverb:

  • I pointed down/RB (=downward/RB)
  • I pointed there/thataway/... - RB

I think for "down there" the reason to tag down as a preposition in PTB is by analogy to "in there", but generally I think this down is more like an adverb meaning "below, at the bottom" (in this reading down there would be like 'there at the bottom'). Despite my personal intuition, ordinarily I'd say we should accept IN RB to match PTB, but... looking in EWT I see that it's consistently RB RB:

http://match.grew.fr/?corpus=UD_English-EWT@2.6&custom=5f75152d74ff3&eud=yes

And what's more, the deprel is advmod into 'down', which the UD validator would reject if upos!=ADV. Because we are much more likely to want to concatenate GUM with EWT, maybe it's worth matching it rather than PTB??

@lgessler
Copy link
Collaborator Author

That makes sense, yeah. Personally I've grown really weary of the IN/RB/RP distinction when it comes to words like down, up, over, under, etc. and in many cases I feel like the choice of which tag to use can be nothing else but arbitrary. But I suppose there's nothing to be done but to choose one and stick to it as long as we're using PTB tags.

@amir-zeldes
Copy link
Owner

Yes, I think it's basically construction by construction. To some extent, correcting/adjudicating with NLP output is better because it can be more consistent: adnominal out + of -> IN IN; but if it's a verb+RP combination -> RP IN (e.g. run out of coffee)

At some point it becomes a bit of a shopping list, you just want to make sure it's the same shopping list for GUM, EWT, ... Could really use some automatic consolidation suggestions, probably.

@amir-zeldes
Copy link
Owner

Fixed in 6.2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants