-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POS tagging errors for directional preposition + here/there #64
Comments
Thanks for reporting! Number 2 is not the construction "down there" IMO, so we shouldn't touch the following in any case:
I think there are contexts in which "down" can be an adverb:
I think for "down there" the reason to tag down as a preposition in PTB is by analogy to "in there", but generally I think this down is more like an adverb meaning "below, at the bottom" (in this reading down there would be like 'there at the bottom'). Despite my personal intuition, ordinarily I'd say we should accept IN RB to match PTB, but... looking in EWT I see that it's consistently RB RB: http://match.grew.fr/?corpus=UD_English-EWT@2.6&custom=5f75152d74ff3&eud=yes And what's more, the deprel is advmod into 'down', which the UD validator would reject if upos!=ADV. Because we are much more likely to want to concatenate GUM with EWT, maybe it's worth matching it rather than PTB?? |
That makes sense, yeah. Personally I've grown really weary of the IN/RB/RP distinction when it comes to words like down, up, over, under, etc. and in many cases I feel like the choice of which tag to use can be nothing else but arbitrary. But I suppose there's nothing to be done but to choose one and stick to it as long as we're using PTB tags. |
Yes, I think it's basically construction by construction. To some extent, correcting/adjudicating with NLP output is better because it can be more consistent: adnominal out + of -> IN IN; but if it's a verb+RP combination -> RP IN (e.g. run out of coffee) At some point it becomes a bit of a shopping list, you just want to make sure it's the same shopping list for GUM, EWT, ... Could really use some automatic consolidation suggestions, probably. |
Fixed in 6.2.0 |
example query:
RB RB is the most common analysis, but PTB has majority IN RB, which I think is the correct analysis. There are other reasons to not like a word like down as an adjective:
The text was updated successfully, but these errors were encountered: