-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
amod vs. compound for terms like "hot dog" #756
Comments
I think we should avoid talking about compositionality or mwes in the context of compounds: there are compounds which follow the Compound Stress Rule (CSR) which are compositional, like "TRAIN station", and non-compositional cases which do not follow it, like "hot poTAto", which I have no doubt is As for the CSR as a criterion for Even for words that have distinct adjectival forms (e.g. golden, vs. gold, which can be a compound modifier), most English speakers don't seem to recognize a distinction. From my experience teaching UD to English speaking students every year, it's hard enough just to explain that the modifier in N+N compounds is not an adjective... When we have two tokens and the text is written, it's not even always 100% possible to know whether CSR applies, so if something is clearly an ADJ synchronically, as in "hot dog", I don't even know that we should be insisting on |
I would vote for always using |
I would also always choose
... and so on. And this would incidentally be the easiest for the annotators, too! |
@Stormur are you arguing that |
I agree with @Stormur. We never understood how to apply |
Interesting. I think @jnivre has expressed that In any case, removing In this issue I just wanted to get some clarity around adjective cases like "hot dog". It seems like nobody objects to calling that |
In fact, as it is now (no real definition in the guidelines), the more I think of it, yes. I agree with viewing it as a kind of |
Here's a pull request for clarifying the scope of I do wish we had a better overall definition but I am trying to improve this one for now to say the scope should be limited based on morphosyntactic criteria. |
@Stormur I don't think we should get rid of
The common denominator of compounding across languages is that it creates something that at the syntactic level behaves like one word. In some languages this amounts only to limitation to a single determiner/definiteness value (e.g. Arabic or Hebrew), but otherwise the modifier can be modified. In others, the result is much more restricted, often amounting to a single noun with almost no possibility of modifying the modifier (e.g. German). The article criterion is convenient for N-N compounds, since it covers a broad range of languages, including the Romance examples ("le vote sanction", but not "*le vote une sanction" or "*un vote la sanction" etc.) |
@amir-zeldes, @Stormur: could one of you start a separate issue to consolidate the discussion of the rationale for |
@nschneid you are right, sorry! The issue is expanding and it deserves its own place. |
In English, ADJ+NOUN expressions like "hot dog" (in the idiomatic food sense) are traditionally known as compounds. Should "hot" attach as
amod
orcompound
?As @amir-zeldes has pointed out, one distinguishing factor is stress (the Compound Stress Rule): the food is pronounced HOT dog with stress on the first word, whereas the compositional use of the adjective would be pronounced hot DOG. But the written form does not reflect this distinction.
I suspect annotators would be inclined to attach all attributive adjectives modifying a noun as
amod
. There is, in fact, one token of "hot dogs" in EWT which is currentlyamod
.A review of uses of
compound
withADJ
+NOUN
in EWT turns up mostly phrasal modifiers like "top notch" and "high quality", which arguably function like multiword adjectives (discussion of phrasal attributive modifiers is happening in #753). Mixed results for "criminal defense lawyer/attorney":amod(defense,criminal)
(2 instances),compound(defense,criminal)
(1 instance). All matches of this pattern in GUM look like tagging errors.Note that the term "compound" has caused confusion for end users (#551), but I am mainly concerned with having clear criteria for annotators.
(We can also include names like "White House" in this category though per PTB guidelines for proper names, "White" is tagged as
PROPN
—#678—so on that basis it makes sense to usecompound
.)The text was updated successfully, but these errors were encountered: