PROPN vs. PTB NNP(S), titles in names, and compound vs. nmod vs. appos #678

nschneid · 2019-12-11T19:16:29Z

We are realizing that the UDv2 guideline for PROPN is a pretty radical departure from the previous approach, which for English followed PTB guidelines.

Note that PROPN is only used for the subclass of nouns that are used as names and that often exhibit special syntactic properties (such as occurring without an article in the singular in English). When other phrases or sentences are used as names, the component words retain their original tags. For example, in Cat on a Hot Tin Roof, Cat is NOUN, on is ADP, a is DET, etc.

It makes sense to give function words their usual tags even within names.
But enforcing this policy would require reviewing all multiword names in (at least the English) treebanks to convert PROPN's into NOUN, ADJ, etc., depending on context.

Moreover, it's not entirely clear what the principle is behind the distinction. In the sentence "I'm reading Cat" (short for Cat on a Hot Tin Roof) should it be NOUN or PROPN?

A fine point is that it is not uncommon to regard words that are etymologically adjectives or participles as proper nouns when they appear as part of a multiword name that overall functions like a proper noun, for example in the Yellow Pages, United Airlines or Thrall Manufacturing Company. This is certainly the practice for the English Penn Treebank tag set.

Is this saying that an adjective in some names can be treated as PROPN but in other constructions (such as Cat on a Hot Tin Roof) it is obligatorily ADJ?

(Related: #664)

The text was updated successfully, but these errors were encountered:

sylvainkahane · 2019-12-11T21:17:52Z

For some UD_French treebanks we decided to introduce en external POS for MWEs and titles. In other words, words in a title or a syntactically regular MWE receive their regular POS, and the whole expression receives an EXTPOS value on the head. For titles, EXTPOS=PROPN. We have one example of a one word title where POS and EXTPOS are diffferent. Moreover titles receive a feature Type=Title, while MWEs have a feature Type=MWE.

amir-zeldes · 2019-12-11T21:18:12Z

From a linguistic point of view 'being a name' is not fundamentally a morphosyntactic category in English IMO, notwithstanding article behavior (which is not 1:1). The PTB guideline (any content word in a name is NNP) is maybe not great, but at least more or less consistently applicable.

If I read the UD guideline literally, it sounds like a person named 'Violet Shoemaker' should not be ADJ+NOUN(?), but a play called 'Violet Shoemaker' should be ADJ+NOUN, which seems strange to me (what if it's about a person called Violet Shoemaker?). I think in practical terms this can't be implemented for the English corpora, since someone would have to go through each PROPN to establish whether it is a 'real' name or part of a 'work of art' like "Cat on a Hot Tin Roof" or similar. And even for Cat on a Hot Tin Roof, I agree with @nschneid that a normal annotator would probably tag "I saw Cat last night" as Cat/PROPN.

dan-zeman · 2019-12-12T10:09:40Z

This seems to reiterate an old (pre-v1) discussion about whether PROPN is a useful category at all :-) At any rate, the problem seems to be limited to some languages (notably English). Czech adjectives are morphologically distinct, and as the name suggests, a proper noun is supposed to be a noun, so in Kočka na rozpálené plechové střeše “Cat on a Hot Tin Roof”, there is no question about making rozpálené plechové “hot tin” anything else than ADJ. There also does not seem to be a reason to make Kočka “Cat” a PROPN instead of a NOUN; it is written capitalized because it happens to occur at the beginning of a multi-word named entity; but it would also be capitalized if it occurred at the beginning of a sentence. And finally, there is no conflict with a previous practice. As a matter of fact, the PDT Czech tagset does not even distinguish proper and common nouns.

That being said, personal names are special even in Czech. This involves mostly surnames because given names can rarely be confused with common nouns. A surname may be derived from an adjective but it will be treated as noun (i.e., proper noun in UD) because it has a radically different distribution. And if it is (zero-)derived from a common noun, we will still tag it PROPN and not NOUN. Some surnames will behave the same way as their common sources but some will not: e.g., in Václav Kočka, the name is masculine, while the common noun kočka “cat” is feminine.

amir-zeldes · 2019-12-12T14:02:15Z

As I wrote above, I agree that names are not really a 'part of speech', even if they have slightly different distributions some of the time, because 1. there are lots of other categories with slightly different distributions which we don't distinguish (e.g. mass nouns) and 2. there are names that don't follow those distributions (e.g. can take articles). It's really a semantic distinction, saying whether something is a name. However, if we want to follow common practice in Computational Linguistics, support NER applications, etc. etc., then we need to have decidable guidelines.

For me, "Cat on a Hot Tin Roof" is definitely the name of a work of art, and if I use "Cat" to refer to it, that is PROPN and not NOUN. The same is true of the Mona Lisa (which even takes a 'the'): it is a PROPN because it is the name of a work of art, and for practical NER it should be included (work of art is even an NER category in OntoNotes). As soon as a noun stops pointing to its regular extension and starts acting as a name, it's fairly intuitive to explain to annotators what to do - a person called "Wolf" is PROPN, an instance of the animal is NOUN.

If we want annotators of English to treat the Mona Lisa as different from a person called Mona Lisa, I think we would find it difficult to teach to annotators, and even more problematically, we would need to go over every single NNP in the English corpora, which I think is a lost cause and not really a good idea. I like @sylvainkahane 's idea of explicitly modeling the duality of titles, but that too would require a very substantial manual annotation effort.

sylvainkahane · 2019-12-12T14:46:35Z

Our goal is not to annotate named entities, but we have to ackowledge that sometimes the problem of NEs crosses the question of syntactic annotation. It happens with titles which behaves as a whole as PRONs but can be phrases of any POS.

To introduce a double POS, one for the external behavior (towards the governor of the phrase) and one for the internal behavior (inside the phrase) is quite simple and solve many problems: the problem of titles, but also the question of MWEs that have a regular internal syntactic structure but an external behavior which does not correspond to their internal structure and to the POS of the head. Again, our goal is not to annotate MWEs, but when MWEs crosses the question of syntactic annotation, we need a solution.

nschneid · 2019-12-13T03:28:59Z

@sylvainkahane Agreed, I'm increasingly seeing the need for what could be called "syntactic MWEs": not all MWEs, to be sure ("pay attention" is syntactically normal), but ones that are problematic for the simple notion of head-modifier relations as a representation of syntagmatic compositionality.

MWEs with internal syntax, but unpredictable external distribution: E.g. titles of works of art. As pointed out above, these would need something like an EXTPOS feature (or refinement of the external attachment edge) to properly capture both internal and external structure.
- Word-level POS tags are especially problematic here due to the mixed use of "proper noun" tags. While I am sympathetic to UD's goal of adopting accessible and practical terminology for things, it is also worth noting that dictionaries include multiword names like "United States" as nouns, and do not POS-tag the individual words.
MWEs that lack a clear internal structure: Currently handled by flat and fixed, to an extent, but these still have a head designated by convention, and word-level POS. And there are some multiple case attachments and the like that strike me as counterintuitive.
Constructions with templatic internal structure: E.g. dates, and personal name constructions that where the name is preceded by a title used to address the person ("Mr. Smith"). There is a tension between treating the whole thing as flat and recognizing that parts serve separate functions and are not equally heads ("Mr." is something of a modifier of "Smith").

amir-zeldes · 2019-12-13T18:04:59Z

EXTPOS sounds great, but I don't see how we would do this for all existing corpora... And even if it's done for some data sets, my instinct is to leave POS alone (since they reflect widely used standards) and rather add 'titlehood' to the large spans (à la PTB NP-TTL), or also add the non-name-like information somewhere, but in any case not redefining PROPN.

Making Smith the head of Mr. Smith would require another deprel - I guess most unoriginal would be dep:title, because I think even subtyping flat would not allow RTL dependencies for that relation. Conceptually I think it's probably right for Smith to be the head, but I'm very easily convinceable that this should be left alone for the status quo.

nschneid · 2019-12-19T19:34:56Z

Making Smith the head of Mr. Smith would require another deprel - I guess most unoriginal would be dep:title, because I think even subtyping flat would not allow RTL dependencies for that relation. Conceptually I think it's probably right for Smith to be the head, but I'm very easily convinceable that this should be left alone for the status quo.

What about a new relation called formulaic, for template-like patterns that don't fall under the main syntactic relations used for normal vocabulary, but rather, specific formulas for putting together names and numbers ("Mr.", dates, units of measurement, etc.)? The technical difference would be

flat and fixed are for combinations where it is difficult to identify a unique head, so by convention the first word functions as the head in the tree
formulaic is for relations where a head can be designated based on criteria like omissibility, but none of the usual dependency relations apply

Examples (specific headedness decisions subject to discussion):

His Excellency Mr. John Smith, Jr.
- flat(John, Smith)
- formulaic(John, Mr.)
- formulaic(John, Jr.)
- flat(His, Excellency)
- formulaic(John, His)
bus number 623 leaves at five o'clock on Sep 23 2020
- Normal syntactic dependencies: nsubj(leaves, bus), obl:tmod(leaves, five), obl:tmod(leaves, 23)
- "bus number 623" (which can be shortened to "bus 623"): formulaic(bus, 623), formulaic(623, number)
- "five o'clock": formulaic(five, o'clock)
- "Sep 23 2020": formulaic(23, Sep), formulaic(Sep, 2020)

And we could have the option of subtyping some of the formulaic relations, e.g. formulaic:month(23, Sep) vs. formulaic:year(Sep, 2020).

jnivre · 2019-12-19T20:42:48Z

I think the simple solution is to use nmod (possibly with subtyping). If we assume that ”Smith” is the head, then ”Mister” is a noun modifying that head. This is what we do for similar constructions in Swedish. UD by necessity has to be rather coarse-grained, so I think we should be careful not to start proliferating relations. Joakim Skickat från min iPhone 19 dec. 2019 kl. 20:35 skrev Nathan Schneider <notifications@github.com>: Making Smith the head of Mr. Smith would require another deprel - I guess most unoriginal would be dep:title, because I think even subtyping flat would not allow RTL dependencies for that relation. Conceptually I think it's probably right for Smith to be the head, but I'm very easily convinceable that this should be left alone for the status quo. What about a new relation called formulaic, for template-like patterns that don't fall under the main syntactic relations used for normal vocabulary, but rather, specific formulas for putting together names and numbers ("Mr.", dates, units of measurement, etc.)? The technical difference would be * flat and fixed are for combinations where it is difficult to identify a unique head, so by convention the first word functions as the head in the tree * formulaic is for relations where a head can be designated based on criteria like omissibility, but none of the usual dependency apply Examples (specific headedness decisions subject to discussion): * His Excellency Mr. John Smith, Jr. * flat(John, Smith) * formulaic(John, Mr.) * formulaic(John, Jr.) * flat(His, Excellency) * formulaic(John, His) * bus number 623 leaves at five o'clock on Sep 23 2020 * Normal syntactic dependencies: nsubj(leaves, bus), obl:tmod(leaves, five), obl:tmod(leaves, 23) * "bus number 623" (which can be shortened to "bus 623"): formulaic(bus, 623), formulaic(623, number) * "five o'clock": formulaic(five, o'clock) * "Sep 23 2020": formulaic(23, Sep), formulaic(Sep, 2020) And we could have the option of subtyping some of the formulaic relations, e.g. formulaic:month(23, Sep) vs. formulaic:year(Sep, 2020). — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#678?email_source=notifications&email_token=ABZ7ZVRYNYD62BB7SSNECLTQZPEGNA5CNFSM4JZT3ZS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHKV6RA#issuecomment-567631684>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABZ7ZVQU7CMB7J2GPZ3PAFTQZPEGNANCNFSM4JZT3ZSQ>. När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy

arademaker · 2019-12-19T20:52:10Z

Why I can't use appos instead of formulaic? appos(John, Mr.)?

dan-zeman · 2019-12-19T20:55:07Z

Why I can't use appos instead of formulaic? appos(John, Mr.)?

Because appos is a left-to-right relation (besides also being meant for a different type of relation, but I think nobody has succeeded in precisely defining what that different relation is, while left-to-rightness is something easily testable).

jnivre · 2019-12-19T20:56:51Z

This would indeed be classified as apposition in many traditional grammars (including the Swedish tradition), but the UD concept of apposition is narrower and restricted to thinks like ”president of X” in ”Mr. Smith, president of X”. Personally, I would be in favor of treating appos as a subtype of nmod rather than a universal relation. Joakim Skickat från min iPhone 19 dec. 2019 kl. 21:52 skrev Alexandre Rademaker <notifications@github.com>: Why I can't use appos instead of formulaic? appos(John, Mr.)? — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#678?email_source=notifications&email_token=ABZ7ZVVIVGZKHQI2FNO6JTLQZPNHXA5CNFSM4JZT3ZS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHK7YMI#issuecomment-567671857>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABZ7ZVS7YZYJ46SZC7MTZ5TQZPNHXANCNFSM4JZT3ZSQ>. När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy

nschneid · 2019-12-19T21:20:26Z

@jnivre As applied to English thus far, nmod is pretty much exclusively for "case"-marked nominals (PPs and possessives). In all of EWT I can find just 20 or so other uses of nmod.

The current definition says: "The nmod relation is used for nominal dependents of another noun or noun phrase and functionally corresponds to an attribute, or genitive complement." I don't know if "Mr." counts as an "attribute".

But assuming we were to widen nmod to include "Mr." then a reasonable question is, how to distinguish it from compound? The most productive kind of compounding in English is with a noun modifying another noun.

"Mr." feels quite different from both compounds and PPs, because it's a formulaic way of putting together names, and has a restrictive distribution (it can only precede a person's name).

amir-zeldes · 2019-12-19T21:35:43Z

I would also prefer not to add major types - UD is complicated enough, and stability is important. If we did do a 'this is a special thing we don't have a name for', I would say dep:xyz is the way to go, so maybe dep:title. Honestly I've gotten used to flat(Mr.,Smith), so I can live with it - I agree with Nathan that nmod intuitively means something different to me (prepositional/case-bearing). This is the reason we have things like nmod:npmod in English for things that are not 'case-bearing'

Incidentally RE the comparison to compound - that's exactly what Stanford Dependencies had, where nn was used for both compound modifiers and titles. I think the reason it didn't annoy people was that names were right-headed anyway (last dominates first name), so it looked less jarring.

jnivre · 2019-12-19T21:46:35Z

I’ve always considered nmod:npmod a strange quick of the English treebank. :) Even the name is a weird mix of dependency and phrase structure syntax. In a cross-linguistic perspective, I think it makes more sense to use nmod for any kind of nominal modification, regardless of whether it is accompanied by head marking, dependent marking, or no marking. Compounding is different because it is occurs at the lexical level, but I agree that English orthography makes it hard to separate in practice. It is easier in Swedish and German where compounds are written with no internal spaces. Joakim Skickat från min iPhone 19 dec. 2019 kl. 22:35 skrev Amir Zeldes <notifications@github.com>: I would also prefer not to add major types - UD is complicated enough, and stability is important. If we did do a 'this is a special thing we don't have a name for', I would say dep:xyz is the way to go, so maybe dep:title. Honestly I've gotten used to flat(Mr.,Smith), so I can live with it - I agree with Nathan that nmod intuitively means something different to me (prepositional/case-bearing). This is the reason we have things like nmod:npmod in English for things that are not 'case-bearing' Incidentally RE the comparison to compound - that's exactly what Stanford Dependencies had, where nn was used for both compound modifiers and titles. I think the reason it didn't annoy people was that names were right-headed anyway (last dominates first name), so it looked less jarring. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#678?email_source=notifications&email_token=ABZ7ZVRMJFKLLIRWUI52VK3QZPSLBA5CNFSM4JZT3ZS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHLDJVY#issuecomment-567686359>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABZ7ZVXTJFMBL23LQDZEQUTQZPSLBANCNFSM4JZT3ZSQ>. När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy

amir-zeldes · 2019-12-19T21:52:45Z

The name nmod:npmod is indeed odd and has purely historical reasons, but I don't agree that the functions it stands for are interchangeable with nmod. The criterion for the nmod/obl subtypes in English is that they appear without adpositions but are not objects. If we have:

I'll jog next week

Then we don't want:

obj(jog,week)

But we can't use the preposition marking guideline to do:

obl(jog,week)

And I think it's clear we have a different construction here than prepositional modification. So the solution is to give these cases a distinct relation:

obl:npmod(jog,week)

I would expect this to be applicable to other languages with unmediated adverbial NPs (e.g. 'accusativus graecus' in classical languages)

amir-zeldes · 2019-12-19T21:55:24Z

Incidentally, we applied the same subtyping scheme to UD_Coptic for the same reasons:

https://corpling.uis.georgetown.edu/annis/#_q=cG9zPSJWIiAtPmRlcFtmdW5jPSJvYmw6bnBtb2QiXSBub3Jt&_c=Y29wdGljLnRyZWViYW5r&cl=5&cr=5&s=0&l=10&_seg=bm9ybV9ncm91cA

It has also been our main workaround for the validator shooting down advmod+not ADV.

nschneid · 2019-12-19T22:21:44Z

I had momentarily forgotten about nmod:npmod. I too find it extremely confusing and it has its own issue: #478

@amir-zeldes, if the nmod subtypes in English are for non-case-bearing nominal modifiers, does that suggest nmod:title would be an appropriate solution for "Mr."? Or would you say that "Mr." is not really nominal, but rather an extremely special/miscellaneous modifier?

amir-zeldes · 2019-12-20T01:54:50Z

It's true that the 'Mr.' modifier doesn't have an adposition, but I think it's different than the extent/spatiotemporal modifiers that typically get obl/nmod:npmod. Those are essentially adverbial, which I guess is what earns them the name 'oblique': 5 years old -> old how much? -> 5 years etc. The 'Mr.' titles are really part of the same NP they modify without instantiating a new morphosyntactic role. In that respect they're similar to appositions, except that appositions consist of two full NPs, and are capable of hosting articles if they are common in English:

The president, Jane Tanaka
Jane Tanaka, the president
a president, Jane Tanaka
Jana Tanaka, a president

Unlike:

Ms. Jane Tanaka
*Jane Tanaka Ms.
President Jane Tanaka
*Jane Tanaka President

Non appositions can't form two NPs with definiteness properties, etc. I think titles are really a distinct construction.

dan-zeman · 2019-12-20T14:40:46Z

I agree with Joakim that nmod denotes a nominal that modifies another nominal, and it is not important whether the dependent nominal is or is not marked for case.

What is important however is that we have a reason to believe that one of the two nominals is the head and the other is dependent. Because if we don't have the reason, then they should be connected via flat.

As for nmod vs. compound specifically in English, I am not sure how exactly the distinction is or should be drawn. But since compounds are lexical, I don't think they can be used with person names; therefore, compound(Tanaka, Ms.) would seem wrong to me => it should be nmod or flat.

nschneid · 2019-12-20T14:54:06Z

What is important however is that we have a reason to believe that one of the two nominals is the head and the other is dependent. Because if we don't have the reason, then they should be connected via flat.

Agreed—I think the fact that in "Mr. Smith", "Mr." is optional while "Smith" is not is reason enough to consider "Smith" the head.

As for nmod vs. compound specifically in English, I am not sure how exactly the distinction is or should be drawn. But since compounds are lexical, I don't think they can be used with person names; therefore, compound(Tanaka, Ms.) would seem wrong to me => it should be nmod or flat.

I agree that compound seems weird in this case, but could you elaborate on what you mean by "compounds are lexical"? In English I'd say that "valley unicorn" is just as compositional as "domestic unicorn" or "unicorn of the valley".

dan-zeman · 2019-12-20T15:05:23Z

could you elaborate on what you mean by "compounds are lexical"?

A short answer should probably be "no I couldn't" — that's why I say I'm not sure how exactly the distinction should be drawn.

But if I step from exact definitions to vague intuition, then my understanding of nominal compounds is that one takes two common nouns, i.e. words denoting entities with given properties, and creates something that also denotes an entity with given properties, i.e., the result is like a common noun except that it is not written as one word. (Of course, the compound relation in UD is also used for compounds that are not nominal, but that does not seem relevant in this thread.)

Personally, I'd be quite fine if the compound relation were not used for nominal compounds and nmod were used instead, so I'd prefer other people to elaborate on this.

nschneid · 2019-12-20T15:38:13Z

Related to the "two common nouns" criterion, we should also consider combinations of a normal common noun and a normal proper noun (by "normal" I mean the noun that could serve as an NP head, unlike "Mr.".):

"the Bush administration": compound(administration, Bush) seems reasonable to me
"the American singer Madonna": compound(Madonna, singer)? This is tougher, a combination of "the American singer" + "Madonna" (two full NPs)—almost an appositive but without punctuation/pause between them. Semantically, it seems to me that "the American singer" is a descriptor that elaborates on "Madonna".

amir-zeldes · 2019-12-20T16:30:06Z

I think "the American singer Madonna" is appos(singer,Madonna), since they are independent NPs, and are also reversible:

Madonna the American singer
* administration the Bush

The criterion for compounding in English is IMO not the fact that they create reference to an entity (so does 'unicorn of the valley'), but that they have the following properties:

They form a single NP, evidenced by the ability to insert only one article:
- a/the valley unicorn (single NP, single article - 'valley' is not a full NP)
- * a/the valley a/the unicorn
- a/the unicorn of a/the valley (two NPs, with nesting, two articles possible)
The modifier is no longer referenceable:
- "... a unicorn of the valley. It (=the valley) was full of grass, so unicorns ..."
- * "... a valley unicorn. It (=the valley) was full of grass, so unicorns ..."
Canonically, the modifier cannot be inflected: (this has some empirical exceptions and has been discussed extensively, e.g. by Pinker)
- rat trap(s)
- * rats trap(s) - only head is pluralizable

nschneid · 2019-12-20T17:15:39Z

I think "the American singer Madonna" is appos(singer,Madonna), since they are independent NPs, and are also reversible:

Madonna the American singer

I agree it's apposition-like, but I think there's a difference between "the American singer Madonna" and "the American singer, Madonna"—the second is a prototypical apposition which adds information about a fully established referent. "The American singer Madonna" strikes me as a descriptor-NP + name-NP construction, better paraphrased as "Madonna, the American singer". (Maybe these considerations are too semantic/nuanced, though, and for simplicity we should just pretend that there's an invisible comma.)

amir-zeldes · 2019-12-20T18:25:51Z

I think syntactically it's the same with or without the comma - the alternative is to not have the article, in which case we have the 'title' construction again:

American singer Madonna said today...

But not:

* Madonna American singer said today...

As soon as it can take 'the', it's an independent NP for me, and for me an apposition is two NPs in sequence fulfilling the same syntactic function (so both NPs are equally the subject of 'said' in "The American singer Madonna", in terms of argument structure, or you can postulate an NP containing both NPs)

nschneid · 2019-12-20T19:09:48Z

I'm OK with the appos solution for "the American singer Madonna" if others are.

Getting back to the broader question of how compound is distinguished from nmod in languages like English where there isn't a cue from the morphology/spelling: Do @amir-zeldes's criteria for compound in English sit well with people? Should we document them somewhere?

jnivre · 2019-12-20T19:23:29Z

The criteria look good to me. The key point is that they form a single NP or, rather, that they form a compound nominal head. That is what I meant by saying that it occurs at the lexical level, not the phrasal level.

dan-zeman · 2019-12-20T19:24:26Z

They do not sit well with me because the term "NP" is not defined in UD :-) But it can be probably rephrased so that it is clear what is meant.

I would also appreciate examples from more languages than just English, for the three-way distinction nmod-compound-appos.

amir-zeldes · 2019-12-20T21:03:08Z

I think the criteria will vary somewhat from language to language, but you can get a 3-way distinction even in other language families. For example, in Semitic languages, construct states are often identified as compounds, but while they only allow a single article, the modifier can be inflected. I'll use Hebrew and horses instead of unicorns, since 'unicorn' is itself complex in Hebrew:

emek ha-susim 'the horses valley' (compound, note horses can be pluralized)
*ha-emek ha-susim '*the horses the valley'
ha-emek shel-ha-susim 'the valley of the horses' (nmod, two articles possible)
[ha-emek], [ha-makom shel-ha-susim] ' the valley, the place of the horses' (appos, both full NPs have articles, and the internal nmod does too)

The article+inflection criterion works for Romance too, though often these are spelled together or with a hyphen:

un/le bracelet-montre "a/the wristwatch (watch-bracelet)" (single article)

I think nmod vs. compound is motivated for Chinese and Japanese as well, where compounds are linked without adpositions, but nmod has adpositions, including for possession (de and no respectively).

amir-zeldes · 2019-12-20T21:09:11Z

BTW for Slavic I suspect it's more difficult to make a 3-way distinction, since the (very rare) noun-noun 'compounds' inflect with agreeing case:

Polish:
krem-żel "cream gel"

And in a sentence in genitive context (real example):
to właśnie charakterystyczne cechy kremu-żelu do twarzy
"These are the characteristic features of the face cream.GEN gel.GEN"

So formally it looks a lot like an apposition. But I think it would be impossible to put a demonstrative on both parts:

* tego kremu tego żelu
* "of this cream-this gel"

Here's an authentic example with a single demonstrative:

po użyciu tego kremu-żelu cera jest ukojona...
after using this cream-gel, the complexion is soothed

So maybe the article criterion could even be used for Polish, if these things are treated as multi token units (might be spelling dependent in practice)

dan-zeman · 2019-12-20T21:25:26Z

If kremu-żelu is tokenized as three tokens, can we say that one of the two lexical tokens is the head? It looks like flat to me.

So I actually forgot flat above, and it is potentially a four-way distinction:

a noun-noun flat, if there is no clear head
an nmod, if there are no language-specific criteria to say that it is compound
a noun-noun compound, if language-specific criteria exist and hold here
an appos; a language-specific criterion in some languages seems to be that both parts are "full NPs", whatever that means (an article/demonstrative is there or can be added?); I am not sure whether we should have apposition among the relations that "have a clear head" because in fact we always attach the second part to the first, so it is like flat.

amir-zeldes · 2019-12-20T21:56:13Z

Yes, I agree the possibility of multiple determiners (including demonstratives in languages without articles) is a good way to distinguish appos from flat and compound.

The krem-żel example is more flat-like than the other compound examples since, insofar as it's a compound at all, it is an example of a copulative compound, like English singer-songwriter (which is both a singer and a songwriter, not a sub-type of songwriter). I think Slavic languages are often described as having little or no compounds, except for the type in N-o-N, which is a single token (e.g. bajkopisarz 'fairy-tale writer')

The more frequent type typologically is determinative compounds, where the compound is a semantic subtype of the head ("taxi driver", "night table"), which is clearly headed and not flat.

dan-zeman · 2020-01-06T16:00:53Z

Okay, so what about this.

arademaker · 2020-01-06T16:12:30Z

I didn't know about this page. We need a way to see in the website the context of a given page or what pages link to it.

dan-zeman · 2020-01-06T16:14:33Z

I didn't know about this page.

That's because the page did not exist until about twenty minutes ago :-)

amir-zeldes · 2020-01-06T16:39:07Z

I like the draft, but I would remove "in English, the criterion seems to be" (I'd rather specify what it is!), and I would add that in appositions, the two nominals can typically be reordered:

Appos:
Barack Obama, the President
The President, Barack Obama

Not appos:
President Barack Obama
x Barack Obama President

(I started using x for star to avoid MD bullets)

Also I would consider "Great deals, great pizza!" to be parataxis.

dan-zeman · 2020-01-06T20:09:14Z

I would remove "in English, the criterion seems to be" (I'd rather specify what it is!)

So would I. I just was not sure that it was correct, but if it is, then I am happy to replace "seems to be" by "is".

dan-zeman · 2020-01-06T21:00:20Z

English rules for apposition modified in 145521a.

Parataxis is another can of worms (but it deserves a separate thread if we are to discuss it). I would have preferred to give an example where asyndetic coordination of nominals acts as a subject / object / oblique dependent in a clause; but I did not find a good example in English. (In any case, Great deals, great pizza! is annotated as coordination in UD 2.5 English EWT: see the results of http://hdl.handle.net/11346/PMLTQ-BSUB.)

amir-zeldes · 2020-01-07T17:36:14Z

RE appos my only suggested change to the commit is:

"has its own determiner" > "can have its own determiner"

Thanks for putting this up!

amir-zeldes · 2020-01-07T17:37:57Z

Oh, and regarding the EWT query, yes, but you can easily find the opposite as well (in EWT itself, not even talking about whether that's consistent with other corpora):

http://hdl.handle.net/11346/PMLTQ-ZW7H

For example in EWT:
OK Food, Slow service
parataxis(food,service)

dan-zeman added the English label Dec 12, 2019

dan-zeman added this to the v2.6 milestone Dec 12, 2019

nschneid changed the title ~~PROPN vs. PTB NNP(S)~~ PROPN vs. PTB NNP(S), titles in names, and compound vs. nmod vs. appos Dec 20, 2019

nschneid added this to Names, titles, numbers, values in MWEs, names, adpositions, particles, etc. Dec 24, 2019

dan-zeman added a commit that referenced this issue Jan 6, 2020

Options when connecting two nominals (see also #678).

54a73ca

dan-zeman added a commit that referenced this issue Jan 6, 2020

Better example (now from GUM) of asyndetic coordination for #678.

e2c5218

dan-zeman closed this as completed in 03cbab0 Jan 24, 2020

nschneid mentioned this issue May 6, 2020

Proper name as a lexical feature? #702

Open

nschneid mentioned this issue Jul 23, 2020

PROPN or NOUN UniversalDependencies/UD_English-EWT#91

Closed

amir-zeldes mentioned this issue Oct 12, 2020

question on tagging of PROPN UniversalDependencies/UD_English-PUD#3

Open

nschneid mentioned this issue Jan 17, 2021

amod vs. compound for terms like "hot dog" #756

Closed

PROPN vs. PTB NNP(S), titles in names, and compound vs. nmod vs. appos #678

PROPN vs. PTB NNP(S), titles in names, and compound vs. nmod vs. appos #678

Comments

nschneid commented Dec 11, 2019

sylvainkahane commented Dec 11, 2019

amir-zeldes commented Dec 11, 2019

dan-zeman commented Dec 12, 2019

amir-zeldes commented Dec 12, 2019

sylvainkahane commented Dec 12, 2019

nschneid commented Dec 13, 2019

amir-zeldes commented Dec 13, 2019

nschneid commented Dec 19, 2019 • edited Loading

jnivre commented Dec 19, 2019 via email

arademaker commented Dec 19, 2019

dan-zeman commented Dec 19, 2019

jnivre commented Dec 19, 2019 via email

nschneid commented Dec 19, 2019

amir-zeldes commented Dec 19, 2019

jnivre commented Dec 19, 2019 via email

amir-zeldes commented Dec 19, 2019

amir-zeldes commented Dec 19, 2019

nschneid commented Dec 19, 2019

amir-zeldes commented Dec 20, 2019

dan-zeman commented Dec 20, 2019

nschneid commented Dec 20, 2019

dan-zeman commented Dec 20, 2019

nschneid commented Dec 20, 2019

amir-zeldes commented Dec 20, 2019 • edited by nschneid Loading

nschneid commented Dec 20, 2019

amir-zeldes commented Dec 20, 2019 • edited Loading

nschneid commented Dec 20, 2019

jnivre commented Dec 20, 2019 via email • edited by dan-zeman Loading

dan-zeman commented Dec 20, 2019

amir-zeldes commented Dec 20, 2019

amir-zeldes commented Dec 20, 2019

dan-zeman commented Dec 20, 2019

amir-zeldes commented Dec 20, 2019

dan-zeman commented Jan 6, 2020

arademaker commented Jan 6, 2020

dan-zeman commented Jan 6, 2020

amir-zeldes commented Jan 6, 2020

dan-zeman commented Jan 6, 2020

dan-zeman commented Jan 6, 2020

amir-zeldes commented Jan 7, 2020

amir-zeldes commented Jan 7, 2020

nschneid commented Dec 19, 2019 •

edited

Loading

amir-zeldes commented Dec 20, 2019 •

edited by nschneid

Loading

amir-zeldes commented Dec 20, 2019 •

edited

Loading

jnivre commented Dec 20, 2019 via email •

edited by dan-zeman

Loading