Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copula: VERB vs. AUX #275

Closed
dan-zeman opened this issue Apr 11, 2016 · 20 comments
Closed

Copula: VERB vs. AUX #275

dan-zeman opened this issue Apr 11, 2016 · 20 comments
Labels
standard needed universal UPOS Universal part-of-speech tags: definitions and examples
Milestone

Comments

@dan-zeman
Copy link
Member

Since the first release of the UD guidelines in October 2014 copula verbs were to be tagged VERB and not AUX. But the stance was not unanimous in the core UD group. Should we revise the decision for version 2 of the guidelines?

The discussion at that time was done by e-mail. I am going to post the relevant messages here so we have a base for further arguments.

@dan-zeman dan-zeman added standard needed UPOS Universal part-of-speech tags: definitions and examples universal labels Apr 11, 2016
@dan-zeman dan-zeman added this to the universal v2 milestone Apr 11, 2016
@dan-zeman
Copy link
Member Author

@jnivre (15.9.2014): AUX: Part of the definition of AUX says that it should be associated with a lexical verb. This would seem to imply that the copula is not an auxiliary verb (whereas the use of “be” to form the progressive form and passive voice is). Is everyone happy with this? I have also suggested that an exact definition of AUX (for languages that have auxiliaries at all) should be included in the language-specific documentation.

@dan-zeman
Copy link
Member Author

@dan-zeman (17.9.2014): I agree that copula is not the same as auxiliary and it also is not a normal verb. Since we do not have a dedicated tag for copulas, we have to put them somewhere i.e. either VERB or AUX. The same holds for modal verbs. I have no strong preference here but I am slightly inclined to keep copulas among main verbs and modals among auxiliaries.

@jnivre (17.9.2014): Thanks, Dan. I share your view concerning the VERB/AUX distinction (exclude copula, include modals)

@dan-zeman
Copy link
Member Author

@manning (26.9.2014): Hi Joakim and everyone,

While generally agreeing with what you’ve been writing up for principles, you put in:

Note that copula verbs, despite being dependents of their predicates, are treated as main verbs in this respect and take auxiliaries as dependents. The general rule is that an auxiliary should always attach to a verb (if there is one).

pastedgraphic-1

This isn’t what we’ve been doing. We’ve been more radically content-word-as-head than that, and would have

nsubj(sick, She)
aux(could, sick)
aux(have, sick)
aux(been, sick)

I suspect that this is actually the better way to go, at least for English. This is especially the case for participles. There are the usual decades of linguistic literature arguing that participles sometimes function as adjectives, indicated by things like being gradable, negatable with ‘un’, etc.:

The book is interesting, The book was quite interesting, The book was uninteresting

But for others there is evidence of them being verbs, such as by being able to use ‘re-‘ with them.

The book is decaying, The book is repositioning the status of women.

But, as it notes in the Penn Treebank tagging manual, in practice “The distinction between adjectives and gerunds/present participles is often very difficult to make.” An example that is done both ways is with “will be destabilizing"

This way of doing things would mean that the dependency structure would change depending on which way you voted on the part of speech. While it’s a shame that the part of speech is often assigned inconsistently/arbitrarily in various situations, it seems to me a much bigger problem if that also changes the skeleton of the dependency tree.

So, unless there are compelling reasons to do otherwise, I’d argue for doing it the way we’ve been doing it…. Is there a good reason to do things this way?

@dan-zeman
Copy link
Member Author

@jnivre (27.9.2014): On second thought, I tend to agree with @manning. Always attaching to a verb seemed like a nice idea at the time, but always attaching to the predicate is probably better in the long run. I think the Finnish treebank attaches auxiliaries to copulas, but this is probably because they use a nested structure for auxiliaries in general.

@dan-zeman
Copy link
Member Author

@fginter (29.9.2014): This is in line with the rest of the decisions, so for consistency
reasons I'm okay(-ish) with it and will rehang the auxiliaries in TDT.
But since this is not how we did it in TDT, I suppose it says we think
the earlier analysis was more in agreement with our intuition.
Especially since the auxiliaries can never be there without the main
copula verb, they seemed to be clearly bound to it.

@jnivre
Copy link
Contributor

jnivre commented Apr 11, 2016

It seems to me that the logical conclusion of this discussion should have been that copulas are AUX because they are treated structurally as auxiliaries (not taking auxiliaries themselves, and being siblings of auxiliaries). Unless there are strong arguments to the contrary, I would therefore advocate this change for v2.

@dan-zeman: Thanks for digging this discussion out of your email archive and posting it on github.

@manning
Copy link
Contributor

manning commented Apr 24, 2016

@jnivre: I also see how making copulas AUX seems more consistent with the predicate-as-head structures we assign in cases like "She is smart".

@nschneid
Copy link
Contributor

While I see the structural argument that copulas are like auxiliaries, I worry that expanding the definition of the AUX tag will confuse people used to the traditional definition of auxiliary as a function word that accompanies a main verb. (Is there a standard term that covers verbal function words? I think "support verb" is in the right ballpark but not quite right.)

If we want to use the POS tagset to express that copulas are not-quite-main-verbs, why not introduce a new tag (COP) and remove all confusion? Then again, if the distinction is already being marked in the dependency relation (aux vs. cop), why do we need the tag distinction at all—why not just call them all verbs?

@dan-zeman
Copy link
Member Author

One thing that I like about copulas becoming AUX is that ambiguity between periphrastic passive on one side, and copula+participle on the other side, will only affect the deprel and not the POS tag. (Example: the contract was signed after the lunch ... auxpass(signed, was) vs. the contract is signed at the last page ... could be cop(signed, is), because the phrase describes the state of the contract rather than the act of signing.)

Thus I would not be happy with introducing a COP tag. Of course removing the AUX tag and keeping only VERB would solve this particular issue as well. I think I could live without AUX but I believe there were people who find this distinction important (the tag was not present in the original Google universal tag set but it was added shortly before the UD project started).

@jnivre
Copy link
Contributor

jnivre commented Jul 14, 2016

I would definitely rather remove the AUX tag than add a COP tag. The AUX/VERB distinction is great for parsing, but only if the tagger gets it right. For Swedish there is a 5 percent absolute difference in parsing accuracy between using gold and predicted tags, which is due almost exclusively to the AUX/VERB distinction.

@nschneid
Copy link
Contributor

For Swedish there is a 5 percent absolute difference in parsing accuracy between using gold and predicted tags, which is due almost exclusively to the AUX/VERB distinction.

What happens if AUX and VERB are collapsed into one tag—is the parser able to learn the distinction?

@jnivre
Copy link
Contributor

jnivre commented Jul 14, 2016

Sort of but not quite. When you go from gold to predicted tags with the AUX/VERB distinction, you lose 4-5 percentage points. If you then go to predicted tags without the distinction, you lose another 0.5-1 points (if I remember correctly). Then again, this is only evidence from a single language. It would be interesting to know what happens in other languages.

@nschneid
Copy link
Contributor

Interesting. Is the POS tagger trained on the same data as the parser? If so, it's curious that it can capture the distinction a bit better.

@nschneid
Copy link
Contributor

http://aclweb.org/anthology/W/W16/W16-1202.pdf, Table 6 might be relevant here: I think the first 4 rows are with gold POS, τ_o = original POS, τ_a = ambiguous (collapsed AUX into VERB). Collapsed tags drop performance by 1.1 points in Slovenian and .4 points in Czech.

@jnivre
Copy link
Contributor

jnivre commented Jul 14, 2016

Yes, they are with gold tags.

For Swedish POS tagging, the situation is complex. Most Swedish taggers are trained on the Stockholm-Umeå Corpus, which is 10 times larger than the treebank (1M vs. 100K tokens), but this corpus does not make the AUX/VERB distinction. So we have to train a second tagger on the treebank itself, but we only trust the second tagger for the AUX/VERB distinction (and only applies it if the first tagger has tagged a word as VERB). To further improve tagging accuracy for this distinction, we apply a few hand-crafted heuristics to the output of both taggers, which rely on the fact that Swedish syntax pretty much always requires the main verb to go after the auxiliarie(s). With this combined system, we achieve over 90% accuracy on the AUX/VERB distinction, but we still observe a 4-5 percent drop compared to gold tags, showing that the few cases it gets wrong lead to really bad parses (essentially because the root dependency is wrong and so many other dependencies depend on this). Finally, you have to remember that these are results for a greedy transition-based parser. Other parsers might behave differently.

By the way, I think we are digressing from the original issue, so if you want to continue discussing tagging and parsing I suggest we go off line. :)

@amir-zeldes
Copy link
Contributor

Regarding the tag issue for copulas, I'd like to point out that in many languages they are not verbs at all:

  • Arabic huwa etc.: personal pronoun
  • Hebrew ze, Polish to, Russian eto: demonstrative
  • Japanese da/desu: idiosyncratic semi-verb like thing(??)
  • Coptic, Hausa: idiosyncratic semi-demonstrative like thing..

The list goes on... I think the cop deprel captures what all of these do quite well, but categorizing them all as a POS tag VERB will probably be less than ideal for many of the cases.

@jnivre
Copy link
Contributor

jnivre commented Jul 14, 2016

Good point. They should obviously not tagged as either VERB or AUX if they are not verbs at all. But this doesn't resolve the issue of what to do with languages where they are verbs.

@fginter
Copy link
Member

fginter commented Jul 14, 2016

If you go offline don't drop @fginter :)

@nschneid
Copy link
Contributor

Is there a more natural name we could use if AUX is broadened to include copular verbs? I don't necessarily object to giving them the same tag, I just worry that the name AUX will confuse people if it includes the copula, which is traditionally considered a main verb (at least in English grammar). In my mind, "auxiliary" means a verbal word that accompanies the main verb.

"Function verb" is the best term I can think of, but I bet somebody else can do better.

@spyysalo
Copy link
Member

Closing as there is no recent activity and the v2 guidelines are now being published. Please consider opening a new issue with reference to the new guidelines and this discussion if there are open questions relating to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
standard needed universal UPOS Universal part-of-speech tags: definitions and examples
Projects
None yet
Development

No branches or pull requests

7 participants