New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copula: VERB vs. AUX #275
Comments
@jnivre (15.9.2014): AUX: Part of the definition of AUX says that it should be associated with a lexical verb. This would seem to imply that the copula is not an auxiliary verb (whereas the use of “be” to form the progressive form and passive voice is). Is everyone happy with this? I have also suggested that an exact definition of AUX (for languages that have auxiliaries at all) should be included in the language-specific documentation. |
@dan-zeman (17.9.2014): I agree that copula is not the same as auxiliary and it also is not a normal verb. Since we do not have a dedicated tag for copulas, we have to put them somewhere i.e. either VERB or AUX. The same holds for modal verbs. I have no strong preference here but I am slightly inclined to keep copulas among main verbs and modals among auxiliaries. @jnivre (17.9.2014): Thanks, Dan. I share your view concerning the VERB/AUX distinction (exclude copula, include modals) |
@manning (26.9.2014): Hi Joakim and everyone, While generally agreeing with what you’ve been writing up for principles, you put in: Note that copula verbs, despite being dependents of their predicates, are treated as main verbs in this respect and take auxiliaries as dependents. The general rule is that an auxiliary should always attach to a verb (if there is one). This isn’t what we’ve been doing. We’ve been more radically content-word-as-head than that, and would have nsubj(sick, She) I suspect that this is actually the better way to go, at least for English. This is especially the case for participles. There are the usual decades of linguistic literature arguing that participles sometimes function as adjectives, indicated by things like being gradable, negatable with ‘un’, etc.:
But for others there is evidence of them being verbs, such as by being able to use ‘re-‘ with them.
But, as it notes in the Penn Treebank tagging manual, in practice “The distinction between adjectives and gerunds/present participles is often very difficult to make.” An example that is done both ways is with “will be destabilizing" This way of doing things would mean that the dependency structure would change depending on which way you voted on the part of speech. While it’s a shame that the part of speech is often assigned inconsistently/arbitrarily in various situations, it seems to me a much bigger problem if that also changes the skeleton of the dependency tree. So, unless there are compelling reasons to do otherwise, I’d argue for doing it the way we’ve been doing it…. Is there a good reason to do things this way? |
@jnivre (27.9.2014): On second thought, I tend to agree with @manning. Always attaching to a verb seemed like a nice idea at the time, but always attaching to the predicate is probably better in the long run. I think the Finnish treebank attaches auxiliaries to copulas, but this is probably because they use a nested structure for auxiliaries in general. |
@fginter (29.9.2014): This is in line with the rest of the decisions, so for consistency |
It seems to me that the logical conclusion of this discussion should have been that copulas are AUX because they are treated structurally as auxiliaries (not taking auxiliaries themselves, and being siblings of auxiliaries). Unless there are strong arguments to the contrary, I would therefore advocate this change for v2. @dan-zeman: Thanks for digging this discussion out of your email archive and posting it on github. |
@jnivre: I also see how making copulas AUX seems more consistent with the predicate-as-head structures we assign in cases like "She is smart". |
While I see the structural argument that copulas are like auxiliaries, I worry that expanding the definition of the If we want to use the POS tagset to express that copulas are not-quite-main-verbs, why not introduce a new tag ( |
One thing that I like about copulas becoming Thus I would not be happy with introducing a |
I would definitely rather remove the AUX tag than add a COP tag. The AUX/VERB distinction is great for parsing, but only if the tagger gets it right. For Swedish there is a 5 percent absolute difference in parsing accuracy between using gold and predicted tags, which is due almost exclusively to the AUX/VERB distinction. |
What happens if |
Sort of but not quite. When you go from gold to predicted tags with the AUX/VERB distinction, you lose 4-5 percentage points. If you then go to predicted tags without the distinction, you lose another 0.5-1 points (if I remember correctly). Then again, this is only evidence from a single language. It would be interesting to know what happens in other languages. |
Interesting. Is the POS tagger trained on the same data as the parser? If so, it's curious that it can capture the distinction a bit better. |
http://aclweb.org/anthology/W/W16/W16-1202.pdf, Table 6 might be relevant here: I think the first 4 rows are with gold POS, τ_o = original POS, τ_a = ambiguous (collapsed AUX into VERB). Collapsed tags drop performance by 1.1 points in Slovenian and .4 points in Czech. |
Yes, they are with gold tags. For Swedish POS tagging, the situation is complex. Most Swedish taggers are trained on the Stockholm-Umeå Corpus, which is 10 times larger than the treebank (1M vs. 100K tokens), but this corpus does not make the AUX/VERB distinction. So we have to train a second tagger on the treebank itself, but we only trust the second tagger for the AUX/VERB distinction (and only applies it if the first tagger has tagged a word as VERB). To further improve tagging accuracy for this distinction, we apply a few hand-crafted heuristics to the output of both taggers, which rely on the fact that Swedish syntax pretty much always requires the main verb to go after the auxiliarie(s). With this combined system, we achieve over 90% accuracy on the AUX/VERB distinction, but we still observe a 4-5 percent drop compared to gold tags, showing that the few cases it gets wrong lead to really bad parses (essentially because the root dependency is wrong and so many other dependencies depend on this). Finally, you have to remember that these are results for a greedy transition-based parser. Other parsers might behave differently. By the way, I think we are digressing from the original issue, so if you want to continue discussing tagging and parsing I suggest we go off line. :) |
Regarding the tag issue for copulas, I'd like to point out that in many languages they are not verbs at all:
The list goes on... I think the |
Good point. They should obviously not tagged as either VERB or AUX if they are not verbs at all. But this doesn't resolve the issue of what to do with languages where they are verbs. |
If you go offline don't drop @fginter :) |
Is there a more natural name we could use if AUX is broadened to include copular verbs? I don't necessarily object to giving them the same tag, I just worry that the name AUX will confuse people if it includes the copula, which is traditionally considered a main verb (at least in English grammar). In my mind, "auxiliary" means a verbal word that accompanies the main verb. "Function verb" is the best term I can think of, but I bet somebody else can do better. |
Closing as there is no recent activity and the v2 guidelines are now being published. Please consider opening a new issue with reference to the new guidelines and this discussion if there are open questions relating to this issue. |
Since the first release of the UD guidelines in October 2014 copula verbs were to be tagged VERB and not AUX. But the stance was not unanimous in the core UD group. Should we revise the decision for version 2 of the guidelines?
The discussion at that time was done by e-mail. I am going to post the relevant messages here so we have a base for further arguments.
The text was updated successfully, but these errors were encountered: