Ellipsis in v2 #396

Open
arademaker opened this Issue Jan 12, 2017 · 10 comments

Projects

None yet

5 participants

@arademaker
Contributor

From http://universaldependencies.org/v2/ellipsis.html and http://universaldependencies.org/format.html it seems that null nodes can only be referenced in the enhanced dependencies field of conllu files. Is that right? Why can't we mention them in the regular HEAD field? We liked the solution of null fields for ellipsis but we would like to avoid making harder for tools to deal with the enhanced dependencies.

@arademaker arademaker referenced this issue in own-pt/bosque-UD Jan 12, 2017
Closed

remnant removed in v2 #125

@sebschu
Member
sebschu commented Jan 12, 2017

Yes, that is correct. The reason behind this is that one of our core principles is that basic UD trees should always be strict surface syntax trees and we would have to give up this principle if we allowed empty nodes in the basic representation.

Can you say more about why this makes it harder for tools to deal with the enhanced UD graphs? Or do you mean that it makes things more complicated if we have different treatments of ellipsis in the basic and the enhanced representation?

@arademaker
Contributor

Thank you @sebschu. Well, I have just learned in the mailing list that this page that I mentioned is not part of the documentation but only an archived discussion . We have also another similar confusion with the two pages about the format:

  1. http://universaldependencies.org/format.html
  2. http://universaldependencies.org/v2/conll-u.html

About the tools, what I mean is that most tools will not pay much attention to the DEPS field, I suspect that we must consider this field with the enhanced dependency graph as an alternative but the corpora must always have the best possible analysis in the basic dependencies, right?

@jnivre
Contributor
jnivre commented Jan 12, 2017

Anything that starts with http://universaldependencies.org/v2/ is only archival. Sorry for the confusion, we will figure out a way to make it clear. Please only consult the documentation that is at the top level directly under http://universaldependencies/.

@dan-zeman
Member

Moving here e-mail discussion about precedence of orphans:

@arademaker :

In the discussion page http://universaldependencies.org/v2/ellipsis.html we have

The second alternative preserves the integrity of the second conjunct as a single subtree by (arbitrarily) promoting one of the orphans to the (subclause) root and attaching the others with a new dummy relation orphan.

The arbitrary way of selecting the node to promote is puzzling me. In the final v2 spec pages, nothing more is said about the way the nodes are choose:
http://universaldependencies.org/u/overview/specific-syntax.html#ellipsis
http://universaldependencies.org/u/dep/orphan.html

Does anyone have specific suggestion for the selection ? Is it really irrelevant ?

@dan-zeman :

Thanks for pointing this out. It is no less relevant than maintaining consistency with other technical relations, such as flat or fixed. I believe the intention here was to promote the first "real" orphan to the head position and attach the others to it, i.e. orphan relations go always left to right (Joakim, please correct me if I do not remember it right). Still with the provision that some orphaned dependents, such as the coordinating conjunction, do not count as "real" orphans for this rule.

@jnivre :

Yes, we need more work on this. My own view is that it should be a content word (if possible) and one of the arguments of the ellided predicate. So in the standard gapping case, the subject or object of the ellided verb are the main candidates, while the coordinating conjunction and other function words are better treated as having their ordinary relations to the promoted head. When it comes to choosing among possible candidates, my ideas are less clear. Linear order is definitely a possibility, but one could always think of alternatives such as appealing to the obliqueness hierarchy, which may lead to more consistent analyses for languages with freer word order.

@dan-zeman dan-zeman added this to the lg-specific v2 milestone Jan 13, 2017
@dan-zeman
Member

@jnivre : I guess that the obliqueness hierarchy, if employed, would look something like this (not sure about the placement of the clausal dependents but they are more likely to have arguments of their own, so placing them lower might make the result more readable):

nsubj > obj > iobj > obl > advmod > csubj > xcomp > ccomp > advcl

@livyreal
livyreal commented Jan 13, 2017 edited

In sentences where the conjunct elements are present in both clauses (the head/ROOT of the sentences), I'd prefer to relate them and them attach all the orphans to the second element of the conjunction:

" The total value is 50 million and the deficit, 40 million" (translated from a Portuguese example)
I would like to have:
conj(million1, million2),
orphan(deficit, million2)
det(the, deficit)
nummod(40, million2)

How does it sound?

This is a problematic sentence, because it is a copular sentence, but still we have an ellipsis here... I'm asking myself how to treat those cases.

@jnivre
Contributor
jnivre commented Jan 13, 2017

A conj link from "million1" to "million2" is fine, but there should be no "orphan" link here, because the omitted copula is not the root of the clause. The analysis should be:

nsubj(million1, value)
cop(million1, is)
conj(million1, million2)
cc(million2, and)
nsubj(million2, deficit)

@dan-zeman
Member

We usually use the deprel(parent, child) notation, i.e. the head word is mentioned first in the brackets.

But even if I reverse your notation, I would not do what you propose. This is a non-verbal predicate situation, which can occur (cross-linguistically) with or without copula. I understand that using copula is the norm in Portuguese and that it has been elided here, but it is just a missing function word. Its omission does not change anything on the fact that both millions are predicates and value resp. deficit are their subjects. Therefore I would do

nsubj(million1, value)
conj(million1, million2)
cc(million2, and)
nsubj(million2, deficit)
det(deficit, the)

@livyreal

tks a lot @dan-zeman and @jnivre

I was thinking in having an orphan relation to (always) mark when an ellipsis (of a core element, at least) occurs, but following the guidelines ("If the elided element has no overt dependents, we do nothing."), it is clear that is not the case. Won't be a good idea to have a treatment that mark all cases of ellipsis and not that only helps to keep the syntax level working?

@jnivre
Contributor
jnivre commented Jan 16, 2017

It is a trade-off. There are lots and lots of things that could potentially be useful, but in order to satisfy its goal of being a cross-linguistically consistent easily understandable syntactic annotation, UD cannot include all of them but has to put priority on basic syntactic relations. But you can always add it yourself in the MISC field or using standoff annotation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment