Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ellipsis in v2 #396

Closed
arademaker opened this issue Jan 12, 2017 · 14 comments
Closed

Ellipsis in v2 #396

arademaker opened this issue Jan 12, 2017 · 14 comments

Comments

@arademaker
Copy link
Contributor

From http://universaldependencies.org/v2/ellipsis.html and http://universaldependencies.org/format.html it seems that null nodes can only be referenced in the enhanced dependencies field of conllu files. Is that right? Why can't we mention them in the regular HEAD field? We liked the solution of null fields for ellipsis but we would like to avoid making harder for tools to deal with the enhanced dependencies.

@sebschu
Copy link
Member

sebschu commented Jan 12, 2017

Yes, that is correct. The reason behind this is that one of our core principles is that basic UD trees should always be strict surface syntax trees and we would have to give up this principle if we allowed empty nodes in the basic representation.

Can you say more about why this makes it harder for tools to deal with the enhanced UD graphs? Or do you mean that it makes things more complicated if we have different treatments of ellipsis in the basic and the enhanced representation?

@arademaker
Copy link
Contributor Author

Thank you @sebschu. Well, I have just learned in the mailing list that this page that I mentioned is not part of the documentation but only an archived discussion . We have also another similar confusion with the two pages about the format:

  1. http://universaldependencies.org/format.html
  2. http://universaldependencies.org/v2/conll-u.html

About the tools, what I mean is that most tools will not pay much attention to the DEPS field, I suspect that we must consider this field with the enhanced dependency graph as an alternative but the corpora must always have the best possible analysis in the basic dependencies, right?

@jnivre
Copy link
Contributor

jnivre commented Jan 12, 2017

Anything that starts with http://universaldependencies.org/v2/ is only archival. Sorry for the confusion, we will figure out a way to make it clear. Please only consult the documentation that is at the top level directly under http://universaldependencies/.

@dan-zeman
Copy link
Member

Moving here e-mail discussion about precedence of orphans:

@arademaker :

In the discussion page http://universaldependencies.org/v2/ellipsis.html we have

The second alternative preserves the integrity of the second conjunct as a single subtree by (arbitrarily) promoting one of the orphans to the (subclause) root and attaching the others with a new dummy relation orphan.

The arbitrary way of selecting the node to promote is puzzling me. In the final v2 spec pages, nothing more is said about the way the nodes are choose:
http://universaldependencies.org/u/overview/specific-syntax.html#ellipsis
http://universaldependencies.org/u/dep/orphan.html

Does anyone have specific suggestion for the selection ? Is it really irrelevant ?

@dan-zeman :

Thanks for pointing this out. It is no less relevant than maintaining consistency with other technical relations, such as flat or fixed. I believe the intention here was to promote the first "real" orphan to the head position and attach the others to it, i.e. orphan relations go always left to right (Joakim, please correct me if I do not remember it right). Still with the provision that some orphaned dependents, such as the coordinating conjunction, do not count as "real" orphans for this rule.

@jnivre :

Yes, we need more work on this. My own view is that it should be a content word (if possible) and one of the arguments of the ellided predicate. So in the standard gapping case, the subject or object of the ellided verb are the main candidates, while the coordinating conjunction and other function words are better treated as having their ordinary relations to the promoted head. When it comes to choosing among possible candidates, my ideas are less clear. Linear order is definitely a possibility, but one could always think of alternatives such as appealing to the obliqueness hierarchy, which may lead to more consistent analyses for languages with freer word order.

@dan-zeman
Copy link
Member

@jnivre : I guess that the obliqueness hierarchy, if employed, would look something like this (not sure about the placement of the clausal dependents but they are more likely to have arguments of their own, so placing them lower might make the result more readable):

nsubj > obj > iobj > obl > advmod > csubj > xcomp > ccomp > advcl

@livyreal
Copy link

livyreal commented Jan 13, 2017

In sentences where the conjunct elements are present in both clauses (the head/ROOT of the sentences), I'd prefer to relate them and them attach all the orphans to the second element of the conjunction:

" The total value is 50 million and the deficit, 40 million" (translated from a Portuguese example)
I would like to have:
conj(million1, million2),
orphan(deficit, million2)
det(the, deficit)
nummod(40, million2)

How does it sound?

This is a problematic sentence, because it is a copular sentence, but still we have an ellipsis here... I'm asking myself how to treat those cases.

@jnivre
Copy link
Contributor

jnivre commented Jan 13, 2017

A conj link from "million1" to "million2" is fine, but there should be no "orphan" link here, because the omitted copula is not the root of the clause. The analysis should be:

nsubj(million1, value)
cop(million1, is)
conj(million1, million2)
cc(million2, and)
nsubj(million2, deficit)

@dan-zeman
Copy link
Member

We usually use the deprel(parent, child) notation, i.e. the head word is mentioned first in the brackets.

But even if I reverse your notation, I would not do what you propose. This is a non-verbal predicate situation, which can occur (cross-linguistically) with or without copula. I understand that using copula is the norm in Portuguese and that it has been elided here, but it is just a missing function word. Its omission does not change anything on the fact that both millions are predicates and value resp. deficit are their subjects. Therefore I would do

nsubj(million1, value)
conj(million1, million2)
cc(million2, and)
nsubj(million2, deficit)
det(deficit, the)

@livyreal
Copy link

tks a lot @dan-zeman and @jnivre

I was thinking in having an orphan relation to (always) mark when an ellipsis (of a core element, at least) occurs, but following the guidelines ("If the elided element has no overt dependents, we do nothing."), it is clear that is not the case. Won't be a good idea to have a treatment that mark all cases of ellipsis and not that only helps to keep the syntax level working?

@jnivre
Copy link
Contributor

jnivre commented Jan 16, 2017

It is a trade-off. There are lots and lots of things that could potentially be useful, but in order to satisfy its goal of being a cross-linguistically consistent easily understandable syntactic annotation, UD cannot include all of them but has to put priority on basic syntactic relations. But you can always add it yourself in the MISC field or using standoff annotation.

@manning manning closed this as completed Jan 22, 2017
@martinpopel
Copy link
Member

So should we add the nsubj > obj > iobj > obl > advmod > csubj > xcomp > ccomp > advcl priorities into the documentation?

@jnivre
Copy link
Contributor

jnivre commented Jan 23, 2017

Sounds good to me.

martinpopel added a commit that referenced this issue Jan 23, 2017
In #396 it was suggested the head promotion priorities for predicate ellipsis are
`nsubj > obj > iobj > obl > advmod > csubj > xcomp > ccomp > advcl`.
Also, I think examples of incorrect annotation should be clearly marked,
e.g. with red color for the wrong edges.
@sebschu
Copy link
Member

sebschu commented Jan 31, 2017

This might be too late now and I don't feel too strongly about it, but what if we used a different hierarchy so that we end up with constructions that are more parallel to copular constructions (and that way potentially avoid a catastrophe in languages where copulas can optionally be omitted)?

In practice, this would mean that we put the nsubj much lower in the hierarchy so that it is typically attached to an object or oblique NP, and potentially even to a clausal complement.

So either

obj > iobj > obl > advmod > nsubj > csubj > xcomp > ccomp > advcl

or

obj > iobj > obl > advmod > xcomp > ccomp > nsubj > csubj > advcl

That way She is a professor and the second clause in He likes tea, and she coffee have a more parallel structure:

nsubj(professor, she)

and

orphan(coffee, she)

@sebschu sebschu reopened this Jan 31, 2017
@jnivre
Copy link
Contributor

jnivre commented Jan 31, 2017

I can see the point of this, but I am not sure the advantages are strong enough to motivate an apparently ad hoc exception to the obliqueness hierarchy. Also, the whole point of the "orphan" relation is to have a warning flag signaling "don't trust this structure to reflect real dependencies", so I am not sure it is only an advantage if the structure looks plausible. And, yes, I think it may be too late now. :)

@sebschu sebschu closed this as completed Feb 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants