Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhanced DEPS ambiguity? #60

Closed
odanoburu opened this issue Apr 26, 2018 · 6 comments
Closed

enhanced DEPS ambiguity? #60

odanoburu opened this issue Apr 26, 2018 · 6 comments

Comments

@odanoburu
Copy link
Member

hello,

I gather this might be a specification issue, but as it was in corpus that I noticed it I thought I might ask here.

I have implemented a(nother) CoNLL-U parser, and was wondering how I should go about parsing the part 21:conj:and of the DEPS field of the word below:

# test
# sent_id = weblog-blogspot.com_aggressivevoicedaily_20060629164800_ENG_20060629_164800-0005
# text = KENNEDY filed an opinion concurring in part, in which SOUTER, GINSBURG, and BREYER joined as to Parts I and II.
23	II	ii	NUM	CD	NumType=Card	21	conj	20:compound|21:conj:and	SpaceAfter=No

from what I understand from the spec, the and is the third possible subfield (case information) of an enhanced dependency.

but how is a CoNLL-U parser supposed to 'know' this? should I try parsing all the possible relation sub-types, and when that fails, conclude it should be case information?

@arademaker
Copy link

@odanoburu, I would not assume a fixed list of enhanced deps relations. Even the UD dependencies are not fixed regarding the sub-relations, I didn't hear about any definition of a core set of enhanced deps that should be used in UD. As far as I know, there are many proposals besides the one that the Stanford group proposed.

@odanoburu
Copy link
Member Author

@arademaker so you agree that the example above seems ambiguous?

@jnivre
Copy link

jnivre commented Apr 26, 2018

There isn't a fixed list for all subtypes, but every language has to specify what subtypes they use in the edeprel.* file. Note that this is different from the deprel.* file, so the set of subtypes may be different in basic and enhanced. Given this information, the representation should not be ambiguous (unless the lemma of a word happens to be identical to the suffix of a subtype).

@odanoburu
Copy link
Member Author

oh, right. silly me, I didn't know about these files... thanks for the clarification!

@odanoburu
Copy link
Member Author

@jnivre where is the edeprel.en file located? I can't find it at https://github.com/UniversalDependencies/tools/tree/master/data

@jnivre
Copy link

jnivre commented Apr 29, 2018

There was a bug in the validator for the latest release, which allowed treebanks to pass validation even with a missing edeprel file. We have asked the English UD team to supply the missing edeprel.en, and @sebschu has promised to fix this, but apparently it has not happened yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants