Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-referencing dependency may not *be* ROOT but *link* to virtual ROOT #1033

Open
reckart opened this issue Feb 19, 2017 · 0 comments
Open

Comments

@reckart
Copy link
Member

reckart commented Feb 19, 2017

Source: #619 (comment)

Having in mind that your Croatian example is bad Croatian in the first place, the correct sentence would be something like this:

moramo odraditi vrlo kompliciran primjer , rečenicu koja sadrži što više sastojaka i ovisnosti , što je više moguće

Running MSTParser on that corrected sentence gives even more ROOT elements.
We analysed now what might be the problem, and it seems that this "multiple-roots problem" might be linked to the fact that model has been trained on "CONLL-X"(http://anthology.aclweb.org/W/W06/W06-2920.pdf) tagged sentences (as stated on model source site http://nlp.ffzg.hr/resources/models/dependency-parsing/). I also checked the source data that the model has been trained on, and yes, there are multiple 0 dependencies in one sentence.
CONLL-X documents says:

  1. HEAD: Head of the current token, which is either a value of ID, or zero (’0’) if the token links to the virtual root node of the sentence. Note that depending on the original treebank annotation, there may be multiple tokens with a HEAD value of zero.

So, it seems that the you should cover this use-case in your unit tests and potentially other parts of dkpro core.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant