-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when parsing multiword expressions in conllu file #26
Comments
Hi! This is conllu format, the parser only handles conll format. Please see the universal dependencies scripts. Miguel |
Hi,
I couldn't find an appropriate script for converting conll-u files to conll
files. I will be glad if you can suggest me a script for this task.
Thanks,
Betul
…On Wed, Feb 14, 2018 at 3:18 PM, Miguel Ballesteros < ***@***.***> wrote:
Hi! This is conllu format, the parser only handles conll format. Please
see the universal dependencies scripts.
Miguel
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEEx3sCE39H-ErylAeEh8S8zfF5aX8w3ks5tUs7_gaJpZM4SFE1v>
.
|
It worked, thank you!
…On Wed, Feb 14, 2018 at 8:57 PM, Miguel Ballesteros < ***@***.***> wrote:
I believe this is the one: https://github.com/UniversalDependencies/tools/
blob/f21108176ff431ebbab4c9414d6e0345e62d3995/conllu_to_conllx.pl
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEEx3n2nfnstJ8In9Wb0pu41MXnGbx9_ks5tUx6QgaJpZM4SFE1v>
.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
I am trying to train this parser on Turkish UD Treebank. When I run this command:
java -jar ParserOracleArcStdWithSwap.jar -t -1 -l 1 -c training.conll > trainingOracle.txt
I got the following error:
The conllu parse the lstm parser gives error is the one below:
The word 'parçacıklarsa' is a multiword token, so it is numbered as '2-3'. Does lstm parser have a mechanism to deal with multiword tokens? How can I solve this issue?
Thanks,
Betul
The text was updated successfully, but these errors were encountered: