Error when parsing multiword expressions in conllu file #26

sb-b · 2018-02-14T10:16:57Z

Hi,

I am trying to train this parser on Turkish UD Treebank. When I run this command:

java -jar ParserOracleArcStdWithSwap.jar -t -1 -l 1 -c training.conll > trainingOracle.txt

I got the following error:

java.lang.NumberFormatException: For input string: "2-3"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:580)
        at java.lang.Integer.parseInt(Integer.java:615)
        at arc_std_swap.Oracle.getTransition(Oracle.java:41)
        at arc_std_swap.Parser.printOracle(Parser.java:366)
        at arc_std_swap.Parser.main(Parser.java:270)

The conllu parse the lstm parser gives error is the one below:

# sent_id = mst-0003
# text = Sanal parçacıklarsa bunların hiçbirini yapamazlar.
1	Sanal	sanal	ADJ	Adj	_	2	amod	_	_
2-3	parçacıklarsa	_	_	_	_	_	_	_	_
2	parçacıklar	parçacık	NOUN	Noun	Case=Nom|Number=Plur|Person=3	6	csubj	_	_
3	sa	i	AUX	Zero	Aspect=Perf|Mood=Cnd|Number=Sing|Person=3|Tense=Pres	2	cop	_	_
4	bunların	bu	PRON	Demons	Case=Gen|Number=Plur|Person=3|PronType=Dem	5	nmod:poss	_	_
5	hiçbirini	hiçbiri	PRON	Quant	Case=Acc|Number=Sing|Number[psor]=Sing|Person=3|Person[psor]=3|PronType=Ind	6	obj	_	_
6	yapamazlar	yap	VERB	Verb	Aspect=Imp|Mood=Pot|Number=Plur|Person=3|Polarity=Neg|Tense=Aor	0	root	_	SpaceAfter=No
7	.	.	PUNCT	Punc	_	6	punct	_	_

The word 'parçacıklarsa' is a multiword token, so it is numbered as '2-3'. Does lstm parser have a mechanism to deal with multiword tokens? How can I solve this issue?

Thanks,

Betul

The text was updated successfully, but these errors were encountered:

miguelballesteros · 2018-02-14T12:18:02Z

Hi! This is conllu format, the parser only handles conll format. Please see the universal dependencies scripts.

Miguel

sb-b · 2018-02-14T17:18:28Z

Hi, I couldn't find an appropriate script for converting conll-u files to conll files. I will be glad if you can suggest me a script for this task. Thanks, Betul

…

On Wed, Feb 14, 2018 at 3:18 PM, Miguel Ballesteros < ***@***.***> wrote: Hi! This is conllu format, the parser only handles conll format. Please see the universal dependencies scripts. Miguel — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#26 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEEx3sCE39H-ErylAeEh8S8zfF5aX8w3ks5tUs7_gaJpZM4SFE1v> .

miguelballesteros · 2018-02-14T17:57:34Z

I believe this is the one: https://github.com/UniversalDependencies/tools/blob/f21108176ff431ebbab4c9414d6e0345e62d3995/conllu_to_conllx.pl

sb-b · 2018-02-15T07:27:13Z

It worked, thank you!

…

On Wed, Feb 14, 2018 at 8:57 PM, Miguel Ballesteros < ***@***.***> wrote: I believe this is the one: https://github.com/UniversalDependencies/tools/ blob/f21108176ff431ebbab4c9414d6e0345e62d3995/conllu_to_conllx.pl — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#26 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEEx3n2nfnstJ8In9Wb0pu41MXnGbx9_ks5tUx6QgaJpZM4SFE1v> .

miguelballesteros closed this as completed Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when parsing multiword expressions in conllu file #26

Error when parsing multiword expressions in conllu file #26

sb-b commented Feb 14, 2018

miguelballesteros commented Feb 14, 2018

sb-b commented Feb 14, 2018 via email

miguelballesteros commented Feb 14, 2018

sb-b commented Feb 15, 2018 via email

Error when parsing multiword expressions in conllu file #26

Error when parsing multiword expressions in conllu file #26

Comments

sb-b commented Feb 14, 2018

miguelballesteros commented Feb 14, 2018

sb-b commented Feb 14, 2018 via email

miguelballesteros commented Feb 14, 2018

sb-b commented Feb 15, 2018 via email