Converting postagged input to conll #20

arisagithub · 2016-05-30T08:06:07Z

Hello,
when I try to run semafor, it stops in the Converting postagged input to conll phase.

Environment variables:
SEMAFOR_HOME=/opt/semafor
CLASSPATH=.:/opt/semafor/target/Semafor-3.0-alpha-04.jar
JAVA_HOME_BIN=/usr/lib/jvm/java-6-oracle/bin
MALT_MODEL_DIR=/opt/semafor_malt_model_20121129
TEMP_DIR: /tmp/semafor.oHswfdoPiw
Environment variables:
SEMAFOR_HOME=/opt/semafor
CLASSPATH=.:/opt/semafor/target/Semafor-3.0-alpha-04.jar
JAVA_HOME_BIN=/usr/lib/jvm/java-6-oracle/bin
MALT_MODEL_DIR=/opt/semafor_malt_model_20121129
Environment variables:
SEMAFOR_HOME=/opt/semafor
CLASSPATH=.:/opt/semafor/target/Semafor-3.0-alpha-04.jar
JAVA_HOME_BIN=/usr/lib/jvm/java-6-oracle/bin
MALT_MODEL_DIR=/opt/semafor_malt_model_20121129

Tokenizing file: Data/Cause.txt

real 0m0.039s
user 0m0.000s
sys 0m0.000s
Finished tokenization.

Part-of-speech tagging tokenized data....
/opt/semafor/scripts/jmx /opt/semafor/bin
Read 11692 items from tagger.project/word.voc
Read 45 items from tagger.project/tag.voc
Read 42680 items from tagger.project/tagfeatures.contexts
Read 42680 contexts, 117558 numFeatures from tagger.project/tagfeatures.fmap
Read model tagger.project/model : numPredictions=45, numParams=117558
Read tagdict from tagger.project/tagdict
This is MXPOST (Version 1.0)
Copyright (c) 1997 Adwait Ratnaparkhi
Sentence: 0 Length: 1 Elapsed Time: 0.024 seconds.
Sentence: 1 Length: 0 Elapsed Time: 0.0 seconds.

real 0m1.937s
user 0m0.800s
sys 0m0.048s
/opt/semafor/bin
Finished part-of-speech tagging tokenized data.

Converting postagged input to conll.
Exception in thread "main" java.lang.IllegalArgumentException:
at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec.decode(SentenceCodec.java:83)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec$SentenceIterator.computeNext(SentenceCodec.java:115)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec$SentenceIterator.computeNext(SentenceCodec.java:100)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.ConvertFormat.convertStream(ConvertFormat.java:94)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.ConvertFormat.main(ConvertFormat.java:76)
Caused by: java.lang.IllegalArgumentException: PosToken must have 2 "_"-separated fields
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.Token.fromPosTagged(Token.java:248)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec$2.decodeToken(SentenceCodec.java:28)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec.decode(SentenceCodec.java:79)
... 6 more

Any help you can give will be greatly appreciated.

sammthomson · 2016-05-31T22:27:05Z

Hi, is there any chance your input file has some blank lines in it? I've run into a similar error in that case. If so, a temporary work-around might be to delete the empty lines before running SEMAFOR.

arisagithub · 2016-06-01T07:45:48Z

Thank you very much for your reply.
I fix this problem by copying the content of input .txt file to a new linux plaintext document and using this new file as input.
but now I have another problem, I need Xml output but, when I set the outputfile path to "/out.xml" the generated output file format is not xml.

here is a sample output:
{"frames":[{"target":{"name":"Operational_testing","spans":[{"start":3,"end":4,"text":"test"}]},"annotationSets":[{"rank":0,"score":71.00063282566339,"frameElements":[{"name":"Product","spans":[{"start":4,"end":10,"text":"for SEMAFOR , a frame-semantic parser"}]}]}]}],"tokens":["This","is","a","test","for","SEMAFOR",",","a","frame-semantic","parser","."]}
{"frames":[{"target":{"name":"Shapes","spans":[{"start":5,"end":6,"text":"line"}]},"annotationSets":[{"rank":0,"score":11.818446277549976,"frameElements":[{"name":"Shape","spans":[{"start":5,"end":6,"text":"line"}]}]}]}],"tokens":["This","is","just","a","dummy","line","."]}
{"frames":[{"target":{"name":"Existence","spans":[{"start":0,"end":2,"text":"There 's"}]},"annotationSets":[{"rank":0,"score":52.10168633235354,"frameElements":[{"name":"Entity","spans":[{"start":2,"end":5,"text":"a Santa Claus"}]}]}]}],"tokens":["There","'s","a","Santa","Claus","!"]}

how can i get xml output from Semafor?

s-pranita · 2017-06-15T11:22:35Z

I am getting same error in Windows even after removing '\n'. How to resolve it ?

arisagithub · 2017-06-15T22:27:08Z

I was able to solve it in ubuntu by copying content of input file(.txt format) to a file without extention.

…

On Thu, Jun 15, 2017 at 3:52 PM, s-pranita ***@***.***> wrote: I am getting same error in Windows even after removing '\n'. Any help you can give will be greatly appreciated. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#20 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQMO6-Sdk6xF42nKKSuvQOK5bdgH9_Dcks5sERP8gaJpZM4Ipk_C> .

s-pranita · 2017-06-16T07:08:24Z

Is there any way to make it work in Windows?

sammthomson mentioned this issue May 31, 2016

Semirings.java: "name clash, have the same erasure" problem #2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting postagged input to conll #20

Converting postagged input to conll #20

arisagithub commented May 30, 2016

sammthomson commented May 31, 2016

arisagithub commented Jun 1, 2016

s-pranita commented Jun 15, 2017 •

edited

Loading

arisagithub commented Jun 15, 2017 via email

s-pranita commented Jun 16, 2017

Converting postagged input to conll #20

Converting postagged input to conll #20

Comments

arisagithub commented May 30, 2016

sammthomson commented May 31, 2016

arisagithub commented Jun 1, 2016

s-pranita commented Jun 15, 2017 • edited Loading

arisagithub commented Jun 15, 2017 via email

s-pranita commented Jun 16, 2017

s-pranita commented Jun 15, 2017 •

edited

Loading