Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting postagged input to conll #20

Open
arisagithub opened this issue May 30, 2016 · 5 comments
Open

Converting postagged input to conll #20

arisagithub opened this issue May 30, 2016 · 5 comments

Comments

@arisagithub
Copy link

Hello,
when I try to run semafor, it stops in the Converting postagged input to conll phase.

Environment variables:
SEMAFOR_HOME=/opt/semafor
CLASSPATH=.:/opt/semafor/target/Semafor-3.0-alpha-04.jar
JAVA_HOME_BIN=/usr/lib/jvm/java-6-oracle/bin
MALT_MODEL_DIR=/opt/semafor_malt_model_20121129
TEMP_DIR: /tmp/semafor.oHswfdoPiw
Environment variables:
SEMAFOR_HOME=/opt/semafor
CLASSPATH=.:/opt/semafor/target/Semafor-3.0-alpha-04.jar
JAVA_HOME_BIN=/usr/lib/jvm/java-6-oracle/bin
MALT_MODEL_DIR=/opt/semafor_malt_model_20121129
Environment variables:
SEMAFOR_HOME=/opt/semafor
CLASSPATH=.:/opt/semafor/target/Semafor-3.0-alpha-04.jar
JAVA_HOME_BIN=/usr/lib/jvm/java-6-oracle/bin
MALT_MODEL_DIR=/opt/semafor_malt_model_20121129

Tokenizing file: Data/Cause.txt

real 0m0.039s
user 0m0.000s
sys 0m0.000s
Finished tokenization.

Part-of-speech tagging tokenized data....
/opt/semafor/scripts/jmx /opt/semafor/bin
Read 11692 items from tagger.project/word.voc
Read 45 items from tagger.project/tag.voc
Read 42680 items from tagger.project/tagfeatures.contexts
Read 42680 contexts, 117558 numFeatures from tagger.project/tagfeatures.fmap
Read model tagger.project/model : numPredictions=45, numParams=117558
Read tagdict from tagger.project/tagdict
This is MXPOST (Version 1.0)
Copyright (c) 1997 Adwait Ratnaparkhi
Sentence: 0 Length: 1 Elapsed Time: 0.024 seconds.
Sentence: 1 Length: 0 Elapsed Time: 0.0 seconds.

real 0m1.937s
user 0m0.800s
sys 0m0.048s
/opt/semafor/bin
Finished part-of-speech tagging tokenized data.

Converting postagged input to conll.
Exception in thread "main" java.lang.IllegalArgumentException:
at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec.decode(SentenceCodec.java:83)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec$SentenceIterator.computeNext(SentenceCodec.java:115)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec$SentenceIterator.computeNext(SentenceCodec.java:100)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.ConvertFormat.convertStream(ConvertFormat.java:94)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.ConvertFormat.main(ConvertFormat.java:76)
Caused by: java.lang.IllegalArgumentException: PosToken must have 2 "_"-separated fields
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.Token.fromPosTagged(Token.java:248)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec$2.decodeToken(SentenceCodec.java:28)
at edu.cmu.cs.lti.ark.fn.data.prep.formats.SentenceCodec.decode(SentenceCodec.java:79)
... 6 more

Any help you can give will be greatly appreciated.

@sammthomson
Copy link
Contributor

Hi, is there any chance your input file has some blank lines in it? I've run into a similar error in that case. If so, a temporary work-around might be to delete the empty lines before running SEMAFOR.

@arisagithub
Copy link
Author

Thank you very much for your reply.
I fix this problem by copying the content of input .txt file to a new linux plaintext document and using this new file as input.
but now I have another problem, I need Xml output but, when I set the outputfile path to "/out.xml" the generated output file format is not xml.

here is a sample output:
{"frames":[{"target":{"name":"Operational_testing","spans":[{"start":3,"end":4,"text":"test"}]},"annotationSets":[{"rank":0,"score":71.00063282566339,"frameElements":[{"name":"Product","spans":[{"start":4,"end":10,"text":"for SEMAFOR , a frame-semantic parser"}]}]}]}],"tokens":["This","is","a","test","for","SEMAFOR",",","a","frame-semantic","parser","."]}
{"frames":[{"target":{"name":"Shapes","spans":[{"start":5,"end":6,"text":"line"}]},"annotationSets":[{"rank":0,"score":11.818446277549976,"frameElements":[{"name":"Shape","spans":[{"start":5,"end":6,"text":"line"}]}]}]}],"tokens":["This","is","just","a","dummy","line","."]}
{"frames":[{"target":{"name":"Existence","spans":[{"start":0,"end":2,"text":"There 's"}]},"annotationSets":[{"rank":0,"score":52.10168633235354,"frameElements":[{"name":"Entity","spans":[{"start":2,"end":5,"text":"a Santa Claus"}]}]}]}],"tokens":["There","'s","a","Santa","Claus","!"]}

how can i get xml output from Semafor?

@s-pranita
Copy link

s-pranita commented Jun 15, 2017

I am getting same error in Windows even after removing '\n'. How to resolve it ?

@arisagithub
Copy link
Author

arisagithub commented Jun 15, 2017 via email

@s-pranita
Copy link

Is there any way to make it work in Windows?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants