Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDRPOSTagger.py returns blank error #13

Closed
matgrioni opened this issue Sep 26, 2017 · 5 comments
Closed

RDRPOSTagger.py returns blank error #13

matgrioni opened this issue Sep 26, 2017 · 5 comments

Comments

@matgrioni
Copy link

I using the following command within RDRPOSTagger/pSCRDRtagger

python RDRPOSTagger.py ../Models/UniPOS/UD_Latin/la-upos.RDR ../Models/UniPOS/UD_Latin/la-upos.DICT rawDataPath

For some of the files I run it on it works as expected. For others, such as the one attached there is an error output as follows:

=> Read a POS tagging model from ../Models/UniPOS/UD_Latin/la-upos.RDR

=> Read a lexicon from ../Models/UniPOS/UD_Latin/la-upos.DICT

=> Perform POS tagging on /home/grioni.2/NER/Preprocessing/Preprocessed/UNKNOWN/Tacitus.txt

ERROR ==>  "''"

===== Usage =====

#1: To train RDRPOSTagger on a gold standard training corpus:

python RDRPOSTagger.py train PATH-TO-GOLD-STANDARD-TRAINING-CORPUS

Example: python RDRPOSTagger.py train ../data/goldTrain

#2: To use the trained model for POS tagging on a raw text corpus:

python RDRPOSTagger.py tag PATH-TO-TRAINED-MODEL PATH-TO-LEXICON PATH-TO-RAW-TEXT-CORPUS

Example: python RDRPOSTagger.py tag ../data/goldTrain.RDR ../data/goldTrain.DICT ../data/rawTest

#3: Find the full usage at http://rdrpostagger.sourceforge.net !

I'm not sure where this blank error is coming from as it is blank. This problem does not occur for the java implementation however, so:

java RDRPOSTagger ../Models/UniPOS/UD_Latin/la-upos.RDR ../Models/UniPOS/UD_Latin/la-upos.DICT rawDataPath

works for the same file.

Alexander_Severus.txt

@datquocnguyen
Copy link
Owner

datquocnguyen commented Sep 27, 2017

Thanks for your report,
I am not sure where the error comes from because the file you attached is not tokenized. RDRPOSTagger requires an input tokenized/word-segmented file.
Best,
Dat.

@matgrioni
Copy link
Author

Thank you for responding. I will try to tokenize the file as shown in /data as I had not noted this before in the requirements. I will close and re-open if the issue persists after that.

@Stormur
Copy link

Stormur commented Nov 16, 2018

I'm getting the same error:

=> Read a POS tagging model from /home/flavio/Documenti/POS/RDRPOSTagger/Models/UniPOS/UD_Latin-ITTB23/la_ittb23-upos.RDR

=> Read a lexicon from /home/flavio/Documenti/POS/RDRPOSTagger/Models/UniPOS/UD_Latin-ITTB23/la_ittb23-upos.DICT

=> Perform POS tagging on /home/flavio/Documenti/POS/Testi_Tabelle/De_divinatione/Cic_DeDiv_SentWord_Tokenized_corretto_detersum_orizzontale.txt

ERROR ==>  "''"

Probably there is an error in the file I used for training, since other models have no problem on the same file. But I can not identify it, since it seems to follow all requirements.

For training:
latin_ittb-ud23_train_orizzontale.txt

To tag:
Cic_DeDiv_SentWord_Tokenized_corretto_detersum_orizzontale.txt

@datquocnguyen
Copy link
Owner

datquocnguyen commented Nov 17, 2018

You can either:

  1. Fix this error by simply adding: '' PUNCT as a new line in the la_ittb23-upos.DICT file.
  2. Or use the latest RDRPOSTagger which I have just updated. It is just a minor update on file InitialTagger.py to handle this error, so you do not need to retrain any model.

@Stormur
Copy link

Stormur commented Nov 19, 2018

Now it works, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants