New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Certain tokens receive null POS and token is not outputted #15
Comments
Thanks for the report. I cannot figure out any reason for what happened. It would be great if you can provide the whole sentence where âManli appears, and which language/model you are working on. |
I am using the Latin model. Also after grepping and looking into the issue more it seems like many tokens receive This is an example with
|
Note that RDRPOSTagger requires an input tokenized/word-segmented corpus. You have to perform tokenization before performing POS tagging. Then you need also to add an entry |
Sorry, I was unclear. That file I attached is the raw file. I am tokenizing before it is being sent to the POS tagger and adding a space to separate any punctuation from the next character. I assume the fix will work the same independent of this fact however. Thanks! |
You can re-download rdrpostagger to run on the tokenized corpus. I just fix the errors by adding missing punctuation mark |
For some tokens, such as
âManli
are tagged as:I couldn't find anything on the documentation about a
null
POS being returned in any case, so I figure this is an undesirable feature. In either case, the inclusion of the original token would be necessary.The text was updated successfully, but these errors were encountered: