-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tokenizer fails #8
Comments
hmmm. This is a good bug. It is most probably a bug in the string representation of a token in the list that comes from a bug in the splitting syllables with affixes into two distinct tokens. It looks likee the attribute token.syls does not have the expected content. What the line 62 does is find the actual characters in token.content by using the indices listed in token.syls It looks like token.syls has not been correctly split in pybo.splitaffixed.py, in a private function called __split_syls(). Could you try with the latest version I have pushed ? |
By the way, the tokenizer did not fail, it is the string representation of the content of the Token object that fails. There still is a bug somewhere, but not big enough to prevent the tokenizer to function altogether. It would not have passed by the line "tokens = tok.tokenize(input_str)" otherwise. |
I tried installing from the latest master, but issue is still there. |
I don't seem to be able to reproduce the bug, using the configuration that is in the latest master. It seems you can't print one of the produced tokens. Maybe a way to identify it would be to do something like the following:
Something else: if this code executes without problem, Ipython/Jupyter maybe has problems rendering the Could you elaborate on what you do instead of loading the output of tok.tokenize() into a variable and how it only happens then ? |
Very strange, without actually reinstalling pybo, things work now. In the between what happened was #9 as a new issue, which was easy to resolve as I mention in #9. I'm using an env that went through some other changes in the meantime, so it probably had to do with that. Will try to reproduce later with a clean env and report based on that. |
Thank you! |
I've installed with pypi and I'm doing...
...at which point I get:
If I don't load tok.tokenize(input_str) in to variable, then the error comes in that step.
The text was updated successfully, but these errors were encountered: