-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caught UnicodeDecodeError when use parseToNode
alone
#3
Comments
For modifying this, we can call Line 282 in 5ee7aa5
like def parseToNode(self, *args):
self.parse(self, *args)
return _MeCab.Tagger_parseToNode(self, *args) Please give me your idea about this. |
I got the same error and fixed it with the above workaround. |
I investigated the reason of this bug. In Line 6527 in 5ee7aa5
In python 2, the So, the reason of this bug is in Lines 3461 to 3470 in 5ee7aa5
But I don't have the patch to solve this bug at this time. 😕 |
I got the same problem and found that using the latest version of MeCab solves the problem. My environment:
This problem seems to be the same as the one reported in taku910/mecab#5, and it has been solved by taku910/mecab#24 merged in Feb 2016. Alhough this problem occurs only in Python 3, it is not a matter of mecab-python3, but it seems to be a matter of memory management of MeCab itself. Unfortunately, major package managers such as Homebrew and APT currently offer older version of MeCab based on the source in Feb 2013, which can be obtained from Google Drive. To avoid this problem without using the workaround mentioned above, you need to build and install MeCab from the latest source on GitHub manually, and then reinstall mecab-python3. |
@graph226 I believe this ought to be fixed by using the latest version of the package and the latest version of MeCab, but I cannot be sure because you did not provide a complete test case that I can run for myself. Could you please try your code again? Make sure to use mecab-python3 0.8.3, MeCab 0.996, and a current version of SWIG (I have 3.0.12). It's been a long time since you reported this bug and perhaps you have moved on, so if I don't hear from you in a month I will close the bug (but feel free to reopen it if you don't get to this until after that, and it's still a problem). |
Please see the spaCy issue linked above, which provides a Dockerfile and code to reproduce the issue. I think @orangain's explanation is exactly right. |
Please try the release candidate available from https://test.pypi.org/project/mecab-python3/0.996.2rc2/ , this bug should be corrected. Thank you everyone for your patience. We plan to make a new official release in the next couple of weeks. |
0.996.2 has been officially released and this issue should be corrected. Please file a new bug report if you are still having problems with |
But keep compatibility for Python 2, because latest packaged mecab binaries include a bug that makes tatomecab unusable. The only way to have it working with Python 3 right now is to compile mecab from source: SamuraiT/mecab-python3#3 (comment) Note that in Python 3, http.server.BaseHTTPRequestHandler.parse_request() forces decoding the request line as latin1-encoded, so commands such as curl http://127.0.0.1:8842/furigana?str=振り仮名をつけろう won’t work any more. One needs to %-encode everything proprely in the URL. Closes #6.
* Upgrade Github Actions in build scripts * Cache mecab build * Add caching of mecab builds * Action forgotten action file * Clean up paths * Remove leftover cd * Fix cache path * Fix path * Use glob for python versions * Don't build mecab if cache is found * Fix caching * Fix job keys * Try modifying conditionals * Clean up conditionals Apparently they don't need the ${{ }} * Explicitly specify Python versions again This may be the easiest way to exclude 3.6.
When we use
tagger.parseToNode(text)
alone, sometimes we get such error as:To avoid this, put
tagger.parse(text)
before parseToNode.The text was updated successfully, but these errors were encountered: