Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed when trying to initialize MeCab on macOS #45

Closed
HiromuHota opened this issue Apr 27, 2020 · 9 comments
Closed

Failed when trying to initialize MeCab on macOS #45

HiromuHota opened this issue Apr 27, 2020 · 9 comments

Comments

@HiromuHota
Copy link

HiromuHota commented Apr 27, 2020

I installed mecab-python3==1.0.0a1 and tested the following lines of code:

>>> import MeCab
>>> wakati = MeCab.Tagger("-Owakati")
>>> wakati.parse("pythonが大好きです").split()
['python', 'が', '大好き', 'です']
>>> chasen = MeCab.Tagger("-Ochasen")

Failed when trying to initialize MeCab. Some things to check:

    - If you are not using a wheel, do you have mecab installed?

    - Do you have a dictionary installed? If not do this:

        pip install unidic-lite

    - If on Windows make sure you have this installed:

        https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads

If you are still having trouble, please file an issue here:

    https://github.com/SamuraiT/mecab-python3/issues
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/hiromu/miniconda3/lib/python3.7/site-packages/MeCab/__init__.py", line 102, in __init__
    super(Tagger, self).__init__(args)
RuntimeError

As suggested, I installed unidic-lite (of 1.0.4) and tested again, but the same result.

When I tried the some codes with mecab-python3==0.996.5:

>>> import MeCab
wakati = MeCab.Tagger("-Owakati")
wakati.parse("pythonが大好きです").split()
chasen = MeCab.Tagger("-Ochasen")
print(chasen.parse("pythonが大好きです"))
>>> wakati = MeCab.Tagger("-Owakati")
>>> wakati.parse("pythonが大好きです").split()
['python', 'が', '大好き', 'です']
>>> chasen = MeCab.Tagger("-Ochasen")
>>> print(chasen.parse("pythonが大好きです"))
python	python	python	名詞-固有名詞-組織		
が	ガ	が	助詞-格助詞-一般		
大好き	ダイスキ	大好き	名詞-形容動詞語幹		
です	デス	です	助動詞	特殊・デス	基本形
EOS

OS: macOS (10.15.4)
Python: 3.7.7
mecab: 0.996
mecab-python3: 1.0.0a1

@polm
Copy link
Collaborator

polm commented Apr 27, 2020

Thanks for the report! I had the same issue on Linux with -Ochasen. I'll see if I can figure out what's wrong with it...

@HiromuHota
Copy link
Author

I've just realized that Travis just builds and deploys (without any test).
Is it a good idea to run test/test_basic.py after deploy on each OS?

@polm
Copy link
Collaborator

polm commented Apr 27, 2020

Travis does run the tests at least on Linux and OSX, see here. It happens as part of the build script, rather than the deploy script, and it's handled a bit indirectly by setting env vars, but it does seem to be running.

It doesn't seem to be run on Windows, so that's an issue.

The biggest problem is probably that the tests are very limited though, which is why they didn't catch this. I'll add one for Chasen mode.

@HiromuHota
Copy link
Author

Oh, my bad. Yes, Travis runs the unit tests both on Linux and on Mac.

@polm
Copy link
Collaborator

polm commented Apr 27, 2020

Ah, I believe I found the issue. The Chasen format is not hard-coded the way wakati is (see 注意事項), and it's not included in the Unidic dicrc. So my commits that removed the included IPAdic broke calls to Chasen.

I'll see if I can get the original error message that makes this clear. I tried to get it before and the error string was always empty, but that may have been a different kind of error.

@polm
Copy link
Collaborator

polm commented Apr 27, 2020

Well, unfortunately it looks like there's no obvious reason the error string isn't shown.

Here's the SWIG code for the Tagger constructor:

MeCab::Tagger* new_MeCab_Tagger () {
  MeCab::Tagger *tagger = MeCab::createTagger("-C");
  if (! tagger) throw MeCab::getLastError();
  return tagger;
}

When it fails, Mecab::getLastError() is supposed to return a string describing the error. However, that string is always empty. (I verified this by changing the method call to a constant string, which worked fine.) I ran into the same issue in fugashi - if MeCab has an internal error, the error string doesn't seem to be exposed.

I may be able to work around this, I'll take another look at it tomorrow.

Regarding the -Ochasen issue though - that's not a bug, it's just normal behavior for an invalid command line, so there's no reason to add a specific test for it. It would be good to be able to differentiate error causes though...

@polm
Copy link
Collaborator

polm commented Apr 28, 2020

I have looked at the MeCab source and tried modifying things in SWIG but unfortunately I cannot get the error out of MeCab. It's possible I'm misunderstanding something, but this might be a bug in MeCab.

I think is where the error is set for the Tagger.

https://github.com/taku910/mecab/blob/3a07c4eefaffb4e7a0690a7f4e5e0263d3ddb8a3/mecab/src/tagger.cpp#L1049-L1057

This looks like where it is set in the Model:

https://github.com/taku910/mecab/blob/3a07c4eefaffb4e7a0690a7f4e5e0263d3ddb8a3/mecab/src/tagger.cpp#L348-L356

The code is a bit different.

I tried instantiating a Model using MeCab.Model('-Ochasen') and that gives the correct error string, even though the code is basically the same as for the Tagger, so I think this might be a bug in MeCab.

polm added a commit to polm/mecab-python3 that referenced this issue Apr 28, 2020
See SamuraiT#45 for details. For some reason the Tagger constructor doesn't
produce error strings, but the Model constructor does. Since the arg
format is the same you can get an error by passing it to the Model.

May want to automate that if it turns out this is an upstream bug that
can't be fixed.
@HiromuHota
Copy link
Author

Just a information, I tried MeCab.Model('-Ochasen') and got the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/hiromu/miniconda3/lib/python3.7/site-packages/MeCab/__init__.py", line 119, in __init__
    super(Model, self).__init__(args)
RuntimeError: writer.cpp(63) [!tmp.empty()] unknown format type [chasen]

Hope this helps.

@polm
Copy link
Collaborator

polm commented May 6, 2020

As this is not a bug I'm going to close it, though I may try getting a better error message later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants