Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returns only top result with NBest option #36

Closed
itmammoth opened this issue Oct 30, 2019 · 2 comments
Closed

Returns only top result with NBest option #36

itmammoth opened this issue Oct 30, 2019 · 2 comments

Comments

@itmammoth
Copy link

Hi,
I am trying to parse some words with the NBest option, but it doesn't seem to work correctly.

  • mecab 0.996
  • mecab-ptyhon3 0.996.2
  • python 3.7.4

What mecab returns is

$ echo こんにゃく粉 | mecab -N2
こんにゃく粉	名詞,一般,*,*,*,*,こんにゃく粉,コンニャクコ,コンニャクコ
EOS
こんにゃく	名詞,一般,*,*,*,*,こんにゃく,コンニャク,コンニャク
粉	名詞,接尾,一般,*,*,*,粉,コ,コ
EOS

On the other hand, what mecab-python3 returns is

import MeCab

tagger = MeCab.Tagger('-N2')
print(tagger.parse('こんにゃく粉'))
こんにゃく粉     名詞,一般,*,*,*,*,こんにゃく粉,コンニャクコ,コンニャクコ
EOS
@polm
Copy link
Collaborator

polm commented Nov 6, 2019

The -N option doesn't work that way when initializing the tagger, you need to use an nbest related method in the API. Example:

tagger.parseNBest(2, 'こんにゃく粉')
# =>  'こんにゃく 粉 \nこんにゃく 粉 \n'
# note: I get the same results for the top two, probably because I'm using Unidic.

In general, be careful because the command line works slightly differently from the C API in MeCab. Another example is that from the command line all newlines are treated as sentence boundaries, while from the API they're just whitespace.

@itmammoth
Copy link
Author

Thanks @polm !
You saved my day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants