Usage
Brooke M. Fujita edited this page Jan 4, 2018
·
5 revisions
Before you start using natto-py, I highly suggest trying out mecab
from the command-line. Spend some time acquainting yourself with the many high-level options that come with MeCab. If you find that the stock output formats do not do what you want, then come up with your own. Also, please pay close attention to the encodings of both MeCab and the dictionaries used.
- Please use the same character encoding as the
charset
value as MeCab's system dictionary for all input. - All MeCab output via natto-py will likewise be in the same encoding as the system dictionary's
charset
.
- natto-py will examine the
charset
of MeCab's system dictionary, and automatically translate all input from Unicode into this character encoding. - Similary, all MeCab output via natto-py will be translated from the system dictionary
charset
into Unicode.
Here are some simple examples for using natto-py:
-
とりあえず使ってみよう!: Start using natto-py to check your MeCab and dictionary info, parse a sentence using default
chasen
format - わかち書き Parsing: Tokenize a sentence into its parts of speech
- N-Best 読み: Obtain the 2 best probable readings for a list of kanji
- 振り仮名変換: Converting kanji in text to furigana
- 出力フォーマットの指定: User-defined Output Formatting