Skip to content
Brooke M. Fujita edited this page Jan 4, 2018 · 5 revisions

Usage

Before You Begin Parsing

Before you start using natto-py, I highly suggest trying out mecab from the command-line. Spend some time acquainting yourself with the many high-level options that come with MeCab. If you find that the stock output formats do not do what you want, then come up with your own. Also, please pay close attention to the encodings of both MeCab and the dictionaries used.

Python Version and String-handling

Python 2

  • Please use the same character encoding as the charset value as MeCab's system dictionary for all input.
  • All MeCab output via natto-py will likewise be in the same encoding as the system dictionary's charset.

Python 3

  • natto-py will examine the charset of MeCab's system dictionary, and automatically translate all input from Unicode into this character encoding.
  • Similary, all MeCab output via natto-py will be translated from the system dictionary charset into Unicode.

Usage Examples

Here are some simple examples for using natto-py:

  1. とりあえず使ってみよう!: Start using natto-py to check your MeCab and dictionary info, parse a sentence using default chasen format
  2. わかち書き Parsing: Tokenize a sentence into its parts of speech
  3. N-Best 読み: Obtain the 2 best probable readings for a list of kanji
  4. 振り仮名変換: Converting kanji in text to furigana
  5. 出力フォーマットの指定: User-defined Output Formatting

Previous | Home | Next