Skip to content


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation - English README


2023-12-14 UPDATE: 0.6.8 Release

Try it:

pip install --upgrade 'sudachipy>=0.6.8' logo is a Rust implementation of Sudachi, a Japanese morphological analyzer.

日本語 README SudachiPy Documentation


$ git clone
$ cd ./

$ cargo build --release
$ cargo install --path sudachi-cli/
$ ./

$ echo "高輪ゲートウェイ駅" | sudachi
高輪ゲートウェイ駅  名詞,固有名詞,一般,*,*,*    高輪ゲートウェイ駅


Multi-granular Tokenization

$ echo 選挙管理委員会 | sudachi
選挙管理委員会  名詞,固有名詞,一般,*,*,*        選挙管理委員会

$ echo 選挙管理委員会 | sudachi --mode A
選挙    名詞,普通名詞,サ変可能,*,*,*    選挙
管理    名詞,普通名詞,サ変可能,*,*,*    管理
委員    名詞,普通名詞,一般,*,*,*        委員
会      名詞,普通名詞,一般,*,*,*        会

Normalized Form

$ echo 打込む かつ丼 附属 vintage | sudachi
打込む  動詞,一般,*,*,五段-マ行,終止形-一般     打ち込む
かつ丼  名詞,普通名詞,一般,*,*,*        カツ丼
附属    名詞,普通名詞,サ変可能,*,*,*    付属
vintage 名詞,普通名詞,一般,*,*,*        ビンテージ

Wakati (space-delimited surface form) Output

$ cat lemon.txt

$ sudachi --wakati lemon.txt
えたい の 知れ ない 不吉 な 塊 が 私 の 心 を 始終 圧え つけ て い た 。
焦躁 と 言おう か 、 嫌悪 と 言おう か ― ― 酒 を 飲ん だ あと に 宿酔 が ある よう に 、 酒 を 毎日 飲ん で いる と 宿酔 に 相当 し た 時期 が やっ て 来る 。
それ が 来 た の だ 。 これ は ちょっと いけ なかっ た 。


You need, default plugins, and a dictionary. (This crate don't include dictionary.)

1. Get the source code

$ git clone

2. Download a Sudachi Dictionary

Sudachi requires a dictionary to operate. You can download a dictionary ZIP file from WorksApplications/SudachiDict (choose one from small, core, or full), unzip it, and place the system_*.dic file somewhere. By the default setting file, assumes that it is placed at resources/system.dic.

Convenience Script

Optionally, you can use the shell script to download a dictionary and install it to resources/system.dic.

$ ./

3. Build

$ cargo build --release

Build (bake dictionary into binary)

This was un-implemented and does not work currently, see #35

Specify the bake_dictionary feature to embed a dictionary into the binary. The sudachi executable will contain the dictionary binary. The baked dictionary will be used if no one is specified via cli option or setting file.

You must specify the path the dictionary file in the SUDACHI_DICT_PATH environment variable when building. SUDACHI_DICT_PATH is relative to the directory (or absolute).

Example on Unix-like system:

# Download dictionary to resources/system.dic
$ ./

# Build with bake_dictionary feature (relative path)
$ env SUDACHI_DICT_PATH=resources/system.dic cargo build --release --features bake_dictionary

# or

# Build with bake_dictionary feature (absolute path)
$ env SUDACHI_DICT_PATH=/path/to/my-sudachi.dic cargo build --release --features bake_dictionary

4. Install $ cargo install --path sudachi-cli/

$ which sudachi

$ sudachi -h
sudachi 0.6.0
A Japanese tokenizer

Usage as a command

$ sudachi -h
A Japanese tokenizer

Usage: sudachi [OPTIONS] [FILE] [COMMAND]

          Builds system dictionary
          Builds user dictionary

          Print this message or the help of the given subcommand(s)

          Input text file: If not present, read from STDIN

  -r, --config-file <CONFIG_FILE>
          Path to the setting file in JSON format
  -p, --resource_dir <RESOURCE_DIR>
          Path to the root directory of resources
  -m, --mode <MODE>
          Split unit: "A" (short), "B" (middle), or "C" (Named Entity) [default: C]
  -o, --output <OUTPUT_FILE>
          Output text file: If not present, use stdout
  -a, --all
          Prints all fields
  -w, --wakati
          Outputs only surface form
  -d, --debug
          Debug mode: Print the debug information
  -l, --dict <DICTIONARY_PATH>
          Path to sudachi dictionary. If None, it refer config and then baked dictionary
      --split-sentences <SPLIT_SENTENCES>
          How to split sentences [default: yes]
  -h, --help
          Print help (see more with '--help')
  -V, --version
          Print version


Columns are tab separated.

  • Surface
  • Part-of-Speech Tags (comma separated)
  • Normalized Form

When you add the -a (--all) flag, it additionally outputs

  • Dictionary Form
  • Reading Form
  • Dictionary ID
    • 0 for the system dictionary
    • 1 and above for the user dictionaries
    • -1 if a word is Out-of-Vocabulary (not in the dictionary)
  • Synonym group IDs
  • (OOV) if a word is Out-of-Vocabulary (not in the dictionary)
$ echo "外国人参政権" | sudachi -a
外国人参政権    名詞,普通名詞,一般,*,*,*        外国人参政権    外国人参政権    ガイコクジンサンセイケン      0       []
echo "阿quei" | sudachipy -a
阿      名詞,普通名詞,一般,*,*,*        阿      阿              -1      []      (OOV)
quei    名詞,普通名詞,一般,*,*,*        quei    quei            -1      []      (OOV)

When you add -w (--wakati) flag, it outputs space-delimited surface instead.

$ echo "外国人参政権" | sudachi -m A -w
外国 人 参政 権


  • Out of Vocabulary handling
  • Easy dictionary file install & management, similar to SudachiPy
  • Registration to



Morphological Analyzers in Rust