Skip to content

MijinkoSD/kuromoji.ts

Repository files navigation

kuromoji.ts

Build Status Test Status Deploy Pages Status
Coverage (Lines) Coverage (Statements) Coverage (Functions) Coverage (Branches)

The code is implemented in TypeScript and is a port from the JavaScript implementation kuromoji.js.

You can see the actual behavior on the demo page.

README.md in other languages:

Directory

demo/         -- Demo page
dict/         -- Dictionary data for tokenizer (compressed with gunzip)
dist/         -- Transpiled JavaScript source code
docs/         -- Image data for README.md
src/          -- TypeScript source code
test/         -- Unit test

Usage

Please refer to Usage.

API

The function tokenize() returns an JSON array like this:

[ {
    word_id: 509800,          // 辞書内での単語ID
    word_type: 'KNOWN',       // 単語タイプ(辞書に登録されている単語ならKNOWN, 未知語ならUNKNOWN)
    word_position: 1,         // 単語の開始位置
    surface_form: '黒文字',    // 表層形
    pos: '名詞',               // 品詞
    pos_detail_1: '一般',      // 品詞細分類1
    pos_detail_2: '*',        // 品詞細分類2
    pos_detail_3: '*',        // 品詞細分類3
    conjugated_type: '*',     // 活用型
    conjugated_form: '*',     // 活用形
    basic_form: '黒文字',      // 基本形
    reading: 'クロモジ',       // 読み
    pronunciation: 'クロモジ'  // 発音
  } ]

This is defined in src/util/IpadicFormatter.ts.