Skip to content

A Clojure library for Japanese morphological analyzer MeCab

Notifications You must be signed in to change notification settings

Jah524/clojure-mecab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

clojure-mecab

A Clojure library for Japanese morphological analyzer MeCab.

Prerequisites

Install Mecab

  • Try below example to check whether mecab has been correctly installed
$ echo すもももももももものうち | mecab
すもも	名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も	助詞,係助詞,*,*,*,*,も,モ,モ
もも	名詞,一般,*,*,*,*,もも,モモ,モモ
も	助詞,係助詞,*,*,*,*,も,モ,モ
もも	名詞,一般,*,*,*,*,もも,モモ,モモ
の	助詞,連体化,*,*,*,*,の,ノ,ノ
うち	名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS

Add dependency to your project.clj

[jah524/clojure-mecab "0.3.0"]

Usage

(use 'clojure-mecab)

(parse "すもももももももものうち")
;=> [["すもも" "名詞" "一般" "*" "*" "*" "*" "すもも" "スモモ" "スモモ"]
;    ["も" "助詞" "係助詞" "*" "*" "*" "*" "も" "モ" "モ"]
;    ["もも" "名詞" "一般" "*" "*" "*" "*" "もも" "モモ" "モモ"]
;    ["も" "助詞" "係助詞" "*" "*" "*" "*" "も" "モ" "モ"]
;    ["もも" "名詞" "一般" "*" "*" "*" "*" "もも" "モモ" "モモ"]
;    ["の" "助詞" "連体化" "*" "*" "*" "*" "の" "ノ" "ノ"]
;    ["うち" "名詞" "非自立" "副詞可能" "*" "*" "*" "うち" "ウチ" "ウチ"]]


(extract-words "すもももももももものうち")
;=> ["すもも" "も" "もも" "も" "もも" "の" "うち"]

(extract-words "すもももももももものうち" ["名詞"] [])
;=> ["すもも" "もも" "もも" "うち"]

(extract-words "すもももももももものうち" [] ["名詞"])
;=> ["も" "も" "の"]

(extract-words "すもももももももものうち" ["名詞"] ["非自立"])
;=> ["すもも" "もも" "もも"]

Note

This library uses clojure.java.shell/sh to access mecab so that you do not need Java JNI bindings. If you want to deploy your application to Saas such as Heroku, you better use kuromoji (Java implementation) instead. But mecab is much faster than kuromoji, so you should use mecab when you process massive data.

License

Copyright © 2018 Jah524

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

About

A Clojure library for Japanese morphological analyzer MeCab

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published