Skip to content

NoUnique/pymecab-ko

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Current PyPI packages Test Status PyPI - Downloads Supported Platforms

pymecab-ko

This is a Python wrapper for the MeCab-ko morphological analyzer for Korean text. It works with Python 3.6 and greater.

There are several implementations of python binding or wrapper for MeCab-ko, but they are generally not well maintained.
I made it to stand on the shoulders of giants(well-maintained open-source projects like MeCab, mecab-ko and mecab-python3) with minimum modifications.
I initially named it mecab-ko-python3 because the package name referenced for development was mecab-python3,
it may seem a little arrogant, but to reduce confusion in the PyPI, the name was changed to 'mecab-ko'.
(The repository is named 'pymecab-ko' to distinguish it from original mecab-ko)

Note: If using MacOS Big Sur, you'll need to upgrade pip to version 20.3 or higher to use wheels due to a pip issue.

issueλ₯Ό μ˜μ–΄λ‘œ μž‘μ„±ν•  ν•„μš”λŠ” μ—†μŠ΅λ‹ˆλ‹€.

Note that Windows wheels require a Microsoft Visual C++ Redistributable, so be sure to install that.

Basic usage

>>> import mecab_ko as MeCab
>>> tagger = MeCab.Tagger("-Owakati")
>>> tagger.parse("아버지가방에듀어가신닀").split()
['아버지', 'κ°€', 'λ°©', '에', 'λ“€μ–΄κ°€', 'μ‹ λ‹€']

>>> tagger = MeCab.Tagger()
>>> print(tagger.parse("아버지가방에듀어가신닀"))
아버지  NNG,*,F,아버지,*,*,*,*
κ°€      JKS,*,F,κ°€,*,*,*,*
λ°©      NNG,*,T,λ°©,*,*,*,*
에      JKB,*,F,에,*,*,*,*
λ“€μ–΄κ°€  VV,*,F,λ“€μ–΄κ°€,*,*,*,*
μ‹ λ‹€    EP+EC,*,F,μ‹ λ‹€,Inflect,EP,EC,μ‹œ/EP/*+γ„΄λ‹€/EC/*
EOS

The API for pymecab-ko closely follows the API for MeCab itself, even when this makes it not very β€œPythonic.” Please consult the official MeCab documentation for more information.

Installation

Binary wheels are available for MacOS X, Linux, and Windows (64bit) are installed by default when you use pip:

pip install mecab-ko

These wheels include a copy of the MeCab-ko library and a dictionary. There is a unique dictionary available for MeCab-ko. mecab-ko-dic is automatically installed when installing pymacab-ko.

To build from source using pip,

pip install --no-binary :all: mecab-ko

Dictionaries

In order to use MeCab-ko, you must install a dictionary. There are 2 dictionaries available for MeCab-ko.
These packages, which include slight modifications for ease of use, are recommended:

Common Issues

If you get a RuntimeError when you try to run MeCab, here are some things to check:

Windows Redistributable

You have to install this to use this package on Windows.

Specifying a mecabrc

If you get this error:

error message: [ifs] no such file or directory: /usr/local/etc/mecabrc

You need to specify a mecabrc file. It's OK to specify an empty file, it just has to exist. You can specify a mecabrc with -r. This may be necessary on Debian or Ubuntu, where the mecabrc is in /etc/mecabrc.

You can specify an empty mecabrc like this:

tagger = MeCab.Tagger('-r/dev/null -d/home/hoge/mydic')

Using Unsupported Output Modes like -Ochasen

Chasen output is not a built-in feature of MeCab, you must specify it in your dicrc or mecabrc. Notably, mecab-ko-dic does not include Chasen output format. Please see the MeCab documentation.

Alternatives

Licensing

Like MeCab and mecab-python3, pymecab-ko is copyrighted free software by Taku Kudo taku@chasen.org and Nippon Telegraph and Telephone Corporation, and is distributed under a 3-clause BSD license (see the file BSD). Alternatively, it may be redistributed under the terms of the GNU General Public License, version 2 (see the file GPL) or the GNU Lesser General Public License, version 2.1 (see the file LGPL).