-
Notifications
You must be signed in to change notification settings - Fork 13
出力フォーマットの指定
Brooke M. Fujita edited this page Mar 9, 2015
·
2 revisions
Using the output format macros described in Appendix F: Output Formatting and the --*-format
options, you can specify your own output formats.
from natto import MeCab
# OUTPUT DEFINITIONS
# ------------------
# node-format -- node exists in dictionary
# bos-format -- beginning-of-sentence node
# eos-format -- end-of-sentence node
# unk-format -- node does not exist in dictionary
#
# MACROS USED
# ------------------
# %m -- morpheme
# %f[0] -- part-of-speech
# %h -- POS ID
# %s -- node stat
fmt = { 'node_format': r'%m\t%f[0](%h),%s\n',
'bos_format': r'BOS>\n',
'eos_format': r'EOS>\n',
'unk_format': r'%m?\t%f[0](%h),%s\n' }
nm = MeCab(options=fmt)
# ブルザエモン is not a real word, it has stat 1 (unknown)
print(nm.parse('ブルザエモンは何者ですか?'))
BOS>
ブルザエモン? 名詞(38),1
は 助詞(16),0
何者 名詞(38),0
です 助動詞(25),0
か 助詞(22),0
? 記号(4),0
EOS>
The above example use a dictionary to specify the output formats. It is also possible to use the long or short option formats of MeCab. Please refer to the --node-format
, --bos-format
, --eos-format
, and --unk-format
options descriptions for more details.