Skip to content

出力フォーマットの指定

Brooke M. Fujita edited this page Mar 9, 2015 · 2 revisions

出力フォーマットの指定

User-defined Output Formatting: --*-format Options

Using the output format macros described in Appendix F: Output Formatting and the --*-format options, you can specify your own output formats.

from natto import MeCab

# OUTPUT DEFINITIONS
# ------------------
# node-format -- node exists in dictionary
# bos-format  -- beginning-of-sentence node
# eos-format  -- end-of-sentence node
# unk-format  -- node does not exist in dictionary
#
# MACROS USED
# ------------------
# %m          -- morpheme
# %f[0]       -- part-of-speech
# %h          -- POS ID
# %s          -- node stat
fmt = { 'node_format': r'%m\t%f[0](%h),%s\n', 
        'bos_format': r'BOS>\n',  
        'eos_format': r'EOS>\n',  
        'unk_format': r'%m?\t%f[0](%h),%s\n' }

nm = MeCab(options=fmt)

# ブルザエモン is not a real word, it has stat 1 (unknown)
print(nm.parse('ブルザエモンは何者ですか?'))
BOS>
ブルザエモン?	名詞(38),1
は	助詞(16),0
何者	名詞(38),0
です	助動詞(25),0
か	助詞(22),0
?	記号(4),0
EOS>

The above example use a dictionary to specify the output formats. It is also possible to use the long or short option formats of MeCab. Please refer to the --node-format, --bos-format, --eos-format, and --unk-format options descriptions for more details.


Previous | Usage Top