Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documentation of the parsed results? #91

Closed
YameiW opened this issue Feb 28, 2023 · 2 comments
Closed

documentation of the parsed results? #91

YameiW opened this issue Feb 28, 2023 · 2 comments

Comments

@YameiW
Copy link

YameiW commented Feb 28, 2023

Hello there,

I am using Mecab to parse Japanese sentences. But I am confused by the results. Do you have some documents that I can read to understand the parsing results?

For instance, What does each column mean, and what is the meaning of some numbers in the last column? Does Mecab give us the dependency information that we can use to extract nominal phrases?

Any help would be appreciated!

Screenshot 2023-02-28 at 1 09 29 PM

@polm
Copy link
Collaborator

polm commented Mar 6, 2023

The output format depends on your config file and your dictionary.

You seem to be using the full sized UniDic with accent information (the last column in your output), so you'll need to check your config file against the dictionary format. Or, instead, you could just use fugashi, which will parse all UniDic fields into a namedtuple for easy use. See here for an overview of fields.

MeCab cannot annotate the field names because they are not stored in the config or dictionary itself anywhere.

Also MeCab does not generate any kind of dependency information.

In general, the official MeCab docs may be helpful.

@polm
Copy link
Collaborator

polm commented Mar 6, 2023

Closing this because I believe that answers your question, but if anything is unclear please feel free to follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants