Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

None-2018-形態素解析器『Sudachi』のための大規模辞書開発 #295

Open
BrambleXu opened this issue Dec 12, 2019 · 0 comments
Assignees
Labels
Dict(M) Dictionary/Lexicon Based Model JP(P) Japanese NLP Problem

Comments

@BrambleXu
Copy link
Owner

Summary:

Resource:

  • pdf
  • [code](
  • [paper-with-code](

Paper information:

  • Author:
  • Dataset:
  • keywords:

Notes:

我々は,汎用的な辞書として使用できる大規模かつ高品質の辞書データの構築を目指す

UniDic是基于 『現代日本語書き言葉均衡コーパス』(BCCWJ)(有标签)开发的。但是Unidic里没有一些常见的固有名词。

我々は,UniDic をベースに,NEologd から大量の固有名称を登録し,大規模な辞書デー
タを構築した。280 万語を超える登録規模となり,付加情報の整備も着実に進んでいる。

XU:不行啊,量不够。

Model Graph:

Result:

Thoughts:

Next Reading:

@BrambleXu BrambleXu added the JP(P) Japanese NLP Problem label Dec 12, 2019
@BrambleXu BrambleXu self-assigned this Dec 12, 2019
@BrambleXu BrambleXu added the Dict(M) Dictionary/Lexicon Based Model label Dec 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dict(M) Dictionary/Lexicon Based Model JP(P) Japanese NLP Problem
Projects
None yet
Development

No branches or pull requests

1 participant