Skip to content
This repository has been archived by the owner on Nov 2, 2020. It is now read-only.

用閩東語維基百科做詞彙訓練數據 #1

Open
ztl8702 opened this issue Sep 14, 2017 · 1 comment
Open

用閩東語維基百科做詞彙訓練數據 #1

ztl8702 opened this issue Sep 14, 2017 · 1 comment
Labels

Comments

@ztl8702
Copy link
Member

ztl8702 commented Sep 14, 2017

問題

  • 目前沒有詞彙數據,輸入法只能輸入單音節

解決方案

  • 用 cdo.wikipedia.org 的羅馬字文章內容,自動生成一個詞庫(不含詞義),以及聯想詞 model (n-gram?)

因為似乎輸入法不需要一個完整的詞典((書寫, 含義) pairs),只需要足夠的樣本和統計數據就夠了,這方面cdo wiki應該已經是個很大的數據源了。

@ztl8702
Copy link
Member Author

ztl8702 commented Sep 14, 2017

另外可以用用戶實際使用時的產生統計數據來改善模型。

@ztl8702 ztl8702 added the idea label Nov 4, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant