You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SG: we modified SG by summing the target word vector w with vectors of its constituent characters c1, and c2. This can be regarded as a special case of FastText, where the minimal n-gram size and maximum n-gram size are both set to 1.
SG+kanji: learn Chinese word embeddings based on characters and sub-characters (Yu 2017 Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components)
SG+kanji+bushu: 加了 偏旁部首 的意思
Model Graph:
Result::
Thoughts:
Next Reading:
The text was updated successfully, but these errors were encountered:
Summary:
subcharacter information对于中文是有效的,那么日文又如何呢?研究发现subcharacter对于中文的提升效果在日文上并不稳定(我想应该是有片假名和平假名的缘故吧)。但是在一些汉字比较多的场景下,character ngrams效果确实有提高。不过在实验中,发现即使是enhanced skip-gram 也比不上 single-character ngram fasttext。
Resource:
Paper information:
Notes:
fastText是subword level model,可以学习character n-grams。
Model Graph:
Result::
Thoughts:
Next Reading:
The text was updated successfully, but these errors were encountered: