ACL-2018-Subcharacter Information in Japanese Embeddings: When Is It Worth It? #291

BrambleXu · 2019-12-04T06:47:48Z

Summary:

subcharacter information对于中文是有效的，那么日文又如何呢？研究发现subcharacter对于中文的提升效果在日文上并不稳定（我想应该是有片假名和平假名的缘故吧）。但是在一些汉字比较多的场景下，character ngrams效果确实有提高。不过在实验中，发现即使是enhanced skip-gram 也比不上 single-character ngram fasttext。

Resource:

pdf
[code](
[paper-with-code](

Paper information:

Author:
Dataset:
keywords:

Notes:

fastText是subword level model，可以学习character n-grams。

SG: we modified SG by summing the target word vector w with vectors of its constituent characters c1, and c2. This can be regarded as a special case of FastText, where the minimal n-gram size and maximum n-gram size are both set to 1.
SG+kanji: learn Chinese word embeddings based on characters and sub-characters (Yu 2017 Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components)
SG+kanji+bushu: 加了偏旁部首的意思

Model Graph:

Result:：

Thoughts:

Next Reading:

Crescentz · 2020-11-12T06:30:57Z

请问有开源么

BrambleXu self-assigned this Dec 4, 2019

BrambleXu added Embedding Embedding/Pre-train Model/Task JP(P) Japanese NLP Problem labels Dec 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ACL-2018-Subcharacter Information in Japanese Embeddings: When Is It Worth It? #291

ACL-2018-Subcharacter Information in Japanese Embeddings: When Is It Worth It? #291

BrambleXu commented Dec 4, 2019

Crescentz commented Nov 12, 2020

ACL-2018-Subcharacter Information in Japanese Embeddings: When Is It Worth It? #291

ACL-2018-Subcharacter Information in Japanese Embeddings: When Is It Worth It? #291

Comments

BrambleXu commented Dec 4, 2019

Crescentz commented Nov 12, 2020