Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad case caused by no-consonant char #11

Open
freshbirdDD opened this issue Jun 2, 2020 · 4 comments
Open

Bad case caused by no-consonant char #11

freshbirdDD opened this issue Jun 2, 2020 · 4 comments

Comments

@freshbirdDD
Copy link

In your trained consonantMap_TwoDCode, no-consontant mappings to (99999.0,99999.0), which causes some chars like "我", "一" are not similar with any char with consonant(e.g. "过", "鸡"). Does this make sense?

@kunqian-58
Copy link

@marina-danilevsky would you please take a look at it and possibly answer it? I translated only the trained model, the algorithm part is a black box to me.

@marinadanilevsky
Copy link

I'm sorry, I'm not at all sure how to answer this (I do not know Chinese, and my co-author who does, and did some of this implementation originally, left the company some time ago). Could you possibly be more specific? I can't quite tell if this is a bug in the mapping generally, or something you're observing with specific input.

@freshbirdDD
Copy link
Author

freshbirdDD commented Jul 29, 2020

import dimsim
dimsim.get_distance("我", "火")
67339.46237343867
dimsim.get_distance("果", "火")
1.1904761904761905
@marinadanilevsky In the above example, “我”, "果" and "火" have similar pronunciation, but get quite different distance in dimsim. Because "我" has no-consonant which mappings to (99999.0,99999.0), but "火" and "果" have explicit consonant which mapping to (7.0, 3.0) and (7.0, 0.5)

@liuyijiang1994
Copy link

import dimsim
dimsim.get_distance("我", "火")
67339.46237343867
dimsim.get_distance("果", "火")
1.1904761904761905
@marinadanilevsky In the above example, “我”, "果" and "火" have similar pronunciation, but get quite different distance in dimsim. Because "我" has no-consonant which mappings to (99999.0,99999.0), but "火" and "果" have explicit consonant which mapping to (7.0, 3.0) and (7.0, 0.5)

修改dimsim中utils/pinyin.py pinyinRewrite方法中58和64行,即注释掉# self.consonant = ""即可

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants