-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add fix for 3 chinese chars, and 2nd char tone should not be 5 #2369
Conversation
老头子 这类词,会被分词成 老头,子。 子之前已被轻声了, 分解之后 老头这词又是属于 neural must ,头又会被轻声。所以会又两个tone 5 ,这个修复主要是改正这个。改完后测试,wer 有升高,但都是标注错误所致 |
@@ -156,6 +156,9 @@ def _yi_sandhi(self, word: str, finals: List[str]) -> List[str]: | |||
return finals | |||
|
|||
def _split_word(self, word: str) -> List[str]: | |||
if len(word) == 3 and word[ | |||
-1:] == '子': # three chars, like 老头子,the second char tone should not be 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
此处应该是 word[-1] 吧?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a="abc"
a[-1]
'c'
a[-1:]
'c'
no difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use compare_base.py to get badcases
fixed in #2370 |
PR types
PR changes
Describe