Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add fix for 3 chinese chars, and 2nd char tone should not be 5 #2369

Closed
wants to merge 4 commits into from

Conversation

david-95
Copy link
Contributor

@david-95 david-95 commented Sep 9, 2022

PR types

PR changes

Describe

@mergify mergify bot added the T2S label Sep 9, 2022
@david-95
Copy link
Contributor Author

david-95 commented Sep 9, 2022

老头子 这类词,会被分词成 老头,子。 子之前已被轻声了, 分解之后 老头这词又是属于 neural must ,头又会被轻声。所以会又两个tone 5 ,这个修复主要是改正这个。改完后测试,wer 有升高,但都是标注错误所致

@@ -156,6 +156,9 @@ def _yi_sandhi(self, word: str, finals: List[str]) -> List[str]:
return finals

def _split_word(self, word: str) -> List[str]:
if len(word) == 3 and word[
-1:] == '子': # three chars, like 老头子,the second char tone should not be 5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处应该是 word[-1] 吧?

Copy link
Contributor Author

@david-95 david-95 Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a="abc"
a[-1]
'c'
a[-1:]
'c'

no difference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use compare_base.py to get badcases

@yt605155624
Copy link
Collaborator

yt605155624 commented Sep 9, 2022

image

我觉得不如直接把 “老头” 和 “老太” 删掉,这个字典是我在网上找的,没有一一核对过,我觉得他不该轻声,另外记得用 pre-commit 刷一下代码格式

@yt605155624
Copy link
Collaborator

image

我觉得不如直接把 “老头” 和 “老太” 删掉,这个字典是我在网上找的,没有一一核对过,我觉得他不该轻声,另外记得用 pre-commit 刷一下代码格式

fixed in #2370

@yt605155624 yt605155624 closed this Sep 9, 2022
@david-95 david-95 deleted the hongliang0909-2 branch September 9, 2022 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants