Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

生成的标点符号中西文标点混杂 #77

Closed
riverscn opened this issue Mar 15, 2023 · 3 comments
Closed

生成的标点符号中西文标点混杂 #77

riverscn opened this issue Mar 15, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@riverscn
Copy link

非常降低中文的输出品质……
300a743f063cd08389fc68dcd031269

@duzx16 duzx16 added the enhancement New feature or request label Mar 15, 2023
@duzx16
Copy link
Member

duzx16 commented Mar 19, 2023

已修复。请使用Hugging Face Hub上最新的模型实现。

@duzx16 duzx16 closed this as completed Mar 19, 2023
@lixiang1991
Copy link

lixiang1991 commented Mar 21, 2023

请问下这个怎么修复的?
我用最新的[tokenization_chatglm.py]跑的结果还是中文的,:;()都会输出成英文符号。
image
image

看了下Hugging Face Hub上的改动记录,就改了两行:ids = [_id for _id in ids if _id >= 0]和self.vocab_files_names["vocab_file"]。比较了ice_text.model的md5和一周前我下载的md5都是9733ffa2cf070e0b78718747ea6a32a7cfc151b9

@riverscn
Copy link
Author

用了HFH上的最新模型,标点符号还是有问题的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants