Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chat-v2版本偶尔出现回复中包含英文单词。 #5

Open
openmynet opened this issue Apr 29, 2024 · 2 comments
Open

chat-v2版本偶尔出现回复中包含英文单词。 #5

openmynet opened this issue Apr 29, 2024 · 2 comments

Comments

@openmynet
Copy link

Q: 张敏的爸爸有三个孩子,其中大儿子叫大明,小儿子叫小明,那么张敏的爸爸的二儿子叫什么?
A: 张敏的爸爸的二儿子就是张敏本人。因为题目中提到的是张敏的爸爸有三个孩子,并且提到了大儿子和小儿子的名字是大明和小明,那么剩下一个孩子就是张敏 herself。

如果使用 LooksJuicy/ruozhiba 的数据进行训练,建议对该数据集进行清洗,里面有些回答是片面,错误的。

@Shenzhi-Wang
Copy link
Owner

感谢您的反馈和建议!我们训练数据集中并没有包含弱智吧的数据集。

但我们在测试过程中也发现偶尔会出现这种情况,一种可能的解决方法是在system prompt中加入一句:

"If the prompt doesn't specify a language, you should reply in the same language as the prompt."

在我的测试中,您所用的prompt应该就不会出现夹带英文的情况了。

@Shenzhi-Wang
Copy link
Owner

可以尝试一下我们的v2.1的版本,这个问题减少了很多。

https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat/commit/4788ab8512511daa7b80f75c85ceb703661a4a4c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants