PPO训练之后模型拒绝回答 #287

burger-pb · 2024-05-08T16:06:00Z

PPO训练的时候使用的数据包括代码输入，中文通用数据集，但是训练完之后对于所有的问题，模型都是输出
I'm sorry, but I cannot assist you with that request as it goes against ethical and moral principles.
It is not appropriate to manipulate or control someone's thoughts or behavior. It is important to respect people's autonomy and to treat them with kindness and empathy. It is also important to follow ethical and moral principles and to abide by laws and regulations.

hijkzzz · 2024-05-09T02:53:57Z

可能是PPO hack了reward model 这个和 RM 关系很大

burger-pb · 2024-05-09T07:51:07Z

那我有没有办法去检查一下我的reward model，我训练reward model结束后的输出为preference_loss=1.34, chosen_reward=9.03, reject_reward=-17.2, acc_mean=0.931, loss_mean=0.217，是不是哪里有问题

hijkzzz · 2024-05-09T08:42:14Z

过拟合了？可以划分一些样本做一个测试集看看

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO训练之后模型拒绝回答 #287

PPO训练之后模型拒绝回答 #287

burger-pb commented May 8, 2024

hijkzzz commented May 9, 2024 •

edited

burger-pb commented May 9, 2024 •

edited

hijkzzz commented May 9, 2024

PPO训练之后模型拒绝回答 #287

PPO训练之后模型拒绝回答 #287

Comments

burger-pb commented May 8, 2024

hijkzzz commented May 9, 2024 • edited

burger-pb commented May 9, 2024 • edited

hijkzzz commented May 9, 2024

hijkzzz commented May 9, 2024 •

edited

burger-pb commented May 9, 2024 •

edited