Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPOSFTDataset bug report和相关问题咨询 #49

Open
DZ9 opened this issue Feb 5, 2024 · 1 comment
Open

PPOSFTDataset bug report和相关问题咨询 #49

DZ9 opened this issue Feb 5, 2024 · 1 comment

Comments

@DZ9
Copy link

DZ9 commented Feb 5, 2024

ppo_datahelper.py此处代码和对应函数不适配。
image

另外想正好咨询一下:

  1. 此处应该padding left or right?
  2. llama2默认是padding right,但我看到reward model里的batch数据都是padding left,ppo这里都有很多地方也是padding到left的,具体的padding对齐策略是怎样的呢?
  3. 我发现loss_mask最终会把对应的tokenid改为0,ppo_trainer.py ,然后和模型输出做cross entropy,这里被mask掉的数据,好像依旧会按照label是0而进行梯度回传,能否咨询下这里的具体原理呢?
    image
@Ablustrund
Copy link
Collaborator

  1. pad-left = false是从右侧padding
  2. reward model padding left是为了防止最后打分的token为pad,影响效果。
  3. lossmask吧tokenid改为pad,所以最终是不会计算loss的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants