-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
position_ids
related PPO bug
#217
Comments
This issue is very interesting and I will confirm it as soon as possible. If so, the bug is caused by huggingface and we will fix it. @tianhao-nexusflow btw, huggingface TRL also does not use the position_ids |
fixed position_ids Llama-7B
|
I'm working to reproduce results and align them with
trlx
. After careful code review, I've identified this issue:Concern: link, Precomputing log probabilities using only
token_ids
andattention_mask
may not fully account for the model's use of positional embeddings. As for llama model, the sametoken_ids
with or without left padding would have different result even it explicitly pass in theattention_mask
. A common strategy to correct this is to explicitly pass in theposition_ids
as in linkPotential Impact: The same padding related issue could also impact the generate quality, while I need more time to carefully look into the code. Will this also impact the critic and reward modules?
The text was updated successfully, but these errors were encountered: