-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add helpful and harmless example #128
Conversation
e695418
to
aedb7d7
Compare
4b93676
to
86e5e02
Compare
34817b6
to
a0894ea
Compare
checkpoint_dir: "checkpoints/ppo_hh" | ||
|
||
model: | ||
model_path: "EleutherAI/gpt-j-6B" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe let's make the default example use a smaller model so more people can run it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Max! I left a tiny nit for the README.md
.
On another note, I'm hitting an issue when using the GPU-hosting launch command from the README.md
(for both ILQL and PPO) where the samples list provided to the ppo_hh reward_fn
is empty causing an IndexError
. Could you please try this?
- Traceback Info: https://gist.github.com/jon-tow/e8aefbd0e7b06a1afc21dba38238d089
Note this only occurs when --num_processes
is provided an odd number (quickly checked for 2, 4, and 6 processes fine).
@jon-tow regarding your issue, If you upgrade to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reciprocated I had forgotten about the accelerate==0.15.0
metric bug 😅 This looks great from my end (not sure if @Dahoas has any more feedback).
This PR adds an example of training on Anthropic's dataset
PPO: https://wandb.ai/sorry/trlx/runs/ab0ehsxm
SFT: https://wandb.ai/sorry/trlx/runs/0aulb18k
ILQL: https://wandb.ai/sorry/trlx/runs/30ab7epk