New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Learnt Reward Modelling example #25
Comments
@jagilley is doing this with a prompt engineered reward model. |
ohh I actually meant one with learning a reward model, I'll clarify the title |
Great. Folks at ScaleAI are doing this. |
I sent them the issue. Daniel, happy to assign scale folks to this issue. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Create an example showing reward modeling. This could use a synthetic reward source artificially limited, or the HHH Anthropic data (already on the Stability cluster).
More ideas for tasks: #13 (comment)
(cc @haileyschoelkopf)
The text was updated successfully, but these errors were encountered: