Learnt Reward Modelling example #25

cat-state · 2022-10-11T02:10:33Z

Create an example showing reward modeling. This could use a synthetic reward source artificially limited, or the HHH Anthropic data (already on the Stability cluster).
More ideas for tasks: #13 (comment)
(cc @haileyschoelkopf)

LouisCastricato · 2022-10-11T02:47:27Z

@jagilley is doing this with a prompt engineered reward model.

cat-state · 2022-10-11T02:48:30Z

@jagilley is doing this with a prompt engineered reward model.

ohh I actually meant one with learning a reward model, I'll clarify the title

LouisCastricato · 2022-10-11T02:50:05Z

Great. Folks at ScaleAI are doing this.

LouisCastricato · 2022-10-11T02:53:48Z

I sent them the issue. Daniel, happy to assign scale folks to this issue.

cat-state mentioned this issue Oct 11, 2022

RLHF with HH Anthropic data #26

Closed

cat-state changed the title ~~Reward Modelling example~~ Learnt Reward Modelling example Oct 11, 2022

LouisCastricato closed this as completed Jan 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learnt Reward Modelling example #25

Learnt Reward Modelling example #25

cat-state commented Oct 11, 2022 •

edited

LouisCastricato commented Oct 11, 2022

cat-state commented Oct 11, 2022

LouisCastricato commented Oct 11, 2022

LouisCastricato commented Oct 11, 2022

Learnt Reward Modelling example #25

Learnt Reward Modelling example #25

Comments

cat-state commented Oct 11, 2022 • edited

LouisCastricato commented Oct 11, 2022

cat-state commented Oct 11, 2022

LouisCastricato commented Oct 11, 2022

LouisCastricato commented Oct 11, 2022

cat-state commented Oct 11, 2022 •

edited