Reward modeling support #836

wheresmyhair · 2024-05-20T16:47:56Z

[Ready for review]
Reward modeling support
Tested on:

research4pan

LGTM 👍 Reward modeling is important for RLHF such as DPO and RAFT, thanks for @wheresmyhair 's contribution!

research4pan · 2024-05-27T16:03:44Z

Several additional fixes in this PR:

Squash warnings for samplings exceeding maximum lengths during tokenization & grouping.
Remove --conversation_template disable

wheresmyhair and others added 7 commits May 21, 2024 00:32

Move old rm scripts

598cd9f

Reward modeling lmflow style ready for test

94617f1

Reward modeling bug fix

a67eabd

Reward modeling script default dataset change

2733ab0

RM bug fix when conversation contains only messages

5520dcb

data download shell add rm dataset

40858a9

rm_trainer modify

fe9fc2d

research4pan approved these changes May 27, 2024

View reviewed changes

research4pan merged commit 0c11ace into main May 27, 2024
3 checks passed

wheresmyhair deleted the yizhenjia-rm branch May 28, 2024 16:08

Provide feedback