Add DPO training #1242

dmahan93 · 2024-06-25T01:30:04Z

Still a bit of a WIP (docs, adding precomputation of reference logprobs) but figured it'd be good to get it up here now since it's a fairly big change for any discussions that are needed.

- fix labels to also have valid/test implementations - fix label masking in _get_batch to also include anything from get_ltor_masks_and_position_ids

…' into add-dpo

…eful (accuracy, etc) - Add reference model setup for DPO - Add pairwise dataset for positive/negative pairs - Add DPO loss

dmahan93 · 2024-06-25T01:31:54Z

Also builds off #1240 and #1239 since the packing implementations/chat templating items are much more useful for DPO.

…ta_paths

… add-dpo

dmahan93 added 6 commits June 21, 2024 11:27

Add a chat data preprocessing script

a950f8b

add EOT at end of a chat

e360e24

- add different packing impl (Unpacked, packing until overflow)

9ee4a8f

- fix labels to also have valid/test implementations - fix label masking in _get_batch to also include anything from get_ltor_masks_and_position_ids

update README.md

0678573

Merge remote-tracking branch 'origin/add-chat-template-based-datasets…

15e3059

…' into add-dpo

- Add metrics to forward step to add DPO specific metrics that are us…

2d20d86

…eful (accuracy, etc) - Add reference model setup for DPO - Add pairwise dataset for positive/negative pairs - Add DPO loss

dmahan93 requested a review from Quentin-Anthony as a code owner June 25, 2024 01:30

dmahan93 and others added 3 commits June 25, 2024 10:07

Update arguments.py to use train_label_data_paths instead of label_da…

c045006

…ta_paths

Merge remote-tracking branch 'origin/add-different-packing-impl' into…

eed3643

… add-dpo

- Bugfixes from upstreaming....

0392080

dmahan93 marked this pull request as draft June 25, 2024 17:56

dmahan93 added 3 commits June 25, 2024 15:30

- add precompute logprobs...

361f459

- Finishing up precompute logprobs...

7398e07

- update readme for DPO...

51af714

dmahan93 marked this pull request as ready for review June 26, 2024 00:33

dmahan93 mentioned this pull request Jun 28, 2024

Add KTO training #1244

Draft

StellaAthena mentioned this pull request Jun 28, 2024

Finetune #1088

Closed

dmahan93 mentioned this pull request Jun 28, 2024

Add Reward Model training #1246

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DPO training #1242

Add DPO training #1242

dmahan93 commented Jun 25, 2024

dmahan93 commented Jun 25, 2024

Add DPO training #1242

Are you sure you want to change the base?

Add DPO training #1242

Conversation

dmahan93 commented Jun 25, 2024

dmahan93 commented Jun 25, 2024