[New feature] Integrate DPO #759

gzliyu · 2024-04-11T07:56:53Z

Background: [DPO is available?] #741
Integrated trl.trainer.DPOTrainer into LMFlow
Run by

./scripts/run_dpo_align.sh \
  --model_name_or_path /home/nlpintern1/liyu/models/0313_sft_llama_full \
  --dataset_path /home/nlpintern1/liyu/dataset/stack-exchange-paired/data \
  --output_lora_path output_models/dpo_lora

research4pan

The whole framework looks good to me, Thanks! Some improvements can be made in later commits.

`scripts/run_dpo_align.sh`

Line 6: we may specify no cuda visible devices, as some users may want to run on single gpus.
Line 7: maybe meta-llama/Llama-2-7b-hf instead.
Line 8: I am wondering if this dataset is available in LMFlow data server?
Line 10: Same as Line 6.

research4pan · 2024-04-11T15:01:35Z

Let's wait for more tests before it can be merged into main, maybe merged into a feature branch first.

gzliyu and others added 3 commits April 9, 2024 17:25

initial commit

99e48cc

Merge branch 'OptimalScale:main' into main

539d96d

implemented dpo

c5e1130

research4pan approved these changes Apr 11, 2024

View reviewed changes

gzliyu and others added 2 commits April 11, 2024 23:33

pr improvements

70b14a0

Merge branch 'OptimalScale:main' into main

427c95a

gzliyu mentioned this pull request Apr 12, 2024

[New feature] Integrate DPO #762

Merged

wheresmyhair mentioned this pull request Apr 28, 2024

Add DPO support #797

Merged

research4pan merged commit 8a70f48 into OptimalScale:main Apr 28, 2024
0 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New feature] Integrate DPO #759

[New feature] Integrate DPO #759

gzliyu commented Apr 11, 2024

research4pan left a comment

research4pan commented Apr 11, 2024

[New feature] Integrate DPO #759

[New feature] Integrate DPO #759

Conversation

gzliyu commented Apr 11, 2024

research4pan left a comment

Choose a reason for hiding this comment

scripts/run_dpo_align.sh

research4pan commented Apr 11, 2024

`scripts/run_dpo_align.sh`