Skip to content

ZHZisZZ/modpo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MODPO: Multi-Objective Direct Preference Optimization

This repo includes a reference implementation of MODPO, an algorithm that extends Direct Preference Optimization (DPO) for multiple alignment objectives with minimal overheads, as described in the paper Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization.

MODPO adapts DPO for multiple objectives with two extra lines of codes

The MODPO loss function is shown in modpo_trainer.py#L142 while the DPO loss function is shown in dpo_trainer.py#L413. MODPO differs in that it includes an extra margin to make sure that the language model is steered by more than one objective.

Installation

Create virtual env

create -n modpo python=3.10
conda activate modpo

Install dependencies

git clone https://github.com/ZHZisZZ/modpo.git
cd modpo
pip install -r requirements.txt
pip install torch=2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install flash-attn==2.3.2 --no-build-isolation

MODPO examples

Safety alignment

bash scripts/modpo/beavertails/run.sh

This script reproduces the safety alignment experiments from our paper. See wandb reports for experimental results here.

Summarization with length penalty

bash scripts/modpo/summarize_w_length_penalty/run.sh

This script reproduces the experiments from Disentangling Length from Quality in Direct Preference Optimization, which is a simpified version of Long-form QA experiments from our paper. We apply MODPO to balance human preference and response length in summarizing long text.

Other examples

This repository also supports some common training pipline:

  • supervised fine-tuning: scripts/examples/sft/run.sh
  • reward training: scripts/examples/rm/run.sh
  • dpo fine-tuning: scripts/examples/dpo/run.sh

If you want to implement your alignment algorithms, please add new trainers under src/trainer.

Adding customized datasets

REAL_DATASET_CONFIGS(src/data/configs.py) lists the datasets currently suppported. If you want to train on your customized datasets, please add new datasets under src/data/raw_data and modify REAL_DATASET_CONFIGS(src/data/configs.py) accordingly. Please see src/data/raw_data/shp for an example.

Citing MODPO

If you find MODPO useful, you can use the following BibTeX entry:

@misc{zhou2023onepreferencefitsall,
      title={Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization}, 
      author={Zhanhui Zhou and Jie Liu and Chao Yang and Jing Shao and Yu Liu and Xiangyu Yue and Wanli Ouyang and Yu Qiao},
      year={2023},
      eprint={2310.03708},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

About

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published