MODPO: Multi-Objective Direct Preference Optimization

This repo includes a reference implementation of MODPO, an algorithm that extends Direct Preference Optimization (DPO) for multiple alignment objectives with minimal overheads, as described in the paper Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization.

MODPO adapts DPO for multiple objectives with two extra lines of codes

The MODPO loss function is shown in modpo_trainer.py#L142 while the DPO loss function is shown in dpo_trainer.py#L413. MODPO differs in that it includes an extra margin to make sure that the language model is steered by more than one objective.

Installation

Create virtual env

create -n modpo python=3.10
conda activate modpo

Install dependencies

git clone https://github.com/ZHZisZZ/modpo.git
cd modpo
pip install -r requirements.txt
pip install torch=2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install flash-attn==2.3.2 --no-build-isolation

MODPO examples

Safety alignment

bash scripts/modpo/beavertails/run.sh

This script reproduces the safety alignment experiments from our paper. See wandb reports for experimental results here.

Summarization with length penalty

bash scripts/modpo/summarize_w_length_penalty/run.sh

This script reproduces the experiments from Disentangling Length from Quality in Direct Preference Optimization, which is a simpified version of Long-form QA experiments from our paper. We apply MODPO to balance human preference and response length in summarizing long text.

Other examples

This repository also supports some common training pipline:

supervised fine-tuning: scripts/examples/sft/run.sh
reward training: scripts/examples/rm/run.sh
dpo fine-tuning: scripts/examples/dpo/run.sh

If you want to implement your alignment algorithms, please add new trainers under src/trainer.

Adding customized datasets

REAL_DATASET_CONFIGS(src/data/configs.py) lists the datasets currently suppported. If you want to train on your customized datasets, please add new datasets under src/data/raw_data and modify REAL_DATASET_CONFIGS(src/data/configs.py) accordingly. Please see src/data/raw_data/shp for an example.

Citing MODPO

If you find MODPO useful, you can use the following BibTeX entry:

@misc{zhou2023onepreferencefitsall,
      title={Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization}, 
      author={Zhanhui Zhou and Jie Liu and Chao Yang and Jing Shao and Yu Liu and Xiangyu Yue and Wanli Ouyang and Yu Qiao},
      year={2023},
      eprint={2310.03708},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts

scripts

src

src

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

MODPO: Multi-Objective Direct Preference Optimization

MODPO adapts DPO for multiple objectives with two extra lines of codes

Installation

Create virtual env

Install dependencies

MODPO examples

Safety alignment

Summarization with length penalty

Other examples

Adding customized datasets

Citing MODPO

About

Releases

Packages

Languages

ZHZisZZ/modpo

Folders and files

Latest commit

History

Repository files navigation

MODPO: Multi-Objective Direct Preference Optimization

MODPO adapts DPO for multiple objectives with two extra lines of codes

Installation

Create virtual env

Install dependencies

MODPO examples

Safety alignment

Summarization with length penalty

Other examples

Adding customized datasets

Citing MODPO

About

Resources

Stars

Watchers

Forks

Languages