This repo includes a reference implementation of MODPO, an algorithm that extends Direct Preference Optimization (DPO) for multiple alignment objectives with minimal overheads, as described in the paper Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization.
The MODPO loss function is shown in modpo_trainer.py#L142 while the DPO loss function is shown in dpo_trainer.py#L413. MODPO differs in that it includes an extra margin to make sure that the language model is steered by more than one objective.
create -n modpo python=3.10
conda activate modpo
git clone https://github.com/ZHZisZZ/modpo.git
cd modpo
pip install -r requirements.txt
pip install torch=2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install flash-attn==2.3.2 --no-build-isolation
bash scripts/modpo/beavertails/run.sh
This script reproduces the safety alignment experiments from our paper. See wandb reports for experimental results here.
bash scripts/modpo/summarize_w_length_penalty/run.sh
This script reproduces the experiments from Disentangling Length from Quality in Direct Preference Optimization, which is a simpified version of Long-form QA experiments from our paper. We apply MODPO to balance human preference and response length in summarizing long text.
This repository also supports some common training pipline:
supervised fine-tuning
:scripts/examples/sft/run.sh
reward training
:scripts/examples/rm/run.sh
dpo fine-tuning
:scripts/examples/dpo/run.sh
If you want to implement your alignment algorithms, please add new trainers under src/trainer
.
REAL_DATASET_CONFIGS(src/data/configs.py)
lists the datasets currently suppported.
If you want to train on your customized datasets, please add new datasets under src/data/raw_data
and modify REAL_DATASET_CONFIGS(src/data/configs.py)
accordingly. Please see src/data/raw_data/shp
for an example.
If you find MODPO useful, you can use the following BibTeX entry:
@misc{zhou2023onepreferencefitsall,
title={Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization},
author={Zhanhui Zhou and Jie Liu and Chao Yang and Jing Shao and Yu Liu and Xiangyu Yue and Wanli Ouyang and Yu Qiao},
year={2023},
eprint={2310.03708},
archivePrefix={arXiv},
primaryClass={cs.LG}
}