Skip to content

Extensive Self-Contrast Enables Feedback-Free Language Model Alignment

License

Notifications You must be signed in to change notification settings

THUDM/Self-Contrast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Self-Contrast

Extensive Self-Contrast Enables Feedback-Free Language Model Alignment

🤗 Model • 📚 Data • 📃 Paper

Self-Contrast

Self-Contrast is an innovative method that offers an annotation-free approach for aligning with human preference. When evaluated against the MT Bench and Alpaca eval benchmarks, Self-Contrast surpassed the performance of the original DPO with synthetic data based on SFT data. This advancement was achieved by augmenting the number of negative samples and incorporating a straightforward sample filtering technique. The results from Self-Contrast underscore a consistent improvement in alignment performance through the strategic increase of negative samples.

Table of Contents

Start

Setup Environment

pip install -r requirements.txt

Due to the special dependency of vllm, you may need to set up two separate environments for training and inference. Carefully consider whether to install vllm in the training environment.

python -m venv inference
source inference/bin/activate
pip install vllm accelerate

Download Public Models

To utilize the functionalities provided by this repository, please download the following pre-trained models, and place them in checkpoints:

  1. UAE-Large-V1
  2. Starling-RM-7B-alpha
  3. Llama-2-7b-chat-hf
  4. Mistral-7B-v0.1

Please note that when conducting experiments with Nectar, the openchat template is used. Consequently, you will need to incorporate the <|end_of_turn|> token into the model as per the instructions provided in the imoneoi/openchat repository.

Data

Quick Start

To get started with the experiment more quickly, you can directly download the data we have prepared.

You can also process the data yourself by downloading the raw data from Nectar, ultrachat_200k, hh-rlhf, or by using other SFT datasets.

Build Up Self-Contrast Data

1. Self-Generate Massive Responses

Before conducting inference, you need to train an SFT model first. Then use the SFT model to generate massive responses.

source inference/bin/activate

python src/inference.py \
--model-path checkpoints/HH-RLHF/sft/zephyr-template_HH-RLHF_SFT \
--data-path data/HH-RLHF/for_inference/train.jsonl \
--result-path inference/HH-RLHF/responses/train.jsonl \
--n 32 \
--template zephyr \
--dataset hh-rlhf \
--max-tokens 2048

sleep 5
pkill -f 'python -m vllm.entrypoints.openai.api_server'
sleep 5

deactivate

2. Compute Embeddings

python src/compute_embeddings.py \
--model-path checkpoints/UAE-Large-V1 \
--data-path inference/HH-RLHF/responses \
--save-path inference/HH-RLHF/embeddings

3. Construct Data with Cosine Similarity

Here is an example of using 75% dissimilar as negative on HH-RLHF:

for i in 1 2 4 8 16; do
   python src/construct_synthetic_data.py \
   --mode lastdedup-75 \
   --data-path inference/HH-RLHF/embeddings \
   --save-path inference/HH-RLHF/data/lastdedup-75-$i \
   --negative-quantity $i
done

Here is another example of using the response directly as negative.

for i in 1 2 4 8 16; do
   python src/construct_synthetic_data.py \
   --mode alldedup \
   --data-path inference/HH-RLHF/embeddings \
   --save-path inference/HH-RLHF/data/alldedup-$i \
   --negative-quantity $i
done

Before training, add your synthetic data to data/dataset_info.json.

Training

You can use the fully automated scripts, including training, testing, and plotting.

bash scripts/HH-RLHF/All_In_One.sh

SFT

# HH-RLHF
bash LLaMA-Factory/scripts/HH-RLHF/SFT.sh

# Nectar
bash LLaMA-Factory/scripts/Nectar/SFT.sh

# UltraChat
# we use alignment-handbook/zephyr-7b-sft-full for UltraChat

DPO

# HH-RLHF
bash LLaMA-Factory/scripts/HH-RLHF/DPO.sh

# Nectar
bash LLaMA-Factory/scripts/Nectar/DPO.sh

# UltraChat
bash LLaMA-Factory/scripts/UltraChat/DPO.sh

SPIN

# HH-RLHF
bash LLaMA-Factory/scripts/HH-RLHF/SPIN.sh

# Nectar
bash LLaMA-Factory/scripts/Nectar/SPIN.sh

# UltraChat
bash LLaMA-Factory/scripts/UltraChat/SPIN.sh

Self-Contrast

# HH-RLHF
bash LLaMA-Factory/scripts/HH-RLHF/Self-Contrast_1.sh
bash LLaMA-Factory/scripts/HH-RLHF/Self-Contrast_16.sh

# Nectar
bash LLaMA-Factory/scripts/Nectar/Self-Contrast_1.sh
bash LLaMA-Factory/scripts/Nectar/Self-Contrast_16.sh

# UltraChat
bash LLaMA-Factory/scripts/UltraChat/Self-Contrast_1.sh
bash LLaMA-Factory/scripts/UltraChat/Self-Contrast_16.sh

Evaluation

We provide a script that allows you to use the reward model to calculate the win rate against the SFT target.

source inference/bin/activate

# inference & compute reward
bash scripts/HH-RLHF/Test.sh

# compute winrate
python src/compute_winrate.py --result-dir results/hh-rlhf/test/reward

# plot figure
python src/draw_figures.py

Acknowledgments

  • Training: We would like to express our deep appreciation to the LLaMA-Factory project, for providing an exceptional tool. The versatility and efficiency of LLaMA-Factory have significantly enhanced our model training process.
  • Evaluation: We wish to extend our sincere thanks to FastChat, alpaca_eval, lm-evaluation-harness, GSM8K-eval for their valuable contributions.

Citation

@misc{liu2024extensive,
      title={Extensive Self-Contrast Enables Feedback-Free Language Model Alignment}, 
      author={Xiao Liu and Xixuan Song and Yuxiao Dong and Jie Tang},
      year={2024},
      eprint={2404.00604},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

Extensive Self-Contrast Enables Feedback-Free Language Model Alignment

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published