CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment

📣 News

[24/Feb/2024] 🎉 Our paper is accepted by LREC-COLING 2024 (The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation)!

✨ Abstract

Reinforcement learning from human feedback (RLHF) is a crucial technique in aligning large language models (LLMs) with human preferences, ensuring these LLMs behave in beneficial and comprehensible ways to users. However, a longstanding challenge in human alignment techniques based on reinforcement learning lies in their inherent complexity and difficulty in training. To address this challenge, we present a simple yet effective Contrastive Learning Framework for Human Alignment (CLHA) to align LLMs with human preferences directly. CLHA employs a novel rescoring strategy to evaluate the noise within the data by considering its inherent quality and dynamically adjusting the training process. Simultaneously, CLHA utilizes pairwise contrastive loss and adaptive supervised fine-tuning loss to adaptively modify the likelihood of generating responses, ensuring enhanced alignment with human preferences. Using advanced methods, CLHA surpasses other algorithms, showcasing superior performance in terms of reward model scores, automatic evaluations, and human assessments on the widely used “Helpful and Harmless” dataset.

✨ The pipeline of CLHA

💪 Dataset

Data Preparation

We provide the preprocessed data for training and testing, which can be get with following steps:

Download data.zip and unzip it.
Place the unzipped data folder in the root directory of the project.

Besides, we also provide the scripts for preprocessing the raw data. Please follow the steps below to prepare the data:

Create a directory named data in the root directory of this project.
Create a directory named data/raw_data in the data directory.
Download the raw data from HH-RLHF, which should be named as hhrlhf, and put it in the data/raw_data directory.
Run the following command to preprocess the data:

# For HH-RLHF
cd train/hh_preprocess_data
python step_1_process.py
python step_2_get_train_data.py
python step_3_get_test_data.py

💪 Usage

Train

We provide the training scripts for training the model. For example, you can run the following commands to train the model:

mkdir checkpoints
mkdir logs
#Download your reward models from https://huggingface.co/OpenAssistant/oasst-rm-2.1-pythia-1.4b-epoch-2.5 and https://huggingface.co/OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1
mkdir rm
cd train
# Train LLMs with HH-RLHF
./train_hh.sh [id_of_exp] hh_train_len2 2

The scripts can be easily modified to train LLMs with different datasets.

Test

The following command can be used to test the model:

# Test LLMs with HH-RLHF
cd eval_hh
./run_infer_main_dist.sh

Note: Before running, the id_of_exp and corresponding ranking length (during training) in run_infer_main_dist.sh have to be specified.

🤝 Acknowledgements

This project was inspired by PRO by [DAMO-ConvAI]. We appreciate the original work done by the author.

🔓 Citation

If this work is helpful to you, welcome to cite our paper as:

@article{fang2024clha,
  title={CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment},
  author={Fang, Feiteng and Zhu, Liang and Yang, Min and Feng, Xi and Hou, Jinchang and Zhao, Qixuan and Li, Chengming and Hu, Xiping and Xu, Ruifeng},
  journal={arXiv preprint arXiv:2403.16649},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
eval_hh		eval_hh
resources		resources
train		train
README.md		README.md
requirements.txt		requirements.txt
score.py		score.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval_hh

eval_hh

resources

resources

train

train

README.md

README.md

requirements.txt

requirements.txt

score.py

score.py

Repository files navigation

CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment

📣 News

✨ Abstract

✨ The pipeline of CLHA

💪 Dataset

Data Preparation

💪 Usage

Train

Test

🤝 Acknowledgements

🔓 Citation

About

Releases

Packages

Languages

calubkk/CLHA

Folders and files

Latest commit

History

Repository files navigation

CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment

📣 News

✨ Abstract

✨ The pipeline of CLHA

💪 Dataset

Data Preparation

💪 Usage

Train

Test

🤝 Acknowledgements

🔓 Citation

About

Resources

Stars

Watchers

Forks

Languages