Skip to content

dongyoung-go/CPM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Compositional preference models for aligning LMs

We introduce Compositional Preference Models (CPMs), a novel framework for training robust and interpretable preference models.

The generic handling of language models and the generation depend on HuggingFace's Transformers library.

Usage

  • run.sh: Run overall experiments
  • feature_extract/annotation.sh: Extract feature values using LM
  • mle-train/logistic_fits.sh: Train logistic classifier that combines feature values into single model
  • reward_model/pm_training.sh: Train standard preference model
  • mle-train/preference_evaluation.sh: Evaluate preference alignment with LLM

Citing

@inproceedings{go2023compositional,
  title={Compositional Preference Models for Aligning LMs},
  author={Go, Dongyoung and Korbak, Tomasz and Kruszewski, Germ{\'a}n and Rozen, Jos and Dymetman, Marc},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2023}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages