Please follow the following commands to install dependencies.
# create an environment
conda create -n prd python=3.8
conda activate prd
# install from the requirement file by pip
pip install -r requirement.txt
We publish the dataset Vicuna80
in the data
folder. For information about datasets, please refer to the README file.
For information about generated results, please refer to the README file.
Please follow the bash commands to run corresponding parts.
Please enter the peer_rank
folder by the following command.
cd peer_rank/
Please run the gen_{reviewer}.sh
scripts to generate reviews for answers from one pair of model. For example,
./gen_claude.sh ../data/vicuna80/generations/answer_[Model 1].jsonl ../data/vicuna80/generations/answer_[Model 2].jsonl
To generate reviews for answers from all pairs of models, please run the gen_{reviewer}_all.sh
. For example,
./gen_claude_all.sh
To run peer ranking, please open the peer_ranking.ipynb
file by any Jupyter Notebook.
Please enter the peer_discussion
folder by the following command.
cd peer_discussion/
Before running any python script, please make sure the file config.yml
contains correct configurations you need.
python review_lfqa.py
There is no codes of generating reviews for Vicuna80 since they are provided in the Peer Rank related codes.
# discuss on LFQA
python gather_all_lfqa.py
python discuss_lfqa.py
# discuss on Vicuna80
python gather_all_vicuna80.py
python discuss_vicuna80.py
Please cite the following if find our work helpful.
@misc{li2023prd,
title={PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations},
author={Ruosen Li and Teerth Patel and Xinya Du},
year={2023},
eprint={2307.02762},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Following 2 options are available for any clarification, comments or suggestions
- Create an issue.
- Contact Ruosen Li, Teerth Roshan Patel, Xinya Du by email.