This is the code and associated datasets for the paper titled
Our paper makes use of two external benchmark datasets:
- Opinion: OpinionsQA, anonymous public opinion survey response data collected by Pew Research.
- Preference: MovieLens 1M, anonymous ratings of over 6000 movie titles from MovieLens users.
- Install conda
brew install conda
- Setup and activate virtual environment for the project
conda create -n "persona" python=3.11.11
conda activate persona
- Install the requirements
python -m ensurepip --upgrade
pip install -r requirements.txt
- Export OpenAI Key and Huggingface Token (if using these resources)
export OPENAI_API_KEY=<Your OpenAI Key>
export HF_TOKEN=<Your Huggingface Token>
This section describes the overall organization of the folder and usage of basic scripts needed.
this repository
│ README.md
│ ..other setup files
│
└───src
│ │ personalized_movielens.py --> runner for MovieLens
│ │ personalized_opinionqa.py --> runner for OpinionQA
│ │ persona.py --> basic persona template setup
│ │ utils.py --> prompts
│ │
The basic arguments required in any runner script (for example, personalized_opinionqa.py
) are as follows -
- in_path: The location of the dataset
- out_dir: the directory where the results will be dumped
- cache_path: Name of the exact file where query cache will be maintained
- num_implicit: Number of judgments to add in the persona
- max_users: How many users to run inference for. In case of OpinionQA, this is number of users per topic
- max_ques: Max questions/user
- max_topics: Only applicable for OpinionQA. Use -1 for all topics (15)
- option: Which type of prompt to use, for both rationale and prediction generation
- -1: No Persona (set num_implicit to 0 in addition to this option)
- 0: Only Judgment
- 1: Only Demographics (set num_implicit to 0 in addition to this option)
- 2: Demographics + Judgment
- 6: Experiences Scaffold
- 7: No Scaffold
- 8: Primal Beliefs Scaffold
- 9: Big 5 Personality Traits Scaffold
- 10: Shwartz Theory Scaffold
- engine: Which model to use. Set "gpt-4" or HF name of the model
- answer_pov: Which POV to use for answer prompt. Options are
first
andthird
. - explanation_pov: Which POV to use for explanation generation prompt. Options are
first
andthird
.
We demonstrate a test run to check if your setup works. Details on how to replicate the full experimental setup is provided after.
- Enter the folder where the run scripts are present
cd src
- Use the
test
files for running tests.- Make changes in the script according to the test run you have to do.
bash run_test.sh
- Viewing Results
- The cache for the queries will be saved in the
cache
location specified in therun_test.sh
files. - The results will be stored in
data/model-output/<name of dataset>
and will be timestamped for the run. - For more details about these, check the Viewing Results section.
- The cache for the queries will be saved in the
All setup remains the same as above, except run bash run_baseline.sh
for Demographics + Judgment setting and bash run_method.sh
for PB&J with Primal Beliefs Scaffold. Parameters in these scripts can be changed to run other baselines
After you run the jobs locally or on bolt (after pulling from conductor), there are two edits that will be made in the data
folder
-
Changes in the cache file, usually present in the
data/cache/<dataset name>
folder. The cache is to make sure that queries are not duplicated when sent to an LLM (particularly helpful for API-based models). -
A new results folder will be timestamped and dumped in
data/model-output/<dataset name>
folder. The folder name will contain details of the run (some param configs) and two filesmodel_accuracy.json
will contain the overall accuracy metricsmodel_generation.json
will contain instance wise predictions of the model stored for post processing.
We provide a sample of what a run result looks like, for the test run described above.
Model output for the test run is provided in data/model-output/opinionqa/test_run
and the corresponding prompt cache containing the generated rationales is in data/cache/opinionqa/test.jsonl
.
This code is primarily intended to run PB&J-style persona prompts, along with baselines. It is build on a wrapper based on this repository. Any issues with model calling can be reported to this repository.
We thank authors of the paper titled Aligning Language Models to User Opinions and their codebase which provided the starting point for all scripts in this codebase.