Improving LLM Personas via Rationalization with Psychological Scaffolds

This is the code and associated datasets for the paper titled

Improving LLM Personas via Rationalization with Psychological Scaffolds. Brihi Joshi, Xiang Ren, Swabha Swayamdipta, Rik Koncel-Kedziorski, Tim Paek

Datasets for Reproducing Results

Our paper makes use of two external benchmark datasets:

Opinion: OpinionsQA, anonymous public opinion survey response data collected by Pew Research.
Preference: MovieLens 1M, anonymous ratings of over 6000 movie titles from MovieLens users.

Setup

Install conda

brew install conda

Setup and activate virtual environment for the project

conda create -n "persona" python=3.11.11
conda activate persona

Install the requirements

python -m ensurepip --upgrade
pip install -r requirements.txt

Export OpenAI Key and Huggingface Token (if using these resources)

export OPENAI_API_KEY=<Your OpenAI Key>
export HF_TOKEN=<Your Huggingface Token>

Scripts

This section describes the overall organization of the folder and usage of basic scripts needed.

this repository
│   README.md
│   ..other setup files   
│
└───src
│   │   personalized_movielens.py --> runner for MovieLens
│   │   personalized_opinionqa.py --> runner for OpinionQA
│   │   persona.py --> basic persona template setup
│   │   utils.py --> prompts
│   │

The basic arguments required in any runner script (for example, personalized_opinionqa.py) are as follows -

in_path: The location of the dataset
out_dir: the directory where the results will be dumped
cache_path: Name of the exact file where query cache will be maintained
num_implicit: Number of judgments to add in the persona
max_users: How many users to run inference for. In case of OpinionQA, this is number of users per topic
max_ques: Max questions/user
max_topics: Only applicable for OpinionQA. Use -1 for all topics (15)
option: Which type of prompt to use, for both rationale and prediction generation
- -1: No Persona (set num_implicit to 0 in addition to this option)
- 0: Only Judgment
- 1: Only Demographics (set num_implicit to 0 in addition to this option)
- 2: Demographics + Judgment
- 6: Experiences Scaffold
- 7: No Scaffold
- 8: Primal Beliefs Scaffold
- 9: Big 5 Personality Traits Scaffold
- 10: Shwartz Theory Scaffold
engine: Which model to use. Set "gpt-4" or HF name of the model
answer_pov: Which POV to use for answer prompt. Options are first and third.
explanation_pov: Which POV to use for explanation generation prompt. Options are first and third.

Running Jobs

We demonstrate a test run to check if your setup works. Details on how to replicate the full experimental setup is provided after.

Test Run

Enter the folder where the run scripts are present

cd src

Use the test files for running tests.
- Make changes in the script according to the test run you have to do.

bash run_test.sh

Viewing Results
- The cache for the queries will be saved in the cache location specified in the run_test.sh files.
- The results will be stored in data/model-output/<name of dataset> and will be timestamped for the run.
- For more details about these, check the Viewing Results section.

Full Runs

All setup remains the same as above, except run bash run_baseline.sh for Demographics + Judgment setting and bash run_method.sh for PB&J with Primal Beliefs Scaffold. Parameters in these scripts can be changed to run other baselines

Viewing Results

After you run the jobs locally or on bolt (after pulling from conductor), there are two edits that will be made in the data folder

Changes in the cache file, usually present in the data/cache/<dataset name> folder. The cache is to make sure that queries are not duplicated when sent to an LLM (particularly helpful for API-based models).
A new results folder will be timestamped and dumped in data/model-output/<dataset name> folder. The folder name will contain details of the run (some param configs) and two files
- model_accuracy.json will contain the overall accuracy metrics
- model_generation.json will contain instance wise predictions of the model stored for post processing.

Test Run Results

We provide a sample of what a run result looks like, for the test run described above. Model output for the test run is provided in data/model-output/opinionqa/test_run and the corresponding prompt cache containing the generated rationales is in data/cache/opinionqa/test.jsonl.

Housekeeping

Separate wrapper for prompting

This code is primarily intended to run PB&J-style persona prompts, along with baselines. It is build on a wrapper based on this repository. Any issues with model calling can be reported to this repository.

Acknowledgement

We thank authors of the paper titled Aligning Language Models to User Opinions and their codebase which provided the starting point for all scripts in this codebase.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
ACKNOWLEDGEMENTS		ACKNOWLEDGEMENTS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Improving LLM Personas via Rationalization with Psychological Scaffolds

Table of Contents

Datasets for Reproducing Results

Setup

Scripts

Running Jobs

Test Run

Full Runs

Viewing Results

Test Run Results

Housekeeping

Separate wrapper for prompting

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

apple/ml-scaffold

Folders and files

Latest commit

History

Repository files navigation

Improving LLM Personas via Rationalization with Psychological Scaffolds

Table of Contents

Datasets for Reproducing Results

Setup

Scripts

Running Jobs

Test Run

Full Runs

Viewing Results

Test Run Results

Housekeeping

Separate wrapper for prompting

Acknowledgement

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages