Skip to content

apple/ml-scaffold

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Improving LLM Personas via Rationalization with Psychological Scaffolds

This is the code and associated datasets for the paper titled

Improving LLM Personas via Rationalization with Psychological Scaffolds. Brihi Joshi, Xiang Ren, Swabha Swayamdipta, Rik Koncel-Kedziorski, Tim Paek

Table of Contents

Datasets for Reproducing Results

Our paper makes use of two external benchmark datasets:

  • Opinion: OpinionsQA, anonymous public opinion survey response data collected by Pew Research.
  • Preference: MovieLens 1M, anonymous ratings of over 6000 movie titles from MovieLens users.

Setup

  • Install conda
brew install conda
  • Setup and activate virtual environment for the project
conda create -n "persona" python=3.11.11
conda activate persona
  • Install the requirements
python -m ensurepip --upgrade
pip install -r requirements.txt
  • Export OpenAI Key and Huggingface Token (if using these resources)
export OPENAI_API_KEY=<Your OpenAI Key>
export HF_TOKEN=<Your Huggingface Token>

Scripts

This section describes the overall organization of the folder and usage of basic scripts needed.

this repository
│   README.md
│   ..other setup files   
│
└───src
│   │   personalized_movielens.py --> runner for MovieLens
│   │   personalized_opinionqa.py --> runner for OpinionQA
│   │   persona.py --> basic persona template setup
│   │   utils.py --> prompts
│   │

The basic arguments required in any runner script (for example, personalized_opinionqa.py) are as follows -

  • in_path: The location of the dataset
  • out_dir: the directory where the results will be dumped
  • cache_path: Name of the exact file where query cache will be maintained
  • num_implicit: Number of judgments to add in the persona
  • max_users: How many users to run inference for. In case of OpinionQA, this is number of users per topic
  • max_ques: Max questions/user
  • max_topics: Only applicable for OpinionQA. Use -1 for all topics (15)
  • option: Which type of prompt to use, for both rationale and prediction generation
    • -1: No Persona (set num_implicit to 0 in addition to this option)
    • 0: Only Judgment
    • 1: Only Demographics (set num_implicit to 0 in addition to this option)
    • 2: Demographics + Judgment
    • 6: Experiences Scaffold
    • 7: No Scaffold
    • 8: Primal Beliefs Scaffold
    • 9: Big 5 Personality Traits Scaffold
    • 10: Shwartz Theory Scaffold
  • engine: Which model to use. Set "gpt-4" or HF name of the model
  • answer_pov: Which POV to use for answer prompt. Options are first and third.
  • explanation_pov: Which POV to use for explanation generation prompt. Options are first and third.

Running Jobs

We demonstrate a test run to check if your setup works. Details on how to replicate the full experimental setup is provided after.

Test Run

  • Enter the folder where the run scripts are present
cd src
  • Use the test files for running tests.
    • Make changes in the script according to the test run you have to do.
bash run_test.sh
  • Viewing Results
    • The cache for the queries will be saved in the cache location specified in the run_test.sh files.
    • The results will be stored in data/model-output/<name of dataset> and will be timestamped for the run.
    • For more details about these, check the Viewing Results section.

Full Runs

All setup remains the same as above, except run bash run_baseline.sh for Demographics + Judgment setting and bash run_method.sh for PB&J with Primal Beliefs Scaffold. Parameters in these scripts can be changed to run other baselines

Viewing Results

After you run the jobs locally or on bolt (after pulling from conductor), there are two edits that will be made in the data folder

  • Changes in the cache file, usually present in the data/cache/<dataset name> folder. The cache is to make sure that queries are not duplicated when sent to an LLM (particularly helpful for API-based models).

  • A new results folder will be timestamped and dumped in data/model-output/<dataset name> folder. The folder name will contain details of the run (some param configs) and two files

    • model_accuracy.json will contain the overall accuracy metrics
    • model_generation.json will contain instance wise predictions of the model stored for post processing.

Test Run Results

We provide a sample of what a run result looks like, for the test run described above. Model output for the test run is provided in data/model-output/opinionqa/test_run and the corresponding prompt cache containing the generated rationales is in data/cache/opinionqa/test.jsonl.

Housekeeping

Separate wrapper for prompting

This code is primarily intended to run PB&J-style persona prompts, along with baselines. It is build on a wrapper based on this repository. Any issues with model calling can be reported to this repository.

Acknowledgement

We thank authors of the paper titled Aligning Language Models to User Opinions and their codebase which provided the starting point for all scripts in this codebase.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published