# Observing Variance on Questionnaire Answers w.r.t. Different Prompts (e.g., Persona Descriptions)

In this notebook we will develop a system to 
1. Run LLM with prompt $p$ on a questionnaire $Q = \{(q_i, [a_{i1}, a_{i2}, \dots])\}_{i\in [N]}$, recording the probability distribution over answers. 
     - **Mathematically**: Given an LLM $P_{LM}$ and prompt $p$, we compute trajectories $\mathcal T = \{P_{LM}(a_{ij} | p + q_i)\}$ for all $q_i, a_{ij}$ in questionnaire $Q = \{(q_i, [a_{i1}, a_{i2}, \dots])\}_{i\in [N]}$.
2. Visualize the "landscape" of prompts $p$ in terms of their questionnaire probability distributions/trajectories $\mathcal T$. 
     - The shape of trajectories $\mathcal T$ on questionnaire $\mathcal Q$ is the same regardless of the prompt/model. 
     - Therefore we can apply standard high dimensional data analysis techniques
     to understand the distribution of trajectories $\mathcal T$ w.r.t.  various
     system prompts $p$. 
         - PCA/tSNE/UMAP
         - Training classifiers to discriminate prompts based on some
         human-defined metric (sentiment, personality, psychiatric conditions,
         etc.). 
         - Topological data analysis techniques(?)


## Synthetic Dataset

We will start with a simple synthetic dataset of {question-answers} of the form 

```json
// datasets/light_heavy_qa.jsonl
{"question": "What is the weight of your head?", "answers": ["My head feels very light.", "My head feels light.", "My head feels neutral.", "My head feels heavy.", "My head feels very heavy."]}
{"question": "What is the weight of your neck?", "answers": ["My neck feels very light.", "My neck feels light.", "My neck feels neutral.", "My neck feels heavy.", "My neck feels very heavy."]}
...
```

We will then generate a set of system prompts, e.g., 
```json
// datasets/light_heavy_prompts.jsonl
{"prompt": "You are currently feeling contempt.", "id": "0", "tag": "contempt"}
{"prompt": "You are currently feeling sadness.", "id": "1", "tag": "sadness"}
...
```

Ensure that the prompts are describing the agent in the 2nd person ("You are..."), 
the questions address the model in the second person ("How are you...") and the 
answers are short, simple answers in the first person("I am...").

Include a question mark at the end of the question, and a period at the end of
each answer and each prompt. Ensure there is a trailing space at the end of each
prompt and question so that the concatenated `prompt + question + answer` is a
properly formatted string.

The question-answer dataset and the prompt dataset should also be in JSONL format. 


## Running LLM on Questionnaire

Conway is currently forwarding to a server running Llama-3 8b with [minference](https://github.com/amanb2000/minference). Where the API url would be `http://control.languagegame.io/colossus/ce_loss`. 

For each prompt, we want to compute the CE loss on each P(answer_i | prompt + question). 
We should store that in a dictionary, and we will eventually save it to disk. 


## Analyzing Results

The result of the CE loss calls should be $-\log P(a_{ki} | p_j + q_k)$ for all 
$i, j, k$ where ($a_{ki}$ is the $i$ th answer to the $k$ th question, $p_j$ is
the $j$ th prompt, and $q_k$ is the $k$ th question). 

We will concatenate these loss values into one vector per prompt $p_j$. We can 
then do PCA/tSNE and look at the landscape/distances between different prompts. 

We can also try to train a classifier that discriminates prompts with tag 
`high valence` and `low valence`. 

We hope to compare the body maps produced by LLM to those produced by humans. 

In [12]:
import requests
import jsonlines
import json
import os

from tqdm import tqdm



# Test the first prompt-question-answer triple
API_URL = "http://localhost:4445/ce_loss"

CACHE_FILE = '../cache/light_heavy_ce_losses_pqa.json'
QUESTION_DATASET = '../datasets/light_heavy_qa.jsonl'
PROMPT_DATASET = '../datasets/light_heavy_prompts.jsonl'


# Load the datasets
with jsonlines.open(QUESTION_DATASET) as f:
    questions = list(f)

with jsonlines.open(PROMPT_DATASET) as f:
    prompts = list(f)



In [13]:
rewrite=False

# add 'id' to each question if its not there 
for i in range(len(questions)): 
    if 'id' not in questions[i]: 
        rewrite=True
        questions[i]['id'] = i

# add 'id' to each prompt if its not there
for i in range(len(prompts)): 
    if 'id' not in prompts[i]: 
        rewrite = True
        prompts[i]['id'] = i

# output back to the question and answer datasets 
if rewrite: 
    print(f"REWRITING DATASETS: {QUESTION_DATASET} and {PROMPT_DATASET}")
    with jsonlines.open(QUESTION_DATASET, 'w') as f:
        for question in questions:
            f.write(question)

    with jsonlines.open(PROMPT_DATASET, 'w') as f:
        for prompt in prompts:
            f.write(prompt)

In [14]:
# Define the CE loss function
def loss_call(prompt_question_string, answer_string, API_URL):
    data = {
        "context_string": prompt_question_string,
        "corpus_string": answer_string
    }
    response = requests.post(API_URL, json=data)

    if response.status_code == 200:
        return response.json()
    else:
        print(f"Error {response.status_code}: {response.text}")
        return None

for i in range(len(questions[0]['answers'])):
    prompt = prompts[0]['prompt']
    question = questions[0]['question']
    answer = questions[0]['answers'][i]

    prompt_question_string = prompt + question
    print(f"\nPrompt + Question: {prompt_question_string}")
    print(f"Answer: {answer}")

    loss = loss_call(prompt_question_string, answer, API_URL)
    print(f"CE Loss: {loss}")


Prompt + Question: You are currently feeling contempt.What is the weight of your head?
Answer: My head feels very light.
CE Loss: {'initial_request': {'context_string': 'You are currently feeling contempt.What is the weight of your head?', 'corpus_string': 'My head feels very light.'}, 'loss': 5.7340168952941895}

Prompt + Question: You are currently feeling contempt.What is the weight of your head?
Answer: My head feels light.
CE Loss: {'initial_request': {'context_string': 'You are currently feeling contempt.What is the weight of your head?', 'corpus_string': 'My head feels light.'}, 'loss': 6.565013408660889}

Prompt + Question: You are currently feeling contempt.What is the weight of your head?
Answer: My head feels neutral.
CE Loss: {'initial_request': {'context_string': 'You are currently feeling contempt.What is the weight of your head?', 'corpus_string': 'My head feels neutral.'}, 'loss': 7.406974792480469}

Prompt + Question: You are currently feeling contempt.What is the wei

In [17]:
if not os.path.exists(CACHE_FILE):
    print(f"Cache file {CACHE_FILE} not found. Running CE loss calculations...")
    # Initialize a dictionary to store the results
    results = {}

    # Iterate over prompts
    for prompt in prompts:
        prompt_id = prompt['id']
        prompt_text = prompt['prompt']
        
        # Initialize a dictionary for the current prompt
        results[prompt_id] = {}
        
        # Iterate over questions
        for question in tqdm(questions):
            question_id = question['id']
            question_text = question['question']
            
            # Initialize a dictionary for the current question
            results[prompt_id][question_id] = {}
            
            # Iterate over answers
            for answer in question['answers']:
                # Concatenate prompt and question
                prompt_question_string = prompt_text + question_text
                
                # Call the loss_call function to get the CE loss
                loss = loss_call(prompt_question_string, answer, API_URL)
                
                # Store the CE loss in the results dictionary
                results[prompt_id][question_id][answer] = loss

    # Print the results
    print(results)
else: 
    print(f"Cache file {CACHE_FILE} found. Loading results...")
    with open(CACHE_FILE, 'r') as f:
        results = json.load(f)

Cache file ../cache/light_heavy_ce_losses_pqa.json found. Loading results...


In [18]:
# save to disk 
if not os.path.exists(CACHE_FILE):
    with open(CACHE_FILE, 'w') as f:
        json.dump(results, f)