# Transaction Utility Theory 2

This notebook aims to recreate some of the empirical findings of Thaler, R. (1985). Mental accounting and consumer choice. Marketing Science, 4(3), 199-214. 
Specifically we are interested in whether LLMs' responses are similar to the original responses in **section 3** of the paper, regarding the Beer question.

However, we will extend this experiment with the notion of income sensitivity. The various income values are inspired by Brand, James and Israeli, Ayelet and Ngwe, Donald, Using GPT for Market Research (March 21, 2023). Harvard Business School Marketing Unit Working Paper No. 23-062, Available at SSRN: https://ssrn.com/abstract=4395751 or http://dx.doi.org/10.2139/ssrn.4395751. Here, they research the willingness to pay for various goods, prompting the LLM with a stated annual income of $50k, $70k, and $120k.  

---

In order to run this notebook an openAI API key, as well as Replicate API token are required.  
Further they have to be set as environment variables named OPENAI_API_KEY and REPLICATE_API_TOKEN respectively.

Throughout the process of this project, adjustments have been made to this notebook, therefore the original outputs are no longer included.   
For test purposes, the notebook was re-run with a very low number of iterations per prompt and every export of prompts, dictionaries or results was commented out.

---

In [1]:
from openai import OpenAI
import openai
import matplotlib.pyplot as plt
import os 
import numpy as np
import pandas as pd
from tqdm import tqdm
import replicate
from ast import literal_eval
import plotly.graph_objects as go
import pickle
from collections import Counter

In [2]:
# Get openAI API key (previously saved as environmental variable)
openai.api_key = os.environ["OPENAI_API_KEY"]

# Set client
client = OpenAI()

# Set global plot style
plt.style.use('seaborn-v0_8')

# Set plots to be displayed in notebook
%matplotlib inline

---

#### Setting up prompts for the experiment

- LLMs used in the experiment:
    - GPT-3.5-Turbo         (ID = 1)
    - GPT-4-1106-Preview    (ID = 2)
    - LLama-70b             (ID = 3)

We can differentiate between the following scenario combinations:

- Place of purchase:
    - A fancy resort hotel              (ID = 1)
    - A small, run-down grocery store   (ID = 2)

- Level of annual income:
    - No information about income       (ID = 1)
    - $50.000                           (ID = 2)
    - $70.000                           (ID = 3)
    - $120.000                          (ID = 4)

Similar to the Prospect Theory and Decoy Effect notebooks, we will use experiment IDs to run the study. The IDs will be constructed as:

*TU2_model__placeofpurchase_income*

Therefore, TU2_3_2_1 would mean we used LLama-70b, the beer is to be purchased at a fancy resort hotel and no information about the annual income is given in the prompt.

**Median answers: $2.65 for hotel and $1.50 for store**

In [3]:
# Set up list of place of purchase
place = ["a fancy resort hotel.", "a small, run-down grocery store."]

# Set up list of annual income levels
incomes = ["", "You have an annual income of $50k.", "You have an annual income of $70k.", "You have an annual income of $120k."]

In [4]:
# Prompts for resort hotel
hotel_prompts = []
for income in incomes:
    prompt = f"""You are lying on the beach on a hot day. All you have to drink is ice water. For the last hour you have been thinking about how much you would enjoy a nice cold bottle 
    of beer. A companion gets up to go make a phone call and offers o bring back a beer from the only nearby place where beer is sold, a fancy resort hotel. He says that the beer might
    be expensive and so asks how much you are willing to pay for the beer. He says that he will buy the beer if it costs as much or less than the price you state. But if it costs more 
    than the price you state he will not buy it. You trust your friend, and there is no possibility of bargaining with the bartender. {income} What price do you tell him?"""
    hotel_prompts.append(prompt)

In [5]:
# Prompts for grocery store 
store_prompts = []
for income in incomes:
    prompt = f"""You are lying on the beach on a hot day. All you have to drink is ice water. For the last hour you have been thinking about how much you would enjoy a nice cold bottle 
    of beer. A companion gets up to go make a phone call and offers o bring back a beer from the only nearby place where beer is sold, a small, run-down grocery store. He says that
    the beer might be expensive and so asks how much you are willing to pay for the beer. He says that he will buy the beer if it costs as much or less than the price you state. 
    But if it costs more than the price you state he will not buy it. You trust your friend, and there is no possibility of bargaining with the store owner. {income} What price do you tell him?"""
    store_prompts.append(prompt)

In [6]:
# Combine into one list
TU2_prompts = hotel_prompts + store_prompts

- TU2_prompts[0]: fancy resort hotel, no info about income (Configuration 1)
- TU2_prompts[1]: fancy resort hotel, $50k annual income (Configuration 2)
- TU2_prompts[2]: fancy resort hotel, $70k annual income (Configuration 3)
- TU2_prompts[3]: fancy resort hotel, $120k annual income (Configuration 4)
- TU2_prompts[4]: store, no info about income (Configuration 5)
- TU2_prompts[5]: store, $50k annual income (Configuration 6)
- TU2_prompts[6]: store, $70k annual income (Configuration 7)
- TU2_prompts[7]: store, $120k annual income (Configuration 8)

In [87]:
# Save prompts for use in Dashboard
#with open("Dashboard/src/data/Input/TU2_prompts.pkl", "wb") as file:
#    pickle.dump(TU2_prompts, file)

- Setting up instructions the model should abide by

In [89]:
instructions = "Answer by only giving a single price in dollars and cents without an explanation."

---

### Dictionaries to store and extract information about the various experiments

In [7]:
# Dictionary to look up prompt for a given experiment id. key: experiment id, value: prompt
TU2_experiment_prompts_dict = {
    "TU2_1_1_1": TU2_prompts[0],
    "TU2_1_1_2": TU2_prompts[1],
    "TU2_1_1_3": TU2_prompts[2],
    "TU2_1_1_4": TU2_prompts[3],
    "TU2_1_2_1": TU2_prompts[4],
    "TU2_1_2_2": TU2_prompts[5],
    "TU2_1_2_3": TU2_prompts[6],
    "TU2_1_2_4": TU2_prompts[7],
    "TU2_2_1_1": TU2_prompts[0],
    "TU2_2_1_2": TU2_prompts[1],
    "TU2_2_1_3": TU2_prompts[2],
    "TU2_2_1_4": TU2_prompts[3],
    "TU2_2_2_1": TU2_prompts[4],
    "TU2_2_2_2": TU2_prompts[5],
    "TU2_2_2_3": TU2_prompts[6],
    "TU2_2_2_4": TU2_prompts[7],
    "TU2_3_1_1": TU2_prompts[0],
    "TU2_3_1_2": TU2_prompts[1],
    "TU2_3_1_3": TU2_prompts[2],
    "TU2_3_1_4": TU2_prompts[3],
    "TU2_3_2_1": TU2_prompts[4],
    "TU2_3_2_2": TU2_prompts[5],
    "TU2_3_2_3": TU2_prompts[6],
    "TU2_3_2_4": TU2_prompts[7]
}

# Dictionary to look up which model to use for a given experiment id. key: experiment id, value: model name
TU2_model_dict = {
    "TU2_1_1_1": "gpt-3.5-turbo",
    "TU2_1_1_2": "gpt-3.5-turbo",
    "TU2_1_1_3": "gpt-3.5-turbo",
    "TU2_1_1_4": "gpt-3.5-turbo",
    "TU2_1_2_1": "gpt-3.5-turbo",
    "TU2_1_2_2": "gpt-3.5-turbo",
    "TU2_1_2_3": "gpt-3.5-turbo",
    "TU2_1_2_4": "gpt-3.5-turbo",
    "TU2_2_1_1": "gpt-4-1106-preview",
    "TU2_2_1_2": "gpt-4-1106-preview",
    "TU2_2_1_3": "gpt-4-1106-preview",
    "TU2_2_1_4": "gpt-4-1106-preview",
    "TU2_2_2_1": "gpt-4-1106-preview",
    "TU2_2_2_2": "gpt-4-1106-preview",
    "TU2_2_2_3": "gpt-4-1106-preview",
    "TU2_2_2_4": "gpt-4-1106-preview",
    "TU2_3_1_1": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU2_3_1_2": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU2_3_1_3": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU2_3_1_4": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU2_3_2_1": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU2_3_2_2": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU2_3_2_3": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU2_3_2_4": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3"
}

# Dictionary to look up what prompt variable was used in a given experiment. key: experiment id, value: prompt variable
TU2_prompt_ids_dict = {
    "TU2_1_1_1": "TU2_prompts[0]",
    "TU2_1_1_2": "TU2_prompts[1]",
    "TU2_1_1_3": "TU2_prompts[2]",
    "TU2_1_1_4": "TU2_prompts[3]",
    "TU2_1_2_1": "TU2_prompts[4]",
    "TU2_1_2_2": "TU2_prompts[5]",
    "TU2_1_2_3": "TU2_prompts[6]",
    "TU2_1_2_4": "TU2_prompts[7]",
    "TU2_2_1_1": "TU2_prompts[0]",
    "TU2_2_1_2": "TU2_prompts[1]",
    "TU2_2_1_3": "TU2_prompts[2]",
    "TU2_2_1_4": "TU2_prompts[3]",
    "TU2_2_2_1": "TU2_prompts[4]",
    "TU2_2_2_2": "TU2_prompts[5]",
    "TU2_2_2_3": "TU2_prompts[6]",
    "TU2_2_2_4": "TU2_prompts[7]",
    "TU2_3_1_1": "TU2_prompts[0]",
    "TU2_3_1_2": "TU2_prompts[1]",
    "TU2_3_1_3": "TU2_prompts[2]",
    "TU2_3_1_4": "TU2_prompts[3]",
    "TU2_3_2_1": "TU2_prompts[4]",
    "TU2_3_2_2": "TU2_prompts[5]",
    "TU2_3_2_3": "TU2_prompts[6]",
    "TU2_3_2_4": "TU2_prompts[7]"
}

# Dictionary to look up what place variable was used in a given experiment. key: experiment id, value: place variable
TU2_places_dict = {
    "TU2_1_1_1": "hotel",
    "TU2_1_1_2": "hotel",
    "TU2_1_1_3": "hotel",
    "TU2_1_1_4": "hotel",
    "TU2_1_2_1": "grocery",
    "TU2_1_2_2": "grocery",
    "TU2_1_2_3": "grocery",
    "TU2_1_2_4": "grocery",
    "TU2_2_1_1": "hotel",
    "TU2_2_1_2": "hotel",
    "TU2_2_1_3": "hotel",
    "TU2_2_1_4": "hotel",
    "TU2_2_2_1": "grocery",
    "TU2_2_2_2": "grocery",
    "TU2_2_2_3": "grocery",
    "TU2_2_2_4": "grocery",
    "TU2_3_1_1": "hotel",
    "TU2_3_1_2": "hotel",
    "TU2_3_1_3": "hotel",
    "TU2_3_1_4": "hotel",
    "TU2_3_2_1": "grocery",
    "TU2_3_2_2": "grocery",
    "TU2_3_2_3": "grocery",
    "TU2_3_2_4": "grocery"
}   

# Dictionary to look up what income variable was used in a given experiment. key: experiment id, value: income variable
TU2_income_dict = {
    "TU2_1_1_1": "0",
    "TU2_1_1_2": "$50k",
    "TU2_1_1_3": "$70k",
    "TU2_1_1_4": "$120k",
    "TU2_1_2_1": "0",
    "TU2_1_2_2": "$50k",
    "TU2_1_2_3": "$70k",
    "TU2_1_2_4": "$120k",
    "TU2_2_1_1": "0",
    "TU2_2_1_2": "$50k",
    "TU2_2_1_3": "$70k",
    "TU2_2_1_4": "$120k",
    "TU2_2_2_1": "0",
    "TU2_2_2_2": "$50k",
    "TU2_2_2_3": "$70k",
    "TU2_2_2_4": "$120k",
    "TU2_3_1_1": "0",
    "TU2_3_1_2": "$50k",
    "TU2_3_1_3": "$70k",
    "TU2_3_1_4": "$120k",
    "TU2_3_2_1": "0",
    "TU2_3_2_2": "$50k",
    "TU2_3_2_3": "$70k",
    "TU2_3_2_4": "$120k"
}

# Dictionary to look up what configuration was used in a given experiment. key: experiment id, value: configuration
TU2_configuration_dict = {
    "TU2_1_1_1": 1,
    "TU2_1_1_2": 2,
    "TU2_1_1_3": 3,
    "TU2_1_1_4": 4,
    "TU2_1_2_1": 5,
    "TU2_1_2_2": 6,
    "TU2_1_2_3": 7,
    "TU2_1_2_4": 8,
    "TU2_2_1_1": 1,
    "TU2_2_1_2": 2,
    "TU2_2_1_3": 3,
    "TU2_2_1_4": 4,
    "TU2_2_2_1": 5,
    "TU2_2_2_2": 6,
    "TU2_2_2_3": 7,
    "TU2_2_2_4": 8,
    "TU2_3_1_1": 1,
    "TU2_3_1_2": 2,
    "TU2_3_1_3": 3,
    "TU2_3_1_4": 4,
    "TU2_3_2_1": 5,
    "TU2_3_2_2": 6,
    "TU2_3_2_3": 7,
    "TU2_3_2_4": 8 
}

In [8]:
# Collect dictionaries and save for use in Dashboard
TU2_dictionaries = [TU2_experiment_prompts_dict, TU2_model_dict, TU2_prompt_ids_dict, TU2_places_dict, TU2_income_dict, TU2_configuration_dict]
#with open("Dashboard/src/data/Input/TU2_dictionaries.pkl", "wb") as file:
#    pickle.dump(TU2_dictionaries, file)

---

#### Setting up functions to repeatedly prompt the LLMs

- Helper function to extract dollar amount of given answers

In [10]:
def extract_dollar_amounts(answers):
    # Only return values that start with "$"
    valid_prices = [item for item in answers if item.startswith("$") and item[1:].replace(',', '').replace('.', '').isdigit()] # check if everything after $ is a digit, exlcuding commas
    # Delete the "$" from the beginning of each price
    prices = [item.replace('$', '') for item in valid_prices]
    return prices

- Functions to query 1 prompt n times for OpenAI Models

In [11]:
def TU2_run_experiment(experiment_id, n, progress_bar, temperature):
    
    answers = []
    for _ in range(n): 
        response = client.chat.completions.create(
            model = TU2_model_dict[experiment_id], 
            max_tokens = 2,
            temperature = temperature, # range is 0 to 2
            messages = [
            {"role": "system", "content": "Answer by only giving a single price in dollars and cents without an explanation."},        
            {"role": "user", "content": 
             f"{TU2_experiment_prompts_dict[experiment_id]} Answer by only giving a single price in dollars and cents without an explanation."}
                   ])

        # Store the answer in the list
        answer = response.choices[0].message.content
        answers.append(answer.strip())
        # Update progress bar (given from either temperature loop, or set locally)
        progress_bar.update(1)

    # Extract valid prices from answers
    valid_prices = extract_dollar_amounts(answers)

    # Compute number of valid answers
    n_observations = len(valid_prices)

    # Collect results 
    results = [experiment_id, temperature, TU2_model_dict[experiment_id], TU2_places_dict[experiment_id],
                TU2_income_dict[experiment_id], answers, n_observations, TU2_configuration_dict[experiment_id]]

    # Give out results
    return results

- Adjusted function for dashboard

In [12]:
def TU2_run_experiment_dashboard(experiment_id, n, temperature):
    
    answers = []
    for _ in range(n): 
        response = client.chat.completions.create(
            model = TU2_model_dict[experiment_id], 
            max_tokens = 2,
            temperature = temperature, # range is 0 to 2
            messages = [
            {"role": "system", "content": "Answer by only giving a single price in dollars and cents without an explanation."},        
            {"role": "user", "content": 
             f"{TU2_experiment_prompts_dict[experiment_id]} Answer by only giving a single price in dollars and cents without an explanation."}
                   ])

        # Store the answer in the list
        answer = response.choices[0].message.content
        answers.append(answer.strip())


    # Extract valid prices from answers
    valid_prices = extract_dollar_amounts(answers)

    # Compute number of valid answers
    n_observations = len(valid_prices)

    # Collect results 
    results = [experiment_id, temperature, TU2_model_dict[experiment_id], TU2_places_dict[experiment_id],
                TU2_income_dict[experiment_id], f"{answers}", n_observations, TU2_configuration_dict[experiment_id]]
    results = pd.DataFrame(results, index = ["Experiment_id", "Temperature", "Model", "Place", "Income", "Answers", "Obs.", "Configuration"]).T

    # Give out results
    return results

- Functions to query 1 prompt n times (LLama)

In [13]:
def TU2_run_experiment_llama(experiment_id, n, progress_bar, temperature):
    answers = []
    for _ in range(n):
        response = replicate.run(
            TU2_model_dict[experiment_id],
            input = {
                "system_prompt":  "Answer by only giving a single price in dollars and cents without an explanation.",
                "temperature": temperature,
                "max_new_tokens": 10, 
                "prompt": f"{TU2_experiment_prompts_dict[experiment_id]} Answer by only giving a single price in dollars and cents without an explanation."
            }
        )
        # Grab answer and append to list
        answer = "" # Set to empty string, otherwise it would append the previous answer to the new one
        for item in response:
            answer = answer + item
        answers.append(answer.strip())

        # Update progress bar
        progress_bar.update(1)

    
    # Extract valid prices from answers
    valid_prices = extract_dollar_amounts(answers)

    # Compute number of valid answers
    n_observations = len(valid_prices)

    # Collect results 
    results = [experiment_id, temperature, TU2_model_dict[experiment_id], TU2_places_dict[experiment_id],
                TU2_income_dict[experiment_id], answers, n_observations, TU2_configuration_dict[experiment_id]]
    
    # Give out results
    return results

- Adjusted function for Dashboard

In [14]:
def TU2_run_experiment_llama_dashboard(experiment_id, n, temperature):
    answers = []
    for _ in range(n):
        response = replicate.run(
            TU2_model_dict[experiment_id],
            input = {
                "system_prompt":  "Answer by only giving a single price in dollars and cents without an explanation.",
                "temperature": temperature,
                "max_new_tokens": 10, 
                "prompt": f"{TU2_experiment_prompts_dict[experiment_id]} Answer by only giving a single price in dollars and cents without an explanation."
            }
        )
        # Grab answer and append to list
        answer = "" # Set to empty string, otherwise it would append the previous answer to the new one
        for item in response:
            answer = answer + item
        answers.append(answer.strip())

   # Extract valid prices from answers
    valid_prices = extract_dollar_amounts(answers)

    # Compute number of valid answers
    n_observations = len(valid_prices)

    # Collect results 
    results = [experiment_id, temperature, TU2_model_dict[experiment_id], TU2_places_dict[experiment_id],
                TU2_income_dict[experiment_id], f"{answers}", n_observations, TU2_configuration_dict[experiment_id]]
    results = pd.DataFrame(results, index = ["Experiment_id", "Temperature", "Model", "Place", "Income", "Answers", "Obs.", "Configuration"]).T
    
    # Give out results
    return results

- Function to loop run_experiment() over a list of temperature values

In [15]:
def TU2_temperature_loop(function, experiment_id, temperature_list = [0.5, 1, 1.5], n = 50):
    """
    Function to run an experiment with different temperature values.
    
    Args:
        function (function): Function to be used for querying ChatGPT i.e. run_experiment()
        experiment_id (str): ID of th e experiment to be run. Contains info about prompt and model
        temperature_list (list): List of temperature values to be looped over
        n: Number of requests for each prompt per temperature value
        max_tokens: Maximum number of tokens in response object
        
    Returns:
        results_df: Dataframe with experiment results
        probs_df: Dataframe with answer probabilities
    """    
    # Empty list for storing results
    results_list = []

    # Initialize progress bar -> used as input for run_experiment()
    progress_bar = tqdm(range(n*len(temperature_list)))

    # Loop over different temperature values, calling the input function n times each (i.e. queriyng ChatGPT n times)
    for temperature in temperature_list:
        results = function(experiment_id = experiment_id, n = n, temperature = temperature, progress_bar = progress_bar) 
        results_list.append(results)
       

    # Horizontally concatenate the results, transpose, and set index
    results_df = pd.DataFrame(results_list).transpose().set_index(pd.Index(
        ["experiment_id", "temperature", "model", "place", "income", "answers", "obs.", "configuration"]))
  
   
    # Return some information about the experiment as a check
    check = f"In this run, a total of {n*len(temperature_list)} requests were made using {TU2_prompt_ids_dict[experiment_id]}."
    # Print information about the experiment
    print(check)
 

    return results_df

---

### Model 1: GPT-3.5-Turbo

In [18]:
# List of temperature values to be looped over
temperature_list = [0.01, 0.5, 1, 1.5, 2] # 0.01 for llama

In [16]:
# For GPT-3.5-turbo we issue 100 requests per prompt & temperature value
N = 2

- Configuration 1

In [19]:
results_1 = []
results_1_1 = TU2_temperature_loop(TU2_run_experiment, "TU2_1_1_1", temperature_list = temperature_list , n = N)
results_1.append(results_1_1)

  0%|          | 0/10 [00:00<?, ?it/s]

100%|██████████| 10/10 [00:05<00:00,  1.83it/s]

In this run, a total of 10 requests were made using TU2_prompts[0].





- Configuration 2

In [20]:
results_2 = []
results_2_1 = TU2_temperature_loop(TU2_run_experiment, "TU2_1_1_2", temperature_list = temperature_list, n = N)
results_2.append(results_2_1)

100%|██████████| 10/10 [00:04<00:00,  2.17it/s]

In this run, a total of 10 requests were made using TU2_prompts[1].





- Configuration 3

In [21]:
results_3 = []
results_3_1 = TU2_temperature_loop(TU2_run_experiment, "TU2_1_1_3", temperature_list = temperature_list, n = N)
results_3.append(results_3_1)

100%|██████████| 10/10 [00:05<00:00,  1.84it/s]

In this run, a total of 10 requests were made using TU2_prompts[2].





- Configuration 4

In [22]:
results_4 = []
results_4_1 = TU2_temperature_loop(TU2_run_experiment, "TU2_1_1_4", temperature_list = temperature_list, n = N)
results_4.append(results_4_1)

100%|██████████| 10/10 [00:03<00:00,  2.54it/s]

In this run, a total of 10 requests were made using TU2_prompts[3].





- Configuration 5

In [23]:
results_5 = []
results_5_1 = TU2_temperature_loop(TU2_run_experiment, "TU2_1_2_1", temperature_list = temperature_list, n = N)
results_5.append(results_5_1)

100%|██████████| 10/10 [00:05<00:00,  1.79it/s]

In this run, a total of 10 requests were made using TU2_prompts[4].





- Configuration 6

In [24]:
results_6 = []
results_6_1 = TU2_temperature_loop(TU2_run_experiment, "TU2_1_2_2", temperature_list = temperature_list, n = N)
results_6.append(results_6_1)

100%|██████████| 10/10 [00:03<00:00,  2.50it/s]

In this run, a total of 10 requests were made using TU2_prompts[5].





- Configuration 7

In [25]:
results_7 = []
results_7_1 = TU2_temperature_loop(TU2_run_experiment, "TU2_1_2_3", temperature_list = temperature_list, n = N)
results_7.append(results_7_1)

100%|██████████| 10/10 [00:04<00:00,  2.33it/s]

In this run, a total of 10 requests were made using TU2_prompts[6].





- Configuration 8

In [26]:
results_8 = []
results_8_1 = TU2_temperature_loop(TU2_run_experiment, "TU2_1_2_4", temperature_list = temperature_list, n = N)
results_8.append(results_8_1)

100%|██████████| 10/10 [00:04<00:00,  2.04it/s]

In this run, a total of 10 requests were made using TU2_prompts[7].





---

### Model 2: GPT-4-1106-Preview

In [27]:
# Number of requests per temperature value
N = 2

- Configuration 1

In [28]:
results_1_2 = TU2_temperature_loop(TU2_run_experiment, "TU2_2_1_1", temperature_list = temperature_list, n = N)
results_1.append(results_1_2)

100%|██████████| 10/10 [00:08<00:00,  1.20it/s]

In this run, a total of 10 requests were made using TU2_prompts[0].





- Configuration 2

In [29]:
results_2_2 = TU2_temperature_loop(TU2_run_experiment, "TU2_2_1_2", temperature_list = temperature_list, n = N)
results_2.append(results_2_2)

100%|██████████| 10/10 [01:50<00:00, 11.06s/it]

In this run, a total of 10 requests were made using TU2_prompts[1].





- Configuration 3

In [30]:
results_3_2 = TU2_temperature_loop(TU2_run_experiment, "TU2_2_1_3", temperature_list = temperature_list, n = N)
results_3.append(results_3_2)

100%|██████████| 10/10 [00:07<00:00,  1.42it/s]

In this run, a total of 10 requests were made using TU2_prompts[2].





- Configuration 4

In [31]:
results_4_2 = TU2_temperature_loop(TU2_run_experiment, "TU2_2_1_4", temperature_list = temperature_list, n = N)
results_4.append(results_4_2)

100%|██████████| 10/10 [00:08<00:00,  1.18it/s]

In this run, a total of 10 requests were made using TU2_prompts[3].





- Configuration 5

In [32]:
results_5_2 = TU2_temperature_loop(TU2_run_experiment, "TU2_2_2_1", temperature_list = temperature_list, n = N)
results_5.append(results_5_2)

100%|██████████| 10/10 [00:10<00:00,  1.05s/it]

In this run, a total of 10 requests were made using TU2_prompts[4].





- Configuration 6

In [33]:
results_6_2 = TU2_temperature_loop(TU2_run_experiment, "TU2_2_2_2", temperature_list = temperature_list, n = N)
results_6.append(results_6_2)

100%|██████████| 10/10 [00:08<00:00,  1.19it/s]

In this run, a total of 10 requests were made using TU2_prompts[5].





- Configuration 7

In [34]:
results_7_2 = TU2_temperature_loop(TU2_run_experiment, "TU2_2_2_3", temperature_list = temperature_list, n = N)
results_7.append(results_7_2)

100%|██████████| 10/10 [00:09<00:00,  1.06it/s]

In this run, a total of 10 requests were made using TU2_prompts[6].





- Configuration 8

In [35]:
results_8_2 = TU2_temperature_loop(TU2_run_experiment, "TU2_2_2_4", temperature_list = temperature_list, n = N)
results_8.append(results_8_2)

100%|██████████| 10/10 [00:18<00:00,  1.82s/it]

In this run, a total of 10 requests were made using TU2_prompts[7].





---

### Model 3: LLama-2-70b

In [36]:
# Number of requests per temperature value
N = 2

- Configuration 1

In [37]:
results_1_3 = TU2_temperature_loop(TU2_run_experiment_llama, "TU2_3_1_1", temperature_list = temperature_list, n = N)
results_1.append(results_1_3)

100%|██████████| 10/10 [00:16<00:00,  1.67s/it]

In this run, a total of 10 requests were made using TU2_prompts[0].





- Configuration 2

In [38]:
results_2_3 = TU2_temperature_loop(TU2_run_experiment_llama, "TU2_3_1_2", temperature_list = temperature_list, n = N)
results_2.append(results_2_3)

100%|██████████| 10/10 [00:16<00:00,  1.61s/it]

In this run, a total of 10 requests were made using TU2_prompts[1].





- Configuration 3

In [39]:
results_3_3 = TU2_temperature_loop(TU2_run_experiment_llama, "TU2_3_1_3", temperature_list = temperature_list, n = N)
results_3.append(results_3_3)

100%|██████████| 10/10 [00:14<00:00,  1.48s/it]

In this run, a total of 10 requests were made using TU2_prompts[2].





- Configuration 4

In [40]:
results_4_3 = TU2_temperature_loop(TU2_run_experiment_llama, "TU2_3_1_4", temperature_list = temperature_list, n = N)
results_4.append(results_4_3)

100%|██████████| 10/10 [00:16<00:00,  1.69s/it]

In this run, a total of 10 requests were made using TU2_prompts[3].





- Configuration 5

In [41]:
results_5_3 = TU2_temperature_loop(TU2_run_experiment_llama, "TU2_3_2_1", temperature_list = temperature_list, n = N)
results_5.append(results_5_3)

100%|██████████| 10/10 [00:15<00:00,  1.54s/it]

In this run, a total of 10 requests were made using TU2_prompts[4].





- Configuration 6

In [42]:
results_6_3 = TU2_temperature_loop(TU2_run_experiment_llama, "TU2_3_2_2", temperature_list = temperature_list, n = N)
results_6.append(results_6_3)

100%|██████████| 10/10 [00:16<00:00,  1.61s/it]

In this run, a total of 10 requests were made using TU2_prompts[5].





- Configuration 7

In [43]:
results_7_3 = TU2_temperature_loop(TU2_run_experiment_llama, "TU2_3_2_3", temperature_list = temperature_list, n = N)
results_7.append(results_7_3)

100%|██████████| 10/10 [00:15<00:00,  1.56s/it]

In this run, a total of 10 requests were made using TU2_prompts[6].





- Configuration 8

In [44]:
results_8_3 = TU2_temperature_loop(TU2_run_experiment_llama, "TU2_3_2_4", temperature_list = temperature_list, n = N)
results_8.append(results_8_3)

100%|██████████| 10/10 [00:14<00:00,  1.50s/it]

In this run, a total of 10 requests were made using TU2_prompts[7].





---

#### Gather all results and save to .csv

In [45]:
# Concatenate results
results_1_df = pd.concat(results_1, axis = 1).transpose()
results_2_df = pd.concat(results_2, axis = 1).transpose()
results_3_df = pd.concat(results_3, axis = 1).transpose()
results_4_df = pd.concat(results_4, axis = 1).transpose()
results_5_df = pd.concat(results_5, axis = 1).transpose()
results_6_df = pd.concat(results_6, axis = 1).transpose()
results_7_df = pd.concat(results_7, axis = 1).transpose()
results_8_df = pd.concat(results_8, axis = 1).transpose()

# Concatenate all results
TU2_results = pd.concat([results_1_df, results_2_df, results_3_df, results_4_df, 
                         results_5_df, results_6_df, results_7_df, results_8_df], axis = 0)

# Rename LLama model
TU2_results['model'] = TU2_results['model'].replace('meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3', 
                                  'llama-2-70b')


# Capitalize first letter of column names
TU2_results.columns = TU2_results.columns.str.capitalize()

# Display results
TU2_results

Unnamed: 0,Experiment_id,Temperature,Model,Place,Income,Answers,Obs.,Configuration
0,TU2_1_1_1,0.01,gpt-3.5-turbo,hotel,0,"[$10, $10]",2,1
1,TU2_1_1_1,0.5,gpt-3.5-turbo,hotel,0,"[$10, $5]",2,1
2,TU2_1_1_1,1.0,gpt-3.5-turbo,hotel,0,"[$10, $5]",2,1
3,TU2_1_1_1,1.5,gpt-3.5-turbo,hotel,0,"[$5, $5]",2,1
4,TU2_1_1_1,2.0,gpt-3.5-turbo,hotel,0,"[$5, $8]",2,1
...,...,...,...,...,...,...,...,...
0,TU2_3_2_4,0.01,llama-2-70b,grocery,$120k,"[$8.50, $8.50]",2,8
1,TU2_3_2_4,0.5,llama-2-70b,grocery,$120k,"[$8.50, $8.50]",2,8
2,TU2_3_2_4,1.0,llama-2-70b,grocery,$120k,"[$7.50, $5.99]",2,8
3,TU2_3_2_4,1.5,llama-2-70b,grocery,$120k,"[$8.50, $7.50]",2,8


In [13]:
# Save results to .csv
# TU2_results.to_csv("Dashboard/src/data/Output/TU2_results.csv", index = False)

---

### Visualization of results

In [95]:
TU2_results = pd.read_csv('Dashboard/src/data/Output/TU2_results.csv')
TU2_results

Unnamed: 0,Experiment_id,Temperature,Model,Place,Income,Answers,Obs.,Configuration
0,TU2_1_1_1,0.01,gpt-3.5-turbo,hotel,0,"['$10', '$10', '$10', '$10', '$10', '$10', '$1...",100,1
1,TU2_1_1_1,0.50,gpt-3.5-turbo,hotel,0,"['$10', '$10', '$10', '$10', '$5', '$10', '$5'...",100,1
2,TU2_1_1_1,1.00,gpt-3.5-turbo,hotel,0,"['$5', '$10', '$5', '$10', '$5', '$10', '$10',...",100,1
3,TU2_1_1_1,1.50,gpt-3.5-turbo,hotel,0,"['$5', '$10', '$10', '$5', '$5', '$10', '$10',...",98,1
4,TU2_1_1_1,2.00,gpt-3.5-turbo,hotel,0,"['$10', '$10', '$5', '$5', '$10', '$5', '$10',...",84,1
...,...,...,...,...,...,...,...,...
115,TU2_3_2_4,0.01,llama-2-70b,grocery,$120k,"['$8.50', '$8.50', '$8.50', '$8.50', '$8.50', ...",50,8
116,TU2_3_2_4,0.50,llama-2-70b,grocery,$120k,"['$7.50', '$7.50', '$8.50', '$8.50', '$8.50', ...",50,8
117,TU2_3_2_4,1.00,llama-2-70b,grocery,$120k,"['$8.50', '$8.50', '$5.50', '$7.50', '$8.50', ...",50,8
118,TU2_3_2_4,1.50,llama-2-70b,grocery,$120k,"['$7.50', '$7.50', '$7.50', '$5.50', '$8.50', ...",50,8


In [4]:
def TU2_plot_results_old(df):
    
    # Transpose for plotting
    df = df.transpose()
    # Get temperature value 
    temperature = df.loc["Temperature"].iloc[0]
    # Get model name
    model = df.loc["Model"].iloc[0]
    # Get number of observations 
    n_observations = df.loc["Obs."].iloc[0] 
    # Get place
    place = df.loc["Place"].iloc[0]
    # Adjust name of place for plot title
    if place == "grocery":
        place = "grocery store"

    # Apply literal_eval to work with list of strings
    answers = df.loc["Answers"].apply(literal_eval).iloc[0]
    # Get stated WTP
    prices = extract_dollar_amounts(answers)
    # Convert to float
    prices = [float(price) for price in prices]
    # Get max, mean and median
    median = np.median(prices)
    mean = np.mean(prices)
    max = np.max(prices)

    # Adjust prices so that every value above 30 is set to 30, deals with outliers
    prices = [30.00 if price > 30 else price for price in prices]

    # Create the histogram using custom bins
    fig = go.Figure(data=[
    go.Bar(
        x = list(Counter(prices).keys()),
        y = list(Counter(prices).values()),
        name="Model answers",
        customdata=[n_observations] * len(prices),
        hovertemplate="Value: %{x}<br>Number of observations: %{y}<br>Number of total observations: %{customdata}<extra></extra>",
        marker_color="rgb(55, 83, 109)",
        width=0.4 ,  # Adjust the width of the bars if needed
    ),
    # Add vertical line for median
    go.Scatter(
        x = [median, median], #start and enf of x
        y = [0, Counter(prices).most_common(1)[0][1]], # count of most common price
        mode="lines",
        name="Median",
        line=dict(color="red", width=4, dash="dash"),
        hovertemplate = "Median: %{x}<extra></extra>",
),
    # Add vertical line for mean
    go.Scatter(
        x = [mean, mean], #start and enf of x
        y = [0, Counter(prices).most_common(1)[0][1]], # count of most common price
        mode="lines",
        name="Mean",
        line=dict(color="green", width=4, dash="dash"),
        hovertemplate = "Mean: %{x}<extra></extra>",
    )
])


    # Layout
    fig.update_layout(
    xaxis = dict(
        title = "Willingness to pay (USD)",
        titlefont_size = 18,
        tickfont_size = 16,
        tickformat=".2f",
    ),
    yaxis = dict(
        title = "Frequency",
        titlefont_size = 18,
        tickfont_size = 16,
    ),
    title = dict(
    text =  f"Distribution of {model}'s WTP for beer at the {place} for temperature {temperature}",
    x = 0.5, 
    y = 0.95,
    font_size = 18,
    ),
    legend=dict(
        x=1.01, 
        y=0.9,
        font=dict(family='Arial', size=12, color='black'),
        bordercolor='black',  
        borderwidth=2,  
    ),
    showlegend = True,
    width = 1000,
    margin=dict(t=60)
    )
    # Adjust x-axis labels to show 30+ to symbolize aggregation
    fig.update_xaxes(
    tickvals = sorted(fig.data[0].x),
    ticktext=["$30+" if tick_value == 30.0 else tick_value for tick_value in sorted(set(fig.data[0].x))],
)

    print(f"The maximum WTP stated by {model} for beer at the {place} for temperature {temperature} is ${max}.")
    # Show the plot
    return fig

In [79]:
df = TU2_results
model = "llama-2-70b"
temperature = 2
place = "hotel"
income = "0"

In [80]:
df = TU2_results[(TU2_results["Model"] == model) & (TU2_results["Temperature"] == temperature) & (TU2_results["Place"] == place) & (TU2_results["Income"] == income)]
df

Unnamed: 0,Experiment_id,Temperature,Model,Place,Income,Answers,Obs.,Configuration
14,TU2_3_1_1,2.0,llama-2-70b,hotel,0,"['$4.50', '$6.50', '$7.50', '$5.00', '$5.50', ...",50,1


In [52]:
def TU2_plot_results(df):


    # Transpose for plotting
    df = df.transpose()
    # Get model name
    model = df.loc["Model"].iloc[0]
    if model == "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3":
        model = "llama-2-70b"
    # Get temperature value 
    temperature = df.loc["Temperature"].iloc[0]
    # Get number of observations 
    n_observations = df.loc["Obs."].iloc[0] 
    # Get place
    place = df.loc["Place"].iloc[0]
    # Adjust name of place for plot title
    if place == "grocery":
        place = "Grocery store"
    if place == "hotel":
        place = "Hotel"
    # Get income
    income = df.loc["Income"].iloc[0]
    if income == "0":
        income = "No information"
    # Apply literal_eval to work with list of strings
    answers = df.loc["Answers"].apply(literal_eval).iloc[0]
    # Get stated WTP
    prices = extract_dollar_amounts(answers)
    sorted_prices = sorted(prices, key=lambda x: float(x))
    numeric_prices = [float(price) for price in prices]
    # Get mean and median
    mean = np.round(np.mean(numeric_prices),2).astype(str)
    median = np.round(np.median(numeric_prices),2).astype(str)
    # Get number of unique answers
    num_unique_answers = len(set(prices))
   

    fig = go.Figure(data = [
    go.Histogram(x = sorted_prices,
                    customdata = [n_observations] * num_unique_answers,
                    hovertemplate = "Price asked: $%{x} <br>Frequency: %{y}<br>Total answers: %{customdata}<br>Mean: $" + mean + "<br>Median: $" + median +"<extra></extra>",
                    marker_color = "rgb(55, 83, 109)",
                    name = "Place: " + place + "<br>Income: " + income,
),
    ])

    # Layout
    fig.update_layout(
        xaxis=dict(
            title="Price asked ($)",
            titlefont_size=18,
            tickfont_size=16,
            tickformat=".2f",
        ),
        yaxis=dict(
            title="Frequency",
            titlefont_size=18,
            tickfont_size=16,
        ),
        title=dict(
            text=f"Distribution of {model}'s WTP for temperature {temperature}",
            x=0.5,
            y=0.95,
            font_size=18,
        ),
        legend=dict(
            x=1.01,  
            y=0.9,
            font=dict(family='Arial', size=16, color='black'),
            bordercolor='black',  
            borderwidth=2,           
        ),
        showlegend=True,
        width=1000,
        margin=dict(t=60),
    )
    

    # Show the plot
    return fig

In [56]:
TU2_plot_results(test_results)