# Transaction Utility Theory

This notebook aims to recreate some of the empirical findings of Thaler, R. (1985). Mental accounting and consumer choice. Marketing Science, 4(3), 199-214. 
Specifically we are interested in whether LLMs' responses are similar to the original responses in **section 3** of the paper. 

---

In [1]:
from openai import OpenAI
import openai
import matplotlib.pyplot as plt
import os 
import numpy as np
import pandas as pd
from tqdm import tqdm
import replicate

In [35]:
# Get openAI API key (previously saved as environmental variable)
openai.api_key = os.environ["OPENAI_API_KEY"]

# Set client
client = OpenAI()

# Set global plot style
plt.style.use('seaborn-v0_8')

# Set plots to be displayed in notebook
%matplotlib inline

---

#### Setting up prompts for the experiment



- LLMs used in the experiment:
    - GPT-3.5-Turbo         (ID = 1)
    - GPT-4-1106-Preview    (ID = 2)
    - LLama-70b             (ID = 3)

We can differentiate between the following scenario combinations:

- Initial ticket price:
    - free                  (ID = 1)
    - $5 (as on ticket)     (ID = 2)
    - $10                   (ID = 3)
- Current market price:
    - $5                    (ID = 1)
    - $10                   (ID = 2)
- Selling to:
    - Friend                (ID = 1)
    - Stranger              (ID = 2)



Similar to the Prospect Theory and Decoy Effect notebooks, we will use experiment IDs to run the study. The IDs will be constructed as:

TU_model_initialprice_currentprice_buyer

Therefore, TU_2_2_1_2 would mean we used GPT-4-1106-Preview, an initial ticket price of $5, a current market price of $5 as well and we are selling to a stranger.

We leave out the information of the respondent being a student in all prompts. From experience, the more concise a prompt is, the better the answer quality. 
Since the job status/education level is not of interest here, we leave out this information in order to formulate clearer prompts. 

In [36]:
# Set up list of initial costs
initial_costs = ["but you were given your tickets for free by a friend.", "which is what you paid for each ticket.", "but you paid $10 each for your tickets when you bought them."]

# Set up list of current ticket prices
orientation_prices = ["$5", "$10"]

# Set up list of potential buyers in a scenario
potential_buyers = ["friend?", "stranger?"]

- Constructing the prompts

In [37]:
TU_prompts = []
for costs in initial_costs:
    for orientation_price in orientation_prices:
        for potential_buyer in potential_buyers:
            prompt = f"""Imagine that you are going to a soldout Cornell hockey playoff game, and you have an extra ticket to sell or give away. The price marked on the ticket is $5 {costs}
            You get to the game early to make sure you get rid of the ticket. An informal survey of people selling tickets indicates that the going price is {orientation_price}. 
            You find someone who wants the ticket and takes out his wallet to pay you. He asks, how much you want for the ticket. 
            Assume that there is now law against charging a price higher than that marked on the ticket. What price do you ask for, if he is a {potential_buyer}"""
            TU_prompts.append(prompt)

- prompts[0]: free, $5, friend  -> 1_1_1 (Configuration 1)
- prompts[1]: free, $5, stranger -> 1_1_2 (Configuration 2)
- prompts[2]: free, $10, friend -> 1_2_1 (Configuration 3)
- prompts[3]: free, $10, stranger -> 1_2_2 (Configuration 4)
- prompts[4]: $5, $5, friend -> 2_1_1 (Configuration 5)
- prompts[5]: $5, $5, stranger -> 2_1_2 (Configuration 6)
- prompts[6]: $5, $10, friend -> 2_2_1 (Configuration 7)
- prompts[7]: $5, $10, stranger -> 2_2_2 (Configuration 8)
- prompts[8]: $10, $5, friend -> 3_1_1  (Configuration 9)
- prompts[9]: $10, $5, stranger -> 3_1_2 (Configuration 10)
- prompts[10]: $10, $10, friend -> 3_2_1 (Configuration 11)
- prompts[11]: $10, $10, stranger -> 3_2_2 (Configuration 12)


- Setting up instructions the model should abide by

In [38]:
instructions = "Answer by only giving a single price in dollars and cents without an explanation."

---

#### Dictionaries to extract information about the different experiments

In [39]:
# Dictionary to look up prompt for a given experiment id. key: experiment id, value: prompt
TU_experiment_prompts_dict = {
    "TU_1_1_1_1": TU_prompts[0],
    "TU_1_1_1_2": TU_prompts[1],
    "TU_1_1_2_1": TU_prompts[2],
    "TU_1_1_2_2": TU_prompts[3],
    "TU_1_2_1_1": TU_prompts[4],
    "TU_1_2_1_2": TU_prompts[5],
    "TU_1_2_2_1": TU_prompts[6],
    "TU_1_2_2_2": TU_prompts[7],
    "TU_1_3_1_1": TU_prompts[8],
    "TU_1_3_1_2": TU_prompts[9],
    "TU_1_3_2_1": TU_prompts[10],
    "TU_1_3_2_2": TU_prompts[11],
    "TU_2_1_1_1": TU_prompts[0],
    "TU_2_1_1_2": TU_prompts[1],
    "TU_2_1_2_1": TU_prompts[2],
    "TU_2_1_2_2": TU_prompts[3],
    "TU_2_2_1_1": TU_prompts[4],
    "TU_2_2_1_2": TU_prompts[5],
    "TU_2_2_2_1": TU_prompts[6],
    "TU_2_2_2_2": TU_prompts[7],
    "TU_2_3_1_1": TU_prompts[8],
    "TU_2_3_1_2": TU_prompts[9],
    "TU_2_3_2_1": TU_prompts[10],
    "TU_2_3_2_2": TU_prompts[11],
    "TU_3_1_1_1": TU_prompts[0],
    "TU_3_1_1_2": TU_prompts[1],
    "TU_3_1_2_1": TU_prompts[2],
    "TU_3_1_2_2": TU_prompts[3],
    "TU_3_2_1_1": TU_prompts[4],
    "TU_3_2_1_2": TU_prompts[5],
    "TU_3_2_2_1": TU_prompts[6],
    "TU_3_2_2_2": TU_prompts[7],
    "TU_3_3_1_1": TU_prompts[8],
    "TU_3_3_1_2": TU_prompts[9],
    "TU_3_3_2_1": TU_prompts[10],
    "TU_3_3_2_2": TU_prompts[11],
}

# Dictionary to look up which model to use for a given experiment id. key: experiment id, value: model name
TU_model_dict = {
    "TU_1_1_1_1": "gpt-3.5-turbo",
    "TU_1_1_1_2": "gpt-3.5-turbo",
    "TU_1_1_2_1": "gpt-3.5-turbo",
    "TU_1_1_2_2": "gpt-3.5-turbo",
    "TU_1_2_1_1": "gpt-3.5-turbo",
    "TU_1_2_1_2": "gpt-3.5-turbo",
    "TU_1_2_2_1": "gpt-3.5-turbo",
    "TU_1_2_2_2": "gpt-3.5-turbo",
    "TU_1_3_1_1": "gpt-3.5-turbo",
    "TU_1_3_1_2": "gpt-3.5-turbo",
    "TU_1_3_2_1": "gpt-3.5-turbo",
    "TU_1_3_2_2": "gpt-3.5-turbo",
    "TU_2_1_1_1": "gpt-4-1106-preview",
    "TU_2_1_1_2": "gpt-4-1106-preview",
    "TU_2_1_2_1": "gpt-4-1106-preview",
    "TU_2_1_2_2": "gpt-4-1106-preview",
    "TU_2_2_1_1": "gpt-4-1106-preview",
    "TU_2_2_1_2": "gpt-4-1106-preview",
    "TU_2_2_2_1": "gpt-4-1106-preview",
    "TU_2_2_2_2": "gpt-4-1106-preview",
    "TU_2_3_1_1": "gpt-4-1106-preview",
    "TU_2_3_1_2": "gpt-4-1106-preview",
    "TU_2_3_2_1": "gpt-4-1106-preview",
    "TU_2_3_2_2": "gpt-4-1106-preview",
    "TU_3_1_1_1": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU_3_1_1_2": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU_3_1_2_1": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU_3_1_2_2": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU_3_2_1_1": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU_3_2_1_2": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU_3_2_2_1": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU_3_2_2_2": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU_3_3_1_1": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU_3_3_1_2": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU_3_3_2_1": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU_3_3_2_2": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
}

# Dictionary to look up what prompt was used for a given experiment id. key: experiment id, value: prompt variable name
TU_prompt_ids_dict = {
    "TU_1_1_1_1": "TU_prompts[0]",
    "TU_1_1_1_2": "TU_prompts[1]",
    "TU_1_1_2_1": "TU_prompts[2]",
    "TU_1_1_2_2": "TU_prompts[3]",
    "TU_1_2_1_1": "TU_prompts[4]",
    "TU_1_2_1_2": "TU_prompts[5]",
    "TU_1_2_2_1": "TU_prompts[6]",
    "TU_1_2_2_2": "TU_prompts[7]",
    "TU_1_3_1_1": "TU_prompts[8]",
    "TU_1_3_1_2": "TU_prompts[9]",
    "TU_1_3_2_1": "TU_prompts[10]",
    "TU_1_3_2_2": "TU_prompts[11]",
    "TU_2_1_1_1": "TU_prompts[0]",
    "TU_2_1_1_2": "TU_prompts[1]",
    "TU_2_1_2_1": "TU_prompts[2]",
    "TU_2_1_2_2": "TU_prompts[3]",
    "TU_2_2_1_1": "TU_prompts[4]",
    "TU_2_2_1_2": "TU_prompts[5]",
    "TU_2_2_2_1": "TU_prompts[6]",
    "TU_2_2_2_2": "TU_prompts[7]",
    "TU_2_3_1_1": "TU_prompts[8]",
    "TU_2_3_1_2": "TU_prompts[9]",
    "TU_2_3_2_1": "TU_prompts[10]",
    "TU_2_3_2_2": "TU_prompts[11]",
    "TU_3_1_1_1": "TU_prompts[0]",
    "TU_3_1_1_2": "TU_prompts[1]",
    "TU_3_1_2_1": "TU_prompts[2]",
    "TU_3_1_2_2": "TU_prompts[3]",
    "TU_3_2_1_1": "TU_prompts[4]",
    "TU_3_2_1_2": "TU_prompts[5]",
    "TU_3_2_2_1": "TU_prompts[6]",
    "TU_3_2_2_2": "TU_prompts[7]",
    "TU_3_3_1_1": "TU_prompts[8]",
    "TU_3_3_1_2": "TU_prompts[9]",
    "TU_3_3_2_1": "TU_prompts[10]",
    "TU_3_3_2_2": "TU_prompts[11]",
    }

# Dictionary to look up initital ticket cotsts for a given experiment id. key: experiment id, value: initial costs
TU_initial_costs_dict = {
    "TU_1_1_1_1": 0,
    "TU_1_1_1_2": 0,
    "TU_1_1_2_1": 0,
    "TU_1_1_2_2": 0,
    "TU_1_2_1_1": 5,
    "TU_1_2_1_2": 5,
    "TU_1_2_2_1": 5,
    "TU_1_2_2_2": 5,
    "TU_1_3_1_1": 10,
    "TU_1_3_1_2": 10,
    "TU_1_3_2_1": 10,
    "TU_1_3_2_2": 10,
    "TU_2_1_1_1": 0,
    "TU_2_1_1_2": 0,
    "TU_2_1_2_1": 0,
    "TU_2_1_2_2": 0,
    "TU_2_2_1_1": 5,
    "TU_2_2_1_2": 5,
    "TU_2_2_2_1": 5,
    "TU_2_2_2_2": 5,
    "TU_2_3_1_1": 10,
    "TU_2_3_1_2": 10,
    "TU_2_3_2_1": 10,
    "TU_2_3_2_2": 10,
    "TU_3_1_1_1": 0,
    "TU_3_1_1_2": 0,
    "TU_3_1_2_1": 0,
    "TU_3_1_2_2": 0,
    "TU_3_2_1_1": 5,
    "TU_3_2_1_2": 5,
    "TU_3_2_2_1": 5,
    "TU_3_2_2_2": 5,
    "TU_3_3_1_1": 10,
    "TU_3_3_1_2": 10,
    "TU_3_3_2_1": 10,
    "TU_3_3_2_2": 10,
    }

# Dictionary to look up orientation prices for a given experiment id. key: experiment id, value: orientation price
TU_orientation_prices_dict = {
    "TU_1_1_1_1": 5,
    "TU_1_1_1_2": 5,
    "TU_1_1_2_1": 10,
    "TU_1_1_2_2": 10,
    "TU_1_2_1_1": 5,
    "TU_1_2_1_2": 5,
    "TU_1_2_2_1": 10,
    "TU_1_2_2_2": 10,
    "TU_1_3_1_1": 5,
    "TU_1_3_1_2": 5,
    "TU_1_3_2_1": 10,
    "TU_1_3_2_2": 10,
    "TU_2_1_1_1": 5,
    "TU_2_1_1_2": 5,
    "TU_2_1_2_1": 10,
    "TU_2_1_2_2": 10,
    "TU_2_2_1_1": 5,
    "TU_2_2_1_2": 5,
    "TU_2_2_2_1": 10,
    "TU_2_2_2_2": 10,
    "TU_2_3_1_1": 5,
    "TU_2_3_1_2": 5,
    "TU_2_3_2_1": 10,
    "TU_2_3_2_2": 10,
    "TU_3_1_1_1": 5,
    "TU_3_1_1_2": 5,
    "TU_3_1_2_1": 10,
    "TU_3_1_2_2": 10,
    "TU_3_2_1_1": 5,
    "TU_3_2_1_2": 5,
    "TU_3_2_2_1": 10,
    "TU_3_2_2_2": 10,
    "TU_3_3_1_1": 5,
    "TU_3_3_1_2": 5,
    "TU_3_3_2_1": 10,
    "TU_3_3_2_2": 10,
    }   

# Dictionary to look up potential buyers for a given experiment id. key: experiment id, value: potential buyer
TU_buyers_dict = {
    "TU_1_1_1_1": "friend",
    "TU_1_1_1_2": "stranger",
    "TU_1_1_2_1": "friend",
    "TU_1_1_2_2": "stranger",
    "TU_1_2_1_1": "friend",
    "TU_1_2_1_2": "stranger",
    "TU_1_2_2_1": "friend",
    "TU_1_2_2_2": "stranger",
    "TU_1_3_1_1": "friend",
    "TU_1_3_1_2": "stranger",
    "TU_1_3_2_1": "friend",
    "TU_1_3_2_2": "stranger",
    "TU_2_1_1_1": "friend",
    "TU_2_1_1_2": "stranger",
    "TU_2_1_2_1": "friend",
    "TU_2_1_2_2": "stranger",
    "TU_2_2_1_1": "friend",
    "TU_2_2_1_2": "stranger",
    "TU_2_2_2_1": "friend",
    "TU_2_2_2_2": "stranger",
    "TU_2_3_1_1": "friend",
    "TU_2_3_1_2": "stranger",
    "TU_2_3_2_1": "friend",
    "TU_2_3_2_2": "stranger",
    "TU_3_1_1_1": "friend",
    "TU_3_1_1_2": "stranger",
    "TU_3_1_2_1": "friend",
    "TU_3_1_2_2": "stranger",
    "TU_3_2_1_1": "friend",
    "TU_3_2_1_2": "stranger",
    "TU_3_2_2_1": "friend",
    "TU_3_2_2_2": "stranger",
}

---

#### Setting up functions to repeatedly prompt the LLMs

- Helper function to extract dollar amount of given answers

In [40]:
# Function to extract the dollar amount of the answer from LLMs
def extract_dollar_amounts(answers):
    # Only return values that start with "$"
    valid_prices = [item for item in answers if item.startswith("$")]
    # Delete the "$" from the beginning of each price
    prices = [item.replace('$', '') for item in valid_prices]
    return prices

- Functions to query 1 prompt n times for OpenAI Models

In [41]:
def TU_run_experiment(experiment_id, n, progress_bar, temperature):
    
    answers = []
    for _ in range(n): 
        response = client.chat.completions.create(
            model = TU_model_dict[experiment_id], 
            max_tokens = 2,
            temperature = temperature, # range is 0 to 2
            messages = [
            {"role": "system", "content": "Answer by only giving a single price in dollars and cents without an explanation."},        
            {"role": "user", "content": 
             f"{TU_experiment_prompts_dict[experiment_id]} Answer by only giving a single price in dollars and cents without an explanation."}
                   ])

        # Store the answer in the list
        answer = response.choices[0].message.content
        answers.append(answer.strip())
        # Update progress bar (given from either temperature loop, or set locally)
        progress_bar.update(1)

    # Extract valid prices from answers
    valid_prices = extract_dollar_amounts(answers)

    # Compute number of valid answers
    n_observations = len(valid_prices)


    # Collect results 
    results = [experiment_id, temperature, TU_model_dict[experiment_id], TU_initial_costs_dict[experiment_id], TU_orientation_prices_dict[experiment_id], TU_buyers_dict[experiment_id], answers, n_observations]
    #results = pd.DataFrame(results).set_index(pd.Index(["experiment_id", "temperature", "model", "initial_cost", "orientation_price", "buyer", "answers", "Obs."]))
    # Give out results
    return results

- Adjusted function for dashboard  (returns dataframe right away)

- Functions to query 1 prompt n times (LLama)

In [68]:
def TU_run_experiment_llama(experiment_id, n, progress_bar, temperature):
    answers = []
    for _ in range(n):
        response = replicate.run(
            TU_model_dict[experiment_id],
            input = {
                "system_prompt":  "Answer by only giving a single price in dollars and cents.",
                "temperature": temperature,
                "max_new_tokens": 3, 
                "prompt": f"{TU_experiment_prompts_dict[experiment_id]} Answer by only giving a single price in dollars and cents without an explanation."
            }
        )
        # Grab answer and append to list
        answer = "" # Set to empty string, otherwise it would append the previous answer to the new one
        for item in response:
            answer = answer + item
        answers.append(answer.strip())

        # Update progress bar
        progress_bar.update(1)

    
    # Extract valid prices from answers
    valid_prices = extract_dollar_amounts(answers)

    # Compute number of valid answers
    n_observations = len(valid_prices)

    # Collect results 
    results = [experiment_id, temperature, TU_model_dict[experiment_id], TU_initial_costs_dict[experiment_id], 
               TU_orientation_prices_dict[experiment_id], TU_buyers_dict[experiment_id], answers, n_observations]
    #results = pd.DataFrame(results).set_index(pd.Index(["experiment_id", "temperature", "model", "initial_cost", "orientation_price", "buyer", "answers", "Obs."]))
    # Give out results
    return results

- Adjusted function for dashboard (returns dataframe right away)

In [69]:
results = TU_run_experiment_llama("TU_3_1_1_1", 10, tqdm(total=10), 0.5)

100%|██████████| 10/10 [00:12<00:00,  1.23s/it]


- Function to loop run_experiment() over a list of temperature values

In [70]:
results

['TU_3_1_1_1',
 0.5,
 'meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3',
 0,
 5,
 'friend',
 ['$1', '$1', '$1', '$1', '$1', '$1', '$1', '$1', '$1', '$1'],
 10]

In [43]:
def TU_temperature_loop(function, experiment_id, temperature_list = [0.5, 1, 1.5], n = 50):
    """
    Function to run an experiment with different temperature values.
    
    Args:
        function (function): Function to be used for querying ChatGPT i.e. run_experiment()
        experiment_id (str): ID of th e experiment to be run. Contains info about prompt and model
        temperature_list (list): List of temperature values to be looped over
        n: Number of requests for each prompt per temperature value
        max_tokens: Maximum number of tokens in response object
        
    Returns:
        results_df: Dataframe with experiment results
        probs_df: Dataframe with answer probabilities
    """    
    # Empty list for storing results
    results_list = []

    # Initialize progress bar -> used as input for run_experiment()
    progress_bar = tqdm(range(n*len(temperature_list)))

    # Loop over different temperature values, calling the input function n times each (i.e. queriyng ChatGPT n times)
    for temperature in temperature_list:
        results = function(experiment_id = experiment_id, n = n, temperature = temperature, progress_bar = progress_bar) 
        results_list.append(results)
       

    # Horizontally concatenate the results, transpose, and set index
    results_df = pd.DataFrame(results_list).transpose().set_index(pd.Index(["experiment_id", "temperature", "model", "initial_cost", "orientation_price", "buyer", "answers", "Obs."]))
  
   
    # Return some information about the experiment as a check
    check = f"In this run, a total of {n*len(temperature_list)} requests were made using {TU_prompt_ids_dict[experiment_id]}."
    # Print information about the experiment
    print(check)
 

    return results_df

---

### Model 1: GPT-3.5-Turbo

In [57]:
# For GPT-3.5-turbo we make 100 requests per prompt & temperature value
# Also, we will focus on lower temperature values to get more consise answers. For higher temperature values, the 
# answers almost always contain the same information, but come with an unnecessary explanation (e.g. "I would ask for $10, because that is the price that everyone else is asking for.")
N = 200

- Configuration 1

In [None]:
results_1 = TU_temperature_loop(TU_run_experiment, "TU_1_1_1_1", temperature_list = [0.5, 1, 1.5], n = N)

In [59]:
test = results_1.loc["answers"]

In [None]:
plt.hist(results_1.loc["answers"], bins = 20)

- Configuration 2

In [None]:
results_2 = TU_temperature_loop(TU_run_experiment, "TU_1_1_1_2", temperature_list = [0.5, 1, 1.5], n = N)

- Configuration 3

In [None]:
results_3 = TU_temperature_loop(TU_run_experiment, "TU_1_1_2_1", temperature_list = [0.5, 1, 1.5], n = N)

- Configuration 4

In [None]:
results_4 = TU_temperature_loop(TU_run_experiment, "TU_1_1_2_2", temperature_list = [0.5, 1, 1.5], n = N)

- Configuration 5

In [None]:
results_5 = TU_temperature_loop(TU_run_experiment, "TU_1_2_1_1", temperature_list = [0.5, 1, 1.5], n = N)

- Configuration 6

In [None]:
results_6 = TU_temperature_loop(TU_run_experiment, "TU_1_2_1_2", temperature_list = [0.5, 1, 1.5], n = N)

- Configuration 7

In [None]:
results_7 = TU_temperature_loop(TU_run_experiment, "TU_1_2_2_1", temperature_list = [0.5, 1, 1.5], n = N)

- Configuration 8

In [None]:
results_8 = TU_temperature_loop(TU_run_experiment, "TU_1_2_2_2", temperature_list = [0.5, 1, 1.5], n = N)

- Configuration 9

In [None]:
results_9 = TU_temperature_loop(TU_run_experiment, "TU_1_3_1_1", temperature_list = [0.5, 1, 1.5], n = N)

- Configuration 10

In [None]:
results_10 = TU_temperature_loop(TU_run_experiment, "TU_1_3_1_2", temperature_list = [0.5, 1, 1.5], n = N)

- Configuration 11

In [None]:
results_11 = TU_temperature_loop(TU_run_experiment, "TU_1_3_2_1", temperature_list = [0.5, 1, 1.5], n = N)

- Configuration 12

In [None]:
results_12 = TU_temperature_loop(TU_run_experiment, "TU_1_3_2_2", temperature_list = [0.5, 1, 1.5], n = N)

---

### Model 2: GPT-4-1106-Preview

In [None]:
N = 50

- Configuration 1

In [None]:
results_1_2 = TU_temperature_loop(TU_run_experiment, "TU_2_1_1_1", temperature_list = [0.5, 1, 1.5], n = N)
results_1.append(results_1_2)

- Configuration 2

In [None]:
results_2_2 = TU_temperature_loop(TU_run_experiment, "TU_2_1_1_2", temperature_list = [0.5, 1, 1.5], n = N)
results_2.append(results_2_2)

- Configuration 3

In [None]:
results_3_2 = TU_temperature_loop(TU_run_experiment, "TU_2_1_2_1", temperature_list = [0.5, 1, 1.5], n = N)
results_3.append(results_3_2)

- Configuration 4

In [None]:
results_4_2 = TU_temperature_loop(TU_run_experiment, "TU_2_1_2_2", temperature_list = [0.5, 1, 1.5], n = N)
results_4.append(results_4_2)

- Configuration 5

In [None]:
results_5_2 = TU_temperature_loop(TU_run_experiment, "TU_2_2_1_1", temperature_list = [0.5, 1, 1.5], n = N)
results_5.append(results_5_2)

- Configuration 6

In [None]:
results_6_2 = TU_temperature_loop(TU_run_experiment, "TU_2_2_1_2", temperature_list = [0.5, 1, 1.5], n = N)
results_6.append(results_6_2)

- Configuration 7

In [None]:
results_7_2 = TU_temperature_loop(TU_run_experiment, "TU_2_2_2_1", temperature_list = [0.5, 1, 1.5], n = N)
results_7.append(results_7_2)

- Configuration 8

In [None]:
results_8_2 = TU_temperature_loop(TU_run_experiment, "TU_2_2_2_2", temperature_list = [0.5, 1, 1.5], n = N)
results_8.append(results_8_2)

- Configuration 9

In [None]:
results_9_2 = TU_temperature_loop(TU_run_experiment, "TU_2_3_1_1", temperature_list = [0.5, 1, 1.5], n = N)
results_9.append(results_9_2)

- Configuration 10

In [None]:
results_10_2 = TU_temperature_loop(TU_run_experiment, "TU_2_3_1_2", temperature_list = [0.5, 1, 1.5], n = N)

- Configuration 11

In [None]:
results_11_2 = TU_temperature_loop(TU_run_experiment, "TU_2_3_2_1", temperature_list = [0.5, 1, 1.5], n = N)
results_11.append(results_11_2)

- Configuration 12

In [None]:
results_12_2 = TU_temperature_loop(TU_run_experiment, "TU_2_3_2_2", temperature_list = [0.5, 1, 1.5], n = N)
results_12.append(results_12_2)

---

### Model 3: LLama-2-70b

In [63]:
N = 50

- Configuration 1

In [None]:
results_1_3 = TU_temperature_loop(TU_run_experiment_llama, "TU_3_1_1_1", temperature_list = [0.5, 1, 1.5], n = N)
results_1.append(results_1_3)

- Configuration 2

In [None]:
results_2_3 = TU_temperature_loop(TU_run_experiment_llama, "TU_3_1_1_2", temperature_list = [0.5, 1, 1.5], n = N)
results_2.append(results_2_3)

- Configuration 3

In [None]:
results_3_3 = TU_temperature_loop(TU_run_experiment_llama, "TU_3_1_2_1", temperature_list = [0.5, 1, 1.5], n = N)
results_3.append(results_3_3)   

- Configuration 4

In [None]:
results_4_3 = TU_temperature_loop(TU_run_experiment_llama, "TU_3_1_2_2", temperature_list = [0.5, 1, 1.5], n = N)
results_4.append(results_4_3)

- Configuration 5

In [None]:
results_5_3 = TU_temperature_loop(TU_run_experiment_llama, "TU_3_2_1_1", temperature_list = [0.5, 1, 1.5], n = N)
results_5.append(results_5_3)

- Configuration 6

In [None]:
results_6_3 = TU_temperature_loop(TU_run_experiment_llama, "TU_3_2_1_2", temperature_list = [0.5, 1, 1.5], n = N)
results_6.append(results_6_3)

- Configuration 7

In [None]:
results_7_3 = TU_temperature_loop(TU_run_experiment_llama, "TU_3_2_2_1", temperature_list = [0.5, 1, 1.5], n = N)
results_7.append(results_7_3)

- Configuration 8

In [None]:
results_8_3 = TU_temperature_loop(TU_run_experiment_llama, "TU_3_2_2_2", temperature_list = [0.5, 1, 1.5], n = N)
results_8.append(results_8_3)

- Configuration 9

In [None]:
results_9_3 = TU_temperature_loop(TU_run_experiment_llama, "TU_3_3_1_1", temperature_list = [0.5, 1, 1.5], n = N)
results_9.append(results_9_3)

- Configuration 10

In [None]:
results_10_3 = TU_temperature_loop(TU_run_experiment_llama, "TU_3_3_1_2", temperature_list = [0.5, 1, 1.5], n = N)
results_10.append(results_10_3)

- Configuration 11

In [None]:
results_11_3 = TU_temperature_loop(TU_run_experiment_llama, "TU_3_3_2_1", temperature_list = [0.5, 1, 1.5], n = N)
results_11.append(results_11_3)

- Configuration 12

In [None]:
results_12_3 = TU_temperature_loop(TU_run_experiment_llama, "TU_3_3_2_2", temperature_list = [0.5, 1, 1.5], n = N)
results_12.append(results_12_3)

----

#### Gather all results and save to .csv

In [None]:
# Not yet sure if this works or has to be transposed etc..

TU_results = pd.concat([results_1, results_2, results_3, results_4, results_5, results_6, results_7, results_8, results_9, results_10, results_11, results_12], axis = 1)