# Transaction Utility Theory 2

This notebook aims to recreate some of the empirical findings of Thaler, R. (1985). Mental accounting and consumer choice. Marketing Science, 4(3), 199-214. 
Specifically we are interested in whether LLMs' responses are similar to the original responses in **section 3** of the paper, regarding the Beer question.

However, we will extend this experiment with the notion of income sensitivity. The various income values are inspired by Brand, James and Israeli, Ayelet and Ngwe, Donald, Using GPT for Market Research (March 21, 2023). Harvard Business School Marketing Unit Working Paper No. 23-062, Available at SSRN: https://ssrn.com/abstract=4395751 or http://dx.doi.org/10.2139/ssrn.4395751. Here, they research the willingness to pay for various goods, prompting the LLM with a stated annual income of $50k, $70k, and $120k.  

---

In [42]:
from openai import OpenAI
import openai
import matplotlib.pyplot as plt
import os 
import numpy as np
import pandas as pd
from tqdm import tqdm
import replicate
from ast import literal_eval

In [43]:
# Get openAI API key (previously saved as environmental variable)
openai.api_key = os.environ["OPENAI_API_KEY"]

# Set client
client = OpenAI()

# Set global plot style
plt.style.use('seaborn-v0_8')

# Set plots to be displayed in notebook
%matplotlib inline

---

#### Setting up prompts for the experiment

- LLMs used in the experiment:
    - GPT-3.5-Turbo         (ID = 1)
    - GPT-4-1106-Preview    (ID = 2)
    - LLama-70b             (ID = 3)

We can differentiate between the following scenario combinations:

- Place of purchase:
    - A fancy resort hotel              (ID = 1)
    - A small, run-down grocery store   (ID = 2)

- Level of annual income:
    - No information about income       (ID = 1)
    - $50.000                           (ID = 2)
    - $70.000                           (ID = 3)
    - $120.000                          (ID = 4)

Similar to the Prospect Theory and Decoy Effect notebooks, we will use experiment IDs to run the study. The IDs will be constructed as:

*TU2_model__placeofpurchase_income*

Therefore, TU2_3_2_1 would mean we used LLama-70b, the beer is to be purchased at a fancy resort hotel and no information about the annual income is given in the prompt.

**Median answers: $2.65 for hotel and $1.50 for store**

In [44]:
# Set up list of place of purchase
place = ["a fancy resort hotel.", "a small, run-down grocery store."]

# Set up list of annual income levels
incomes = ["", "You have an annual income of $50k.", "You have an annual income of $70k.", "You have an annual income of $120k."]

In [45]:
# Prompts for resort hotel
hotel_prompts = []
for income in incomes:
    prompt = f"""You are lying on the beach on a hot day. All you have to drink is ice water. For the last hour you have been thinking about how much you would enjoy a nice cold bottle 
    of beer. A companion gets up to go make a phone call and offers o bring back a beer from the only nearby place where beer is sold, a fancy resort hotel. He says that the beer might
    be expensive and so aks how much you are willing to pay for the beer. He says that he will buy the beer if it costs as much or less than the price you state. But if it costs more 
    than the price you state he will not buy it. You trust your friend, and there is no possibility of bargaining with the bartender. {income} What price do you tell him?"""
    hotel_prompts.append(prompt)

In [46]:
# Prompts for grocery store 
store_prompts = []
for income in incomes:
    prompt = f"""You are lying on the beach on a hot day. All you have to drink is ice water. For the last hour you have been thinking about how much you would enjoy a nice cold bottle 
    of beer. A companion gets up to go make a phone call and offers o bring back a beer from the only nearby place where beer is sold, a small, run-down grocery store. He says that
    the beer might be expensive and so aks how much you are willing to pay for the beer. He says that he will buy the beer if it costs as much or less than the price you state. 
    But if it costs more than the price you state he will not buy it. You trust your friend, and there is no possibility of bargaining with the store owner. {income} What price do you tell him?"""
    store_prompts.append(prompt)

In [47]:
# Combine into one list
TU2_prompts = hotel_prompts + store_prompts

- TU2_prompts[0]: fancy resort hotel, no info about income (Configuration 1)
- TU2_prompts[1]: fancy resort hotel, $50k annual income (Configuration 2)
- TU2_prompts[2]: fancy resort hotel, $70k annual income (Configuration 3)
- TU2_prompts[3]: fancy resort hotel, $120k annual income (Configuration 4)
- TU2_prompts[4]: store, no info about income (Configuration 5)
- TU2_prompts[5]: store, $50k annual income (Configuration 6)
- TU2_prompts[6]: store, $70k annual income (Configuration 7)
- TU2_prompts[7]: store, $120k annual income (Configuration 8)

In [48]:
# Gather experiment ids
TU2_experiment_ids = ["TU2_1_1_1", "TU2_1_1_2", "TU2_1_1_3", "TU2_1_1_4", "TU2_1_2_1", "TU2_1_2_2", "TU2_1_2_3", "TU2_1_2_4",
                      "TU2_2_1_1", "TU2_2_1_2", "TU2_2_1_3", "TU2_2_1_4", "TU2_2_2_1", "TU2_2_2_2", "TU2_2_2_3", "TU2_2_2_4",
                      "TU2_3_1_1", "TU2_3_1_2", "TU2_3_1_3", "TU2_3_1_4", "TU2_3_2_1", "TU2_3_2_2", "TU2_3_2_3", "TU2_3_2_4"]

- Setting up instructions the model should abide by

In [49]:
instructions = "Answer by only giving a single price in dollars and cents without an explanation."

---

### Dictionaries to store and extract information about the various experiments

In [50]:
# Dictionary to look up prompt for a given experiment id. key: experiment id, value: prompt
TU2_experiment_prompts_dict = {
    "TU2_1_1_1": TU2_prompts[0],
    "TU2_1_1_2": TU2_prompts[1],
    "TU2_1_1_3": TU2_prompts[2],
    "TU2_1_1_4": TU2_prompts[3],
    "TU2_1_2_1": TU2_prompts[4],
    "TU2_1_2_2": TU2_prompts[5],
    "TU2_1_2_3": TU2_prompts[6],
    "TU2_1_2_4": TU2_prompts[7],
    "TU2_2_1_1": TU2_prompts[0],
    "TU2_2_1_2": TU2_prompts[1],
    "TU2_2_1_3": TU2_prompts[2],
    "TU2_2_1_4": TU2_prompts[3],
    "TU2_2_2_1": TU2_prompts[4],
    "TU2_2_2_2": TU2_prompts[5],
    "TU2_2_2_3": TU2_prompts[6],
    "TU2_2_2_4": TU2_prompts[7],
    "TU2_3_1_1": TU2_prompts[0],
    "TU2_3_1_2": TU2_prompts[1],
    "TU2_3_1_3": TU2_prompts[2],
    "TU2_3_1_4": TU2_prompts[3],
    "TU2_3_2_1": TU2_prompts[4],
    "TU2_3_2_2": TU2_prompts[5],
    "TU2_3_2_3": TU2_prompts[6],
    "TU2_3_2_4": TU2_prompts[7]
}

# Dictionary to look up which model to use for a given experiment id. key: experiment id, value: model name
TU2_model_dict = {
    "TU2_1_1_1": "gpt-3.5-turbo",
    "TU2_1_1_2": "gpt-3.5-turbo",
    "TU2_1_1_3": "gpt-3.5-turbo",
    "TU2_1_1_4": "gpt-3.5-turbo",
    "TU2_1_2_1": "gpt-3.5-turbo",
    "TU2_1_2_2": "gpt-3.5-turbo",
    "TU2_1_2_3": "gpt-3.5-turbo",
    "TU2_1_2_4": "gpt-3.5-turbo",
    "TU2_2_1_1": "gpt-4-1106-preview",
    "TU2_2_1_2": "gpt-4-1106-preview",
    "TU2_2_1_3": "gpt-4-1106-preview",
    "TU2_2_1_4": "gpt-4-1106-preview",
    "TU2_2_2_1": "gpt-4-1106-preview",
    "TU2_2_2_2": "gpt-4-1106-preview",
    "TU2_2_2_3": "gpt-4-1106-preview",
    "TU2_2_2_4": "gpt-4-1106-preview",
    "TU2_3_1_1": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU2_3_1_2": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU2_3_1_3": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU2_3_1_4": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU2_3_2_1": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU2_3_2_2": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU2_3_2_3": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    "TU2_3_2_4": "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3"
}

# Dictionary to look up what prompt variable was used in a given experiment. key: experiment id, value: prompt variable
TU2_prompt_ids_dict = {
    "TU2_1_1_1": "TU2_prompts[0]",
    "TU2_1_1_2": "TU2_prompts[1]",
    "TU2_1_1_3": "TU2_prompts[2]",
    "TU2_1_1_4": "TU2_prompts[3]",
    "TU2_1_2_1": "TU2_prompts[4]",
    "TU2_1_2_2": "TU2_prompts[5]",
    "TU2_1_2_3": "TU2_prompts[6]",
    "TU2_1_2_4": "TU2_prompts[7]",
    "TU2_2_1_1": "TU2_prompts[0]",
    "TU2_2_1_2": "TU2_prompts[1]",
    "TU2_2_1_3": "TU2_prompts[2]",
    "TU2_2_1_4": "TU2_prompts[3]",
    "TU2_2_2_1": "TU2_prompts[4]",
    "TU2_2_2_2": "TU2_prompts[5]",
    "TU2_2_2_3": "TU2_prompts[6]",
    "TU2_2_2_4": "TU2_prompts[7]",
    "TU2_3_1_1": "TU2_prompts[0]",
    "TU2_3_1_2": "TU2_prompts[1]",
    "TU2_3_1_3": "TU2_prompts[2]",
    "TU2_3_1_4": "TU2_prompts[3]",
    "TU2_3_2_1": "TU2_prompts[4]",
    "TU2_3_2_2": "TU2_prompts[5]",
    "TU2_3_2_3": "TU2_prompts[6]",
    "TU2_3_2_4": "TU2_prompts[7]"
}

# Dictionary to look up what place variable was used in a given experiment. key: experiment id, value: place variable
TU2_places_dict = {
    "TU2_1_1_1": "hotel",
    "TU2_1_1_2": "hotel",
    "TU2_1_1_3": "hotel",
    "TU2_1_1_4": "hotel",
    "TU2_1_2_1": "grocery",
    "TU2_1_2_2": "grocery",
    "TU2_1_2_3": "grocery",
    "TU2_1_2_4": "grocery",
    "TU2_2_1_1": "hotel",
    "TU2_2_1_2": "hotel",
    "TU2_2_1_3": "hotel",
    "TU2_2_1_4": "hotel",
    "TU2_2_2_1": "grocery",
    "TU2_2_2_2": "grocery",
    "TU2_2_2_3": "grocery",
    "TU2_2_2_4": "grocery",
    "TU2_3_1_1": "hotel",
    "TU2_3_1_2": "hotel",
    "TU2_3_1_3": "hotel",
    "TU2_3_1_4": "hotel",
    "TU2_3_2_1": "grocery",
    "TU2_3_2_2": "grocery",
    "TU2_3_2_3": "grocery",
    "TU2_3_2_4": "grocery"
}   

# Dictionary to look up what income variable was used in a given experiment. key: experiment id, value: income variable
TU2_income_dict = {
    "TU2_1_1_1": "0",
    "TU2_1_1_2": "$50k",
    "TU2_1_1_3": "$70k",
    "TU2_1_1_4": "$120k",
    "TU2_1_2_1": "0",
    "TU2_1_2_2": "$50k",
    "TU2_1_2_3": "$70k",
    "TU2_1_2_4": "$120k",
    "TU2_2_1_1": "0",
    "TU2_2_1_2": "$50k",
    "TU2_2_1_3": "$70k",
    "TU2_2_1_4": "$120k",
    "TU2_2_2_1": "0",
    "TU2_2_2_2": "$50k",
    "TU2_2_2_3": "$70k",
    "TU2_2_2_4": "$120k",
    "TU2_3_1_1": "0",
    "TU2_3_1_2": "$50k",
    "TU2_3_1_3": "$70k",
    "TU2_3_1_4": "$120k",
    "TU2_3_2_1": "0",
    "TU2_3_2_2": "$50k",
    "TU2_3_2_3": "$70k",
    "TU2_3_2_4": "$120k"
}

# Dictionary to look up what configuration was used in a given experiment. key: experiment id, value: configuration
TU2_configuration_dict = {
    "TU2_1_1_1": 1,
    "TU2_1_1_2": 2,
    "TU2_1_1_3": 3,
    "TU2_1_1_4": 4,
    "TU2_1_2_1": 5,
    "TU2_1_2_2": 6,
    "TU2_1_2_3": 7,
    "TU2_1_2_4": 8,
    "TU2_2_1_1": 1,
    "TU2_2_1_2": 2,
    "TU2_2_1_3": 3,
    "TU2_2_1_4": 4,
    "TU2_2_2_1": 5,
    "TU2_2_2_2": 6,
    "TU2_2_2_3": 7,
    "TU2_2_2_4": 8,
    "TU2_3_1_1": 1,
    "TU2_3_1_2": 2,
    "TU2_3_1_3": 3,
    "TU2_3_1_4": 4,
    "TU2_3_2_1": 5,
    "TU2_3_2_2": 6,
    "TU2_3_2_3": 7,
    "TU2_3_2_4": 8 
}

---

#### Setting up functions to repeatedly prompt the LLMs

- Helper function to extract dollar amount of given answers

In [51]:
# Function to extract the dollar amount of the answer from LLMs
def extract_dollar_amounts(answers):
    # Only return values that start with "$"
    valid_prices = [item for item in answers if item.startswith("$")]
    # Delete the "$" from the beginning of each price
    prices = [item.replace('$', '') for item in valid_prices]
    return prices

- Functions to query 1 prompt n times for OpenAI Models

In [52]:
def TU2_run_experiment(experiment_id, n, progress_bar, temperature):
    
    answers = []
    for _ in range(n): 
        response = client.chat.completions.create(
            model = TU2_model_dict[experiment_id], 
            max_tokens = 2,
            temperature = temperature, # range is 0 to 2
            messages = [
            {"role": "system", "content": "Answer by only giving a single price in dollars and cents without an explanation."},        
            {"role": "user", "content": 
             f"{TU2_experiment_prompts_dict[experiment_id]} Answer by only giving a single price in dollars and cents without an explanation."}
                   ])

        # Store the answer in the list
        answer = response.choices[0].message.content
        answers.append(answer.strip())
        # Update progress bar (given from either temperature loop, or set locally)
        progress_bar.update(1)

    # Extract valid prices from answers
    valid_prices = extract_dollar_amounts(answers)

    # Compute number of valid answers
    n_observations = len(valid_prices)

    # Collect results 
    results = [experiment_id, temperature, TU2_model_dict[experiment_id], TU2_places_dict[experiment_id],
                TU2_income_dict[experiment_id], answers, n_observations, TU2_configuration_dict[experiment_id]]

    # Give out results
    return results

- Functions to query 1 prompt n times (LLama)

In [55]:
def TU2_run_experiment_llama(experiment_id, n, progress_bar, temperature):
    answers = []
    for _ in range(n):
        response = replicate.run(
            TU2_model_dict[experiment_id],
            input = {
                "system_prompt":  "Answer by only giving a single price in dollars and cents without an explanation.",
                "temperature": temperature,
                "max_new_tokens": 10, 
                "prompt": f"{TU2_experiment_prompts_dict[experiment_id]} Answer by only giving a single price in dollars and cents without an explanation."
            }
        )
        # Grab answer and append to list
        answer = "" # Set to empty string, otherwise it would append the previous answer to the new one
        for item in response:
            answer = answer + item
        answers.append(answer.strip())

        # Update progress bar
        progress_bar.update(1)

    
    # Extract valid prices from answers
    valid_prices = extract_dollar_amounts(answers)

    # Compute number of valid answers
    n_observations = len(valid_prices)

    # Collect results 
    results = [experiment_id, temperature, TU2_model_dict[experiment_id], TU2_places_dict[experiment_id],
                TU2_income_dict[experiment_id], answers, n_observations, TU2_configuration_dict[experiment_id]]
    
    # Give out results
    return results

- Function to loop run_experiment() over a list of temperature values

In [54]:
def TU2_temperature_loop(function, experiment_id, temperature_list = [0.5, 1, 1.5], n = 50):
    """
    Function to run an experiment with different temperature values.
    
    Args:
        function (function): Function to be used for querying ChatGPT i.e. run_experiment()
        experiment_id (str): ID of th e experiment to be run. Contains info about prompt and model
        temperature_list (list): List of temperature values to be looped over
        n: Number of requests for each prompt per temperature value
        max_tokens: Maximum number of tokens in response object
        
    Returns:
        results_df: Dataframe with experiment results
        probs_df: Dataframe with answer probabilities
    """    
    # Empty list for storing results
    results_list = []

    # Initialize progress bar -> used as input for run_experiment()
    progress_bar = tqdm(range(n*len(temperature_list)))

    # Loop over different temperature values, calling the input function n times each (i.e. queriyng ChatGPT n times)
    for temperature in temperature_list:
        results = function(experiment_id = experiment_id, n = n, temperature = temperature, progress_bar = progress_bar) 
        results_list.append(results)
       

    # Horizontally concatenate the results, transpose, and set index
    results_df = pd.DataFrame(results_list).transpose().set_index(pd.Index(
        ["experiment_id", "temperature", "model", "place", "income", "answers", "n_observations", "configuration"]))
  
   
    # Return some information about the experiment as a check
    check = f"In this run, a total of {n*len(temperature_list)} requests were made using {TU2_prompt_ids_dict[experiment_id]}."
    # Print information about the experiment
    print(check)
 

    return results_df

---

### Model 1: GPT-3.5-Turbo

In [20]:
# For GPT-3.5-turbo we make 100 requests per prompt & temperature value
N = 100

- Configuration 1

In [35]:
results_1 = []
results_1_1 = TU2_temperature_loop(TU2_run_experiment, "TU2_1_1_1", temperature_list = [0.5, 1, 1.5], n = N)
results_1.append(results_1_1)

- Configuration 2

In [34]:
results_2 = []
results_2_1 = TU2_temperature_loop(TU2_run_experiment, "TU2_1_1_2", temperature_list = [0.5, 1, 1.5], n = N)
results_2.append(results_2_1)

- Configuration 3

In [36]:
results_3 = []
results_3_1 = TU2_temperature_loop(TU2_run_experiment, "TU2_1_1_3", temperature_list = [0.5, 1, 1.5], n = N)
results_3.append(results_3_1)

- Configuration 4

In [37]:
results_4 = []
results_4_1 = TU2_temperature_loop(TU2_run_experiment, "TU2_1_1_4", temperature_list = [0.5, 1, 1.5], n = N)
results_4.append(results_4_1)

- Configuration 5

In [38]:
results_5 = []
results_5_1 = TU2_temperature_loop(TU2_run_experiment, "TU2_1_2_1", temperature_list = [0.5, 1, 1.5], n = N)
results_5.append(results_5_1)

- Configuration 6

In [39]:
results_6 = []
results_6_1 = TU2_temperature_loop(TU2_run_experiment, "TU2_1_2_2", temperature_list = [0.5, 1, 1.5], n = N)
results_6.append(results_6_1)

- Configuration 7

In [40]:
results_7 = []
results_7_1 = TU2_temperature_loop(TU2_run_experiment, "TU2_1_2_3", temperature_list = [0.5, 1, 1.5], n = N)
results_7.append(results_7_1)

- Configuration 8

In [41]:
results_8 = []
results_8_1 = TU2_temperature_loop(TU2_run_experiment, "TU2_1_2_4", temperature_list = [0.5, 1, 1.5], n = N)
results_8.append(results_8_1)

---

### Model 2: GPT-4-1106-Preview

In [None]:
N = 50

- Configuration 1

In [None]:
results_1_2 = TU2_temperature_loop(TU2_run_experiment, "TU2_2_1_1", temperature_list = [0.5, 1, 1.5], n = N)
results_1.append(results_1_2)

- Configuration 2

In [None]:
results_2_2 = TU2_temperature_loop(TU2_run_experiment, "TU2_2_1_2", temperature_list = [0.5, 1, 1.5], n = N)
results_2.append(results_2_2)

- Configuration 3

In [None]:
results_3_2 = TU2_temperature_loop(TU2_run_experiment, "TU2_2_1_3", temperature_list = [0.5, 1, 1.5], n = N)
results_3.append(results_3_2)

- Configuration 4

In [None]:
results_4_2 = TU2_temperature_loop(TU2_run_experiment, "TU2_2_1_4", temperature_list = [0.5, 1, 1.5], n = N)
results_4.append(results_4_2)

- Configuration 5

In [None]:
results_5_2 = TU2_temperature_loop(TU2_run_experiment, "TU2_2_2_1", temperature_list = [0.5, 1, 1.5], n = N)
results_5.append(results_5_2)

- Configuration 6

In [None]:
results_6_2 = TU2_temperature_loop(TU2_run_experiment, "TU2_2_2_2", temperature_list = [0.5, 1, 1.5], n = N)
results_6.append(results_6_2)

- Configuration 7

In [None]:
results_7_2 = TU2_temperature_loop(TU2_run_experiment, "TU2_2_2_3", temperature_list = [0.5, 1, 1.5], n = N)
results_7.append(results_7_2)

- Configuration 8

In [None]:
results_8_2 = TU2_temperature_loop(TU2_run_experiment, "TU2_2_2_4", temperature_list = [0.5, 1, 1.5], n = N)
results_8.append(results_8_2)

---

### Model 3: LLama-2-70b

In [None]:
N = 50

- Configuration 1

In [None]:
results_1_3 = TU2_temperature_loop(TU2_run_experiment_llama, "TU2_3_1_1", temperature_list = [0.5, 1, 1.5], n = N)
results_1.append(results_1_3)

- Configuration 2

In [None]:
results_2_3 = TU2_temperature_loop(TU2_run_experiment_llama, "TU2_3_1_2", temperature_list = [0.5, 1, 1.5], n = N)
results_2.append(results_2_3)

- Configuration 3

In [None]:
results_3_3 = TU2_temperature_loop(TU2_run_experiment_llama, "TU2_3_1_3", temperature_list = [0.5, 1, 1.5], n = N)
results_3.append(results_3_3)

- Configuration 4

In [None]:
results_4_3 = TU2_temperature_loop(TU2_run_experiment_llama, "TU2_3_1_4", temperature_list = [0.5, 1, 1.5], n = N)
results_4.append(results_4_3)

- Configuration 5

In [None]:
results_5_3 = TU2_temperature_loop(TU2_run_experiment_llama, "TU2_3_2_1", temperature_list = [0.5, 1, 1.5], n = N)
results_5.append(results_5_3)

- Configuration 6

In [None]:
results_6_3 = TU2_temperature_loop(TU2_run_experiment_llama, "TU2_3_2_2", temperature_list = [0.5, 1, 1.5], n = N)
results_6.append(results_6_3)

- Configuration 7

In [None]:
results_7_3 = TU2_temperature_loop(TU2_run_experiment_llama, "TU2_3_2_3", temperature_list = [0.5, 1, 1.5], n = N)
results_7.append(results_7_3)

- Configuration 8

In [None]:
results_8_3 = TU2_temperature_loop(TU2_run_experiment_llama, "TU2_3_2_4", temperature_list = [0.5, 1, 1.5], n = N)
results_8.append(results_8_3)

---

#### Gather all results and save to .csv

In [None]:
# Concatenate results
results_1_df = pd.concat(results_1, axis = 1).transpose()
results_2_df = pd.concat(results_2, axis = 1).transpose()
results_3_df = pd.concat(results_3, axis = 1).transpose()
results_4_df = pd.concat(results_4, axis = 1).transpose()
results_5_df = pd.concat(results_5, axis = 1).transpose()
results_6_df = pd.concat(results_6, axis = 1).transpose()
results_7_df = pd.concat(results_7, axis = 1).transpose()
results_8_df = pd.concat(results_8, axis = 1).transpose()
results_9_df = pd.concat(results_9, axis = 1).transpose()


# Concatenate all results
TU2_results = pd.concat([results_1_df, results_2_df, results_3_df, results_4_df, results_5_df,
                         results_6_df, results_7_df, results_8_df, results_9_df], axis = 0)

# Rename LLama model
TU2_results['model'] = TU2_results['model'].replace('meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3', 
                                  'llama-2-70b')

# Display results
TU2_results