### Decoy Effect

This notebook aims to recreate some findings concerning the **Decoy Effect**. Specifically, we recreate the event described in https://thestrategystory.com/2020/10/02/economist-magazine-a-story-of-clever-decoy-pricing/ with numbers taken from https://en.wikipedia.org/wiki/Decoy_effect.

- Huge deviations in runtime of cells: 50x5 function calls for GPT-3.5-Turbo can take ~5 min up to ~120 min (24.11.2023)

The aspect of *prompt engineering* or *prompt optimization* seems very important in this case. An easy proof whether the ouput is meaningful and the model answered as desired is to check if all resulting **probabilities add up to 100%**.

In [12]:
from openai import OpenAI
import openai
import matplotlib.pyplot as plt
import os 
import numpy as np
import pandas as pd
import re

In [13]:
# Get API key (previously saved as environmental variable)
openai.api_key = os.environ["OPENAI_API_KEY"]

# Set client
client = OpenAI()

# Set global plot style
plt.style.use('seaborn-v0_8')

|Answer option          | Scenario 1 | Scenario 2 (no 2nd option)|
|-----------------------|--------------|-----------|
| Online subscription   |     16%      |  68%       |
| Print subscribtion    |      0%     | 0%        |
| Combination           |     84%   | 32%        |

(empirical findings)


The answer options for this experiment remain the same throughout each prompt, being:
-  A: 1-year Economist.com subscription for $59.00, including access to all articles from 1997 onwards.
-  B: 1-year Print subscription to The Economist for $125.00.
-  C: Print & web subscription for 125$: 1-year subscription to the print edition of The Econnomist and online access to all articles from 1997 onwards.

------------------------------------------------------------------------------------------------------------------------

#### Setting up a function to query ChatGPT with varying parameters

In [14]:
# Function to extract prompt id from prompt name 
def extract_prompt_id(prompt):
    # Just the numerical part is enough 
    match = re.search(r'\d+', prompt)
    if match:
        return int(match.group())
    else:
        return None

In [15]:
def decoy_function(prompt, model, n, temperature = 1, max_tokens = 1):

    """
    Function to query ChatGPT in order to find out about the extent to which it adheres to the decoy effect
    
    Args:
        prompt: Specific prompt to be used for querying ChatGPT e.g. Chat in web-interface. 
        model: Model to be used for querying ChatGPT e.g. "gpt-3.5-turbo", "gpt-4", "gpt-4-1106-preview"
        n: Number of queries to be made
        temperature: Degree of randomness with range 0 (deterministic) to 2 (random)
        max_tokens: Maximum number of tokens in response object
        
    Returns:
        results: Count of answers for each option
        probs: Probability of each option being chosen
    """

    answers = []
    for _ in range(n): 
        response = client.chat.completions.create(
            model = model, 
            max_tokens = max_tokens,
            temperature = temperature, # range is 0 to 2
            messages = [
            {"role": "user", "content": prompt},
                   ])

        # Store the answer in the list
        answer = response.choices[0].message.content
        answers.append(answer.strip())
    
    # Extract number of prompt
    prompt_id = extract_prompt_id(prompt)

    # Counting results
    A = answers.count("A")
    B = answers.count("B")
    C = answers.count("C")

    # Collecting results in a dataframe
    results = [prompt_id, temperature, A, B, C]

    # Getting percentage each answer
    p_a = f"{(A / len(answers)) * 100:.2f}%"
    p_b = f"{(B / len(answers)) * 100:.2f}%"
    p_c = f"{(C / len(answers)) * 100:.2f}%"

    # Collect probabilities in a dataframe
    probs = [prompt_id, temperature, p_a, p_b, p_c]

    # Give out results
    return results, probs

-----------------------------------------------------------

#### Next we set up the different prompts we are going to use

For some models the prompts will have to be readjusted. This step has to be performed because the models answer in different manners and we have to make sure that the output is actually meaningful.

- Prompt 1: Unprimed & all answer options

In [16]:
prompt_1 = """You are presented with three different subscription alternatives for the "The Economist" magazine:
         A: 1-year Economist.com subscription for $59.00, including access to all articles from 1997 onwards.
         B: 1-year Print subscription to The Economist for $125.00.
         C: Print & web subscription for 125$: 1-year subscription to the print edition of The Econnomist and online access to all articles from 1997 onwards.
         Which alternative would you choose? Please answer by only giving the letter of the alternative A, B or C you would choose without any reasoning."""

- Prompt 2: Unprimed & second option (decoy) removed

In [17]:
prompt_2 = """You are presented with three different subscription alternatives for the "The Economist" magazine:
         A: 1-year Economist.com subscription for $59.00, including access to all articles from 1997 onwards.
         B: 1-year Print subscription to The Economist for $125.00.
         C: Print & web subscription for 125$: 1-year subscription to the print edition of The Econnomist and online access to all articles from 1997 onwards.
         The Marketing team of The Economist however has now decided to remove option B. 
         Which remaining alternative would you choose? Please answer by only giving the letter of the alternative A, B or C you would choose without any reasoning."""

- Prompt 3: Primed & all answer options

In [18]:
prompt_3 = """You are a market researcher that knows about the Decoy Effect. 
         You are presented with three different subscription alternatives for the "The Economist" magazine:
         A: 1-year Economist.com subscription for $59.00, including access to all articles from 1997 onwards.
         B: 1-year Print subscription to The Economist for $125.00.
         C: Print & web subscription for 125$: 1-year subscription to the print edition of The Econnomist and online access to all articles from 1997 onwards.
         Which alternative would you choose? Please answer by only giving the letter of the alternative A, B or C you would choose without any reasoning."""

- Prompt 4: Primed & second option (decoy) removed

In [19]:
prompt_4 = """You are a market researcher that knows about the Decoy Effect. 
         You are presented with three different subscription alternatives for the "The Economist" magazine:
         A: 1-year Economist.com subscription for $59.00, including access to all articles from 1997 onwards.
         B: 1-year Print subscription to The Economist for $125.00.
         C: Print & web subscription for 125$: 1-year subscription to the print edition of The Econnomist and online access to all articles from 1997 onwards.
         The Marketing team of The Economist however has now decided to remove option B. 
         Which remaining alternative would you choose? Please answer by only giving the letter of the alternative A, B or C you would choose without any reasoning."""

- Set up prompt dictionary to extract id in cleaner way

In [20]:
prompt_ids= {
    "prompt_1": 1,
    "prompt_2": 2,
    "prompt_3": 3,
    "prompt_4": 4,
}

--------------------------------------------------------

## Comparing different LLMs

#### GPT-3.5-Turbo

Model training ended in September 2021

In [21]:
# Simple test of function
results_1, probs_1 = decoy_function(prompt = prompt_1, model = "gpt-3.5-turbo", n = 10, temperature = 0.5, max_tokens = 1)
probs_1

# prompt, temp, p_a, p_b, p_c

[1, 0.5, '0.00%', '10.00%', '90.00%']

- Prompt 1: Unprimed & all answer options

In [22]:
# Empty lists for storing results
results_1 = []
probs_1 = []

# Loop over 5 different temperature values, calling the function 50 times each 
for temperature in [0, 0.5, 1, 1.5, 2]:
    results, probs = decoy_function(prompt = prompt_1, model = "gpt-3.5-turbo", n = 50, temperature = temperature, max_tokens=1)
    results_1.append(results) # vertically appended
    probs_1.append(probs)

# Horizontally concatenate the results, transpose and set index
results_1 = pd.DataFrame(results_1).transpose().set_index(pd.Index(["Prompt", "Temp", "p(A)", "p(B)", "p(C)"]))
probs_1 = pd.DataFrame(probs_1).transpose().set_index(pd.Index(["Prompt", "Temp", "p(A)", "p(B)", "p(C)"]))

# Display results
probs_1

Unnamed: 0,0,1,2,3,4
Prompt,1,1,1,1,1
Temp,0.0,0.5,1.0,1.5,2.0
p(A),0.00%,0.00%,0.00%,8.00%,10.00%
p(B),0.00%,4.00%,6.00%,20.00%,24.00%
p(C),100.00%,96.00%,94.00%,72.00%,58.00%


In [23]:
# probs_1 (from previous run) -> Why do they suddenly choose B?

- Prompt 2: Unprimed & second option (decoy) removed

In [24]:
# Empty lists for storing results
results_2 = []
probs_2 = []

# Loop over 5 different temperature values 
for temperature in [0, 0.5, 1, 1.5, 2]:
    results, probs = decoy_function(prompt = prompt_2, model = "gpt-3.5-turbo", n = 50, temperature = temperature, max_tokens = 1)
    results_2.append(results) # vertically appended
    probs_2.append(probs)

# Horizontally concatenate the results, transpose and set index
results_2 = pd.DataFrame(results_2).transpose().set_index(pd.Index(["Prompt", "Temp", "p(A)", "p(B)", "p(C)"]))
probs_2 = pd.DataFrame(probs_2).transpose().set_index(pd.Index(["Prompt", "Temp", "p(A)", "p(B)", "p(C)"]))

# Display results
probs_2

Unnamed: 0,0,1,2,3,4
Prompt,1,1,1,1,1
Temp,0.0,0.5,1.0,1.5,2.0
p(A),100.00%,78.00%,70.00%,64.00%,58.00%
p(B),0.00%,0.00%,0.00%,0.00%,0.00%
p(C),0.00%,22.00%,30.00%,36.00%,36.00%


- Prompt 3: Primed & all answer options

In [25]:
# Empty lists for storing results
results_3 = []
probs_3 = []

# Loop over 5 different temperature values 
for temperature in [0, 0.5, 1, 1.5, 2]:
    results, probs = decoy_function(prompt = prompt_3, model = "gpt-3.5-turbo", n = 50, temperature = temperature, max_tokens = 1)
    results_3.append(results) # vertically appended
    probs_3.append(probs)

# Horizontally concatenate the results, transpose and set index
results_3 = pd.DataFrame(results_3).transpose().set_index(pd.Index(["Prompt", "Temp", "p(A)", "p(B)", "p(C)"]))
probs_3 = pd.DataFrame(probs_3).transpose().set_index(pd.Index(["Prompt", "Temp", "p(A)", "p(B)", "p(C)"]))

# Display results
probs_3

Unnamed: 0,0,1,2,3,4
Prompt,1,1,1,1,1
Temp,0.0,0.5,1.0,1.5,2.0
p(A),0.00%,0.00%,12.00%,6.00%,14.00%
p(B),12.00%,36.00%,38.00%,38.00%,40.00%
p(C),88.00%,64.00%,48.00%,48.00%,36.00%


- Prompt 4: Primed & second option (decoy) removed

In [26]:
# Empty lists for storing results
results_4 = []
probs_4 = []

# Loop over 5 different temperature values 
for temperature in [0, 0.5, 1, 1.5, 2]:
    results, probs = decoy_function(prompt = prompt_4, model = "gpt-3.5-turbo", n = 50, temperature = temperature, max_tokens = 1)
    results_4.append(results) # vertically appended
    probs_4.append(probs)

# Horizontally concatenate the results, transpose and set index
results_4 = pd.DataFrame(results_4).transpose().set_index(pd.Index(["Prompt", "Temp", "p(A)", "p(B)", "p(C)"]))
probs_4 = pd.DataFrame(probs_4).transpose().set_index(pd.Index(["Prompt", "Temp", "p(A)", "p(B)", "p(C)"]))

# Display results
probs_4

Unnamed: 0,0,1,2,3,4
Prompt,1,1,1,1,1
Temp,0.0,0.5,1.0,1.5,2.0
p(A),0.00%,42.00%,50.00%,42.00%,46.00%
p(B),0.00%,0.00%,0.00%,0.00%,0.00%
p(C),100.00%,58.00%,50.00%,56.00%,44.00%


#### GPT-4-1106-preview

Model training ended in April 2023

- Prompt 1: Unprimed & all answer options

In [27]:
# Empty lists for storing results
results_5 = []
probs_5 = []

# Loop over 5 different temperature values, calling the function 50 times each 
for temperature in [0, 0.5, 1, 1.5, 2]:
    results, probs = decoy_function(prompt = prompt_1, model = "gpt-4-1106-preview", n = 50, temperature = temperature, max_tokens=1)
    results_5.append(results) # vertically appended
    probs_5.append(probs)

# Horizontally concatenate the results, transpose and set index
results_5 = pd.DataFrame(results_5).transpose().set_index(pd.Index(["Prompt", "Temp", "p(A)", "p(B)", "p(C)"]))
probs_5 = pd.DataFrame(probs_5).transpose().set_index(pd.Index(["Prompt", "Temp", "p(A)", "p(B)", "p(C)"]))

# Display results
probs_5

Unnamed: 0,0,1,2,3,4
Prompt,1,1,1,1,1
Temp,0.0,0.5,1.0,1.5,2.0
p(A),0.00%,0.00%,0.00%,2.00%,0.00%
p(B),0.00%,0.00%,0.00%,0.00%,0.00%
p(C),100.00%,100.00%,100.00%,98.00%,98.00%


- Prompt 2: Unprimed & second option (decoy) removed

In [28]:
# Empty lists for storing results
results_6 = []
probs_6 = []

# Loop over 5 different temperature values, calling the function 50 times each 
for temperature in [0, 0.5, 1, 1.5, 2]:
    results, probs = decoy_function(prompt = prompt_2, model = "gpt-4-1106-preview", n = 50, temperature = temperature, max_tokens=1)
    results_6.append(results) # vertically appended
    probs_6.append(probs)

# Horizontally concatenate the results, transpose and set index
results_6 = pd.DataFrame(results_6).transpose().set_index(pd.Index(["Prompt", "Temp", "p(A)", "p(B)", "p(C)"]))
probs_6 = pd.DataFrame(probs_6).transpose().set_index(pd.Index(["Prompt", "Temp", "p(A)", "p(B)", "p(C)"]))

# Display results
probs_6

Unnamed: 0,0,1,2,3,4
Prompt,1,1,1,1,1
Temp,0.0,0.5,1.0,1.5,2.0
p(A),0.00%,2.00%,6.00%,20.00%,34.00%
p(B),0.00%,0.00%,0.00%,0.00%,0.00%
p(C),100.00%,98.00%,94.00%,76.00%,62.00%


- Prompt 3: Primed & all answer options

In [29]:
# Empty lists for storing results
results_7 = []
probs_7 = []

# Loop over 5 different temperature values, calling the function 50 times each 
for temperature in [0, 0.5, 1, 1.5, 2]:
    results, probs = decoy_function(prompt = prompt_3, model = "gpt-4-1106-preview", n = 50, temperature = temperature, max_tokens=1)
    results_7.append(results) # vertically appended
    probs_7.append(probs)

# Horizontally concatenate the results, transpose and set index
results_7 = pd.DataFrame(results_7).transpose().set_index(pd.Index(["Prompt", "Temp", "p(A)", "p(B)", "p(C)"]))
probs_7 = pd.DataFrame(probs_7).transpose().set_index(pd.Index(["Prompt", "Temp", "p(A)", "p(B)", "p(C)"]))

# Display results
probs_7

Unnamed: 0,0,1,2,3,4
Prompt,1,1,1,1,1
Temp,0.0,0.5,1.0,1.5,2.0
p(A),0.00%,0.00%,0.00%,0.00%,0.00%
p(B),0.00%,0.00%,0.00%,0.00%,0.00%
p(C),100.00%,100.00%,100.00%,100.00%,98.00%


- Prompt 4: Primed & second option (decoy) removed

In [30]:
# Empty lists for storing results
results_8 = []
probs_8 = []

# Loop over 5 different temperature values, calling the function 50 times each 
for temperature in [0, 0.5, 1, 1.5, 2]:
    results, probs = decoy_function(prompt = prompt_4, model = "gpt-4-1106-preview", n = 50, temperature = temperature, max_tokens=1)
    results_8.append(results) # vertically appended
    probs_8.append(probs)

# Horizontally concatenate the results, transpose and set index
results_8 = pd.DataFrame(results_8).transpose().set_index(pd.Index(["Prompt", "Temp", "p(A)", "p(B)", "p(C)"]))
probs_8 = pd.DataFrame(probs_8).transpose().set_index(pd.Index(["Prompt", "Temp", "p(A)", "p(B)", "p(C)"]))

# Display results
probs_8

Unnamed: 0,0,1,2,3,4
Prompt,1,1,1,1,1
Temp,0.0,0.5,1.0,1.5,2.0
p(A),18.00%,30.00%,44.00%,40.00%,30.00%
p(B),0.00%,0.00%,0.00%,0.00%,0.00%
p(C),82.00%,70.00%,56.00%,58.00%,52.00%


Motivated by the I-bias experiment, we now examine whether labeling bias can be mitigated by using
letters that have similar frequency in written English. Therefore, instead of assigning to choices the
labels “A”, “B”, etc. we assign the following labels: “R”, “S”, “N”, “L”, “O”, “T”, “M”, “P”, “W”, “U”, “Y”,
“V” - Mendler-Dünner paper 