# ChatGPT experiment

Concept: Dustin Eirdosh

Code: Julia Findeisen

The experiment consists of 6 exam question with 5 response options each. Each exam question is tested with 3 instruction types (ALL, BEST, EVAL), which are supplemented with 5 (ALL, BEST) or 1 (EVAL) additional prompt, respectively. Additionally we run the experiment with 2 different temperatures, where higher tempratures coincide with increased creativity or "chaos". We conduct 10 trials per condition, i.e. the ChatGPT response is regenerated 10 times per distinct prompt and temperature. The conditions and responses generated by ChatGPT are stored in an SQL database and exported in CSV format.

# Preparation

Import modules and set OpenAI API key

In [None]:
!pip install openai

In [1]:
import openai
import json, sqlite3
import numpy as np
import pandas as pd
from pandas import DataFrame

In [4]:
# Set your OpenAI API key
openai.api_key = "YOUR KEY HERE"

<h3> Load exam questions </h3>

First we load the EvoFlex exam questions and print them in human-readable format. There are six different questions with five corresponding response options each.

In [2]:
# Load JSON file containing the EvoFlex questions from GitHub repo
url = 'https://raw.githubusercontent.com/dustineirdosh/evoflex/main/evoflex_en.json'
exam_data = pd.read_json(url)

In [3]:
# Print questions in readable format
for i,question in enumerate(exam_data['questions']):
    print (f"\n{i+1}. {question['question']}\n")
    print ("Question:", question['question_text'], "\n")
    for option in question['options']:
        print (option)


1. Peacock's Feathers

Question: Male Peacocks exhibit an incredibly colorful display of feathers. Ancestral males of the modern Peacock did not always have such feathers. How did the male Peacock's feathers evolve? 

A: The ancestors of female peacocks had a preference for males with colorful feathers. Male peacocks that happened to have more colorful feathers happened to breed more than the others and had more offspring.
B: The ancestors of female peacocks realized their offspring would survive and reproduce better if they mated with males who had colorful feathers, because of this, over many generations of genetic changes, the males developed more colorful feathers.
C: The ancestors of male peacocks wanted to attract more females, because of this, over many generations, their descendants continued to develop more colorful feathers that the females were drawn to.
D: The ancestors of male peacocks increasingly needed colorful feathers in order to find a female mate, because of this, 

<h3> Set prompts </h3>

Next we set the prompts and supplemental prompts corresponding to the three instruction types. 

In [7]:
# Set instructions
instructions = {
    'all': "Select ALL scientifically adequate responses, and respond only with the letter(s) of the correct response option(s).",
    'best': "Select the BEST response, and respond only with the letter of the correct response option.",
    'eval': "Critically evaluate this response option in terms of scientific adequacy. Based on your response, categorize this statement as follows: \n0 = not scientifically adequate from the perspective of evolutionary biology\n1 = scientifically adequate from the perspective of evolutionary biology.\nRespond ONLY with the number of the category."
}

In [8]:
# Set supplemental prompts for instruction types ALL and BEST
suppl_prompts = ["Provide a detailed explanation of your reasoning for this choice.", 
                 "Critically evaluate if this is the best response.", 
                 "Think carefully about the instructions, and critically evaluate if your first response is still the best response.",
                "Think carefully about your full range of knowledge regarding evolutionary biology, think through each response option in relation to evolutionary biology, and critically evaluate if your first response is still the best response.",
                 "Think carefully about coherence and compatibility of the various response options, and critically evaluate if your first response is still the best response."]

In [9]:
# Set supplemental prompt for instruction type EVAL
suppl_prompts_eval = ["Act as a teacher responding to a student who selected this response, generate a discussion with them that leads to a more scientifically adequate response."]

<h3> Create database </h3>

In [36]:
# Create database and delete existing table
con = sqlite3.connect('evoflex_chatgpt_data.db')
cursor = con.cursor()
cursor.execute("DROP TABLE IF EXISTS chatgpt_interactions")
 
# Creating table
create_table_stmt = """ CREATE TABLE chatgpt_interactions (
            Question VARCHAR(255) NOT NULL,
            Temperature REAL NOT NULL,
            Prompt VARCHAR(255) NOT NULL,
            Iteration INTEGER NOT NULL,
            Response VARCHAR(255)
        ); """
 
cursor.execute(create_table_stmt)
 
print("Table ready")

con.close()

Table ready


# Experiment

Each prompt is run with 2 different temperatures, where higher tempratures coincide with increased creativity or "chaos". We conduct 10 trials per condition, i.e. the ChatGPT response is regenerated 10 times per distinct prompt and temperature.

In [11]:
# Temperatures – high temperatures increase creativity 
temperatures = [0.2, 1.0]

# Number of iterations (regenerated responses) for each distinct prompt
num_iterations = 10  

Connect to database and interact with ChatGPT.

In [22]:
# Connect to the SQLite database
with sqlite3.connect('evoflex_chatgpt_data.db') as connection:
    
    # Create a cursor object to interact with the database
    cursor = connection.cursor()
    
    # Exam question loop:
    for question in exam_data['questions']:
        question_name = question['question']
        
        # Temperature loop
        for temp in temperatures:
            
            # Instruction type loop
            for instr_type in instructions:
                instruction = instructions[instr_type]
                
                # Instruction types ALL and BEST use different supplemental prompts than instruction type EVAL, and don't iterate over responses
                if instr_type != "eval":
                  
                    # Create original prompt consisting of instruction, question text and ALL response options
                    prompt = f"{instruction}\nQuestion: {question['question_text']}\nOptions: {', '.join(question['options'])}\n"

                    # Supplemental prompt loop:
                    for n in range(len(suppl_prompts)+1):

                        # Create combined prompt by appending the supplemental prompt to the original prompt – first iteration without supplemental prompt
                        supplemental_prompt = "" if n==0 else suppl_prompts[n-1]
                        combined_prompt = f"{prompt}Supplemental Prompt: {supplemental_prompt}\n" if supplemental_prompt else prompt
                        
                        # 10 trials per prompt, each trial is stored in the database
                        for iteration in range(num_iterations):
                            
                            # Make API call to ChatGPT
                            response = openai.Completion.create(
                            engine="text-davinci-003", 
                            prompt=combined_prompt,
                            max_tokens=2000,  
                            temperature = temp
                            )
                            
                            # Store response in database
                            cursor.execute('INSERT INTO chatgpt_interactions (question, temperature, prompt, iteration, response) VALUES (?, ?, ?, ?, ?)', (question_name, temp, combined_prompt, iteration+1, response.choices[0].text.strip()))
                        
                else:
                   
                    # For instruction type EVAL, we additionally iterate over the response options, and use different suppplemental prompts. The rest of the API call scheme remains identical
                    for option in question['options']:

                        # Create original prompt consisting of instruction, question text and ONE response option
                        prompt = f"{instruction} Question: {question['question_text']}\nOption: {option}\n"
                        
                        # Supplemental prompt loop:
                        for n in range(len(suppl_prompts_eval)+1):

                            # Create combined prompt – run first iteration without supplemental prompt
                            supplemental_prompt = "" if n==0 else suppl_prompts_eval[n-1]
                            combined_prompt = f"{prompt}Supplemental Prompt: {supplemental_prompt}\n" if supplemental_prompt else prompt
                            
                            # Trial loop:
                            for iteration in range(num_iterations):
                                
                                # Make API call to ChatGPT
                                response = openai.Completion.create(
                                engine="text-davinci-003", 
                                prompt=combined_prompt,
                                max_tokens=2000, 
                                temperature = temp
                                )

                                # Store response in database
                                cursor.execute('INSERT INTO chatgpt_interactions (question, temperature, prompt, iteration, response) VALUES (?, ?, ?, ?, ?)', (question_name, temp, combined_prompt, (iteration+1), response.choices[0].text.strip() ))


# Result

In [38]:
# Query database and print table
con = sqlite3.connect('evoflex_chatgpt_data.db')
cursor = con.cursor()
cursor.execute('SELECT question, temperature, prompt, iteration, response FROM chatgpt_interactions')
row = cursor.fetchall()
pd.set_option('display.width', None)
df = DataFrame(data=np.array(row), 
               index=np.arange(len(row)), 
               columns=['Question','Temperature', 'Prompt', 'Iteration','Response'])

In [39]:
df

Unnamed: 0,Question,Temperature,Prompt,Iteration,Response
0,Peacock's Feathers,0.2,"Select ALL scientifically adequate responses, ...",1,"A, B, D"
1,Peacock's Feathers,0.2,"Select ALL scientifically adequate responses, ...",2,"A, B, D"
2,Peacock's Feathers,0.2,"Select ALL scientifically adequate responses, ...",3,"A, B, D"
3,Peacock's Feathers,0.2,"Select ALL scientifically adequate responses, ...",4,"A, B, D"
4,Peacock's Feathers,0.2,"Select ALL scientifically adequate responses, ...",5,"A, B, D"
...,...,...,...,...,...
2635,Cheetah running,1.0,Critically evaluate this response option in te...,6,0
2636,Cheetah running,1.0,Critically evaluate this response option in te...,7,0
2637,Cheetah running,1.0,Critically evaluate this response option in te...,8,0
2638,Cheetah running,1.0,Critically evaluate this response option in te...,9,0


<h3> Export table </h3>

In [63]:
df.to_csv("chatgpt_interactions.csv")