# OpenAI Simulations for Chat Log Classification

This notebook was developed as part of my work as an undergraduate research assistant in the UC Santa Barbara Computer Science Department. The goal of the project is to evaluate whether large language models—specifically OpenAI’s GPT-4o—can consistently classify human-written summaries of chat logs from multiplayer strategic trading games. These summaries were manually written based on annotated transcripts collected during a socioeconomic experiment.

The code in this notebook supports the simulation workflow used to test classification reliability. Each simulation involves sending a structured CSV of summaries to GPT-4o, capturing the model’s predicted category labels, and repeating this process many times to assess consistency.

This notebook focuses on the simulation setup, output storage, and aggregation steps used to generate results for Models 4 and 6. It was written to make the automation process transparent and reproducible for future analysis.

## Model 4: Multi-Label Classification (5 Categories)

Model 4 was the first prompt design tested. It allowed GPT-4o to assign multiple categories to a single summary. The categories used were:

- Coordination
- Efficiency
- Conflict
- Inequality
- Unknown

The input was a simplified CSV (`id_chat_input.csv`) containing just the summary ID and text. The first 20 simulations were run manually. The remaining 80 were automated using the OpenAI API with `temperature=0.7`.

The outputs from all 100 simulations were saved as individual CSV files and later concatenated into a single file: `sum_model_4.csv`.

In [None]:
import pandas as pd
import os
from time import sleep
from openai import OpenAI

api_key = "*your api key*"

client = OpenAI(api_key=api_key)

input_df = pd.read_csv("id_chat_input.csv")

# Convert DataFrame to CSV-style string for GPT input
csv_input_string = input_df.to_csv(index=False)

user_prompt = """
Use the CSV file with summaries. I want you to classify each summary based on the following categories. 
The classification is not required to be mutually exclusive—each summary can be classified into multiple categories if needed.
Categories:
Coordination
Efficiency
Conflict
Inequality
Unknown (if it does not fit any of the other categories)
The output should include binary category columns. No additional text and do not explain what you did. I only want csv output.
"""

output_dir = "/Users/*your path*/chatgpt_outputsM4"
os.makedirs(output_dir, exist_ok=True)

for i in range(21, 101):
    try:
        print(f"Running simulation {i}...")

        # GPT-4 API call
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": user_prompt},
                {"role": "user", "content": f"CSV input:\n{csv_input_string}"}
            ],
            temperature=0.7
        )

        response_text = response.choices[0].message.content

        filename = os.path.join(output_dir, f"model4_{i}.csv")
        with open(filename, "w", encoding="utf-8") as f:
            f.write(response_text)

        print(f"Saved: {filename}")
        sleep(1)  # Pause between calls

    except Exception as e:
        print(f"Error during simulation {i}: {e}")

## Concatenating Model 4 Simulation Outputs

Model 4 involved 100 simulations of GPT-4o using a multi-label classification prompt. Each simulation produced a CSV file stored in the `chatgpt_outputsM4/` directory, where each summary could be assigned one or more of five categories: *Coordination*, *Efficiency*, *Conflict*, *Inequality*, and *Unknown*.

This script reads all simulation outputs, performs basic file validation, standardizes column names, and appends metadata for the simulation number and model type. It then merges all the individual outputs into a single comprehensive file.

The final dataset, saved as `sum_model_4.csv`, contains all model predictions across simulations and is used for analyzing category assignment frequency and measuring classification consistency across repeated GPT-4o runs.

In [None]:
import os
import pandas as pd
import re
import sys
import csv

# Define input and output paths
input_folder = 'chatgpt_outputsM4'
output_file = 'sum_model_4.csv'

def print_file_preview(filepath):
    """Print the first few lines of a file for debugging"""
    print(f"\nPreviewing file: {filepath}")
    with open(filepath, 'r') as f:
        for i, line in enumerate(f):
            if i < 5:
                print(f"Line {i+1}: {line.strip()}")
            else:
                break
    print()

# Initialize list to store all dataframes
dfs = []

# Get all relevant CSV files
csv_files = sorted([f for f in os.listdir(input_folder) if f.endswith('.csv')])

# Process each CSV file
for file in csv_files:
    match = re.search(r'model4_(\d+)\.csv', file)
    if match:
        simulation_num = match.group(1)
        filepath = os.path.join(input_folder, file)
        
        try:
            print(f"Processing file: {file}")
            print_file_preview(filepath)

            # Check structure
            with open(filepath, 'r') as f:
                reader = csv.reader(f)
                header = next(reader)
                if len(header) != 7:
                    print(f"Error: Header has {len(header)} fields instead of 7")
                    print(f"Header: {header}")
                    sys.exit(1)

                for i, row in enumerate(reader, start=2):
                    if len(row) != 7:
                        print(f"Error: Line {i} has {len(row)} fields instead of 7")
                        print(f"Line content: {row}")
                        sys.exit(1)

            # Read and process file
            df = pd.read_csv(filepath)
            df.columns = df.columns.str.title()
            df = df.rename(columns={
                'Id_Chat': 'id_chat',
                'Text': 'text',
                'Coordination': 'Coordination',
                'Efficiency': 'Efficiency',
                'Conflict': 'Conflict',
                'Inequality': 'Inequality',
                'Unknown': 'Unknown'
            })

            print(f"Columns found: {list(df.columns)}")
            print(f"Number of columns: {len(df.columns)}")
            print(f"Number of rows: {len(df)}\n")

            df['Model'] = 4
            df['Simulation'] = simulation_num

            df = df[['id_chat', 'text', 'Coordination', 'Efficiency', 'Conflict',
                     'Inequality', 'Unknown', 'Model', 'Simulation']]

            dfs.append(df)

        except Exception as e:
            print(f"Error processing file {file}:")
            print(str(e))
            print("Skipping this file...\n")
            continue

if dfs:
    final_df = pd.concat(dfs, ignore_index=True)
    final_df.to_csv(output_file, index=False)
    print(f"Successfully created {output_file} with {len(final_df)} rows from {len(dfs)} input files.")
else:
    print(f"No valid CSV files found in {input_folder}.")


## Model 6: Binary Classification (2 Categories)

Model 6 was the final and most simplified prompt design tested. It restricted GPT-4o to assign exactly one of two mutually exclusive categories per summary:

- Coordination
- Unknown

This design was motivated by results from Model 5, where over 95% of outputs already fell into these two categories. By narrowing the classification scope, the goal was to increase output consistency and reduce ambiguity.

The input remained the same: a simplified CSV (`id_chat_input.csv`) containing just the summary ID and text. As with previous models, the first 20 simulations were run manually, and the remaining 80 were automated using the OpenAI API with `temperature=0.7`.

All 100 outputs were saved as individual CSV files and later concatenated into a single dataset: `sum_model_6.csv`, which was then aggregated to produce `totals_model6.csv` for consistency analysis.

In [None]:
import pandas as pd
import os
from time import sleep
from openai import OpenAI


api_key = "*your api key*"

client = OpenAI(api_key=api_key)

input_df = pd.read_csv("id_chat_input.csv")

# Convert DataFrame to CSV-style string for GPT input
csv_input_string = input_df.to_csv(index=False)

user_prompt = """
Use the CSV file with summaries. I want you to classify each summary based on the following categories. The classification must be mutually exclusive—each summary can be classified into only one category.
Categories:
Coordination
Unknown (if it does not fit any of the other categories)
The output should include binary category columns. I only want csv output. 
"""

output_dir = "/Users/*your path*/chatgpt_outputsM6"
os.makedirs(output_dir, exist_ok=True)

for i in range(21, 101):
    try:
        print(f"Running simulation {i}...")

        # GPT-4 API call
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": user_prompt},
                {"role": "user", "content": f"CSV input:\n{csv_input_string}"}
            ],
            temperature=0.7
        )

        response_text = response.choices[0].message.content

        filename = os.path.join(output_dir, f"model6_{i}.csv")
        with open(filename, "w", encoding="utf-8") as f:
            f.write(response_text)

        print(f"Saved: {filename}")
        sleep(1)  # Pause between calls

    except Exception as e:
        print(f"Error during simulation {i}: {e}")

## Concatenating Model 6 Simulation Outputs

After running 100 simulations of Model 6, each individual output was saved as a separate CSV file in the `chatgpt_outputsM6/` directory. Each file contains classification results from a single GPT-4o run, using the same input summaries but potentially varying outputs due to model randomness.

This script reads and validates the structure of each CSV file, checks for formatting errors (e.g., incorrect number of columns), and standardizes column names. It then appends simulation metadata (`Model`, `Simulation`) to each dataframe and concatenates them into a unified dataset.

The final output is saved as `sum_model_6.csv`, which consolidates all simulation runs for downstream analysis of classification consistency and category frequency.

In [None]:
import os
import pandas as pd
import re
import sys
import csv

input_folder = 'chatgpt_outputsM6'
output_file = 'sum_model_6.csv'

def print_file_preview(filepath):
    """Print the first few lines of a file for debugging"""
    print(f"\nPreviewing file: {filepath}")
    with open(filepath, 'r') as f:
        for i, line in enumerate(f):
            if i < 5:  # Print first 5 lines
                print(f"Line {i+1}: {line.strip()}")
            else:
                break
    print()

# Initialize an empty list to store all dataframes
dfs = []

# Get all CSV files from the input folder
csv_files = sorted([f for f in os.listdir(input_folder) if f.endswith('.csv')])

# Process each CSV file
for file in csv_files:
    # Extract simulation number from filename using regex
    match = re.search(r'model6_(\d+)\.csv', file)
    if match:
        simulation_num = match.group(1)
        filepath = os.path.join(input_folder, file)
        
        try:
            print(f"Processing file: {file}")
            print_file_preview(filepath)
            with open(filepath, 'r') as f:
                reader = csv.reader(f)
                header = next(reader)
                if len(header) != 4:
                    print(f"Error: Header has {len(header)} fields instead of 4")
                    print(f"Header: {header}")
                    sys.exit(1)
                
                for i, row in enumerate(reader, start=2):
                    if len(row) != 4:
                        print(f"Error: Line {i} has {len(row)} fields instead of 4")
                        print(f"Line content: {row}")
                        sys.exit(1)
            
            df = pd.read_csv(filepath)    
            df.columns = df.columns.str.title() 
            df = df.rename(columns={
                'Id_Chat': 'id_chat',
                'Text': 'text',
                'Coordination': 'Coordination',
                'Unknown': 'Unknown'
            })
            
            print(f"Columns found: {list(df.columns)}")
            print(f"Number of columns: {len(df.columns)}")
            print(f"Number of rows: {len(df)}\n")
            
            df['Model'] = 6
            df['Simulation'] = simulation_num
            df = df[['id_chat', 'text', 'Coordination', 'Unknown', 'Model', 'Simulation']]
            dfs.append(df)
            
        except Exception as e:
            print(f"Error processing file {file}:")
            print(str(e))
            print("Skipping this file...\n")
            continue

if dfs:
    final_df = pd.concat(dfs, ignore_index=True)
    final_df.to_csv(output_file, index=False)
    print(f"Successfully created {output_file} with {len(final_df)} rows from {len(dfs)} input files.")
else:
    print(f"No CSV files matching the pattern 'model6_*.csv' were found in {input_folder}")

---

**Notebook created by:** Anna Gornyitzki  
**Date:** July 2025  
**Affiliation:** Undergraduate Research Assistant, Computer Science, UC Santa Barbara  
