## OpenAI Case Study - Experiment Playground

### Imports & Config

In [1]:
from assistant_helpers import create_assistant, prompt_assistant, evaluate_method
import pandas as pd

df = pd.read_csv('tse_takehome_dataset.csv')

### Naive Assistant

Assistant with no instruction set and weak prompt.

In [12]:
# Create Naive Assistant
naive_assistant = create_assistant()

In [None]:
# Prompt Naive Assistant
naive_response = prompt_assistant(naive_assistant, prompt='Tell me tina escobars favorite city and why', file_path='tse_takehome_dataset.csv')

In [None]:
# Print conversation output  DO NOT RERUN!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
print(naive_response[0])


Assistant Response: Let's first take a look at the contents of the uploaded file to find out details about Tina Escobar's favorite city. I'll start by checking the structure and contents of the file.

Tool Call Input:
# Let's first read the file to understand its contents and format.
file_path = '/mnt/data/file-WMiQ1qtEx6sidyriwNX3YG'

# Trying to determine the file type
import magic

file_type = magic.from_file(file_path, mime=True)
file_type

Assistant Response: It seems that the `magic` library isn't available in this environment, so I'll take a different approach to determine the format of the uploaded file. I'll attempt to open it as a text file first, since text formats like CSV and JSON are common. Let's see what it contains.

Tool Call Input:
# Attempting to read the first few lines of the file to determine its format.
try:
    with open(file_path, 'r') as file:
        # Read the first 512 characters to guess the format
        content_preview = file.read(512)
except Exceptio

In [None]:
# Reproduce call to pandas from code interpreter
df.head()

Unnamed: 0,date,name,company_name,description_of_company,favourite_memory,favourite_city_and_why,favourite_food_and_why,occupation,description_of_job,experience_relevant_to_job,growth_plan
0,2023-08-13,James Padilla,Microsoft,An American multinational technology corporati...,Use world sure long fine dinner.,"London, for its historical landmarks and diver...","Pizza, because it's versatile, delicious, and ...",Project Manager,Oversees projects from start to finish to ensu...,Game image scientist girl them receive.,Claim score out local meeting all.
1,2020-06-09,Eric Rogers,Facebook,An American online social media and social net...,Plant especially information clear.,"Paris, for its beautiful architecture for its ...","Chocolate, because it's comforting, decadent, ...",Product Manager,Oversees the development of products from conc...,Decide despite then environment.,Enough order degree appear trial design ten no...
2,2023-12-17,Christopher King,Google,A multinational technology company that specia...,Thus bad front decade.,"Tokyo, for its unique blend of traditional and...","Sushi, for its fresh flavors and artful presen...",Graphic Designer,Creates visual concepts to communicate ideas t...,Very development lead point.,Finally glass officer group fill house hard ca...
3,2023-02-22,Tina Escobar,Microsoft,An American multinational technology corporati...,Their small walk want blood worry wish.,"New York, because of its vibrant city life and...","Pasta, for its comfort, versatility in dishes,...",Data Analyst,Interprets data and turns it into information ...,Raise physical show sing.,Computer either stand responsibility serve eve...
4,2019-12-05,Joshua Lewis,Google,An American multinational technology corporati...,Give friend green eat agent finally fine.,"Paris, for its beautiful architecture for its ...","Pizza, because it's versatile, delicious, and ...",Software Engineer,Develops and maintains software applications.,Foot night morning anything.,Color wind look enjoy our task true thousand t...


### Improved Assistant - Better Prompts

Assistant with more specific prompt

##### V1 - Instructs not to use dataframe output and to answer completely (Candidate)

In [55]:
improved_prompt_assistant = create_assistant(temperature=0)

In [56]:
# Prompt Naive Assistant
improved_prompt_response = prompt_assistant(assistant=improved_prompt_assistant, prompt='Review the attached CSV and extract all information from all fields.  DO NOT use dataframe output to get your answer. Tell me Tina Escobars favorite city and why, including ALL details in the relevant field.', file_path='tse_takehome_dataset.csv')

In [57]:
# Print conversation output  DO NOT RERUN!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
print(improved_prompt_response[0])


Tool Call Input:
import csv

# Define the file path
file_path = '/mnt/data/file-5TbUSgVXMd373LgxoRfLDD'

# Read the CSV file and extract all information
data = []
with open(file_path, mode='r', encoding='utf-8') as file:
    csv_reader = csv.reader(file)
    headers = next(csv_reader)  # Read the header
    for row in csv_reader:
        data.append(row)

# Find the row for Tina Escobar and extract her favorite city and the reason
tina_favorite_city_info = None
for row in data:
    if "Tina Escobar" in row:
        tina_favorite_city_info = row
        break

headers, tina_favorite_city_info

Assistant Response: Tina Escobar's favorite city is New York. She loves it because of its vibrant city life and diversity. Additionally, New York is home to the largest metropolitan zoo in the US.


In [None]:
# Read the CSV file and extract all information
data = []
with open(file_path, mode='r', encoding='utf-8') as file:
    csv_reader = csv.reader(file)
    headers = next(csv_reader)  # Read the header
    for row in csv_reader:
        data.append(row)

# Find the row for Tina Escobar and extract her favorite city and the reason
tina_favorite_city_info = None
for row in data:
    if "Tina Escobar" in row:
        tina_favorite_city_info = row
        break

headers, tina_favorite_city_info

[{'date': '2023-08-13',
  'name': 'James Padilla',
  'company_name': 'Microsoft',
  'description_of_company': 'An American multinational technology corporation which produces computer software, consumer electronics, personal computers, and related services.',
  'favourite_memory': 'Use world sure long fine dinner.',
  'favourite_city_and_why': 'London, for its historical landmarks and diverse cultural scene. for its historical landmarks and diverse cultural scene. Additionally, London has hosted the Summer Olympics three times: in 1908, 1948, and 2012.',
  'favourite_food_and_why': "Pizza, because it's versatile, delicious, and brings people together.",
  'occupation': 'Project Manager',
  'description_of_job': 'Oversees projects from start to finish to ensure they meet company goals.',
  'experience_relevant_to_job': 'Game image scientist girl them receive.',
  'growth_plan': 'Claim score out local meeting all.'},
 {'date': '2020-06-09',
  'name': 'Eric Rogers',
  'company_name': 'Fac

##### V2 - Explicitly directs to extract CSV to Dict

In [None]:
improved_prompt_assistant_v2 = create_assistant()

In [None]:
# Prompt Naive Assistant
improved_prompt_response_v2 = prompt_assistant(assistant=improved_prompt_assistant_v2, prompt='Extract all information in the attached CSV into a dict. Tell me Tina Escobars favorite city and why, including ALL details in the relevant field you created.  ', file_path='tse_takehome_dataset.csv')

In [None]:
# Print conversation output  DO NOT RERUN!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
print(improved_prompt_response_v2[0])


Assistant Response: First, I will read the contents of the uploaded CSV file to examine its structure and contents. Then, I'll extract the information into a dictionary.

Tool Call Input:
import pandas as pd

# Load the CSV file into a DataFrame
file_path = '/mnt/data/file-HUpmKBj6PyZdSjKxuV7aUY'
df = pd.read_csv(file_path)

# Display the first few rows of the DataFrame to understand its structure
df.head()

Assistant Response: The CSV file contains several columns related to individuals and their preferences, including favorite cities and their reasons. 

Now, I will extract all of this information into a dictionary. Then, I will look specifically for the details regarding Tina Escobar's favorite city. Let's proceed with this extraction and analysis.

Tool Call Input:
# Convert the DataFrame to a list of dictionaries
data_dict = df.to_dict(orient='records')

# Find the entry for Tina Escobar to get information about her favorite city
tina_escobar_info = next((entry for entry in data_

In [None]:
# Reproduce call to pandas from code interpreter
data_dict = df.to_dict(orient='records')

tina_escobar_info = next((entry for entry in data_dict if entry['name'] == 'Tina Escobar'), None)

tina_escobar_info

{'date': '2023-02-22',
 'name': 'Tina Escobar',
 'company_name': 'Microsoft',
 'description_of_company': 'An American multinational technology corporation which produces computer software, consumer electronics, personal computers, and related services.',
 'favourite_memory': 'Their small walk want blood worry wish.',
 'favourite_city_and_why': "New York, because of its vibrant city life and diversity. because of its vibrant city life and diversity. Additionally, It's home to the largest metropolitan zoo in the US.",
 'favourite_food_and_why': 'Pasta, for its comfort, versatility in dishes, and rich flavors.',
 'occupation': 'Data Analyst',
 'description_of_job': 'Interprets data and turns it into information which can offer ways to improve a business.',
 'experience_relevant_to_job': 'Raise physical show sing.',
 'growth_plan': 'Computer either stand responsibility serve ever performance financial ahead cold.'}

##### V3 - Emphasizes completeness.

In [None]:
improved_prompt_assistant_v3 = create_assistant()

In [None]:
# Prompt Assistant
improved_prompt_response_v3 = prompt_assistant(assistant=improved_prompt_assistant_v3, prompt='Tell me Tina Escobars favorite city and why, including ALL details in the relevant field.  You MUST emphasize completeness over being concise. ', file_path='tse_takehome_dataset.csv')

In [None]:
# Print conversation output  DO NOT RERUN!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
print(improved_prompt_response_v3[0])


Assistant Response: To assist you with this request, I'll first examine the contents of the uploaded file to locate the relevant information about Tina Escobar's favorite city and the reasons behind her preference. Let's open and review the file.

Tool Call Input:
# Let's start by checking the contents of the uploaded file to understand its format and locate the required information.
file_path = '/mnt/data/file-5mPJhLTuNVFjiTaTsyA7Z9'
with open(file_path, 'r', encoding='utf-8') as file:
    content = file.read()

# Display the first few hundred characters to get an overview of the contents
content[:500]

Assistant Response: The file appears to be in a CSV format, containing fields such as `date`, `name`, `company_name`, `description_of_company`, `favourite_memory`, `favourite_city_and_why`, and so on. Tina Escobar's favorite city and the reason for her preference should be in the `favourite_city_and_why` field. Let's parse the file to extract this specific information for Tina Escobar

In [None]:
# Reproduce call to pandas from code interpreter
import csv

# Define the target individual
target_name = "Tina Escobar"
favorite_city_info = None

# Read through the CSV file
with open('tse_takehome_dataset.csv', mode='r', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        if row['name'] == target_name:
            # Capture the 'favourite_city_and_why' details for Tina Escobar
            favorite_city_info = row.get('favourite_city_and_why', None)
            break

favorite_city_info


"New York, because of its vibrant city life and diversity. because of its vibrant city life and diversity. Additionally, It's home to the largest metropolitan zoo in the US."

### Improved Assistant - Better Instructions

Assistant with more specific instructions and basic prompt

##### V1 - Explicit Rules, Lowered Temperature  (Candidate)

In [26]:
improved_instruct_assistant = create_assistant(instructions='You are an assistant that MUST provide complete and untruncated answers. You MUST strictly follow these non-negotiable rules:\n- You MUST prioritize completeness in your response over being concise.\n- When reviewing information in a CSV, you can ONLY base your answer on the full and untruncated content of each field.\n- When processing CSV content, you MUST convert CSV content to a dictionary and ONLY base your answer on the dictionary output.\n- You MUST strictly follow all of these rules.\n-You will NEVER use data frame outputs to answer your question in Code Interpreter.\n\nIf you cannot fully comply with these rules, you MUST explicitly state why.', temperature=0.0)

In [33]:
# Prompt Assistant
improved_instruct_response = prompt_assistant(assistant=improved_instruct_assistant, prompt='Tell me what tina escobars favorite city is and why', file_path='tse_takehome_dataset.csv')

In [34]:
# Print conversation output  DO NOT RERUN!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
print(improved_instruct_response[0])


Tool Call Input:
import csv

# Read the CSV file and convert it to a list of dictionaries
file_path = '/mnt/data/file-FucjHDGHKmLr2QojdbaNVY'
data = []

with open(file_path, mode='r', encoding='utf-8') as file:
    csv_reader = csv.DictReader(file)
    for row in csv_reader:
        data.append(row)

# Display the data to understand its structure
data

Assistant Response: Tina Escobar's favorite city is New York. She appreciates it for its vibrant city life and diversity. Additionally, she notes that New York is home to the largest metropolitan zoo in the United States.


In [36]:
import csv

# Read the CSV file and convert it to a list of dictionaries
with open('tse_takehome_dataset.csv', mode='r', encoding='utf-8') as file:
    csv_reader = csv.DictReader(file)
    data = [row for row in csv_reader]

# Display the data to understand its structure
data

[{'date': '2023-08-13',
  'name': 'James Padilla',
  'company_name': 'Microsoft',
  'description_of_company': 'An American multinational technology corporation which produces computer software, consumer electronics, personal computers, and related services.',
  'favourite_memory': 'Use world sure long fine dinner.',
  'favourite_city_and_why': 'London, for its historical landmarks and diverse cultural scene. for its historical landmarks and diverse cultural scene. Additionally, London has hosted the Summer Olympics three times: in 1908, 1948, and 2012.',
  'favourite_food_and_why': "Pizza, because it's versatile, delicious, and brings people together.",
  'occupation': 'Project Manager',
  'description_of_job': 'Oversees projects from start to finish to ensure they meet company goals.',
  'experience_relevant_to_job': 'Game image scientist girl them receive.',
  'growth_plan': 'Claim score out local meeting all.'},
 {'date': '2020-06-09',
  'name': 'Eric Rogers',
  'company_name': 'Fac

##### V2 - Instructions direct assistant to extract to dict

In [38]:
improved_instruct_assistant_v2 = create_assistant(instructions='You are tasked with providing complete and untruncated answers to questions.  If you are passed a CSV, you MUST convert this to a dict and ONLY use the dict to answer your questions.')

In [21]:
# Prompt Assistant
improved_instruct_response_v2 = prompt_assistant(assistant=improved_instruct_assistant_v2, prompt='Tell me Tina Escobars favorite city and why', file_path='tse_takehome_dataset.csv')

In [23]:
# Print conversation output  DO NOT RERUN!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
print(improved_instruct_response_v2[0])


Assistant Response: I'll review the contents of the uploaded file to try to find information about Tina Escobar's favorite city and the reason why. Let's take a look at the file first.

Tool Call Input:
# Let's open and inspect the contents of the uploaded file to find information about Tina Escobar's favorite city.
file_path = '/mnt/data/file-1mpAgJZXxB8ERL64g3Ubv6'

# Determine the type of file first to see how to approach reading it
import magic

# Use the magic library to determine the file type
file_type = magic.from_file(file_path, mime=True)
file_type

Assistant Response: It appears that the `magic` library is not available in this environment. Without knowing the file type, I'll attempt to read the file as a text file first. If that doesn't work, I'll explore other methods. Let's try reading the file.

Tool Call Input:
# Attempt to open and read the contents of the file as a text file
try:
    with open(file_path, 'r') as file:
        contents = file.read()
except Exception a

In [18]:
improved_instruct_response_v2[2][0].content[0].text.value

"Tina Escobar's favorite city is New York. She loves it because of its vibrant city life and diversity. Additionally, she appreciates that it's home to the largest metropolitan zoo in the United States."

In [None]:
# Parse the CSV content to extract information about Tina Escobar
tina_info = None

with open('tse_takehome_dataset.csv', newline='', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    # Search for Tina Escobar's information
    for row in reader:
        if row['name'].strip() == 'Tina Escobar':
            tina_info = row
            break

tina_info['favourite_city_and_why'] if tina_info else "Tina Escobar's information not found."


"New York, because of its vibrant city life and diversity. because of its vibrant city life and diversity. Additionally, It's home to the largest metropolitan zoo in the US."

##### V3 - Provide steps to get each answer

In [45]:
improved_instruct_assistant_v3 = create_assistant(instructions='You are an assistant that provides full and untruncated responses to questions.  The user will provide a CSV and you will take the following steps to arrive at your answer:\nStep 1 - Convert the CSV to a dictionary.\nStep 2 - Review the FULL contents of the dictionary to understand the context required to answer the question.\nStep 3 - Using ONLY dictionary outputs, answer the user question COMPLETELY and without leaving out ANY details.\n\nFollowing these steps EXACTLY is STRICTLY REQUIRED and NON-NEGOTIABLE.  You MUST NOT use dataframe output in your response.', temperature=0)

In [46]:
# Prompt Assistant
improved_instruct_response_v3 = prompt_assistant(assistant=improved_instruct_assistant_v3, prompt='Tell me Tina Escobars favorite city and why', file_path='tse_takehome_dataset.csv')

In [47]:
# Print conversation output  DO NOT RERUN!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
print(improved_instruct_response_v3[0])


Tool Call Input:
import csv

# Define the file path
file_path = '/mnt/data/file-2oDZwYVbNzYpNBXuhLaSaQ'

# Read the CSV file and convert it to a dictionary
with open(file_path, mode='r') as file:
    csv_reader = csv.DictReader(file)
    data = [row for row in csv_reader]

# Display the data to understand its structure
data

Assistant Response: Tina Escobar's favorite city is New York. She loves it because of its vibrant city life and diversity. Additionally, New York is home to the largest metropolitan zoo in the United States.


In [None]:
# Parse the CSV content to extract information about Tina Escobar
tina_info = None

with open('tse_takehome_dataset.csv', newline='', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    # Search for Tina Escobar's information
    for row in reader:
        if row['name'].strip() == 'Tina Escobar':
            tina_info = row
            break

tina_info['favourite_city_and_why'] if tina_info else "Tina Escobar's information not found."


"New York, because of its vibrant city life and diversity. because of its vibrant city life and diversity. Additionally, It's home to the largest metropolitan zoo in the US."

### Improved Assistant - Hybrid

In [None]:
improved_hybrid_assistant = create_assistant(instructions='You are tasked with providing complete and untruncated answers to questions.  You will prioritize completeness over being concise. You are required to convert CSVs into dicts and ONLY use the dict to answer your question.')

In [None]:
# Prompt Assistant
improved_hybrid_response = prompt_assistant(assistant=improved_hybrid_assistant, prompt='Tell me Tina Escobars favorite city and why, including ALL details in the relevant field.', file_path='tse_takehome_dataset.csv')

In [None]:
# Print conversation output  DO NOT RERUN!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
print(improved_hybrid_response[0])


Assistant Response: Let's first open and examine the contents of the uploaded file to locate the information about Tina Escobar's favorite city and the reasons behind it.

Tool Call Input:
# Let's open the uploaded file and check its contents to find information about Tina Escobar's favorite city.
file_path = '/mnt/data/file-NmhvG9QhEkVKhG9YCeJoXR'

# First, let's read the uploaded file. We need to determine the format (e.g., text, json, csv) to extract the data properly.
with open(file_path, 'rb') as file:
    file_content = file.read()

# For inspection, let's try to decode the content as text initially, as it's a common format for small files.
try:
    content = file_content.decode('utf-8')
except UnicodeDecodeError:
    # If decoding as utf-8 fails, it might be a binary file or a different encoding.
    content = None

content[:500] if content else "Binary or binary-like file"

Assistant Response: The uploaded file appears to be a CSV file with column headers. The relevant field f

In [None]:
# Extracting the row where Tina Escobar is mentioned
tina_escobar_info = df[df['name'] == 'Tina Escobar']

# Retrieving her favorite city and the reason
tina_favorite_city = tina_escobar_info[['favourite_city_and_why']].values[0][0]
tina_favorite_city

"New York, because of its vibrant city life and diversity. because of its vibrant city life and diversity. Additionally, It's home to the largest metropolitan zoo in the US."

### Advanced Assistant - Function Calling (Candidate)

In [None]:
improved_advanced_assistant = create_assistant(instructions='You are an assistant tasked with providing complete and untruncated answers to questions.  You must prioritize completeness over being concise.  ALL uploaded CSVs need to be converted by the function \'process_csv\' and will be returned as a JSON formatted string.  ALWAYS parse the JSON string before leveraging data to ask a question.', enable_function=True)

In [5]:
# Prompt Assistant
improved_advanced_response = prompt_assistant(assistant=improved_advanced_assistant, prompt='List everyone\'s favorite city and why', file_path='tse_takehome_dataset.csv', debug=True)

Run Status: queued
Run Status: requires_action
Processing CSV file: file-WKsPxrQbgCiBZdStyiaPyo
Function output submitted for tool tool call id: call_mau4ps4eVOcDTOet5Rim7F2z
Run Status: in_progress
Run Status: in_progress
Run Status: in_progress
Run Status: in_progress
Run Status: in_progress
Run Status: in_progress
Run Status: in_progress
Run Status: in_progress
Run Status: in_progress
Run Status: in_progress
Run Status: in_progress
Run Status: in_progress
Run Status: completed


In [None]:
# Print conversation output: Trial 2
print(improved_advanced_response[0])


Function Call: process_csv with arguments {"file_id":"file-WKsPxrQbgCiBZdStyiaPyo"}

Assistant Response: Here's a list of everyone’s favorite city along with the reasons why:

1. **James Padilla**: 
   - **Favorite City**: London
   - **Why**: For its historical landmarks and diverse cultural scene. Additionally, London has hosted the Summer Olympics three times: in 1908, 1948, and 2012.

2. **Eric Rogers**:
   - **Favorite City**: Paris
   - **Why**: For its beautiful architecture. Additionally, The Eiffel Tower was intended to be a temporary installation for the 1889 World Fair.

3. **Christopher King**:
   - **Favorite City**: Tokyo
   - **Why**: For its unique blend of traditional and modern culture. It's considered one of the world's most important and powerful global cities.

4. **Tina Escobar**:
   - **Favorite City**: New York
   - **Why**: Because of its vibrant city life and diversity. Additionally, it's home to the largest metropolitan zoo in the US.

5. **Joshua Lewis**:
 

In [None]:
# Print conversation output: Trial 1
print(improved_advanced_response[0])


Function Call: process_csv with arguments {"file_id":"file-WKsPxrQbgCiBZdStyiaPyo"}

Assistant Response: Here's a list of everyone’s favorite city along with the reasons why:

1. **James Padilla**: 
   - **Favorite City**: London
   - **Why**: For its historical landmarks and diverse cultural scene. Additionally, London has hosted the Summer Olympics three times: in 1908, 1948, and 2012.

2. **Eric Rogers**:
   - **Favorite City**: Paris
   - **Why**: For its beautiful architecture. Additionally, The Eiffel Tower was intended to be a temporary installation for the 1889 World Fair.

3. **Christopher King**:
   - **Favorite City**: Tokyo
   - **Why**: For its unique blend of traditional and modern culture. It's considered one of the world's most important and powerful global cities.

4. **Tina Escobar**:
   - **Favorite City**: New York
   - **Why**: Because of its vibrant city life and diversity. Additionally, it's home to the largest metropolitan zoo in the US.

5. **Joshua Lewis**:
 