## OpenAI Case Study - Experiment Playground

### Imports & Config

In [2]:
from assistant_helpers import create_assistant, prompt_assistant
import pandas as pd
import csv

df = pd.read_csv('tse_takehome_dataset.csv')

### Naive Assistant

Assistant with no instruction set and weak prompt.

In [2]:
# Create Naive Assistant
naive_assistant = create_assistant()

In [8]:
# Prompt Naive Assistant
naive_response = prompt_assistant(naive_assistant, prompt='Tell me tina escobars favorite city and why', file_path='tse_takehome_dataset.csv')

In [9]:
# Print conversation output  DO NOT RERUN!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
print(naive_response[0])


Assistant Response: Let's first take a look at the contents of the uploaded file to find out details about Tina Escobar's favorite city. I'll start by checking the structure and contents of the file.

Tool Call Input:
# Let's first read the file to understand its contents and format.
file_path = '/mnt/data/file-WMiQ1qtEx6sidyriwNX3YG'

# Trying to determine the file type
import magic

file_type = magic.from_file(file_path, mime=True)
file_type

Assistant Response: It seems that the `magic` library isn't available in this environment, so I'll take a different approach to determine the format of the uploaded file. I'll attempt to open it as a text file first, since text formats like CSV and JSON are common. Let's see what it contains.

Tool Call Input:
# Attempting to read the first few lines of the file to determine its format.
try:
    with open(file_path, 'r') as file:
        # Read the first 512 characters to guess the format
        content_preview = file.read(512)
except Exceptio

In [None]:
# Reproduce call to pandas from code interpreter
df.head()

Unnamed: 0,date,name,company_name,description_of_company,favourite_memory,favourite_city_and_why,favourite_food_and_why,occupation,description_of_job,experience_relevant_to_job,growth_plan
0,2023-08-13,James Padilla,Microsoft,An American multinational technology corporati...,Use world sure long fine dinner.,"London, for its historical landmarks and diver...","Pizza, because it's versatile, delicious, and ...",Project Manager,Oversees projects from start to finish to ensu...,Game image scientist girl them receive.,Claim score out local meeting all.
1,2020-06-09,Eric Rogers,Facebook,An American online social media and social net...,Plant especially information clear.,"Paris, for its beautiful architecture for its ...","Chocolate, because it's comforting, decadent, ...",Product Manager,Oversees the development of products from conc...,Decide despite then environment.,Enough order degree appear trial design ten no...
2,2023-12-17,Christopher King,Google,A multinational technology company that specia...,Thus bad front decade.,"Tokyo, for its unique blend of traditional and...","Sushi, for its fresh flavors and artful presen...",Graphic Designer,Creates visual concepts to communicate ideas t...,Very development lead point.,Finally glass officer group fill house hard ca...
3,2023-02-22,Tina Escobar,Microsoft,An American multinational technology corporati...,Their small walk want blood worry wish.,"New York, because of its vibrant city life and...","Pasta, for its comfort, versatility in dishes,...",Data Analyst,Interprets data and turns it into information ...,Raise physical show sing.,Computer either stand responsibility serve eve...
4,2019-12-05,Joshua Lewis,Google,An American multinational technology corporati...,Give friend green eat agent finally fine.,"Paris, for its beautiful architecture for its ...","Pizza, because it's versatile, delicious, and ...",Software Engineer,Develops and maintains software applications.,Foot night morning anything.,Color wind look enjoy our task true thousand t...


### Improved Assistant - Better Prompts

Assistant with more specific prompt

##### V1 - Instructs not to use DF.head()

In [2]:
improved_prompt_assistant = create_assistant()

In [None]:
# Prompt Naive Assistant
improved_prompt_response = prompt_assistant(assistant=improved_prompt_assistant, prompt='Review the attached CSV and extract all information from all fields.  DO NOT use data.head() to get your answer. Tell me Tina Escobars favorite city and why, including ALL details in the relevant field.', file_path='tse_takehome_dataset.csv')

In [17]:
# Print conversation output  DO NOT RERUN!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
print(improved_prompt_response[0])


Tool Call Input:
import pandas as pd

# Load the CSV file into a DataFrame
file_path = '/mnt/data/file-TNGD9VCj1Bcn1Cu6NHVzra'
data = pd.read_csv(file_path)

# Display all the information from all fields
data_info = data.to_dict(orient='records')
data_info

Assistant Response: Tina Escobar's favorite city is New York. Here is why she prefers it:

- **City**: New York
- **Reasons**: Because of its vibrant city life and diversity.
- **Additional Detail**: It's home to the largest metropolitan zoo in the US.

This information includes all the details provided in the relevant field for Tina Escobar's favorite city.


In [20]:
# Reproduce call to pandas from code interpreter
data_info = df.to_dict(orient='records')

data_info

[{'date': '2023-08-13',
  'name': 'James Padilla',
  'company_name': 'Microsoft',
  'description_of_company': 'An American multinational technology corporation which produces computer software, consumer electronics, personal computers, and related services.',
  'favourite_memory': 'Use world sure long fine dinner.',
  'favourite_city_and_why': 'London, for its historical landmarks and diverse cultural scene. for its historical landmarks and diverse cultural scene. Additionally, London has hosted the Summer Olympics three times: in 1908, 1948, and 2012.',
  'favourite_food_and_why': "Pizza, because it's versatile, delicious, and brings people together.",
  'occupation': 'Project Manager',
  'description_of_job': 'Oversees projects from start to finish to ensure they meet company goals.',
  'experience_relevant_to_job': 'Game image scientist girl them receive.',
  'growth_plan': 'Claim score out local meeting all.'},
 {'date': '2020-06-09',
  'name': 'Eric Rogers',
  'company_name': 'Fac

##### V2 - Explicitly directs to extract CSV to Dict

In [None]:
improved_prompt_assistant_v2 = create_assistant()

In [None]:
# Prompt Naive Assistant
improved_prompt_response_v2 = prompt_assistant(assistant=improved_prompt_assistant_v2, prompt='Extract all information in the attached CSV into a dict. Tell me Tina Escobars favorite city and why, including ALL details in the relevant field you created.  ', file_path='tse_takehome_dataset.csv')

In [None]:
# Print conversation output  DO NOT RERUN!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
print(improved_prompt_response_v2[0])


Assistant Response: First, I will read the contents of the uploaded CSV file to examine its structure and contents. Then, I'll extract the information into a dictionary.

Tool Call Input:
import pandas as pd

# Load the CSV file into a DataFrame
file_path = '/mnt/data/file-HUpmKBj6PyZdSjKxuV7aUY'
df = pd.read_csv(file_path)

# Display the first few rows of the DataFrame to understand its structure
df.head()

Assistant Response: The CSV file contains several columns related to individuals and their preferences, including favorite cities and their reasons. 

Now, I will extract all of this information into a dictionary. Then, I will look specifically for the details regarding Tina Escobar's favorite city. Let's proceed with this extraction and analysis.

Tool Call Input:
# Convert the DataFrame to a list of dictionaries
data_dict = df.to_dict(orient='records')

# Find the entry for Tina Escobar to get information about her favorite city
tina_escobar_info = next((entry for entry in data_

In [None]:
# Reproduce call to pandas from code interpreter
data_dict = df.to_dict(orient='records')

tina_escobar_info = next((entry for entry in data_dict if entry['name'] == 'Tina Escobar'), None)

tina_escobar_info

{'date': '2023-02-22',
 'name': 'Tina Escobar',
 'company_name': 'Microsoft',
 'description_of_company': 'An American multinational technology corporation which produces computer software, consumer electronics, personal computers, and related services.',
 'favourite_memory': 'Their small walk want blood worry wish.',
 'favourite_city_and_why': "New York, because of its vibrant city life and diversity. because of its vibrant city life and diversity. Additionally, It's home to the largest metropolitan zoo in the US.",
 'favourite_food_and_why': 'Pasta, for its comfort, versatility in dishes, and rich flavors.',
 'occupation': 'Data Analyst',
 'description_of_job': 'Interprets data and turns it into information which can offer ways to improve a business.',
 'experience_relevant_to_job': 'Raise physical show sing.',
 'growth_plan': 'Computer either stand responsibility serve ever performance financial ahead cold.'}

##### V3 - Emphasizes completeness.

In [34]:
improved_prompt_assistant_v3 = create_assistant()

In [None]:
# Prompt Assistant
improved_prompt_response_v3 = prompt_assistant(assistant=improved_prompt_assistant_v3, prompt='Tell me Tina Escobars favorite city and why, including ALL details in the relevant field.  You MUST emphasize completeness over being concise. ', file_path='tse_takehome_dataset.csv')

In [46]:
# Print conversation output  DO NOT RERUN!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
print(improved_prompt_response_v3[0])


Assistant Response: To assist you with this request, I'll first examine the contents of the uploaded file to locate the relevant information about Tina Escobar's favorite city and the reasons behind her preference. Let's open and review the file.

Tool Call Input:
# Let's start by checking the contents of the uploaded file to understand its format and locate the required information.
file_path = '/mnt/data/file-5mPJhLTuNVFjiTaTsyA7Z9'
with open(file_path, 'r', encoding='utf-8') as file:
    content = file.read()

# Display the first few hundred characters to get an overview of the contents
content[:500]

Assistant Response: The file appears to be in a CSV format, containing fields such as `date`, `name`, `company_name`, `description_of_company`, `favourite_memory`, `favourite_city_and_why`, and so on. Tina Escobar's favorite city and the reason for her preference should be in the `favourite_city_and_why` field. Let's parse the file to extract this specific information for Tina Escobar

In [48]:
# Reproduce call to pandas from code interpreter
import csv

# Define the target individual
target_name = "Tina Escobar"
favorite_city_info = None

# Read through the CSV file
with open('tse_takehome_dataset.csv', mode='r', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        if row['name'] == target_name:
            # Capture the 'favourite_city_and_why' details for Tina Escobar
            favorite_city_info = row.get('favourite_city_and_why', None)
            break

favorite_city_info


"New York, because of its vibrant city life and diversity. because of its vibrant city life and diversity. Additionally, It's home to the largest metropolitan zoo in the US."

### Improved Assistant - Better Instructions

Assistant with more specific instructions and basic prompt

##### V1 - Instructions emphasize completeness

In [10]:
improved_instruct_assistant = create_assistant(instructions='You are tasked with providing complete and untruncated answers to questions.  When reviewing information in a CSV, ensure you have access to the full and untruncated information in each field.  You should prioritize completeness over being concise.')

In [11]:
# Prompt Assistant
improved_instruct_response = prompt_assistant(assistant=improved_instruct_assistant, prompt='Tell me Tina Escobars favorite city and why', file_path='tse_takehome_dataset.csv')

In [12]:
# Print conversation output  DO NOT RERUN!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
print(improved_instruct_response[0])


Tool Call Input:
import PyPDF2

# Open the uploaded PDF file
file_path = '/mnt/data/file-DpQGnN8w6fdzztWXB8NoxM'

# Read the PDF content
with open(file_path, "rb") as file:
    pdf_reader = PyPDF2.PdfReader(file)
    # Extract text from each page
    pdf_text = []
    for page in pdf_reader.pages:
        pdf_text.append(page.extract_text())

# Join all text into a single string
pdf_content = " ".join(pdf_text)
pdf_content[:1000]  # Display the first 1000 characters for inspection

Assistant Response: It seems that the file you uploaded is a malformed PDF, and I am unable to extract text using the standard PDF reading library. However, we can try a different approach using a more robust method to view some parts of the content. Let's attempt extracting text using an alternative method.

Tool Call Input:
import fitz  # PyMuPDF

# Open the PDF file with PyMuPDF
pdf_document = fitz.open(file_path)

# Extract text using PyMuPDF
pdf_text_mupdf = []
for page_num in range(len(pdf_document)):

In [None]:
# Extracting the row where Tina Escobar is mentioned
tina_escobar_info = df[df['name'] == 'Tina Escobar']

# Retrieving her favorite city and the reason
tina_favorite_city = tina_escobar_info[['favourite_city_and_why']].values[0][0]
tina_favorite_city

"New York, because of its vibrant city life and diversity. because of its vibrant city life and diversity. Additionally, It's home to the largest metropolitan zoo in the US."

##### V2 - Instructions direct assistant to extract to dict

In [15]:
improved_instruct_assistant_v2 = create_assistant(instructions='You are tasked with providing complete and untruncated answers to questions.  You are required to convert CSVs into dicts and ONLY use the dict to answer your question.')

In [16]:
# Prompt Assistant
improved_instruct_response_v2 = prompt_assistant(assistant=improved_instruct_assistant_v2, prompt='Tell me Tina Escobars favorite city and why', file_path='tse_takehome_dataset.csv')

In [17]:
# Print conversation output  DO NOT RERUN!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
print(improved_instruct_response_v2[0])


Assistant Response: To find out Tina Escobar's favorite city and the reasons why, I'll first need to examine the contents of the uploaded file. Let's read the file to extract the relevant information.

Tool Call Input:
# Let's open and read the contents of the uploaded file to find information about Tina Escobar's favorite city.
file_path = '/mnt/data/file-BA1uB5Zfdx8EmJKw3sCm1C'

with open(file_path, 'r', encoding='utf-8') as file:
    content = file.read()

content[:1000]  # Display the first 1000 characters for inspection.

Assistant Response: The file appears to be a CSV containing detailed information about individuals, including their favorite city and the reasons for their preference. To locate the specific information about Tina Escobar, I'll parse the CSV data and extract her details.

Tool Call Input:
import csv

# Parse the CSV content to extract information about Tina Escobar
tina_info = None

with open(file_path, newline='', encoding='utf-8') as csvfile:
    reader = csv.

In [3]:
# Parse the CSV content to extract information about Tina Escobar
tina_info = None

with open('tse_takehome_dataset.csv', newline='', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    # Search for Tina Escobar's information
    for row in reader:
        if row['name'].strip() == 'Tina Escobar':
            tina_info = row
            break

tina_info['favourite_city_and_why'] if tina_info else "Tina Escobar's information not found."


"New York, because of its vibrant city life and diversity. because of its vibrant city life and diversity. Additionally, It's home to the largest metropolitan zoo in the US."

### Improved Assistant - Hybrid

In [4]:
improved_hybrid_assistant = create_assistant(instructions='You are tasked with providing complete and untruncated answers to questions.  You will prioritize completeness over being concise. You are required to convert CSVs into dicts and ONLY use the dict to answer your question.')

In [5]:
# Prompt Assistant
improved_hybrid_response = prompt_assistant(assistant=improved_hybrid_assistant, prompt='Tell me Tina Escobars favorite city and why, including ALL details in the relevant field.', file_path='tse_takehome_dataset.csv')

In [6]:
# Print conversation output  DO NOT RERUN!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
print(improved_hybrid_response[0])


Assistant Response: Let's first open and examine the contents of the uploaded file to locate the information about Tina Escobar's favorite city and the reasons behind it.

Tool Call Input:
# Let's open the uploaded file and check its contents to find information about Tina Escobar's favorite city.
file_path = '/mnt/data/file-NmhvG9QhEkVKhG9YCeJoXR'

# First, let's read the uploaded file. We need to determine the format (e.g., text, json, csv) to extract the data properly.
with open(file_path, 'rb') as file:
    file_content = file.read()

# For inspection, let's try to decode the content as text initially, as it's a common format for small files.
try:
    content = file_content.decode('utf-8')
except UnicodeDecodeError:
    # If decoding as utf-8 fails, it might be a binary file or a different encoding.
    content = None

content[:500] if content else "Binary or binary-like file"

Assistant Response: The uploaded file appears to be a CSV file with column headers. The relevant field f

In [None]:
# Extracting the row where Tina Escobar is mentioned
tina_escobar_info = df[df['name'] == 'Tina Escobar']

# Retrieving her favorite city and the reason
tina_favorite_city = tina_escobar_info[['favourite_city_and_why']].values[0][0]
tina_favorite_city

"New York, because of its vibrant city life and diversity. because of its vibrant city life and diversity. Additionally, It's home to the largest metropolitan zoo in the US."

### Advanced Assistant - Experimental?