# Creating AI Assistant with GPT-4o

### Before you **begin**

- Make sure you have an **OpenAI developer account**.
- Your OpenAI developer account **has credit on it**.
- **Define an environment variable** named `OPENAI_API_KEY` containing the **API key**.

In [None]:
# Run this to make sure the API key is available
import os

'OPENAI_API_KEY' in os.environ

If the code above returns `False`, follow the steps in `openai-setup.ipynb`.

## 0. Setup

You may need to run

```%pip install openai==1.46.0```

If you're using a terminal or script, just drop the %

```pip install openai==1.46.0```

In [None]:
# Imports
import openai
import pandas as pd

In [None]:
# Defining an OpenAI client
client = openai.OpenAI()

## 1. Upload the Papers

So that GPT knows about the latest AGI research, we will provide it with some arxiv papers. There are 10 recent papers on AGI stored in the `papers` directory of this workbook.

_The papers were found by searching arxiv for "AGI", then eyballing recent papers for content on definitions of AGI or progress towards AGI._

The table below shows the filenames and the titles of the papers.

In [6]:
# Creating a DataFrame with paper titles and filenames
papers = pd.DataFrame({
    "filename": [
        "2405.10313v1.pdf",
        "2401.03428v1.pdf",
        "2401.09395v2.pdf",
        "2401.13142v3.pdf",
        "2403.02164v2.pdf",
        "2403.12107v1.pdf",
        "2404.10731v1.pdf",
        "2312.11562v5.pdf",
        "2311.02462v2.pdf",
        "2310.15274v1.pdf"
    ],
    "title": [
        "How Far Are We From AGI?",
        "EXPLORING LARGE LANGUAGE MODEL BASED INTELLIGENT AGENTS: DEFINITIONS, METHODS, AND PROSPECTS",
        "CAUGHT IN THE QUICKSAND OF REASONING, FAR FROM AGI SUMMIT: Evaluating LLMs’ Mathematical and Coding Competency through Ontology-guided Interventions",
        "Unsocial Intelligence: an Investigation of the Assumptions of AGI Discourse",
        "Cognition is All You Need The Next Layer of AI Above Large Language Models",
        "Scenarios for the Transition to AGI",
        "What is Meant by AGI? On the Definition of Artificial General Intelligence",
        "A Survey of Reasoning with Foundation Models",
        "Levels of AGI: Operationalizing Progress on the Path to AGI",
        "Systematic AI Approach for AGI: Addressing Alignment, Energy, and AGI Grand Challenges"
    ]
})
# Adding the path to the filename
papers["filename"] = "papers/" + papers["filename"]
# Displaying the DataFrame
papers

Unnamed: 0,filename,title
0,papers/2405.10313v1.pdf,How Far Are We From AGI?
1,papers/2401.03428v1.pdf,EXPLORING LARGE LANGUAGE MODEL BASED INTELLIGE...
2,papers/2401.09395v2.pdf,"CAUGHT IN THE QUICKSAND OF REASONING, FAR FROM..."
3,papers/2401.13142v3.pdf,Unsocial Intelligence: an Investigation of the...
4,papers/2403.02164v2.pdf,Cognition is All You Need The Next Layer of AI...
5,papers/2403.12107v1.pdf,Scenarios for the Transition to AGI
6,papers/2404.10731v1.pdf,What is Meant by AGI? On the Definition of Art...
7,papers/2312.11562v5.pdf,A Survey of Reasoning with Foundation Models
8,papers/2311.02462v2.pdf,Levels of AGI: Operationalizing Progress on th...
9,papers/2310.15274v1.pdf,Systematic AI Approach for AGI: Addressing Ali...


In [None]:
# A function to upload files to OpenAI
def upload_file_for_assistant(file_path):
    uploaded_file = client.files.create( # API call to upload the file
        file = open(file_path, "rb"), # Opening the file in binary mode (rb = read binary)
        purpose = 'assistants'	# Purpose of the file upload (e.g., for assistant use). Other options include 'fine-tune' and 'search'
    )
    
    return uploaded_file.id

In [None]:
# Upload each file in the 'filename' column and collect their IDs
uploaded_file_ids = papers["filename"].apply(upload_file_for_assistant).to_list()

# Display the uploaded file IDs
uploaded_file_ids

In [None]:
# Check the files using code
client.files.list()

## 2. Adding the files to a Vector Store

To access the documents and get sensible results, they need to be split up into small chunks and added to a vector database.

The assistants API lets you avoid worrying about the chunking stage, so you just need to specify the file IDs that you want to add to a vector database.

🚨 WARNING 🚨
<div style="border: 2px solid red; padding: 10px; border-radius: 5px; background-color: #ffe6e6; color: darkred;"> ⚠️ You will get <strong>charged daily</strong> for having a vector database. By default, it is automatically deleted after 7 days of inactivity, but I <strong>strongly recommend deleting it</strong> right after this code-along if you don't want to be charged for the full week. </div>

In [None]:
# Create a vector store to enable semantic search over the uploaded files.
vstore = client.beta.vector_stores.create(
    file_ids=uploaded_file_ids, # list of uploaded file IDs to include in the store
    name="arxiv_agi_papers" # custom name to help identify this vector store later
)

# Display the created vector store object
vstore

### Check that this worked

View the vector stores in your account at https://platform.openai.com/storage/vector_stores

or:

In [None]:
# Check the vector stores using code
client.beta.vector_stores.list()

## 3. Create the Assistant

The assistant needs a prompt describing how it should behave. This consists of a few paragraphs of text that give GPT information about what its role is, what it should be talking about, and how to phrase the responses.

### 💡 Pro Tip: Let ChatGPT Help You Prompt Itself

> 🧠 **Yes, you can use ChatGPT (or any LLM) to write prompts _for_ assistants — just like writing anything else.**  
> In fact, the prompt below was written by ChatGPT itself and only lightly edited by a human.

---

#### 📝 The Meta-Prompt I Used

Here’s the actual prompt I gave ChatGPT to generate the instruction for the assistant:

<div style="border-left: 4px solid #6c63ff; padding: 1em; background-color: #f4f4ff; margin: 1em 0; border-radius: 6px; font-family: monospace; font-size: 0.95em;">
I'm going to make a GPT assistant that explains the contents of journal articles about artificial general intelligence.  
The assistant, named <strong>'Aggie'</strong>, must be able to read arXiv papers in PDF form and explain the contents of those papers to an audience of data scientists.  
Please suggest a good instruction prompt for the AI assistant.
</div>

In [10]:
# 👇 This is the core prompt that defines how the assistant will behave.
assistant_prompt = """
You are Aggie, a knowledgeable and articulate AI assistant specializing in artificial general intelligence (AGI). Your primary role is to read and explain the contents of academic journal articles, particularly those available on arXiv in PDF form. Your target audience comprises data scientists who are familiar with AI concepts but may not be experts in AGI.

When explaining the contents of the papers, follow these guidelines:

Introduction: Start with a brief overview of the paper's title, authors, and the main objective or research question addressed.

Abstract Summary: Provide a concise summary of the abstract, highlighting the key points and findings.

Key Sections and Findings: Break down the paper into its main sections (e.g., Introduction, Methods, Results, Discussion). For each section, provide a summary that includes:

The main points and arguments presented.
Any important methods or techniques used.
Key results and findings.
The significance and implications of these findings.
Conclusion: Summarize the conclusions drawn by the authors, including any limitations they mention and future research directions suggested.

Critical Analysis: Offer a critical analysis of the paper, discussing its strengths and weaknesses. Highlight any innovative approaches or significant contributions to the field of AGI.

Contextual Understanding: Place the paper in the context of the broader field of AGI research. Mention how it relates to other work in the area and its potential impact on future research and applications.

Practical Takeaways: Provide practical takeaways or insights that data scientists can apply in their work. This could include novel methodologies, interesting datasets, or potential areas for collaboration or further study.

Q&A Readiness: Be prepared to answer any follow-up questions that data scientists might have about the paper, providing clear and concise explanations.

Ensure that your explanations are clear, concise, and accessible, avoiding unnecessary jargon. Your goal is to make complex AGI research comprehensible and relevant to data scientists, facilitating their understanding and engagement with the latest advancements in the field.
"""

Now the assistant can be created. You simply give it a name, the prompt, the model to use (in this case GPT-4o), and specify which tools and resources it is allowed to use.

### Instructions

- Define the assistant. Assign to `aggie`.
    - Call it "Aggie" (or another memorable name).
    - Give it the `assistant_prompt`.
    - Set the model to use, `gpt-4o`.
    - Give it access to the file search tool.
    - Give it access to the vector store tool resource.

In [None]:
# Create the assistant using the OpenAI API, specifying the model and the vector store for file search.
aggie = client.beta.assistants.create(
    name="Aggie",
    instructions=assistant_prompt,
    model="gpt-4o",
    tools=[{"type": "file_search"}],
    tool_resources={"file_search":{"vector_store_ids":[vstore.id]}}
)
    
# Display the created assistant object
aggie

### Check that this worked

View the assistants in your account at https://platform.openai.com/playground/assistants

or:

# Check the assistants using code
client.beta.assistants.list()

## 4. Create a Conversation Thread

In [None]:
# Create a thread object. Assign to conversation.
conversation = client.beta.threads.create()

# See the result
conversation

### Instructions

- Add a user message to the conversation. Assign to `msg_what_is_agi`.
    - Give it the thread id.
    - Make it a user message.
    - Ask "What are the most common definitions of AGI?".

In [None]:
# 💬 Add a user message to the conversation thread
msg_what_is_agi = client.beta.threads.messages.create(
    thread_id=conversation.id,
    role="user",
    content="What are the most common definitions of AGI?"
)

# Display the message object to see the response from the assistant.
msg_what_is_agi

## 5. Run the assistant

Running the assistant requires an event handler to make it print the responses. While it's fairly tricky code, you never need to change it. This code is taken verbatim from [the OpenAI assistants documentation](https://platform.openai.com/docs/assistants/overview).

### Instructions

- Run the code to define an event handler.

In [None]:
# Imports
# Run this
from typing_extensions import override
from openai import AssistantEventHandler

In [None]:
# Create a custom event handler class to handle events from the assistant.
class EventHandler(AssistantEventHandler):    
  @override
  def on_text_created(self, text) -> None:
    print(f"\nassistant > ", end="", flush=True)
  @override
  def on_text_delta(self, delta, snapshot):
    print(delta.value, end="", flush=True)
      
  def on_tool_call_created(self, tool_call):
    print(f"\nassistant > {tool_call.type}\n", flush=True)
  
  def on_tool_call_delta(self, delta, snapshot):
    if delta.type == 'code_interpreter':
      if delta.code_interpreter.input:
        print(delta.code_interpreter.input, end="", flush=True)
      if delta.code_interpreter.outputs:
        print(f"\n\noutput >", flush=True)
        for output in delta.code_interpreter.outputs:
          if output.type == "logs":
            print(f"\n{output.logs}", flush=True)

Finally, we are ready to run the assistant to get it to answer our question. The code is the same every time, so we can wrap it in a function.

Streaming responses mean that text is displayed a few words at a time, rather than waiting for the entirety of the text to be generated and printing all at once.

In [13]:
# Create a function to run the assistant in a streaming mode.
# The assistant will respond to user messages in real-time, providing a more interactive experience.
def run_aggie():
    with client.beta.threads.runs.stream(
        thread_id=conversation.id,
        assistant_id=aggie.id,
        event_handler=EventHandler(),
    ) as stream:
        stream.until_done()

In [None]:
# Run the assistant
run_aggie()

## 6. Add another msg and Run it Again

In [None]:
# Ask a question to the assistant
question = input() # The user can input a question to the assistant, which will be processed in real-time.
# Add the user question to the conversation thread
msg_question = client.beta.threads.messages.create(
    thread_id=conversation.id, # The ID of the conversation thread where the message will be added
    role="user", # The role of the message sender (in this case, the user)
    content=question # The content of the message (the user's question)
)

# Display the message object to see the response from the assistant.
msg_question

In [None]:
# Run the assistant again
run_aggie()