Building an AI Assistant for Data Science with Multimodal Capabilities

In [1]:
# Run this to install the latest version of the OpenAI package
!pip install openai==1.33.0

Defaulting to user installation because normal site-packages is not writeable
Collecting openai==1.33.0
  Downloading openai-1.33.0-py3-none-any.whl.metadata (21 kB)
Downloading openai-1.33.0-py3-none-any.whl (325 kB)
Installing collected packages: openai
[0mSuccessfully installed openai-1.33.0

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


In [2]:
# Import the os package
import os

# Import the openai package
import openai

# Import the pandas package with an alias
import pandas as pd

In [3]:
# Define an OpenAI client. Assign to client.
client = openai.OpenAI()

## Task 1: Upload the Arxiv Papers

In [4]:
#Uploading the papers
papers = pd.DataFrame({
    "filename": [
        "2405.10313v1.pdf",
        "2401.03428v1.pdf",
        "2401.09395v2.pdf",
        "2401.13142v3.pdf",
        "2403.02164v2.pdf",
        "2403.12107v1.pdf",
        "2404.10731v1.pdf",
        "2312.11562v5.pdf",
        "2311.02462v2.pdf",
        "2310.15274v1.pdf"
    ],
    "title": [
        "How Far Are We From AGI?",
        "EXPLORING LARGE LANGUAGE MODEL BASED INTELLIGENT AGENTS: DEFINITIONS, METHODS, AND PROSPECTS",
        "CAUGHT IN THE QUICKSAND OF REASONING, FAR FROM AGI SUMMIT: Evaluating LLMs’ Mathematical and Coding Competency through Ontology-guided Interventions",
        "Unsocial Intelligence: an Investigation of the Assumptions of AGI Discourse",
        "Cognition is All You Need The Next Layer of AI Above Large Language Models",
        "Scenarios for the Transition to AGI",
        "What is Meant by AGI? On the Definition of Artificial General Intelligence",
        "A Survey of Reasoning with Foundation Models",
        "Levels of AGI: Operationalizing Progress on the Path to AGI",
        "Systematic AI Approach for AGI: Addressing Alignment, Energy, and AGI Grand Challenges"
    ]
})
papers["filename"] = "papers/" + papers["filename"]
papers

Unnamed: 0,filename,title
0,papers/2405.10313v1.pdf,How Far Are We From AGI?
1,papers/2401.03428v1.pdf,EXPLORING LARGE LANGUAGE MODEL BASED INTELLIGE...
2,papers/2401.09395v2.pdf,"CAUGHT IN THE QUICKSAND OF REASONING, FAR FROM..."
3,papers/2401.13142v3.pdf,Unsocial Intelligence: an Investigation of the...
4,papers/2403.02164v2.pdf,Cognition is All You Need The Next Layer of AI...
5,papers/2403.12107v1.pdf,Scenarios for the Transition to AGI
6,papers/2404.10731v1.pdf,What is Meant by AGI? On the Definition of Art...
7,papers/2312.11562v5.pdf,A Survey of Reasoning with Foundation Models
8,papers/2311.02462v2.pdf,Levels of AGI: Operationalizing Progress on th...
9,papers/2310.15274v1.pdf,Systematic AI Approach for AGI: Addressing Ali...


In [5]:
# Run this
def upload_file_for_assistant(file_path): 
    uploaded_file = client.files.create(
        file=open(file_path, "rb"),
        purpose='assistants'
    )
    return uploaded_file.id

In [6]:
 # Assigning to uploaded_file_ids.
uploaded_file_ids = papers['filename'] \
    .apply(upload_file_for_assistant) \
    .to_list()

# See the result
uploaded_file_ids

['file-ReLGbUAEWYUv9WbcJoPfbh',
 'file-BrK1q3rHRfCtS56H4heYqw',
 'file-96p93rZqUexeNo92qGkCm2',
 'file-2BmefRPou5kSBfYcHQ3Rkm',
 'file-XAERfUYrvuJFrAGZcpr6C4',
 'file-W9mKi8a8PsbdZbcMBfQ8K4',
 'file-VbjkSCV33EwUxQM1zYu6ka',
 'file-DcajDM2tVAYdPHknJF4Prp',
 'file-SNaaEjUaVgCALLoF24AsRp',
 'file-2K96i4uTgQ4MxpnfX7kBer']

### Check that this worked

View the files in your account at https://platform.openai.com/storage/files

## Task 2: Add the Files to a Vector Store

In [7]:
# Create a vector store, associating the uploaded file IDs and naming it.
vstore = client.beta.vector_stores.create(
    file_ids = uploaded_file_ids,
    name = "agi_papers"
)

# See the results
vstore

VectorStore(id='vs_682520dc81b48191970aee913002fe7b', created_at=1747263708, file_counts=FileCounts(cancelled=0, completed=0, failed=0, in_progress=10, total=10), last_active_at=1747263708, metadata={}, name='agi_papers', object='vector_store', status='in_progress', usage_bytes=0, expires_after=None, expires_at=None)

### Check that this worked

View the vector stores in your account at https://platform.openai.com/storage/vector_stores

## Task 3: Create the Assistant

In [8]:
# Creating the Prompt
assistant_prompt = """
You are Aggie, a knowledgeable and articulate AI assistant specializing in artificial general intelligence (AGI). Your primary role is to read and explain the contents of academic journal articles, particularly those available on arXiv in PDF form. Your target audience comprises data scientists who are familiar with AI concepts but may not be experts in AGI.

When explaining the contents of the papers, follow these guidelines:

Introduction: Start with a brief overview of the paper's title, authors, and the main objective or research question addressed.

Abstract Summary: Provide a concise summary of the abstract, highlighting the key points and findings.

Key Sections and Findings: Break down the paper into its main sections (e.g., Introduction, Methods, Results, Discussion). For each section, provide a summary that includes:

The main points and arguments presented.
Any important methods or techniques used.
Key results and findings.
The significance and implications of these findings.
Conclusion: Summarize the conclusions drawn by the authors, including any limitations they mention and future research directions suggested.

Critical Analysis: Offer a critical analysis of the paper, discussing its strengths and weaknesses. Highlight any innovative approaches or significant contributions to the field of AGI.

Contextual Understanding: Place the paper in the context of the broader field of AGI research. Mention how it relates to other work in the area and its potential impact on future research and applications.

Practical Takeaways: Provide practical takeaways or insights that data scientists can apply in their work. This could include novel methodologies, interesting datasets, or potential areas for collaboration or further study.

Q&A Readiness: Be prepared to answer any follow-up questions that data scientists might have about the paper, providing clear and concise explanations.

Ensure that your explanations are clear, concise, and accessible, avoiding unnecessary jargon. Your goal is to make complex AGI research comprehensible and relevant to data scientists, facilitating their understanding and engagement with the latest advancements in the field.
"""

In [9]:
# Assuming vstore is an instance of a class that has an 'id' attribute, 
# we need to define vstore before using it in the assistant creation.

# Create an instance of VectorStore
#vstore = client.beta.vector_stores.create(name="MyVectorStore")

# Define the assistant. Assign to aggie.
aggie = client.beta.assistants.create(
    name="Aggie",
    instructions=assistant_prompt,
    model="gpt-4o",  
    tools=[{"type": "file_search"}],
    tool_resources={"file_search": {"vector_store_ids": [vstore.id]}}
)

# See the result
aggie

Assistant(id='asst_9n4UlaVexhipzSdlntUy2T55', created_at=1747263721, description=None, instructions="\nYou are Aggie, a knowledgeable and articulate AI assistant specializing in artificial general intelligence (AGI). Your primary role is to read and explain the contents of academic journal articles, particularly those available on arXiv in PDF form. Your target audience comprises data scientists who are familiar with AI concepts but may not be experts in AGI.\n\nWhen explaining the contents of the papers, follow these guidelines:\n\nIntroduction: Start with a brief overview of the paper's title, authors, and the main objective or research question addressed.\n\nAbstract Summary: Provide a concise summary of the abstract, highlighting the key points and findings.\n\nKey Sections and Findings: Break down the paper into its main sections (e.g., Introduction, Methods, Results, Discussion). For each section, provide a summary that includes:\n\nThe main points and arguments presented.\nAny i

### Check that this worked

View the assistants in your account at https://platform.openai.com/playground/assistants

## Task 4: Create a Conversation Thread

In [10]:
# Create a thread object. Assign to conversation.
conversation = client.beta.threads.create()

# See the result
conversation

Thread(id='thread_0QuqxXxHpnLGXcFtsrFXgFuw', created_at=1747263729, metadata={}, object='thread', tool_resources=ToolResources(code_interpreter=None, file_search=None))

In [11]:
# Add a user message to the conversation. Assign to msg_what_is_agi.
msg_what_is_agi = client.beta.threads.messages.create(
    thread_id=conversation.id,
    role="user",
    content="What are the most common definitions of AGI?"
)

# See the result
msg_what_is_agi

Message(id='msg_PmGbbnKjqXpWZYxddRRi80PU', assistant_id=None, attachments=[], completed_at=None, content=[TextContentBlock(text=Text(annotations=[], value='What are the most common definitions of AGI?'), type='text')], created_at=1747263730, incomplete_at=None, incomplete_details=None, metadata={}, object='thread.message', role='user', run_id=None, status=None, thread_id='thread_0QuqxXxHpnLGXcFtsrFXgFuw')

## Task 5: Run the assistant

In [12]:
# Run this
from typing_extensions import override
from openai import AssistantEventHandler
 
# First, we create a EventHandler class to define
# how we want to handle the events in the response stream.
 
class EventHandler(AssistantEventHandler):    
  @override
  def on_text_created(self, text) -> None:
    print(f"\nassistant > ", end="", flush=True)
      
  @override
  def on_text_delta(self, delta, snapshot):
    print(delta.value, end="", flush=True)
      
  def on_tool_call_created(self, tool_call):
    print(f"\nassistant > {tool_call.type}\n", flush=True)
  
  def on_tool_call_delta(self, delta, snapshot):
    if delta.type == 'code_interpreter':
      if delta.code_interpreter.input:
        print(delta.code_interpreter.input, end="", flush=True)
      if delta.code_interpreter.outputs:
        print(f"\n\noutput >", flush=True)
        for output in delta.code_interpreter.outputs:
          if output.type == "logs":
            print(f"\n{output.logs}", flush=True)


In [13]:
# Run this
def run_aggie():
    with client.beta.threads.runs.stream(
        thread_id=conversation.id,
        assistant_id=aggie.id,
        event_handler=EventHandler(),
    ) as stream:
        stream.until_done()

In [14]:
# Run the assistant
run_aggie()


assistant > file_search


assistant > The definitions of Artificial General Intelligence (AGI) vary widely, reflecting different focuses and assumptions in AI research. Here are some common definitions of AGI:

1. **Human-Level Cognitive Tasks**: AGI is often defined as a machine capable of performing the cognitive tasks that humans can typically do, without necessarily requiring a physical embodiment【4:1†source】.

2. **General Learning Abilities**: Another common definition of AGI is an AI that is not specialized for specific tasks but can learn to perform a wide range of tasks, similar to humans【4:2†source】.

3. **Economically Valuable Work**: OpenAI defines AGI as highly autonomous systems that outperform humans in most economically valuable work. This definition emphasizes performance and economic impact rather than the processes behind intelligence【4:2†source】.

4. **Flexibility and Generality**: Some definitions highlight the need for AGI to be flexible and general, capable of a

In [15]:
# Create another user message, adding it to the conversation. Assign to msg_how_close_is_agi.
msg_how_close_is_agi = client.beta.threads.messages.create(
    thread_id=conversation.id,
    role="user",
    content="How Far Are We From AGI?"
)

# See the result
msg_how_close_is_agi

Message(id='msg_pTaTkAvYU8pixxMl8RqT2Sbs', assistant_id=None, attachments=[], completed_at=None, content=[TextContentBlock(text=Text(annotations=[], value='How Far Are We From AGI?'), type='text')], created_at=1747263774, incomplete_at=None, incomplete_details=None, metadata={}, object='thread.message', role='user', run_id=None, status=None, thread_id='thread_0QuqxXxHpnLGXcFtsrFXgFuw')

In [16]:
# Run the assistant
run_aggie()


assistant > file_search


assistant > The question of how far we are from achieving Artificial General Intelligence (AGI) does not have a definitive answer, as opinions within the AI research community vary significantly:

1. **Diverse Opinions from Experts**: A poll conducted at the ICLR 2024 "How Far Are We From AGI" workshop showcased a range of opinions among researchers. Approximately 37% predicted that it would take more than 20 years to achieve AGI, while others were more optimistic, suggesting it could be within a few decades【8:0†source】.

2. **Technical and Philosophical Challenges**: Several hurdles remain in achieving AGI, including technical, ethical, and philosophical issues. Current AI systems often operate as "black boxes" with limited explainability and transparency, which is a significant barrier for AGI development【8:19†source】. Additionally, there are unresolved challenges in emulating human-like reasoning, ensuring AI safety, and aligning AI with human values【8:13†