# Text Analysis with the OpenAI Assistants API

A tutorial on how to get GPT4 to analyse a PDF file with the new Assistants API.

OpenAI tells us "The Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries. The Assistants API currently supports three types of tools: Code Interpreter, Retrieval, and Function calling."

In this tutorial we will use the Code Interpreter tool in order to read CSV data; we will then ask the AI questions about the data and display the messages that are returned.

Some of the code is adapted from the [OpenAI Cookbook](https://github.com/openai/openai-cookbook/blob/main/examples/Assistants_API_overview_python.ipynb) (MIT licence).

In [4]:
import json

# This utility function uses the IPython function 'display' to pretty-print the JSON of the
# various entities created in the Assistants API. The code is copied directly from the OpenAI Cookbook.

def show_json(obj):
    display(json.loads(obj.model_dump_json()))

import time

# This utility function waits fro a rum to complete. The code is copied directly from the OpenAI Cookbook.

def wait_on_run(run, thread):
    while run.status == "queued" or run.status == "in_progress":
        run = client.beta.threads.runs.retrieve(
            thread_id=thread.id,
            run_id=run.id,
        )
        time.sleep(0.5)
    return run

# Pretty printing helper from the OpenAI Cookbook

def pretty_print(messages):
    print("# Messages")
    for m in messages:
        print(f"{m.role}: {m.content[0].text.value}")
    print()

### Assistants
You can create Assistants directly through the Assistants API. You must provide a name, a model and instruction as in the code below and you must, of course be signed in. This Notebook requires you to enter your OpenAI API key.

After creating the assistant we display the JSON version of it.

In [5]:
from openai import OpenAI

# If you have set an environment variable set with your API key then you do not need
# to pass it to OpenAI
# client = OpenAI()

# Otherwise you need to pass the key as a parameter, e.g.
# api_key = "your key here - don't put this in publicly available code!"
# or enter it interactively
api_key = input("API key")

client = OpenAI(api_key = api_key)

In [6]:
# Create an assistant

# Upload a file
file = client.files.create(
    file=open(
        "Rock_parrot.pdf",
        "rb",
    ),
    purpose="assistants",
)

assistant = client.beta.assistants.create(
    name = "Summariser",
    instructions="You are ornithologist. Use your knowledge base to best respond to queries.",
    model="gpt-4-1106-preview",
    tools=[{"type": "retrieval"},{"type": "code_interpreter"}],
    file_ids=[file.id],
)

### Create a thread

Initially, the thread is empty.


In [7]:
thread = client.beta.threads.create()

Then add the Message to the thread:


In [8]:
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="What is the habitat of the Rock Parrot",
)
show_json(message)

{'id': 'msg_pEYeshJdzxfsUOndW88DJ9rv',
 'assistant_id': None,
 'content': [{'text': {'annotations': [],
    'value': 'What is the habitat of the Rock Parrot'},
   'type': 'text'}],
 'created_at': 1706007582,
 'file_ids': [],
 'metadata': {},
 'object': 'thread.message',
 'role': 'user',
 'run_id': None,
 'thread_id': 'thread_ERApKc7Z9MfXYGRDk6y3wfNC'}

In [9]:
show_json(thread)

{'id': 'thread_ERApKc7Z9MfXYGRDk6y3wfNC',
 'created_at': 1706007581,
 'metadata': {},
 'object': 'thread'}

#### Runs


In [10]:
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id,
)
show_json(run)

{'id': 'run_W0CE9TMS64Irer61eaXGUVua',
 'assistant_id': 'asst_hBnHnRjiJJzbFmCbFgRlQdk8',
 'cancelled_at': None,
 'completed_at': None,
 'created_at': 1706007582,
 'expires_at': 1706008182,
 'failed_at': None,
 'file_ids': ['file-AbUmF6zHEBHHPemN516WuN5j'],
 'instructions': 'You are ornithologist. Use your knowledge base to best respond to queries.',
 'last_error': None,
 'metadata': {},
 'model': 'gpt-4-1106-preview',
 'object': 'thread.run',
 'required_action': None,
 'started_at': None,
 'status': 'queued',
 'thread_id': 'thread_ERApKc7Z9MfXYGRDk6y3wfNC',
 'tools': [{'type': 'retrieval'}, {'type': 'code_interpreter'}],
 'usage': None}

Wait for completion of the run

In [11]:
run = wait_on_run(run, thread)
show_json(run)

{'id': 'run_W0CE9TMS64Irer61eaXGUVua',
 'assistant_id': 'asst_hBnHnRjiJJzbFmCbFgRlQdk8',
 'cancelled_at': None,
 'completed_at': 1706007592,
 'created_at': 1706007582,
 'expires_at': None,
 'failed_at': None,
 'file_ids': ['file-AbUmF6zHEBHHPemN516WuN5j'],
 'instructions': 'You are ornithologist. Use your knowledge base to best respond to queries.',
 'last_error': None,
 'metadata': {},
 'model': 'gpt-4-1106-preview',
 'object': 'thread.run',
 'required_action': None,
 'started_at': 1706007582,
 'status': 'completed',
 'thread_id': 'thread_ERApKc7Z9MfXYGRDk6y3wfNC',
 'tools': [{'type': 'retrieval'}, {'type': 'code_interpreter'}],
 'usage': {'prompt_tokens': 6064,
  'completion_tokens': 137,
  'total_tokens': 6201}}

### Messages


In [12]:
messages = client.beta.threads.messages.list(thread_id=thread.id)
show_json(messages)

{'data': [{'id': 'msg_pBq2lZAh7Azb9ncQONuJ408d',
   'assistant_id': 'asst_hBnHnRjiJJzbFmCbFgRlQdk8',
   'content': [{'text': {'annotations': [{'end_index': 387,
        'file_citation': {'file_id': 'file-AbUmF6zHEBHHPemN516WuN5j',
         'quote': 'Rocky islands and coastal dune areas are the preferred habitats for this species which is found from Lake Alexandrina in southeastern South Australia westwards across coastal South and Western Australia to Shark Bay. Unlike other grass parrots it nests in burrows or rocky crevices mostly on offshore islands such as Rottnest Island. Seeds of grasses and succulent plants form the bulk of its diet'},
        'start_index': 377,
        'text': '【7†source】',
        'type': 'file_citation'}],
      'value': 'The Rock Parrot (Neophema petrophila) prefers habitats of rocky islands and coastal dune areas. It can be found from Lake Alexandrina in southeastern South Australia westwards across coastal South and Western Australia to Shark Bay. Notably

In [13]:
pretty_print(messages)

# Messages
assistant: The Rock Parrot (Neophema petrophila) prefers habitats of rocky islands and coastal dune areas. It can be found from Lake Alexandrina in southeastern South Australia westwards across coastal South and Western Australia to Shark Bay. Notably, this species differs from other grass parrots as it nests in burrows or rocky crevices, mostly on offshore islands like Rottnest Island【7†source】.
user: What is the habitat of the Rock Parrot



In [49]:
for m in messages:
    print(f"{m.role}: {m.content[0].text.annotations}")

assistant: [TextAnnotationFileCitation(end_index=387, file_citation=TextAnnotationFileCitationFileCitation(file_id='file-AbUmF6zHEBHHPemN516WuN5j', quote='Rocky islands and coastal dune areas are the preferred habitats for this species which is found from Lake Alexandrina in southeastern South Australia westwards across coastal South and Western Australia to Shark Bay. Unlike other grass parrots it nests in burrows or rocky crevices mostly on offshore islands such as Rottnest Island. Seeds of grasses and succulent plants form the bulk of its diet'), start_index=377, text='【7†source】', type='file_citation')]
user: []


In [63]:
"""
# Retrieve the message object
message = client.beta.threads.messages.retrieve(
  thread_id="thread_ERApKc7Z9MfXYGRDk6y3wfNC",
  message_id="msg_pBq2lZAh7Azb9ncQONuJ408d"
)

# Extract the message content
message_content = message.content[0].text
annotations = message_content.annotations
citations = []

# Iterate over the annotations and add footnotes
for index, annotation in enumerate(annotations):
    # Replace the text with a footnote
    message_content.value = message_content.value.replace(annotation.text, f' [{index}]')

    # Gather citations based on annotation attributes
    if (file_citation := getattr(annotation, 'file_citation', None)):
        cited_file = client.files.retrieve(file_citation.file_id)
        citations.append(f'[{index}] {file_citation.quote} from {cited_file.filename}')
    elif (file_path := getattr(annotation, 'file_path', None)):
        cited_file = client.files.retrieve(file_path.file_id)
        citations.append(f'[{index}] Click <here> to download {cited_file.filename}')
        # Note: File download functionality not implemented above for brevity

# Add footnotes to the end of the message before displaying to user
message_content.value += '\n' + '\n'.join(citations)
"""

NotFoundError: Error code: 404 - {'error': {'message': 'No such File object: file-AbUmF6zHEBHHPemN516WuN5j', 'type': 'invalid_request_error', 'param': 'id', 'code': None}}

In [24]:
message = client.beta.threads.messages.retrieve(thread_id=thread.id, message_id='msg_pBq2lZAh7Azb9ncQONuJ408d')
show_json(message)


{'id': 'msg_pBq2lZAh7Azb9ncQONuJ408d',
 'assistant_id': 'asst_hBnHnRjiJJzbFmCbFgRlQdk8',
 'content': [{'text': {'annotations': [{'end_index': 387,
      'file_citation': {'file_id': 'file-AbUmF6zHEBHHPemN516WuN5j',
       'quote': 'Rocky islands and coastal dune areas are the preferred habitats for this species which is found from Lake Alexandrina in southeastern South Australia westwards across coastal South and Western Australia to Shark Bay. Unlike other grass parrots it nests in burrows or rocky crevices mostly on offshore islands such as Rottnest Island. Seeds of grasses and succulent plants form the bulk of its diet'},
      'start_index': 377,
      'text': '【7†source】',
      'type': 'file_citation'}],
    'value': 'The Rock Parrot (Neophema petrophila) prefers habitats of rocky islands and coastal dune areas. It can be found from Lake Alexandrina in southeastern South Australia westwards across coastal South and Western Australia to Shark Bay. Notably, this species differs fro

In [None]:
# Clear up after session

deleted_assistant_file = client.beta.assistants.files.delete(
    assistant_id=assistant.id,
    file_id=file.id
)
print(deleted_assistant_file)

response = client.beta.assistants.delete(assistant.id)
print(response)



FileDeleteResponse(id='file-pIm8hsGoApG7PqifZqrtzRy7', deleted=True, object='assistant.file.deleted')
AssistantDeleted(id='asst_TErCng44FcKv6jmRZqFu7Tn5', deleted=True, object='assistant.deleted')
