# OpenAI Assistants APIs

The Assistants' API lets you create AI assistants in your applications. These assistants follow instruction. They use models, tools, and knowledge to answer user questions. In this notebook we are going to use one of the tools, retriever, to query against two pdf documents we will upload.

The architecture and data flow diagram below depicts the interaction among all components that comprise OpenAI Assistant APIs. Central to understand is the Threads and Runtime that executes asynchronously, adding and reading messages to the Threads.

For integrating the Assistants API:

1. Creat an Assistant with custom instructions and select a model. Optionally, enable tools like Code Interpreter, Retrieval, and Function Calling.

2. Initiate a Thread for each user conversation.
    
3. Add user queries as Messages to the Thread.

4.  Run the Assistant on the Thread for responses, which automatically utilizes the enabled tools

Below we follow those steps to demonstrate how to integrate Assistants API, using Retrieval tool, to a) upload a couple of pdf documents and b) use Assistant to query the contents of the document. Consider this as a mini Retrieval Augmented Generation (RAG).

The OpenAI documentation describes in details [how Assistants work](https://platform.openai.com/docs/assistants/how-it-works).

<img src="./images/assistant_ai_tools_retriever.png">

**Note**: Much of the code and diagrams are inspired from  Randy Michak of [Empowerment AI](https://www.youtube.com/watch?v=yzNG3NnF0YE)


## How to use Assistant API using Tools: Retriever using multiple documents

In [7]:
!pip install openai
!pip install python-dotenv



In [8]:
import warnings
import os
import time

import openai
from openai import OpenAI

from dotenv import load_dotenv, find_dotenv
from typing import List
from assistant_utils import print_thread_messages, upload_files, \
                            loop_until_completed, create_assistant_run

Load our *.env* file with respective API keys and base url endpoints. Here you can either use OpenAI or Anyscale Endpoints.

**Note**: Assistant API calling for Anyscale Endpoints (which serves only OS models) is not yet aviable).

In [9]:
warnings.filterwarnings('ignore')

_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_base = os.getenv("ANYSCALE_API_BASE", os.getenv("OPENAI_API_BASE"))
openai.api_key = os.getenv("ANYSCALE_API_KEY", "")
MODEL = os.getenv("MODEL")
print(f"Using MODEL={MODEL}; base={openai.api_base}")

Using MODEL=gpt-4-1106-preview; base=https://api.openai.com/v1


Upload two pdfs. OpenAI allows up twenty files

In [10]:
DOCS_TO_LOAD = ["docs/llm_survey_halluciantions.pdf",
                "docs/1001-math-problems-2nd.pdf"]

In [11]:
from openai import OpenAI

client = OpenAI(
    api_key = openai.api_key,
    base_url = openai.api_base
)

### Step 1: Create our knowledgebase
This entails uploading your pdfs as your knowledgebase for the retrievers to use. Once you upload a file, the Assistant from OpenAI will break it into smaller chuncks, sort and save these chuncks, index and store the embeddings as vectors.

The retrievers use your query to retrieve the best semantic matches on vectors in the knowledgebase, and then feed the LLM, along with the original query, to generate the consolidated and comprehesive answer, similarly to how a large-scale RAG retriever operates.

Upload the data files from your storage.

### Step 2: Create an Assistant
Before you can start interacting with the Assistant to carry out any tasks, you need an AI assistant object. Supply the Assistant with a model to use, tools, and file ids to use for its knowledge base.

In [12]:
assistant = client.beta.assistants.create(
  name="Math Tutor",
  instructions="You are a personal math tutor. Write and run code to answer math questions.",
  tools=[{"type": "code_interpreter"}],
  model=MODEL,
)

assistant

Assistant(id='asst_XrHEBSlOf5BFxVCkibWbvu56', created_at=1726655410, description=None, instructions='You are a personal math tutor. Write and run code to answer math questions.', metadata={}, model='gpt-4-1106-preview', name='Math Tutor', object='assistant', tools=[CodeInterpreterTool(type='code_interpreter')], response_format='auto', temperature=1.0, tool_resources=ToolResources(code_interpreter=ToolResourcesCodeInterpreter(file_ids=[]), file_search=None), top_p=1.0)

### Step 3: Create a thread
As the diagram above shows, the Thread is the object with which the AI Assistant runs will interact with, by fetching messages and putting messages to it. Think of a thread as a "conversation session between an Assistant and a user. Threads store Messages and automatically handle truncation to fit content into a model’s context window."

In [13]:
thread = client.beta.threads.create()
thread

Thread(id='thread_DEB9cE0RjOR3ExXXJNwZeEK8', created_at=1726655410, metadata={}, object='thread', tool_resources=ToolResources(code_interpreter=None, file_search=None))

### Step 4: Add your message query to the thread for the Assistant

In [14]:
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="I need to solve the equation `3x + 11 = 14`. Can you help me?",
)
message.model_dump()

{'id': 'msg_7TwBVbZhqeiNdWljCMmjWbVK',
 'assistant_id': None,
 'attachments': [],
 'completed_at': None,
 'content': [{'text': {'annotations': [],
    'value': 'I need to solve the equation `3x + 11 = 14`. Can you help me?'},
   'type': 'text'}],
 'created_at': 1726655410,
 'incomplete_at': None,
 'incomplete_details': None,
 'metadata': {},
 'object': 'thread.message',
 'role': 'user',
 'run_id': None,
 'status': None,
 'thread_id': 'thread_DEB9cE0RjOR3ExXXJNwZeEK8'}

### Step 5: Create a Run for the Assistant
A Run is an invocation of an Assistant on a Thread. The Assistant uses its configuration and the Thread’s Messages to perform tasks by calling models and tools. As part of a Run, the Assistant appends Messages to the Thread.

Note that Assistance will run asychronously: the run has the following
lifecycle and states: [*expired, completed, failed, cancelled*]. Run objects can have multiple statuses.

<img src="https://cdn.openai.com/API/docs/images/diagram-1.png">

In [15]:
instruction_msg = """Please address the user as Jules Dmatrix.
    Do not provide an answer to the question if the information was not retrieved from
    the knowledge base.
"""
run = create_assistant_run(client, assistant, thread, instruction_msg)
print(run.model_dump_json(indent=2))

{
  "id": "run_yhPlfNfrSXZ06TZxAMI2KVso",
  "assistant_id": "asst_XrHEBSlOf5BFxVCkibWbvu56",
  "cancelled_at": null,
  "completed_at": null,
  "created_at": 1726655410,
  "expires_at": 1726656010,
  "failed_at": null,
  "incomplete_details": null,
  "instructions": "Please address the user as Jules Dmatrix.\n    Do not provide an answer to the question if the information was not retrieved from\n    the knowledge base.\n",
  "last_error": null,
  "max_completion_tokens": null,
  "max_prompt_tokens": null,
  "metadata": {},
  "model": "gpt-4-1106-preview",
  "object": "thread.run",
  "parallel_tool_calls": true,
  "required_action": null,
  "response_format": "auto",
  "started_at": null,
  "status": "queued",
  "thread_id": "thread_DEB9cE0RjOR3ExXXJNwZeEK8",
  "tool_choice": "auto",
  "tools": [
    {
      "type": "code_interpreter"
    }
  ],
  "truncation_strategy": {
    "type": "auto",
    "last_messages": null
  },
  "usage": null,
  "temperature": 1.0,
  "top_p": 1.0,
  "tool_res

### Step 6: Loop through the Assistant run until status is 'completed'

In [16]:
run_status = client.beta.threads.runs.retrieve(
    thread_id = thread.id,
    run_id = run.id
)
print(run_status.model_dump_json(indent=4))

{
    "id": "run_yhPlfNfrSXZ06TZxAMI2KVso",
    "assistant_id": "asst_XrHEBSlOf5BFxVCkibWbvu56",
    "cancelled_at": null,
    "completed_at": null,
    "created_at": 1726655410,
    "expires_at": 1726656010,
    "failed_at": null,
    "incomplete_details": null,
    "instructions": "Please address the user as Jules Dmatrix.\n    Do not provide an answer to the question if the information was not retrieved from\n    the knowledge base.\n",
    "last_error": null,
    "max_completion_tokens": null,
    "max_prompt_tokens": null,
    "metadata": {},
    "model": "gpt-4-1106-preview",
    "object": "thread.run",
    "parallel_tool_calls": true,
    "required_action": null,
    "response_format": "auto",
    "started_at": 1726655411,
    "status": "in_progress",
    "thread_id": "thread_DEB9cE0RjOR3ExXXJNwZeEK8",
    "tool_choice": "auto",
    "tools": [
        {
            "type": "code_interpreter"
        }
    ],
    "truncation_strategy": {
        "type": "auto",
        "last_mess

#### Poll until Assistant run is completed

In [17]:
def loop_until_completed1(clnt: object, thrd: object, run_obj: object) -> None:
    """
    Poll the Assistant runtime until the run is completed or failed
    """
    while run_obj.status not in ["completed", "failed", "requires_action"]:
        run_obj = clnt.beta.threads.runs.retrieve(
            thread_id = thrd.id,
            run_id = run_obj.id)
        time.sleep(10)
        print(run_obj)
loop_until_completed1(client, thread, run_status)

Run(id='run_yhPlfNfrSXZ06TZxAMI2KVso', assistant_id='asst_XrHEBSlOf5BFxVCkibWbvu56', cancelled_at=None, completed_at=None, created_at=1726655410, expires_at=1726656010, failed_at=None, incomplete_details=None, instructions='Please address the user as Jules Dmatrix.\n    Do not provide an answer to the question if the information was not retrieved from\n    the knowledge base.\n', last_error=None, max_completion_tokens=None, max_prompt_tokens=None, metadata={}, model='gpt-4-1106-preview', object='thread.run', parallel_tool_calls=True, required_action=None, response_format='auto', started_at=1726655411, status='in_progress', thread_id='thread_DEB9cE0RjOR3ExXXJNwZeEK8', tool_choice='auto', tools=[CodeInterpreterTool(type='code_interpreter')], truncation_strategy=TruncationStrategy(type='auto', last_messages=None), usage=None, temperature=1.0, top_p=1.0, tool_resources={})
Run(id='run_yhPlfNfrSXZ06TZxAMI2KVso', assistant_id='asst_XrHEBSlOf5BFxVCkibWbvu56', cancelled_at=None, completed_at=1

### Step 7: Retrieve the message returned by the assistance
Only when the run is **completed** can you fetch the messages from the Thread

In [18]:
print_thread_messages(client, thread)

('assistant:The solution to the equation \\(3x + 11 = 14\\) is \\(x = 1\\), '
 'Jules Dmatrix.')
'user:I need to solve the equation `3x + 11 = 14`. Can you help me?'


### **Repeat: Add message to assistant**

In [19]:
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="""
    A cylindrical tank has a radius of 4 meters and a height of 10 meters.

Calculate the volume of the tank.
If water is being filled at a rate of 2 cubic meters per hour, how long will it take to fill the tank completely?
If the tank is filled to three-quarters of its capacity, how much water (in liters) is in the tank?
    """,
)
message

Message(id='msg_wy3XB4x67QihzVmUwGqVbpom', assistant_id=None, attachments=[], completed_at=None, content=[TextContentBlock(text=Text(annotations=[], value='\n    A cylindrical tank has a radius of 4 meters and a height of 10 meters.\n\nCalculate the volume of the tank.\nIf water is being filled at a rate of 2 cubic meters per hour, how long will it take to fill the tank completely?\nIf the tank is filled to three-quarters of its capacity, how much water (in liters) is in the tank?\n    '), type='text')], created_at=1726655433, incomplete_at=None, incomplete_details=None, metadata={}, object='thread.message', role='user', run_id=None, status=None, thread_id='thread_DEB9cE0RjOR3ExXXJNwZeEK8')

### **Repeat: Create another run for the Assistant for the second message**


In [20]:
run = create_assistant_run(client, assistant, thread, instruction_msg)
run_status = client.beta.threads.runs.retrieve(
    thread_id = thread.id,
    run_id = run.id
)

print(run_status.status)

queued


In [21]:
loop_until_completed1(client, thread, run_status)

Run(id='run_Y5Bd72CakXUiFJyLfUh5leoa', assistant_id='asst_XrHEBSlOf5BFxVCkibWbvu56', cancelled_at=None, completed_at=None, created_at=1726655433, expires_at=1726656033, failed_at=None, incomplete_details=None, instructions='Please address the user as Jules Dmatrix.\n    Do not provide an answer to the question if the information was not retrieved from\n    the knowledge base.\n', last_error=None, max_completion_tokens=None, max_prompt_tokens=None, metadata={}, model='gpt-4-1106-preview', object='thread.run', parallel_tool_calls=True, required_action=None, response_format='auto', started_at=1726655433, status='in_progress', thread_id='thread_DEB9cE0RjOR3ExXXJNwZeEK8', tool_choice='auto', tools=[CodeInterpreterTool(type='code_interpreter')], truncation_strategy=TruncationStrategy(type='auto', last_messages=None), usage=None, temperature=1.0, top_p=1.0, tool_resources={})
Run(id='run_Y5Bd72CakXUiFJyLfUh5leoa', assistant_id='asst_XrHEBSlOf5BFxVCkibWbvu56', cancelled_at=None, completed_at=N

In [22]:
print_thread_messages(client, thread)

('user:\n'
 '    A cylindrical tank has a radius of 4 meters and a height of 10 meters.\n'
 '\n'
 'Calculate the volume of the tank.\n'
 'If water is being filled at a rate of 2 cubic meters per hour, how long will '
 'it take to fill the tank completely?\n'
 'If the tank is filled to three-quarters of its capacity, how much water (in '
 'liters) is in the tank?\n'
 '    ')
('assistant:The solution to the equation \\(3x + 11 = 14\\) is \\(x = 1\\), '
 'Jules Dmatrix.')
'user:I need to solve the equation `3x + 11 = 14`. Can you help me?'
