# OpenAI Assistants APIs

The Assistants' API lets you create AI assistants in your applications. These assistants follow instructions and use models, tools, and knowledge to answer user questions. In this notebook we are going to use one of the tools, retriever,
to query against two pdf documents we will upload.

The architeture and data flow diagram below depicts the interaction among all
components that comprise OpenAI Assistant APIs. Central to understand is the 
Threads and Runtime that executes anyschronously, adding and reading messages
to the Threads.

The OpenAI documentation describes in details [how Assistants work](https://platform.openai.com/docs/assistants/how-it-works).

<img src="./images/assistant_ai_tools_retriever.png">


## How to use Assistant API using Tools: Retriever

In [1]:
import warnings
import os

import openai
from openai import OpenAI

from dotenv import load_dotenv, find_dotenv

Load our .env file with respective API keys and base url endpoints. Here you can either use OpenAI or Anyscale Endpoints. **Note**: Assistant API calling for Anyscale Endpoints (which serves only OS modles) is not yet aviable).

In [4]:
warnings.filterwarnings('ignore')

_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_base = os.getenv("ANYSCALE_API_BASE", os.getenv("OPENAI_API_BASE"))
openai.api_key = os.getenv("ANYSCALE_API_KEY", os.getenv("OPENAI_API_KEY"))
MODEL = os.getenv("MODEL")
print(f"Using MODEL={MODEL}; base={openai.api_base}")

Using MODEL=gpt-4-1106-preview; base=https://api.openai.com/v1


In [5]:
DOCS_TO_LOAD = ["docs/HAI_AI-Index-Report_2023.pdf", 
                "docs/llm_survey_halluciantions.pdf"]

Utility function to create OpenAI client instance

In [6]:
from openai import OpenAI

client = OpenAI(
    api_key = openai.api_key,
    base_url = openai.api_base
)

### Create our knowledgebase
This entails uploading your pdfs as your knowledgebase for the retrievers to use.
Most likely, these docuements will be chuncked and indexed into vectors. That bit is hidden from you.

The retireivers use our query to search the best semantic matches with vectors in the knowledgebase, and feed the LLM, along with the original query, to generate the consolidated and comprehesive answere. Its like a mini Retrieval Augment Generation (RAG)

In [18]:
file_objects = []
for doc in DOCS_TO_LOAD:
    print(f"Uploading doc:{doc}...")
    file_obj = client.files.create(file=open(doc, "rb"), 
                                             purpose="assistants")
    file_objects.append(file_obj)

Uploading doc:docs/HAI_AI-Index-Report_2023.pdf...
Uploading doc:docs/llm_survey_halluciantions.pdf...


In [17]:
file_objects

[FileObject(id='file-tsRWdKT1tj4X0k0OImkNcNvi', bytes=25318310, created_at=1703028209, filename='HAI_AI-Index-Report_2023.pdf', object='file', purpose='assistants', status='processed', status_details=None),
 FileObject(id='file-vA4recYS28J71huYBoRedW1M', bytes=1870663, created_at=1703028212, filename='llm_survey_halluciantions.pdf', object='file', purpose='assistants', status='processed', status_details=None)]

### Create an Assistant 
Before you can start interacting with the assistance to carry out any tasks, you need an AI assistant object.

In [30]:
assistant = client.beta.assistants.create(name="AI Report and LLM survey Chatbot",
                                           instructions="""You are a knowledgeable chatbot trained to respond 
                                               inquires about the Standfords HAI Artificial index 2023 report and on survey of why 
                                               LLMs hallucinate. Use a neutral, professional advisory tone, and only
                                               respond by consulting the knowledgbase or files you are granted access to.
                                               Do not make up answers.""",
                                           model=MODEL,
                                           tools = [{'type': 'retrieval'}],
                                           file_ids=[file_objects[0].id, file_objects[1].id]
)                                        
assistant

Assistant(id='asst_wUIbHW1FdTAR4cnC9wwMhW37', created_at=1703029364, description=None, file_ids=['file-JPaJ4gNVSL89Ypdrr6UcdzKl', 'file-qssr6MxqaMZk3FTJPKjsK3SF'], instructions='You are a knowledgeable chatbot trained to respond \n                                               inquires about the Standfords HAI Artificial index 2023 report and on survey of why \n                                               LLMs hallucinate. Use a neutral, professional advisory tone, and only\n                                               respond by consulting the knowledgbase or files you are granted access to.\n                                               Do not make up answers.', metadata={}, model='gpt-4-1106-preview', name='AI Report and LLM survey Chatbot', object='assistant', tools=[ToolRetrieval(type='retrieval')])

### Create a thread 
As the diagram above shows, the Thread is the object by which the AI Assistance runtime environment will interact with, by fetching messages and putting messages to it.

Think of a thread as a "conversation session between an Assistant and a user. Threads store Messages and automatically handle truncation to fit content into a model’s context."

In [26]:
thread = client.beta.threads.create()
thread

Thread(id='thread_UW9eo5oAtk0nY8dCTBqpcXbP', created_at=1703028930, metadata={}, object='thread')

### Add our message query to the thread

In [27]:
message_1 = client.beta.threads.messages.create(
    thread_id=thread.id, 
    role="user",
    content="""What are the top 10 takeaways in the Artificial Intelligence Index Report 2023.
    Summarize each in no more three sentences.""",
)
message_1

ThreadMessage(id='msg_pAwRieM7toCEIYvGcwIMeVMQ', assistant_id=None, content=[MessageContentText(text=Text(annotations=[], value='What are the top 10 takeaways in the Artificial Intelligence Index Report 2023.\n    Summarize each in no more three sentences.'), type='text')], created_at=1703028935, file_ids=[], metadata={}, object='thread.message', role='user', run_id=None, thread_id='thread_UW9eo5oAtk0nY8dCTBqpcXbP')

### Step 5: Run the assistance.
The Assistant uses it’s configuration and the Thread’s Messages to perform tasks by calling models and tools. As part of a Run, the Assistant appends Messages to the Thread.

Note that Assistance will run asychronously: the run has the following
lifecycle and states: [*expired, completed, failed, cancelled*]. Run objects can have multiple statuses.

<img src="https://cdn.openai.com/API/docs/images/diagram-1.png">

In [36]:
run = client.beta.threads.runs.create(
    thread_id = thread.id,
    assistant_id = assistant.id,
    instructions = """Please address the user as Jules Dmatrix.  
    Do not provide an answer to the question if the information was not retrieved from the knowledge base.
"""
)
run

Run(id='run_4mzVkwgi6mh3v2g5pN8Zm0dY', assistant_id='asst_wUIbHW1FdTAR4cnC9wwMhW37', cancelled_at=None, completed_at=None, created_at=1703030029, expires_at=1703030629, failed_at=None, file_ids=['file-JPaJ4gNVSL89Ypdrr6UcdzKl', 'file-qssr6MxqaMZk3FTJPKjsK3SF'], instructions='Please address the user as Jules Dmatrix.  \n    Do not provide an answer to the question if the information was not retrieved from the knowledge base.\n', last_error=None, metadata={}, model='gpt-4-1106-preview', object='thread.run', required_action=None, started_at=None, status='queued', thread_id='thread_UW9eo5oAtk0nY8dCTBqpcXbP', tools=[ToolAssistantToolsRetrieval(type='retrieval')])

### Step 6: Retrieve the status 

In [32]:
run = client.beta.threads.runs.retrieve(
    thread_id = thread.id,
    run_id = run.id
)

print(run.status)

completed


### Step 7. Retrieve the message returned by the assistance

In [34]:
from pprint import pprint
messages = client.beta.threads.messages.list(
    thread_id = thread.id,

)
for each in messages:
  pprint(each.role + ":" + each.content[0].text.value)

('assistant:The top 10 takeaways from the Artificial Intelligence Index Report '
 '2023 are as follows:\n'
 '\n'
 '1. **Industry races ahead of academia**: Since 2014, industry has overtaken '
 'academia in releasing significant machine learning models. By 2022, there '
 'were 32 significant industry-produced models compared to only three by '
 'academia, signaling that industry possesses more resources such as data, '
 'computing power, and funds for AI development【9†source】.\n'
 '\n'
 '2. **Performance saturation on traditional benchmarks**: Although AI '
 'continues to achieve state-of-the-art results, improvements on many '
 'benchmarks are increasingly marginal. This saturation is happening more '
 'rapidly, yet new, comprehensive benchmarks like BIG-bench and HELM have been '
 'introduced【9†source】.\n'
 '\n'
 '3. **AI and environmental impact**: Recent research indicates significant '
 'environmental impacts of AI, like the carbon emissions produced by the BLOOM '
 "model's train

### Add another message to for the Assistant

In [35]:
message_2 = client.beta.threads.messages.create(
    thread_id=thread.id, 
    role="user",
    content="""What are the examples of each category of LLM hallucinations types? Give 
    an example of each types?"
    """,
)
message_2

ThreadMessage(id='msg_sSCIAKegqYmwKnGfMpr03cSI', assistant_id=None, content=[MessageContentText(text=Text(annotations=[], value='What are the examples of each category of LLM hallucinations types? Give \n    an example of each types?"\n    '), type='text')], created_at=1703029958, file_ids=[], metadata={}, object='thread.message', role='user', run_id=None, thread_id='thread_UW9eo5oAtk0nY8dCTBqpcXbP')

### Run the assistant

In [38]:
run = client.beta.threads.runs.retrieve(
    thread_id = thread.id,
    run_id = run.id
)

print(run.status)

completed


In [39]:
from pprint import pprint
messages = client.beta.threads.messages.list(
    thread_id = thread.id,

)
for each in messages:
  pprint(each.role + ":" + each.content[0].text.value)

('assistant:The examples of each category of Large Language Model (LLM) '
 'hallucination types are as follows:\n'
 '\n'
 '- **Factuality Hallucination:**\n'
 '  - **Factual Inconsistency**: An LLM incorrectly states that "Yuri Gagarin '
 'was the first person to land on the Moon," when in fact it was Neil '
 'Armstrong【16†source】.\n'
 '  - **Factual Fabrication**: Despite the lack of verifiable evidence, an LLM '
 'fabricates a history for unicorns, claiming they "roamed the plains of '
 'Atlantis around 10000 BC" and were associated with royalty【16†source】.\n'
 '\n'
 '- **Faithfulness Hallucination:**\n'
 '  - **Instruction Inconsistency**: An LLM responds to the instruction to '
 'translate "What is the capital of France?" into Spanish by simply stating '
 '"The capital of France is Paris," without performing the '
 'translation【16†source】.\n'
 '  - **Context Inconsistency**: When provided information that the Nile '
 'originates from the Great Lakes region of central Africa, an LLM

In [40]:
# Delete the assistant. This will end the runtime associated with it

response = client.beta.assistants.delete(assistant.id)
response

AssistantDeleted(id='asst_wUIbHW1FdTAR4cnC9wwMhW37', deleted=True, object='assistant.deleted')