# OpenAI Assistant Agent <a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/agent/openai_assistant_agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This shows you how to use our agent abstractions built on top of the [OpenAI Assistant API](https://platform.openai.com/docs/assistants/overview).


In [None]:
!pip install llama-index
!pip install vecs
!pip install pypdf
!pip install python-dotenv

## Assistant Agent with our own Vector Store / Retrieval API

LlamaIndex has 35+ vector database integrations. Instead of using the in-house Retrieval API, you can use our assistant agent over any vector store.

Here is our full [list of vector store integrations](https://docs.llamaindex.ai/en/stable/module_guides/storing/vector_stores.html). We picked one vector store (Supabase) using a random number generator.

In [25]:
from llama_index.agent import OpenAIAssistantAgent
from llama_index import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
)
from llama_index.vector_stores import SupabaseVectorStore

from llama_index.tools import QueryEngineTool, ToolMetadata

In [13]:
# !mkdir -p 'data/10k/'
# !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
# !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'

In [17]:
# load data
import glob
pdf_files = glob.glob('./c-elegans-data/*.pdf')
reader = SimpleDirectoryReader(input_files=pdf_files)

docs = reader.load_data()
for doc in docs:
    doc.id_ = "c-elegans-docs"

In [22]:
import os
from dotenv import load_dotenv
load_dotenv()

vector_store = SupabaseVectorStore(
    postgres_connection_string=(
        f"postgresql://postgres:{os.getenv('VECTOR_DATABASE_PW')}@db.rgvrtfssleyejerbzqbv.supabase.co:5432/postgres"
    ),
    collection_name="research_papers",
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(docs, storage_context=storage_context)

In [26]:
# sanity check that the docs are in the vector store
num_docs = vector_store.get_by_id("c-elegans-docs", limit=1000)
print(f"Num of docs: {len(num_docs)}")
print(f"First doc: {num_docs[0]}")



Num of docs: 52
First doc: 5f25e27e-33a3-4166-94d3-cca43d296cf0


In [27]:
c_elegans_tool = QueryEngineTool(
    query_engine=index.as_query_engine(similarity_top_k=3),
    metadata=ToolMetadata(
        name="c-elegans-research",
        description=(
            "Given a query, find the most relevant interventions for increasing the max lifespan of C. Elegans."
        ),
    ),
)

In [61]:

from pydantic import BaseModel, Field# define pydantic model for auto-retrieval function
from typing import Tuple, List
from llama_index.tools import FunctionTool

'''
Output tool
outputs list of triples: (List of 1-3 combined interventions, Explanation, Probability for what % it increases max lifespan of C.Elegans)
'''
def output_tool(interventions: str, explanation: str, max_lifespan_increase: str) -> int:
    """
    This is a tool for the outpu This tool should take in the 
    """
    return "Interventions: " + interventions + "\nExplanation: " + explanation + "\nMax Lifespan Increase Prediction: " + str(max_lifespan_increase)

description = """
Output a tuple of intervations, with the explanation of why it is chosen, and the probability of how much it increases the max lifespan of C. Elegans.
"""

class InterventionsOutput(BaseModel):
    interventions: str = Field(..., description="1-3 combined interventions from interventions.txt")
    explanation: str = Field(..., description="Explanation for the choice")
    max_lifespan_increase: float = Field(..., description="Multiplier prediction on how much it would increase the max lifespan of C.Elegans")


output_interventions_tool = FunctionTool.from_defaults(
    fn=output_tool,
    name="output_interventions_tool",
    description=description,
    fn_schema=InterventionsOutput,
)

In [59]:
instructions = """
You are helping longevity researchers choose promising life extending interventions for C. Elegans.
The proposed interventions should be a combination of 1-3 interventions that are listed in the interventions.txt file that you can read with the code interpreter.

You have acccess to a database of research papers on C. Elegans via the c_elegans_tool.

Read all the longevity interventions research papers. 
Interpolate from the experiments, hypotheses and results of the paper to propose novel interventions to prolong the lifespan of C. Elegans.

Then, reference check the interventions you propose with the uploaded csv files by writing code to check if they have been proposed before.
Update your hypotheses based on the results of the reference check. Do additional literature review if necessary with the c_elegans_tool.

Based on the data, propose the most promising interventions to prolong the lifespan of C. Elegans.
Each suggestion should include a rationale for its potential efficacy and estimated probabilities of lifespan extension in C.Elegans.
The Assistant ensures that all recommendations are evidence-based and reflect the latest research insights.

You should use the output_interventions_tool to output your proposed interventions in a structured format.
"""

agent = OpenAIAssistantAgent.from_new(
    name="Longevity Scientist Assistant (llama index) - 3",
    instructions=instructions,
    tools=[c_elegans_tool, output_interventions_tool],
    verbose=True,
    run_retrieve_sleep_time=1.0,
    openai_tools=[{"type": "code_interpreter"}],
    files=["./c-elegans-data/interventions.txt", "./c-elegans-data/DrugAge-database.csv"],
)

In [60]:
query = """
Propose inteventions to try on C. Elegans to prolong its lifespan.
"""
response = agent.chat(query)

=== Calling Function ===
Calling function: c-elegans-research with args: {"input":"longevity interventions C. Elegans"}




Got output: Some interventions that have been shown to extend longevity in C. elegans include dietary restriction, restriction of energy intake, and certain drugs such as antioxidants, metabolites, and synthetic compounds. These interventions have been found to delay the aging process and increase lifespan in nematodes.
=== Calling Function ===
Calling function: c-elegans-research with args: {"input":"novel combination lifespan extension"}
Got output: The paper discusses a computational method for predicting lifespan-extending drugs in C. elegans. The authors propose a semi-supervised classification algorithm that uses a random walk with restart on a drug-protein bipartite network to predict drugs with lifespan-extending effects. The method aims to narrow down the scope of candidate drugs that need to be verified through wet-lab experiments. The authors also mention that they carried out wet-lab experiments to verify the effectiveness of one of the screened drugs, 2-Bromo-4’-nitroaceto

TypeError: can only concatenate str (not "float") to str

In [45]:
print(str(response))

From the analysis, out of 2,120 interventions, 1,950 interventions have not been used before in previous experiments as per the CSV database. Here are five unused interventions for reference: `Abacavir`, `Aberela`, `Abiraterone`, `Abreua`, and `Abscisic`.

I have generated five novel combinations of these unused interventions, which could serve as potential longevity research avenues for C. elegans:

1. Combination of `Ivacaftor` and `Masitinib`
2. Combination of `Phenylbutyric`, `Fenretinide`, and `Trelagliptin`
3. Combination of `Tylosin`, `Doxifluridine`, and `Pranlukast`
4. Combination of `Bebebe`, `Ursodiol`, and `Methenamine`
5. Combination of `Methylcobalamin`

Before suggesting these combinations as interventions, it would be important to assess their potential effectiveness based on existing research. Such research may include the known effects of these compounds on biological pathways associated with aging, their toxicity, and their overall effect on organismal health. 

Do y