<a href="https://colab.research.google.com/github/doukansurel/Retrieval-Augmented-Generation/blob/main/Context_Augmented_OpenAIAgent_with_LlamaIndex.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Setup

In [None]:
!pip install llama-index

In [None]:
!pip install pypdf

In [2]:
import json
from typing import Sequence

from llama_index import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)
from llama_index.tools import QueryEngineTool, ToolMetadata

In [4]:
import os
os.environ["OPENAI_API_KEY"] = "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"

In [5]:
import warnings
import logging

warnings.filterwarnings("ignore")
logger = logging.getLogger("pypdf")
logger.setLevel(logging.ERROR)

In [6]:
from google.colab import drive

drive.mount("/content/drive")

Mounted at /content/drive


In [31]:
try:
    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/march"
    )
    march_index = load_index_from_storage(storage_context)

    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/june"
    )
    june_index = load_index_from_storage(storage_context)

    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/sept"
    )
    sept_index = load_index_from_storage(storage_context)

    index_loaded = True
except:
    index_loaded = False

#Download Data

In [None]:
!mkdir -p 'data/10q/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_march_2022.pdf' -O 'data/10q/uber_10q_march_2022.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_june_2022.pdf' -O 'data/10q/uber_10q_june_2022.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_sept_2022.pdf' -O 'data/10q/uber_10q_sept_2022.pdf'

In [33]:
if not index_loaded:
    # load data
    march_docs = SimpleDirectoryReader(
        input_files=["./data/10q/uber_10q_march_2022.pdf"]
    ).load_data()
    june_docs = SimpleDirectoryReader(
        input_files=["./data/10q/uber_10q_june_2022.pdf"]
    ).load_data()
    sept_docs = SimpleDirectoryReader(
        input_files=["./data/10q/uber_10q_sept_2022.pdf"]
    ).load_data()

    # build index
    march_index = VectorStoreIndex.from_documents(march_docs)
    june_index = VectorStoreIndex.from_documents(june_docs)
    sept_index = VectorStoreIndex.from_documents(sept_docs)

    # persist index
    march_index.storage_context.persist(persist_dir="./storage/march")
    june_index.storage_context.persist(persist_dir="./storage/june")
    sept_index.storage_context.persist(persist_dir="./storage/sept")

In [34]:
march_engine = march_index.as_query_engine(similarity_top_k=3)
june_engine = june_index.as_query_engine(similarity_top_k=3)
sept_engine = sept_index.as_query_engine(similarity_top_k=3)

In [35]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=march_engine,
        metadata=ToolMetadata(
            name="uber_march_10q",
            description=(
                "Provides information about Uber 10Q filings for March 2022. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=june_engine,
        metadata=ToolMetadata(
            name="uber_june_10q",
            description=(
                "Provides information about Uber financials for June 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=sept_engine,
        metadata=ToolMetadata(
            name="uber_sept_10q",
            description=(
                "Provides information about Uber financials for Sept 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
]

#Context-Augmented Agent

In [36]:
from llama_index.schema import Document
from llama_index.agent import ContextRetrieverOpenAIAgent

In [37]:
# toy index - stores a list of abbreviations
texts = [
    "Abbreviation: X = Revenue",
    "Abbreviation: YZ = Risk Factors",
    "Abbreviation: Z = Costs",
]
docs = [Document(text=t) for t in texts]
context_index = VectorStoreIndex.from_documents(docs)

In [38]:
context_agent = ContextRetrieverOpenAIAgent.from_tools_and_retriever(
    query_engine_tools,
    context_index.as_retriever(similarity_top_k=1),
    verbose=True,
)

In [39]:
response = context_agent.chat("What is the YZ of March 2022?")

[1;3;33mContext information is below.
---------------------
Abbreviation: YZ = Risk Factors
---------------------
Given the context information and not prior knowledge, either pick the corresponding tool or answer the function: What is the YZ of March 2022?

[0mSTARTING TURN 1
---------------

=== Calling Function ===
Calling function: uber_march_10q with args: {
  "input": "Risk Factors"
}
Got output: The company faces various risk factors that could have an adverse effect on its business, financial condition, and results of operations. These risk factors include:

1. Economic, social, weather, and regulatory conditions, including the impact of COVID-19, which may negatively affect the company's operations.
2. Failure to offer or successfully implement autonomous vehicle technologies, which could result in a competitive disadvantage.
3. Difficulty in retaining and attracting high-quality personnel, which could impact the company's business.
4. Security or data privacy breaches, unau

In [40]:
print(str(response))

The risk factors (YZ) for Uber in March 2022 include:

1. Economic, social, weather, and regulatory conditions, including the impact of COVID-19.
2. Failure to offer or successfully implement autonomous vehicle technologies.
3. Difficulty in retaining and attracting high-quality personnel.
4. Security or data privacy breaches, unauthorized access, or improper use of proprietary or confidential data.
5. Cyberattacks, such as malware or phishing attacks.
6. Climate change risks.
7. Dependence on third parties for distribution and software.
8. The need for additional capital to support business growth.
9. Challenges in identifying, acquiring, and integrating suitable businesses.
10. Restrictions and modifications to the company's business model in certain jurisdictions.
11. Legal and regulatory risks.
12. Extensive government regulation and oversight related to payment and financial services.
13. Risks related to data collection, use, transfer, and protection.
14. Intellectual property pr

In [41]:
context_agent.chat("What is the X and Z in September 2022?")

[1;3;33mContext information is below.
---------------------
Abbreviation: Z = Costs
---------------------
Given the context information and not prior knowledge, either pick the corresponding tool or answer the function: What is the X and Z in September 2022?

[0mSTARTING TURN 1
---------------

=== Calling Function ===
Calling function: uber_sept_10q with args: {
  "input": "Costs"
}
Got output: The context information provided discusses various costs incurred by Uber. These costs include cost of revenue, exclusive of depreciation and amortization, operations and support expenses, and sales and marketing expenses. The cost of revenue consists of insurance costs, credit card processing fees, data center and networking expenses, and costs incurred with carriers for transportation services. Operations and support expenses primarily consist of compensation expenses for employees supporting operations and customer support. Sales and marketing expenses include compensation costs, advertisi

AgentChatResponse(response="In September 2022, Uber incurs various costs, including:\n\n1. Cost of revenue, exclusive of depreciation and amortization, which includes insurance costs, credit card processing fees, data center and networking expenses, and costs incurred with carriers for transportation services.\n2. Operations and support expenses, primarily consisting of compensation expenses for employees supporting operations and customer support.\n3. Sales and marketing expenses, including compensation costs, advertising costs, and promotional expenditures.\n4. Research and development expenses related to engineering, design, and product development.\n\nThese costs are incurred as part of Uber's operations and are important factors to consider when evaluating the company's financial performance.", sources=[ToolOutput(content='The context information provided discusses various costs incurred by Uber. These costs include cost of revenue, exclusive of depreciation and amortization, oper

Use Uber 10-Q as context, use Calculator as Tool

In [43]:
from llama_index.tools import BaseTool, FunctionTool


def magic_formula(revenue: int, cost: int) -> int:
    """Runs MAGIC_FORMULA on revenue and cost."""
    return revenue - cost


magic_tool = FunctionTool.from_defaults(fn=magic_formula, name="magic_formula")

In [44]:
context_agent = ContextRetrieverOpenAIAgent.from_tools_and_retriever(
    [magic_tool], sept_index.as_retriever(similarity_top_k=3), verbose=True
)

In [45]:
response = context_agent.chat(
    "Can you run MAGIC_FORMULA on Uber's revenue and cost?"
)

[1;3;33mContext information is below.
---------------------
UBER TECHNOLOGIES, INC.
CONDENSED CONSOLIDATED STATEMENTS OF OPERATIONS
(In millions, except share amounts which are reflected in thousands, and per share amounts)
(Unaudited)
Three Months Ended September  30, Nine Months Ended September  30,
2021 2022 2021 2022
Revenue $ 4,845 $ 8,343 $ 11,677 $ 23,270 
Costs and expenses
Cost of revenue, exclusive of depreciation and amortization shown separately
below 2,438 5,173 6,247 14,352 
Operations and support 475 617 1,330 1,808 
Sales and marketing 1,168 1,153 3,527 3,634 
Research and development 493 760 1,496 2,051 
General and administrative 625 908 1,705 2,391 
Depreciation and amortization 218 227 656 724 
Total costs and expenses 5,417 8,838 14,961 24,960 
Loss from operations (572) (495) (3,284) (1,690)
Interest expense (123) (146) (353) (414)
Other income (expense), net (1,832) (535) 1,821 (7,796)
Loss before income taxes and income (loss) from equity method investments (2,

In [46]:
print(response)

The result of running the MAGIC_FORMULA on Uber's revenue and cost is -1690.


Context-Augmented OpenAI Agent' in amacı oluşturulan sorgu motoruna dışarıdan bir fonksiyon oluşturularak daha bilgili hale getirilmesi veya sorgu motoruna yeni bir yeteneğini tanıtıp onu geliştirmek