<a href="https://colab.research.google.com/github/devanshu1204/Working-Agentic_RAG/blob/main/Working_Agentic_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install python-dotenv==1.0.0
!pip install llama-index==0.10.27
!pip install llama-index-readers-file

Collecting python-dotenv==1.0.0
  Downloading python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.0
Collecting llama-index==0.10.27
  Downloading llama_index-0.10.27-py3-none-any.whl (6.9 kB)
Collecting llama-index-agent-openai<0.3.0,>=0.1.4 (from llama-index==0.10.27)
  Downloading llama_index_agent_openai-0.2.8-py3-none-any.whl (13 kB)
Collecting llama-index-cli<0.2.0,>=0.1.2 (from llama-index==0.10.27)
  Downloading llama_index_cli-0.1.12-py3-none-any.whl (26 kB)
Collecting llama-index-core<0.11.0,>=0.10.27 (from llama-index==0.10.27)
  Downloading llama_index_core-0.10.55-py3-none-any.whl (15.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.5/15.5 MB[0m [31m33.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting llama-index-embeddings-openai<0.2.0,>=0.1.5 (from llama-index==0.10.27)
  Downloading llama_index_embeddings_openai-0.1.10-py3-none-any.whl (6.2 kB)
Collecting llama-index-i

In [2]:
import nest_asyncio
nest_asyncio.apply()

In [3]:
import os
from google.colab import userdata
os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

In [4]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, SummaryIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.tools import FunctionTool, QueryEngineTool
from llama_index.core.vector_stores import MetadataFilters, FilterCondition
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from typing import List, Optional
import numpy as np

### Chunking

In [6]:
# load documents
documents = SimpleDirectoryReader(input_files=[
    "CrowdStrike_FY2023_Q1.pdf",
    "CrowdStrike_FY2023_Q2.pdf",
    "CrowdStrike_FY2023_Q3.pdf"
]).load_data()

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

    # Check if nodes are created successfully
if nodes:
        print("Nodes created successfully.")
        print(f"Total nodes created: {len(nodes)}")

        # Print some nodes for debugging
        print("Printing first 5 nodes:")
        for i, node in enumerate(nodes[:5]):
            print(f"Node {i+1}: {node}")
else:
        print("Failed to create nodes.")

Nodes created successfully.
Total nodes created: 48
Printing first 5 nodes:
Node 1: Node ID: e79a2566-6196-40f7-9d03-cf453605ff9e
Text: CRWD earnings call for the period ending April 30, 2022
CrowdStrike Holdings, Inc. (CRWD -1.01%) Q1 2023 Earnings Call Jun 02,
2022, 5:00 p.m. ET Contents: Prepared Remarks Questions and Answers
Call Participants Prepared Remarks: Operator Hello. Thank you for
standing by, and welcome to the CrowdStrike fiscal first  quarter 2023
results confere...
Node 2: Node ID: 53ac8cb8-4a84-4eb6-b882-9cc4f1099b42
Text: Thank you, Maria, and thank you, all, for joining us. I will
start today's call  by summarizing three key points. First, fiscal
2023 is off to a fantastic start. We believe our Q1 results exemplify
that we have a winning formula that includes scale, growth,
profitability, and free cash flow. Second, we saw strength across the
platform, including...
Node 3: Node ID: f60fee01-e9c2-40e5-8cea-1568a2a08393
Text: adoption metrics quarter after quarter. In

### Creating Vector Embeddings

In [7]:
 print("starting vector indexing by creating embeddings using ada 002")

 vector_index = VectorStoreIndex(nodes)

 print("finished vector indexing")

starting vector indexing by creating embeddings using ada 002
finished vector indexing


### Storing Vector Embeddings

In [8]:
vector_index.storage_context.persist()

### Access Vector Embeddings From Storage

In [9]:
from llama_index.core import StorageContext, load_index_from_storage

storage_context = StorageContext.from_defaults(persist_dir="./storage")
vector_index = load_index_from_storage(storage_context=storage_context)

### Function for creating Tools

In [10]:
def get_doc_tools(
    file_path: str,
    name: str,
) -> str:
    """Get vector query and summary query tools from a document."""


    def vector_query(
        query: str,
        page_numbers: Optional[List[str]] = None
    ) -> str:
        """Use to answer questions over a given paper.

        Useful if you have specific questions over the paper.
        Always leave page_numbers as None UNLESS there is a specific page you want to search for.

        Args:
            query (str): the string query to be embedded.
            page_numbers (Optional[List[str]]): Filter by set of pages. Leave as NONE
                if we want to perform a vector search
                over all pages. Otherwise, filter by the set of specified pages.

        """

        page_numbers = page_numbers or []
        metadata_dicts = [
            {"key": "page_label", "value": p} for p in page_numbers
        ]

        query_engine = vector_index.as_query_engine(
            similarity_top_k=2,
            filters=MetadataFilters.from_dicts(
                metadata_dicts,
                condition=FilterCondition.OR
            )
        )
        response = query_engine.query(query)
        return response


    vector_query_tool = FunctionTool.from_defaults(
        name=f"vector_tool_{name}",
        fn=vector_query
    )

    summary_index = SummaryIndex(nodes)

    summary_query_engine = summary_index.as_query_engine(
        response_mode="tree_summarize",
        use_async=True,
    )
    summary_tool = QueryEngineTool.from_defaults(
        name=f"summary_tool_{name}",
        query_engine=summary_query_engine,
        description=(
            f"Useful for summarization questions related to {name}"
        ),
    )

    return vector_query_tool, summary_tool

### Calling the function for creating tools

In [11]:
papers = [
    "CrowdStrike_FY2023_Q1.pdf",
    "CrowdStrike_FY2023_Q2.pdf",
    "CrowdStrike_FY2023_Q3.pdf"
]

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, paper.replace(".pdf", ""))
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]


Getting tools for paper: CrowdStrike_FY2023_Q1.pdf
Getting tools for paper: CrowdStrike_FY2023_Q2.pdf
Getting tools for paper: CrowdStrike_FY2023_Q3.pdf


In [12]:
initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [8]:
print(initial_tools)

[<llama_index.core.tools.function_tool.FunctionTool object at 0x79bd44cd1150>, <llama_index.core.tools.query_engine.QueryEngineTool object at 0x79bd44cd24d0>, <llama_index.core.tools.function_tool.FunctionTool object at 0x79bd44c11360>, <llama_index.core.tools.query_engine.QueryEngineTool object at 0x79bd44c12b00>, <llama_index.core.tools.function_tool.FunctionTool object at 0x79bd434e1090>, <llama_index.core.tools.query_engine.QueryEngineTool object at 0x79bd434e2ec0>]


In [13]:
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-3.5-turbo")

In [14]:
len(initial_tools)

6

In [15]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    initial_tools,
    llm=llm,
    verbose=True
)
agent = AgentRunner(agent_worker)

### QnA

In [16]:
response = agent.query(
    "Tell me about the financial status of the company"
)

Added user message to memory: Tell me about the financial status of the company
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "financial status"}
=== Function Output ===
CrowdStrike reported strong financial results for the third quarter of fiscal year 2023. The company achieved significant growth in ending ARR, net new ARR, total revenue, and subscription revenue. Additionally, CrowdStrike demonstrated strong performance in terms of gross margins, operating expenses, operating income, and net income. The company also highlighted its focus on maintaining a balance between growth and profitability, as well as its commitment to generating free cash flow.
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q2 with args: {"query": "financial status"}
=== Function Output ===
The company reported strong financial results in the third quarter of fiscal year 2023. Total revenue grew by 53% year over year to reach $580.9 

In [18]:
response = agent.query("recipe to make a coffee")

Added user message to memory: recipe to make a coffee
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "recipe to make a coffee"}
=== Function Output ===
Efficiency in sales and customer acquisition is key to success in the market. By leveraging an e-commerce platform with a high magic number, companies can streamline the process of gaining new customers and cross-selling to existing ones. Prioritizing customer needs and offering a seamless buying experience can set a company apart from its competition. Additionally, focusing on providing value and addressing critical issues like ransomware can lead to upselling opportunities and increased deal sizes. Leveraging technology and a strong sales team can drive efficiency and growth in the organization.
=== LLM Response ===
I couldn't find a specific recipe for making coffee in the document. However, the document emphasizes the importance of efficiency in sales and customer acquisition for su

In [50]:
response = agent.query("Who is the CFO of the company")

Added user message to memory: Who is the CFO of the company
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "CFO"}
=== Function Output ===
Burt Podbere
=== LLM Response ===
The CFO of the company is Burt Podbere.


In [49]:
response = agent.query("How was the performance of the company")

Added user message to memory: How was the performance of the company
=== Calling Function ===
Calling function: summary_tool_CrowdStrike_FY2023_Q1 with args: {"input": "performance of the company"}
=== Function Output ===
The company had a strong performance in the fiscal first quarter of 2023, with notable achievements including a record net new ARR of $190 million, ending ARR growth of 61% year over year exceeding $1.9 billion, record non-GAAP operating profit of $83 million, and a free cash flow margin of 32%. The company demonstrated growth, profitability, and cash flow well above industry benchmarks, reflecting a powerful combination of factors contributing to sustained success. Additionally, the company saw strength across its platform, with a significant number of customers adopting multiple modules, showcasing the wide competitive moat and long-term growth potential of the company.
=== LLM Response ===
The company had a strong performance in the fiscal first quarter of 2023, wi

In [12]:
response = agent.query("How does the company performed in quarter 3 compared to quarter 1")

Added user message to memory: How does the company performed in quarter 3 compared to quarter 1
=== Calling Function ===
Calling function: summary_tool_CrowdStrike_FY2023_Q1 with args: {"input": "performance in quarter 1"}
=== Function Output ===
The performance in quarter 1 was marked by strong achievements, including exceeding expectations in net new ARR, achieving a significant year-over-year growth in Ending ARR, recording a record non-GAAP operating profit, and maintaining a healthy free cash flow margin. Despite facing macroeconomic headwinds that led to delays in purchasing decisions from smaller non-enterprise accounts, the company demonstrated resilience by continuing to prioritize investments from larger enterprise customers. The company's ability to maintain strong competitive win rates and consistent average selling prices showcases its confidence in its long-term market position and the cybersecurity market's resiliency.
=== Calling Function ===
Calling function: summary_t

In [14]:
response = agent.query("How does the company performed in quarter 3 compared to quarter 1")
print(str(response))

Added user message to memory: How does the company performed in quarter 3 compared to quarter 1
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "financial performance"}
=== Function Output ===
The company reported strong financial performance in the third quarter of fiscal year 2023. Total revenue grew by 53% over the same quarter last year, reaching $580.9 million. Subscription revenue also saw significant growth, increasing by 53% to reach $547.4 million. Non-GAAP operating income grew by 77% year over year to reach a record $89.7 million, with an operating margin improvement of 2 percentage points to reach 15%. Additionally, non-GAAP net income attributable to the company more than doubled over the prior year, growing to a record $96.1 million. Cash flow from operations and free cash flow also experienced substantial growth, increasing by 53% and 41% year over year, respectively.
=== Calling Function ===
Calling function: vector_tool

In [15]:
response = agent.query("Who are the analyst who joined the call")

Added user message to memory: Who are the analyst who joined the call
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "analyst joined the call"}
=== Function Output ===
Joe Gallo from Jefferies, Saket Kalia from Barclays, and several other analysts joined the call.
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q2 with args: {"query": "analyst joined the call"}
=== Function Output ===
Joe Gallo from Jefferies, Saket Kalia from Barclays, and several other analysts joined the call.
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q3 with args: {"query": "analyst joined the call"}
=== Function Output ===
Joe Gallo from Jefferies, Andrew Nowinski from Well Fargo Securities, Brian Essex from Goldman Sachs, Joel Fishbein from Truist Securities, Matt Hedberg from RBC Capital Markets, Rob Owens from Piper Sandler, Alex Henderson from Needham and Company, Roger Boyd from UBS, Daniel Park from C

In [16]:
response = agent.query("Who are the analyst who joined the call adn list out the analysts who joined the call across the 3 quarters")

Added user message to memory: Who are the analyst who joined the call adn list out the analysts who joined the call across the 3 quarters
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "list of analysts who joined the call"}
=== Function Output ===
Andrew Nowinski, Brian Essex, Joel Fishbein, Matt Hedberg, Rob Owens, Alex Henderson, Roger Boyd, Daniel Park, Joseph Gallo, Shaul Eyal, Fatima Boolani, Brad Reback, Janet Zhang, Sterling Auty, Hamza Fodderwala, Tal Liani, John DiFucci, Brad Zelnick, Joe Gallo
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q2 with args: {"query": "list of analysts who joined the call"}
=== Function Output ===
Andrew Nowinski, Brian Essex, Joel Fishbein, Matt Hedberg, Rob Owens, Alex Henderson, Roger Boyd, Daniel Park, Joseph Gallo, Shaul Eyal, Fatima Boolani, Brad Reback, Janet Zhang, Sterling Auty, Hamza Fodderwala, Tal Liani, John DiFucci, Brad Zelnick, Joe Gallo
=== Calling Func

In [17]:
response = agent.query("How are the focus areas for the company changed from quarter 1 to quarter 3")

Added user message to memory: How are the focus areas for the company changed from quarter 1 to quarter 3
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "focus areas"}
=== Function Output ===
Complete, Spotlight, Identity Threat Detection
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q3 with args: {"query": "focus areas"}
=== Function Output ===
Complete, Spotlight, Identity Threat Detection
=== LLM Response ===
The focus areas for the company remained the same from quarter 1 to quarter 3. The focus areas are "Complete, Spotlight, Identity Threat Detection."


In [18]:
response = agent.query("What are the steps that imporved the performance of the company in quarter 2 if we compare it with quarter 1. What were the the result of those steps")

Added user message to memory: What are the steps that imporved the performance of the company in quarter 2 if we compare it with quarter 1. What were the the result of those steps
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "steps that improved the performance of the company"}
=== Function Output ===
Investing in the e-commerce platform, increasing the number of trials available, prioritizing customer needs, leveraging the trial program to drive sales efficiency, upselling modules through the e-commerce platform, and focusing on go-to-market execution are steps that have improved the performance of the company.
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q2 with args: {"query": "steps that improved the performance of the company"}
=== Function Output ===
Investing in the e-commerce platform, increasing the number of trials available, prioritizing customer needs, leveraging the trial program to drive sal

In [19]:
response = agent.query("Who are the analyst who joined the call adn list out the analysts who joined the call across the 3 quarters. list in tabular format")

Added user message to memory: Who are the analyst who joined the call adn list out the analysts who joined the call across the 3 quarters. list in tabular format
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "analysts who joined the call"}
=== Function Output ===
Brian Essex, Daniel Park, Shaul Eyal, Joseph Gallo, Brad Reback, Janet Zhang, Sterling Auty, Hamza Fodderwala, Tal Liani, John DiFucci, Brad Zelnick
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q2 with args: {"query": "analysts who joined the call"}
=== Function Output ===
Andrew Nowinski, Brian Essex, Joel Fishbein, Matt Hedberg, Rob Owens, Alex Henderson, Roger Boyd, Daniel Park, Joseph Gallo, Shaul Eyal, Fatima Boolani, Brad Reback, Janet Zhang, Sterling Auty, Hamza Fodderwala, Tal Liani, John DiFucci, Brad Zelnick, Joe Gallo
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q3 with args: {"query": "analysts who joined t