<a href="https://colab.research.google.com/github/devanshu1204/Working-Agentic_RAG/blob/main/Working_Agentic_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install python-dotenv==1.0.0
!pip install llama-index==0.10.27
!pip install llama-index-readers-file

In [4]:
import nest_asyncio
nest_asyncio.apply()

In [5]:
import os
from google.colab import userdata
os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

In [6]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, SummaryIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.tools import FunctionTool, QueryEngineTool
from llama_index.core.vector_stores import MetadataFilters, FilterCondition
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from typing import List, Optional
import numpy as np

In [7]:
from IPython.display import Markdown, display

### Chunking

In [8]:
# load documents
documents = SimpleDirectoryReader(input_files=[
    "CrowdStrike_FY2023_Q1.pdf",
    "CrowdStrike_FY2023_Q2.pdf",
    "CrowdStrike_FY2023_Q3.pdf"
]).load_data()

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

    # Check if nodes are created successfully
if nodes:
        print("Nodes created successfully.")
        print(f"Total nodes created: {len(nodes)}")

        # Print some nodes for debugging
        print("Printing first 5 nodes:")
        for i, node in enumerate(nodes[:5]):
            print(f"Node {i+1}: {node}")
else:
        print("Failed to create nodes.")

Nodes created successfully.
Total nodes created: 48
Printing first 5 nodes:
Node 1: Node ID: e15553b1-3128-4d85-911b-86b90cca77a2
Text: CRWD earnings call for the period ending April 30, 2022
CrowdStrike Holdings, Inc. (CRWD -1.01%) Q1 2023 Earnings Call Jun 02,
2022, 5:00 p.m. ET Contents: Prepared Remarks Questions and Answers
Call Participants Prepared Remarks: Operator Hello. Thank you for
standing by, and welcome to the CrowdStrike fiscal first  quarter 2023
results confere...
Node 2: Node ID: e605ab61-27fc-474e-9411-f3d6404dfad7
Text: Thank you, Maria, and thank you, all, for joining us. I will
start today's call  by summarizing three key points. First, fiscal
2023 is off to a fantastic start. We believe our Q1 results exemplify
that we have a winning formula that includes scale, growth,
profitability, and free cash flow. Second, we saw strength across the
platform, including...
Node 3: Node ID: 543be2e5-1824-40ce-aaa0-021abb326a83
Text: adoption metrics quarter after quarter. In

### Creating Vector Embeddings

In [None]:
 print("starting vector indexing by creating embeddings using ada 002")

 vector_index = VectorStoreIndex(nodes)

 print("finished vector indexing")

### Storing Vector Embeddings

In [8]:
vector_index.storage_context.persist()

### Access Vector Embeddings From Storage

In [9]:
from llama_index.core import StorageContext, load_index_from_storage

storage_context = StorageContext.from_defaults(persist_dir="./storage")
vector_index = load_index_from_storage(storage_context=storage_context)

### Function for GBP

In [10]:
def convert_usd_to_gbp(amount_usd, conversion_rate=0.7712):
    """Convert USD to GBP using the provided conversion rate."""
    return amount_usd * conversion_rate



import re

def find_usd_amounts(text):
    """Find all USD amounts in the text."""
    # Match patterns like $100 million or $1 billion
    return re.findall(r'\$([\d,]+(?:\.\d+)?)\s*(million|billion)', text, re.IGNORECASE)

def convert_units(amount, unit):
    """Convert amount based on units."""
    if unit.lower() == 'million':
        return amount * 1_000_000
    elif unit.lower() == 'billion':
        return amount * 1_000_000_000
    return amount

def integrate_conversion(answer, conversion_rate=0.7712):
    """Integrate USD to GBP conversion in the generated answer."""
    usd_amounts = find_usd_amounts(answer)
    for amount, unit in usd_amounts:
        amount_usd = float(amount.replace(',', ''))
        amount_usd_converted = convert_units(amount_usd, unit)
        amount_gbp_converted = convert_usd_to_gbp(amount_usd_converted, conversion_rate)
        amount_gbp_display = amount_gbp_converted / convert_units(1, unit)
        converted_text = f"${amount} {unit} ({amount_gbp_display:.2f} {unit.lower()} GBP)"
        answer = re.sub(rf'\${amount}\s*{unit}', converted_text, answer, flags=re.IGNORECASE)
    return answer


### Function for creating Tools

In [11]:
## TEST

def get_doc_tools(file_path: str, name: str) -> str:
    """Get vector query and summary query tools from a document."""

    def vector_query(query: str, page_numbers: Optional[List[str]] = None) -> str:
        """Use to answer questions over a given paper.

        Useful if you have specific questions over the paper.
        Always leave page_numbers as None UNLESS there is a specific page you want to search for.

        Args:
            query (str): the string query to be embedded.
            page_numbers (Optional[List[str]]): Filter by set of pages. Leave as NONE
                if we want to perform a vector search
                over all pages. Otherwise, filter by the set of specified pages.

        """

        page_numbers = page_numbers or []
        metadata_dicts = [{"key": "page_label", "value": p} for p in page_numbers]

        query_engine = vector_index.as_query_engine(
            similarity_top_k=2,
            filters=MetadataFilters.from_dicts(metadata_dicts, condition=FilterCondition.OR)
        )
        response = query_engine.query(query)
        response_text = str(response)  # Ensure response is converted to string

        # Integrate the USD to GBP conversion
        response_text_with_conversion = integrate_conversion(response_text)

        return response_text_with_conversion

    vector_query_tool = FunctionTool.from_defaults(
        name=f"vector_tool_{name}",
        fn=vector_query
    )

    def summary_query(query: str) -> str:
        """Use to generate summary for a given query."""
        response = summary_query_engine.query(query)
        response_text = str(response)  # Ensure response is converted to string

        # Integrate the USD to GBP conversion
        response_text_with_conversion = integrate_conversion(response_text)

        return response_text_with_conversion

    summary_index = SummaryIndex(nodes)

    summary_query_engine = summary_index.as_query_engine(
        response_mode="tree_summarize",
        use_async=True,
    )

    summary_tool = FunctionTool.from_defaults(
        name=f"summary_tool_{name}",
        fn=summary_query,
        description=f"Useful for summarization questions related to {name}",
    )

    return vector_query_tool, summary_tool


In [2]:
def get_doc_tools(
    file_path: str,
    name: str,
) -> str:
    """Get vector query and summary query tools from a document."""


    def vector_query(
        query: str,
        page_numbers: Optional[List[str]] = None
    ) -> str:
        """Use to answer questions over a given paper.

        Useful if you have specific questions over the paper.
        Always leave page_numbers as None UNLESS there is a specific page you want to search for.

        Args:
            query (str): the string query to be embedded.
            page_numbers (Optional[List[str]]): Filter by set of pages. Leave as NONE
                if we want to perform a vector search
                over all pages. Otherwise, filter by the set of specified pages.

        """

        page_numbers = page_numbers or []
        metadata_dicts = [
            {"key": "page_label", "value": p} for p in page_numbers
        ]

        query_engine = vector_index.as_query_engine(
            similarity_top_k=2,
            filters=MetadataFilters.from_dicts(
                metadata_dicts,
                condition=FilterCondition.OR
            )
        )
        response = query_engine.query(query)

        response_text = str(response)

        # Integrate the USD to GBP conversion
        response_text_with_conversion = integrate_conversion(response_text)

        return response_text_with_conversion

    vector_query_tool = FunctionTool.from_defaults(
        name=f"vector_tool_{name}",
        fn=vector_query
    )


    summary_index = SummaryIndex(nodes)


    summary_query_engine = summary_index.as_query_engine(
        response_mode="tree_summarize",
        use_async=True,
    )
    summary_tool = QueryEngineTool.from_defaults(
        name=f"summary_tool_{name}",
        query_engine=summary_query_engine,
        description=(
            f"Useful for summarization questions related to {name}"
        ),
    )

    return vector_query_tool, summary_tool

### Calling the function for creating tools

In [12]:
papers = [
    "CrowdStrike_FY2023_Q1.pdf",
    "CrowdStrike_FY2023_Q2.pdf",
    "CrowdStrike_FY2023_Q3.pdf"
]

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, paper.replace(".pdf", ""))
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]


Getting tools for paper: CrowdStrike_FY2023_Q1.pdf
Getting tools for paper: CrowdStrike_FY2023_Q2.pdf
Getting tools for paper: CrowdStrike_FY2023_Q3.pdf


In [13]:
initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [14]:
print(initial_tools)

[<llama_index.core.tools.function_tool.FunctionTool object at 0x7e91d5d29d50>, <llama_index.core.tools.function_tool.FunctionTool object at 0x7e91d5d2ab60>, <llama_index.core.tools.function_tool.FunctionTool object at 0x7e91d5d2bd90>, <llama_index.core.tools.function_tool.FunctionTool object at 0x7e91d6a13250>, <llama_index.core.tools.function_tool.FunctionTool object at 0x7e91d5cfa560>, <llama_index.core.tools.function_tool.FunctionTool object at 0x7e91d5cfba00>]


In [15]:
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-3.5-turbo")

In [16]:
len(initial_tools)

6

In [17]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    initial_tools,
    llm=llm,
    verbose=True
)
agent = AgentRunner(agent_worker)

### QnA

In [16]:
response = agent.query(
    "Tell me about the financial status of the company"
)

Added user message to memory: Tell me about the financial status of the company
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "financial status"}
=== Function Output ===
CrowdStrike reported strong financial results for the third quarter of fiscal year 2023. The company achieved significant growth in ending ARR, net new ARR, total revenue, and subscription revenue. Additionally, CrowdStrike demonstrated strong performance in terms of gross margins, operating expenses, operating income, and net income. The company also highlighted its focus on maintaining a balance between growth and profitability, as well as its commitment to generating free cash flow.
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q2 with args: {"query": "financial status"}
=== Function Output ===
The company reported strong financial results in the third quarter of fiscal year 2023. Total revenue grew by 53% year over year to reach $580.9 

In [45]:
response = agent.query("What was the revenue in quarter 3")

Added user message to memory: What was the revenue in quarter 3
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q3 with args: {"query": "revenue"}
=== Function Output ===
Revenue grew 53% over Q3 of the previous year to reach $580.9 million (447.99 million GBP). Subscription revenue specifically grew 53% over the same period to reach $547.4 million (422.15 million GBP). Professional services revenue was $33.5 million (25.84 million GBP), showing a 46% year-over-year growth.
=== LLM Response ===
The revenue in quarter 3 was $580.9 million.


In [47]:
response = agent.query("recipe to make a coffee")

Added user message to memory: recipe to make a coffee
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "recipe to make a coffee"}
=== Function Output ===
Efficiency in sales and customer acquisition is key to success. By leveraging an e-commerce platform with a high magic number, companies can streamline the process of gaining new customers and cross-selling to existing ones. Prioritizing customer needs and offering a seamless buying experience can set a company apart from its competition. Additionally, focusing on providing value and addressing critical issues, such as cybersecurity threats, can lead to upselling opportunities and increased deal sizes. Leveraging technology and a strong sales team can drive efficiency and contribute to overall organizational success.
=== LLM Response ===
I couldn't find a specific recipe for making coffee in the document. However, the document emphasizes the importance of efficiency in sales and custome

In [50]:
response = agent.query("Who is the CFO of the company")

Added user message to memory: Who is the CFO of the company
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "CFO"}
=== Function Output ===
Burt Podbere
=== LLM Response ===
The CFO of the company is Burt Podbere.


In [46]:
response = agent.query("How was the performance of the company")

Added user message to memory: How was the performance of the company
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "performance of the company"}
=== Function Output ===
The company has recently achieved its strongest net new ARR quarter in its history, reaching a record $218 million (168.12 million GBP) in net new ARR. This performance indicates strong health in the business, with the metric being a focal point for the entire company. The CEO expressed satisfaction with the results and emphasized the importance of this metric for the company's performance.
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q2 with args: {"query": "performance of the company"}
=== Function Output ===
The company has recently achieved its strongest net new ARR quarter in its history, reaching a record $218 million (168.12 million GBP) in net new ARR. This performance indicates strong health in the business, with the company being 

In [48]:
response = agent.query("How does the company performed in quarter 3 compared to quarter 1")

Added user message to memory: How does the company performed in quarter 3 compared to quarter 1
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "financial performance"}
=== Function Output ===
The company reported strong financial performance in the third quarter of fiscal year 2023. Total revenue grew by 53% over the same quarter last year, reaching $580.9 million (447.99 million GBP). Subscription revenue also saw significant growth, increasing by 53% to reach $547.4 million (422.15 million GBP). Non-GAAP operating income grew by 77% year over year to reach a record $89.7 million (69.18 million GBP), with an operating margin improvement of 2 percentage points to reach 15%. Additionally, non-GAAP net income attributable to the company more than doubled over the prior year, growing to a record $96.1 million (74.11 million GBP). Cash flow from operations and free cash flow also experienced substantial growth, increasing by 53% and 41% ye

In [14]:
response = agent.query("How does the company performed in quarter 3 compared to quarter 1")
print(str(response))

Added user message to memory: How does the company performed in quarter 3 compared to quarter 1
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "financial performance"}
=== Function Output ===
The company reported strong financial performance in the third quarter of fiscal year 2023. Total revenue grew by 53% over the same quarter last year, reaching $580.9 million. Subscription revenue also saw significant growth, increasing by 53% to reach $547.4 million. Non-GAAP operating income grew by 77% year over year to reach a record $89.7 million, with an operating margin improvement of 2 percentage points to reach 15%. Additionally, non-GAAP net income attributable to the company more than doubled over the prior year, growing to a record $96.1 million. Cash flow from operations and free cash flow also experienced substantial growth, increasing by 53% and 41% year over year, respectively.
=== Calling Function ===
Calling function: vector_tool

In [15]:
response = agent.query("Who are the analyst who joined the call")

Added user message to memory: Who are the analyst who joined the call
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "analyst joined the call"}
=== Function Output ===
Joe Gallo from Jefferies, Saket Kalia from Barclays, and several other analysts joined the call.
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q2 with args: {"query": "analyst joined the call"}
=== Function Output ===
Joe Gallo from Jefferies, Saket Kalia from Barclays, and several other analysts joined the call.
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q3 with args: {"query": "analyst joined the call"}
=== Function Output ===
Joe Gallo from Jefferies, Andrew Nowinski from Well Fargo Securities, Brian Essex from Goldman Sachs, Joel Fishbein from Truist Securities, Matt Hedberg from RBC Capital Markets, Rob Owens from Piper Sandler, Alex Henderson from Needham and Company, Roger Boyd from UBS, Daniel Park from C

In [16]:
response = agent.query("Who are the analyst who joined the call adn list out the analysts who joined the call across the 3 quarters")

Added user message to memory: Who are the analyst who joined the call adn list out the analysts who joined the call across the 3 quarters
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "list of analysts who joined the call"}
=== Function Output ===
Andrew Nowinski, Brian Essex, Joel Fishbein, Matt Hedberg, Rob Owens, Alex Henderson, Roger Boyd, Daniel Park, Joseph Gallo, Shaul Eyal, Fatima Boolani, Brad Reback, Janet Zhang, Sterling Auty, Hamza Fodderwala, Tal Liani, John DiFucci, Brad Zelnick, Joe Gallo
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q2 with args: {"query": "list of analysts who joined the call"}
=== Function Output ===
Andrew Nowinski, Brian Essex, Joel Fishbein, Matt Hedberg, Rob Owens, Alex Henderson, Roger Boyd, Daniel Park, Joseph Gallo, Shaul Eyal, Fatima Boolani, Brad Reback, Janet Zhang, Sterling Auty, Hamza Fodderwala, Tal Liani, John DiFucci, Brad Zelnick, Joe Gallo
=== Calling Func

In [17]:
response = agent.query("How are the focus areas for the company changed from quarter 1 to quarter 3")

Added user message to memory: How are the focus areas for the company changed from quarter 1 to quarter 3
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "focus areas"}
=== Function Output ===
Complete, Spotlight, Identity Threat Detection
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q3 with args: {"query": "focus areas"}
=== Function Output ===
Complete, Spotlight, Identity Threat Detection
=== LLM Response ===
The focus areas for the company remained the same from quarter 1 to quarter 3. The focus areas are "Complete, Spotlight, Identity Threat Detection."


In [18]:
response = agent.query("What are the steps that imporved the performance of the company in quarter 2 if we compare it with quarter 1. What were the the result of those steps")

Added user message to memory: What are the steps that imporved the performance of the company in quarter 2 if we compare it with quarter 1. What were the the result of those steps
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "steps that improved the performance of the company"}
=== Function Output ===
Investing in the e-commerce platform, increasing the number of trials available, prioritizing customer needs, leveraging the trial program to drive sales efficiency, upselling modules through the e-commerce platform, and focusing on go-to-market execution are steps that have improved the performance of the company.
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q2 with args: {"query": "steps that improved the performance of the company"}
=== Function Output ===
Investing in the e-commerce platform, increasing the number of trials available, prioritizing customer needs, leveraging the trial program to drive sal

In [19]:
response = agent.query("Who are the analyst who joined the call adn list out the analysts who joined the call across the 3 quarters. list in tabular format")

Added user message to memory: Who are the analyst who joined the call adn list out the analysts who joined the call across the 3 quarters. list in tabular format
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "analysts who joined the call"}
=== Function Output ===
Brian Essex, Daniel Park, Shaul Eyal, Joseph Gallo, Brad Reback, Janet Zhang, Sterling Auty, Hamza Fodderwala, Tal Liani, John DiFucci, Brad Zelnick
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q2 with args: {"query": "analysts who joined the call"}
=== Function Output ===
Andrew Nowinski, Brian Essex, Joel Fishbein, Matt Hedberg, Rob Owens, Alex Henderson, Roger Boyd, Daniel Park, Joseph Gallo, Shaul Eyal, Fatima Boolani, Brad Reback, Janet Zhang, Sterling Auty, Hamza Fodderwala, Tal Liani, John DiFucci, Brad Zelnick, Joe Gallo
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q3 with args: {"query": "analysts who joined t

In [49]:
response = agent.query("summary of quarter 1")

Added user message to memory: summary of quarter 1
=== Calling Function ===
Calling function: summary_tool_CrowdStrike_FY2023_Q1 with args: {"input": "summary"}
=== Function Output ===
CrowdStrike demonstrated strong performance in the second and third quarters of fiscal year 2023, achieving record net new ARR, revenue exceeding $500 million for the first time, and significant customer growth. Despite macroeconomic headwinds impacting net new ARR, the company remains confident in its market position and the resilience of the cybersecurity market. They are focused on consolidating on the Falcon platform, driving module adoption, and expanding their market share while balancing growth with profitability and free cash flow. CrowdStrike's leadership emphasizes the favorable competitive landscape, strong win rates, and the launch of new modules to meet customer needs. The company's optimism about growth opportunities in the cybersecurity market is evident, despite ongoing macro challenges.


In [21]:
response = agent.query("summary of quarter 1")

Added user message to memory: summary of quarter 1
=== Calling Function ===
Calling function: summary_tool_CrowdStrike_FY2023_Q1 with args: {"query": "summary"}
=== Function Output ===
CrowdStrike had a strong second quarter in fiscal year 2023, achieving record net new ARR, accelerated growth, and milestones like exceeding $2 billion (1.54 billion GBP) ARR and record revenue exceeding $500 million (385.60 million GBP) for the first time. The company added over 1,700 net new customers and saw strong module performance in Falcon Complete, Identity Protection, and Humio. Despite macroeconomic headwinds impacting net new ARR, the company remains confident in its market position and the resilience of the cybersecurity market. They are focusing on expanding their pipeline, leveraging their partner ecosystem, and maintaining a disciplined approach to unit economics. CrowdStrike aims to balance growth with profitability and free cash flow, targeting a 30% free cash flow margin in FY '24. The 

In [23]:
response = agent.query("revenue of the company in quarter 1")

Added user message to memory: revenue of the company in quarter 1
=== Calling Function ===
Calling function: vector_tool_CrowdStrike_FY2023_Q1 with args: {"query": "revenue of the company"}
=== Function Output ===
The company's revenue for the third quarter was $580.9 million (447.99 million GBP), reflecting a 53% growth over the same quarter of the previous year.
=== LLM Response ===
The revenue of the company in quarter 1 was $580.9 million.
