#Agentic RAG Application using LlamaIndex - Router Query Engine


In this Tutorial, we’ll implement a basic Agentic RAG application using Llama-index. This is the first tutorail in a series of tutorials on the Agentic Rag Application using LlamaIndex. These tutorials are inspired from the deepLearning.AI course on `Agentic Rag Application using LlamaIndex`.


In these tutorials, we will cover:

* **Router Query Engines:** The simplest form of an agentic RAG. It allows us to add logic statements that help the LLM determine the best route for a specific task, based on the tasks and available tools.

* **Tool Calling:** We’ll show how to add custom tools to the agentic RAG architecture. This involves creating interfaces for agents to choose a tool and letting the LLM provide the necessary arguments to call these Python functions.

* **Agentic RAG with Multi-step Reasoning:** We’ll enhance the agentic RAG with multi-step reasoning for more complex tasks.

* **Agentic RAG with Multi-step Reasoning and Multiple Documents:** We’ll extend the multi-step reasoning to work with multiple documents, enabling the system to handle diverse and intricate tasks.

#Router Query Engine
In this tutorial we will go though the Router Query Engine which is the the most simplestic form of Agentic RAG in LlamaIndex. This is illustrated witht the below figure.

![Router Query Engine](https://raw.githubusercontent.com/abdulsamadkhan/AgenticRag/main/images/RouterQueryEngine.jpg)


# 1. Requirements
**installing libraries:**This will install the necessary libraries.

In [1]:
!pip install llama-index==0.10.27
!pip install llama-index-llms-openai==0.1.15
!pip install llama-index-embeddings-openai==0.1.7

Collecting llama-index==0.10.27
  Downloading llama_index-0.10.27-py3-none-any.whl (6.9 kB)
Collecting llama-index-agent-openai<0.3.0,>=0.1.4 (from llama-index==0.10.27)
  Downloading llama_index_agent_openai-0.2.5-py3-none-any.whl (13 kB)
Collecting llama-index-cli<0.2.0,>=0.1.2 (from llama-index==0.10.27)
  Downloading llama_index_cli-0.1.12-py3-none-any.whl (26 kB)
Collecting llama-index-core<0.11.0,>=0.10.27 (from llama-index==0.10.27)
  Downloading llama_index_core-0.10.37.post1-py3-none-any.whl (15.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.4/15.4 MB[0m [31m43.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting llama-index-embeddings-openai<0.2.0,>=0.1.5 (from llama-index==0.10.27)
  Downloading llama_index_embeddings_openai-0.1.9-py3-none-any.whl (6.0 kB)
Collecting llama-index-indices-managed-llama-cloud<0.2.0,>=0.1.2 (from llama-index==0.10.27)
  Downloading llama_index_indices_managed_llama_cloud-0.1.6-py3-none-any.whl (6.7 kB)
Collecting llama-in

We’ll also use the nest-asyncio library since Llama-index uses a lot of asyncio functionality in the background

In [2]:
import nest_asyncio
nest_asyncio.apply()

**Setting OpenAI Key:**Load the Open API Key, in the colab environment.

In [3]:
#loading openAI API Key
!pip install openai
import openai
from google.colab import userdata
openai.api_key = userdata.get('OPENAI_KEY')
from openai import OpenAI

client = OpenAI(
    # This is the default and can be omitted
    api_key=openai.api_key,
)




**Pdf download:**This code snippet will download the pdf, You can add a pdf of your own choice.

In [4]:
# Import necessary libraries
import requests

# Define the URL and filename
url = "https://arxiv.org/pdf/2308.00352"
filename = "MetaGPT.pdf"

# Send a GET request to the URL
response = requests.get(url)

# Check for successful response
if response.status_code == 200:
    # Open the file in write binary mode
    with open(filename, "wb") as output_file:
        # Write the content of the response to the file
        output_file.write(response.content)
    print("Successfully downloaded the PDF file")
else:
    print("Error: Failed to download the PDF file")

Successfully downloaded the PDF file


#2. Loading data and creating chunks
`SimpleDirectoryReader` is a built-in tool designed to streamline the process of loading documents from your local machine into the system for indexing and searching

In [5]:
from llama_index.core import SimpleDirectoryReader
# load documents
documents = SimpleDirectoryReader(input_files=["MetaGPT.pdf"]).load_data()

The `SentenceSplitter` will split our pdf into smaller documents with size `1024` charaters. More formally these smaller documents are called `node`, which contains the `text` of given size and `metadata`.  
The function `get_nodes_from_documents`  will return a list of nodes.
This is llustrated using the below diagram.
![Node_List](https://raw.githubusercontent.com/abdulsamadkhan/AgenticRag/main/images/Nodes_List.jpg)




In [6]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

In [7]:
# printing the type of nodes and no of elements in the nodes list
print(type(nodes))
print(len(nodes))
#printing the metadata and text of the first node in the list
print(nodes[0].metadata)
print(nodes[0].text)


<class 'list'>
32
{'page_label': '1', 'file_name': 'MetaGPT.pdf', 'file_path': 'MetaGPT.pdf', 'file_type': 'application/pdf', 'file_size': 16715764, 'creation_date': '2024-05-20', 'last_modified_date': '2024-05-20'}
Preprint
METAGPT: M ETA PROGRAMMING FOR A
MULTI -AGENT COLLABORATIVE FRAMEWORK
Sirui Hong1∗, Mingchen Zhuge2∗, Jonathan Chen1, Xiawu Zheng3, Yuheng Cheng4,
Ceyao Zhang4,Jinlin Wang1,Zili Wang ,Steven Ka Shing Yau5,Zijuan Lin4,
Liyang Zhou6,Chenyu Ran1,Lingfeng Xiao1,7,Chenglin Wu1†,J¨urgen Schmidhuber2,8
1DeepWisdom,2AI Initiative, King Abdullah University of Science and Technology,
3Xiamen University,4The Chinese University of Hong Kong, Shenzhen,
5Nanjing University,6University of Pennsylvania,
7University of California, Berkeley,8The Swiss AI Lab IDSIA/USI/SUPSI
ABSTRACT
Remarkable progress has been made on automated problem solving through so-
cieties of agents based on large language models (LLMs). Existing LLM-based
multi-agent systems can already solve simple dialogu

#3. LLM and embedding model


The Llamaindex needs embedding model for the creation of `Summary Index` and `Vector Index`. This can be setup it in the global Settings object.
Below code configures LlamaIndex to:
* Use the OpenAI GPT-3.5 turbo model for Large Language Model tasks.
* Use the OpenAI "text-embedding-ada-002" model for generating text embeddings.


In [8]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

#4. Creation of Summary Index and Vector Index
We will now create Summary Index and Vector Index on our Nodes_List.

**Summary Index:**
The summary index simply stores Nodes as a sequential chain and during query time, if no other query parameters are specified, LlamaIndex simply loads all Nodes in the list into our Response Synthesis module, as illustrated in the below figure.


![Node_List](https://raw.githubusercontent.com/abdulsamadkhan/AgenticRag/main/images/Summary_index.jpg)

**Vector Index**
The vector store index stores each Node and a corresponding embedding in a Vector Store and during query time a vector store index involves fetching the top-k most similar Nodes, and passing those into our Response Synthesis module., as illustrated in the below figure.

![Node_List](https://raw.githubusercontent.com/abdulsamadkhan/AgenticRag/main/images/Vector_Index.jpg)



In [9]:
from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

#5. Creation of QueryEngines on the Summary Index and Vector Index
The code `summary_index.as_query_engine`  in LlamaIndex converts a Summary Index and Vector Index objects into a  query engine. This query engine allows you to search the summaries stored within the Summary Index and Vector Index efficiently.

In [10]:
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
vector_query_engine = vector_index.as_query_engine()

#6. Converting QueryEngines into Tools
A query tool is simply a query engine with metadata. This helps the router query engine to then be able to decide what query engine tool to route to depending on the query it receives.

In [11]:
from llama_index.core.tools import QueryEngineTool


summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to MetaGPT"
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from the MetaGPT paper."
    ),
)

#7. Define Router Query Engine
We will create a router query engine tool which will enable us to use all the query tools we created from the query engines, as shown in the figure below.

![Node_List](https://raw.githubusercontent.com/abdulsamadkhan/AgenticRag/main/images/RouterQueryEngine_Partial.jpg)


**LLMSingleSelector:** This is a selector that uses the LLM to select a single choice from a list of choices.

In [12]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector


query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
)

#8. Testing the code
Lets query the `RouterQueryEngine` to check which tool is activated by it.

In [18]:
response = query_engine.query("summarize the paper ")
print(str(response))
print(len(response.source_nodes))

[1;3;38;5;200mSelecting query engine 0: Useful for summarization questions related to MetaGPT.
[0mThe paper introduces MetaGPT, a meta-programming framework for LLM-based multi-agent systems that enhances problem-solving processes through collaborative interactions. MetaGPT incorporates Standard Operating Procedures (SOPs) to streamline workflows, assign specialized roles to agents, and improve task decomposition. By utilizing structured communication interfaces and an executive feedback mechanism, MetaGPT achieves state-of-the-art performance in code generation benchmarks. The framework emphasizes role specialization, workflow efficiency, and communication protocols to ensure effective collaboration among agents with diverse expertise. Additionally, the paper discusses the software development process using MetaGPT, detailing the roles of various agents and highlighting its effectiveness in transforming abstract requirements into detailed designs.
32


From the output we cans see that the RouterQueryEngine routed the query to Summary Index Tool and response used all the nodes,i.e, present in the Summary Index.


In [19]:
response = query_engine.query(
    "How do agents share information with other agents?"
)
print(str(response))
print(len(response.source_nodes))

[1;3;38;5;200mSelecting query engine 1: This choice is more relevant as it specifically mentions retrieving specific context, which is necessary for understanding how agents share information with other agents..
[0mAgents share information with other agents by using a shared message pool to publish structured messages. They can also subscribe to relevant messages based on their profiles. Additionally, agents monitor the environment (i.e., the message pool) to spot important observations, such as messages from other agents, which can either directly trigger actions or assist in completing tasks.
2


From the response we can observe that the RouterQueryEngine routed the query to Vector Index Tool and response  was generated using only two relevant nodes.


#9. Combining the Code
Now that we have understood this basic pipeline, let’s move ahead into converting this into a pipeline function that we can utilize in the next tutorials:


In [21]:
async def create_router_query_engine(
    document_fp: str,
    verbose: bool = True,
) -> RouterQueryEngine:
    # load lora_paper.pdf documents
    documents = SimpleDirectoryReader(input_files=[document_fp]).load_data()

    # chunk_size of 1024 is a good default value
    splitter = SentenceSplitter(chunk_size=1024)
    # Create nodes from documents
    nodes = splitter.get_nodes_from_documents(documents)

    # LLM model
    Settings.llm = OpenAI(model="gpt-3.5-turbo")
    # embedding model
    Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

    # summary index
    summary_index = SummaryIndex(nodes)
    # vector store index
    vector_index = VectorStoreIndex(nodes)

    # summary query engine
    summary_query_engine = summary_index.as_query_engine(
        response_mode="tree_summarize",
        use_async=True,
    )

    # vector query engine
    vector_query_engine = vector_index.as_query_engine()

    summary_tool = QueryEngineTool.from_defaults(
        query_engine=summary_query_engine,
        description=(
            "UUseful for summarization questions related to MetaGPT."
        ),
    )

    vector_tool = QueryEngineTool.from_defaults(
        query_engine=vector_query_engine,
        description=(
            "Useful for retrieving specific context from the the MetaGPT paper."
        ),
    )


    query_engine = RouterQueryEngine(
        selector=LLMSingleSelector.from_defaults(),
        query_engine_tools=[
            summary_tool,
            vector_tool,
        ],
        verbose=verbose
    )


    return query_engine


query_engine = await create_router_query_engine("MetaGPT.pdf")
response = query_engine.query("What is the summary of the document?")
print(str(response))

[1;3;38;5;200mSelecting query engine 0: The document is asking for a summary related to MetaGPT, so choice 1 is more relevant as it mentions summarization questions related to MetaGPT..
[0mThe document discusses MetaGPT, a meta-programming framework that utilizes Standard Operating Procedures (SOPs) to enhance the problem-solving capabilities of multi-agent systems based on Large Language Models (LLMs). MetaGPT incorporates role specialization, structured communication interfaces, and workflows based on SOPs to streamline tasks and improve code generation quality. It outperforms previous methods in code generation tasks, demonstrating improved performance in benchmarks like HumanEval and MBPP. The framework facilitates collaboration among agents through efficient sharing mechanisms and workflow management. MetaGPT's executable feedback mechanism enhances code quality during runtime, showcasing state-of-the-art performance in various evaluations and experiments. The document also expl

# Resources
* https://weaviate.io/developers/weaviate/concepts/vector-index
* https://python.langchain.com/v0.1/docs/modules/data_connection/vectorstores/
https://docs.llamaindex.ai/en/stable/module_guides/indexing/index_guide/
