# Agentic RAG

Agentic RAG describes an AI agent-based implementation of RAG.


## Why Agentic RAG ?
Typical RAG applications have two considerable limitations:

1. The naive RAG pipeline only considers one external knowledge source. However, some solutions might require two external knowledge sources, and some solutions might require external tools and APIs, such as web searches.
2. They are a one-shot solution, which means that context is retrieved once. There is no reasoning or validation over the quality of the retrieved context.

The RAG agent can then reason and act over the following example retrieval scenarios:

1. Decide whether to retrieve information or not
2. Decide which tool to use to retrieve relevant information
3. Formulate the query itself
4. Evaluate the retrieved context and decide whether it needs to re-retrieve.

In [1]:
pip install llama_index  llama-index-embeddings-langchain  llama-index-llms-langchain

### Load Data

In [2]:
import nest_asyncio

nest_asyncio.apply()

In [3]:
from llama_index.core import SimpleDirectoryReader

# load documents
documents = SimpleDirectoryReader(input_files=["2005.14165v4.pdf"]).load_data()

### Define LLM and Embedding model

In [None]:
from llm_call import *
chat_llm=get_llm()
embedding_model=get_embedding_model()

In [4]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

In [8]:
from llama_index.core import Settings

Settings.embed_model = embedding_model
Settings.llm = chat_llm

### Define Summary Index and Vector Index over the Same Data

In [9]:
from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

### Define Query Engines and Set Metadata

In [10]:
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
vector_query_engine = vector_index.as_query_engine()

In [11]:
from llama_index.core.tools import QueryEngineTool


summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to Language Models are Few-Shot Learners"
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from the Language Models are Few-Shot Learners paper."
    ),
)

### Define Router Query Engine

In [12]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector


query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
)

In [3]:
response = query_engine.query("What is the summary of the Language Models are Few-Shot Learners this paper in less than 100 words?")
print(str(response))

`Response `

Selecting query engine 0:

Choice 1 is directly related to summarization questions about the 'Language Models are Few-Shot Learners' paper..
The paper "Language Models are Few-Shot Learners" demonstrates that large-scale language models, such as GPT-3 with 175 billion parameters, can perform a wide range of tasks with minimal task-specific training. By leveraging few-shot, one-shot, and zero-shot learning, these models achieve competitive performance across various benchmarks, including translation, question answering, and commonsense reasoning. The study highlights the potential of these models to generalize from limited data, reducing the need for extensive fine-tuning, while also addressing issues like bias, fairness, and energy consumption.

In [2]:
response = query_engine.query(
    "How GPT2 is diiferent from GPT3?"
)
print(str(response))

`Response `

Selecting query engine 1: 

The question 'How GPT2 is different from GPT3?' requires retrieving specific context from the Language Models are Few-Shot Learners paper, which is best addressed by choice 2..
GPT-3 differs from GPT-2 in several key ways. Firstly, GPT-3 has a significantly larger dataset and model size, about two orders of magnitude larger than those used for GPT-2. This includes a large amount of Common Crawl data, which increases the potential for contamination and memorization. Despite this, GPT-3 does not overfit its training set by a significant amount. Additionally, GPT-3's training data includes 7% of text in languages other than English, whereas GPT-2 primarily used an English-only dataset due to capacity concerns. GPT-3 also shows improved performance in various tasks, including translation and question answering, and its performance scales smoothly with model size, indicating that it continues to absorb more knowledge as its capacity increases.