# Agentic-RAG Application - Document Q&A

In this notebook, we are going to see in a step-by-step manner how to build a document Q&A application using an `agentic RAG` pipeline. In the context of RAG applications with large amount of files, having an agent can reduce the retrieval step time and improve the generation. One idea could be that this agent can reason and decide which chunks of a specific PDF document could potentially give a better answer to the query, thus the LLM is not reading all the chunks.
    
    -> To implement: Summarizing the documents using LangChain, provide the LLM with this context to filter out the chunks by source and retrieve only from those filtered chunks.

As in the previous notebooks, **Gemini AI models** will be used for embedding and generating answers and **ChromaDB** as the vector database. The RAG module will be constructed manually instead of using LangChain for learning purposes. 

**LangChain** will be also used, but to ease up integration of components in building the app.

## Getting Started

* Install the python SDK to use the `Gemini API`
* Install langchain_community (this package contains third-party integrations -> e.g. pyPDF loaders`) 

In [18]:
%pip install -qU langchain-google-genai
%pip install -qU langchain_community

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## Libraries

In [19]:
# %pip install YPython
# %pip install dotenv
# %pip install langChain

In [20]:
import os
import chromadb
import google.generativeai as genai

from dotenv import load_dotenv  # to load environment variables (for API key variable)
from pathlib import Path  

from IPython.display import Markdown  # to get output in Markdown style

from langchain_community.document_loaders import PyPDFLoader, PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter  # langChain text splitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings, GoogleGenerativeAI  # langChain access to google GenAI embedding models
from chromadb import Documents, EmbeddingFunction, Embeddings
from google.api_core import retry
from langchain.chains.combine_documents import create_stuff_documents_chain


## Setup Google API key

https://ai.google.dev/gemini-api/docs/api-key 

* Secure your API key in a environment variable file (.env) and load it using `load_dotenv()`
* Ignore the .env file in gitignore

In [21]:
dotenv_path = Path('./env')
load_dotenv()

GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY')

## Q&A System - Step by step

### 1 - Load documents
The first step is to load PDF documents into the system. We use `PyPDFLoader` from the `langchain_community` library to achieve this.

Instead of loading all the files into documents, let´s do it one by one to call the LLM and get the summary. It will be used the `map/reduce technique`.

In [22]:
# LLM generative model from gemini
llm_generative = GoogleGenerativeAI(model="gemini-2.0-flash-001")

In [23]:
# Function to call the LLM to get a PDF summary
def get_pdf_summary(doc: str) -> str: 

    # Prompt template
    prompt_summary = """
    Provide clear, concise summary of the provided document in maximum 3 lines without bullet points:

    {context} 
    """

    prompt_summary = prompt_summary.format(context = doc)
    summary = llm_generative.invoke(prompt_summary)

    return summary


In [24]:
DirectoryPath = "../Data/"
fileNames = [f for f in os.listdir(DirectoryPath) if f.endswith('.pdf')]
filePaths = [DirectoryPath + f for f in fileNames]

summaries = {}

for path in filePaths:

    loader = PyPDFLoader(path, mode="single")
    doc = loader.load()

    summary = get_pdf_summary(doc[0].page_content)
    summaries[path] = summary


### 2 - Split the Documents into Chunks

To handle large documents efficiently, we split the documents into smaller chunks using the `RecursiveCharacterTextSplitter` class.

Every chunk has a `metadata` param (dictionary) that contains the key `source` of it (pdf path).

* This will be used to filter out those chunks related to a single PDF that have higher chances of containing the answer. 

`Remember`: the agent will provide the LLM with the summaries to determine which PDF has highly probability to contain the answer, therefore, the RAG process will use only those related-chunks.

In [25]:
loader = PyPDFDirectoryLoader("../Data/")  # Alternatively, to load multiple files in a folder
docs = loader.load()

# Chunk_size: number of characters in the chunk
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=1000)
chunks = text_splitter.split_documents(docs)

### 3 - Embeddings & Vector Store for Document Retrieval

We now store the document chunks and their embeddings in a vector database, which will allow us to retrieve similar documents based on user queries and filtering by source of metadata.

In this example, we are using Chroma as our vector database. Chroma is one of the many options available for storing and retrieving embeddings efficiently. 

1. Create a Chroma client
chroma_client = chromadb.Client()

2. Create a collection: where you'll store your embeddings, documents, and any additional metadata. Collections index your embeddings and documents, and enable efficient retrieval and filtering
    * By default, Chroma uses the **Sentence Transformers** `all-MiniLM-L6-v2` model to create embeddings.
    * to customize one, we just need to implement the `embedding function` protocol.

3. Add documents to the collection: Chroma will store your text and handle embedding and indexing automatically. You can also customize the embedding model. You must provide unique string IDs for your documents.

In [29]:
DB_NAME = "my_rag_db"

genai.configure(api_key=GOOGLE_API_KEY)

# 1. Create a Chroma client
chroma_client = chromadb.Client()

In [30]:
# 2. Collection: Custom embedding function
# Define new class that inherits from "EmbeddingFunction" class all the properties and methods and can add its own
class GeminiEmbeddingFunction(EmbeddingFunction):
    # Specify whether to generate embeddings for documents, or queries (Class attribute: document_mode)
    document_mode = True

    # Define a method (_class_) tha makes the class instance callable like a function
    def __call__(self, input: Documents) -> Embeddings:
        if self.document_mode:
            embedding_task = "retrieval_document"
        else:
            embedding_task = "retrieval_query"

        retry_policy = {"retry": retry.Retry(predicate=retry.if_transient_error)}

        response = genai.embed_content(
            model="models/text-embedding-004",
            content=input,
            task_type=embedding_task,
            request_options=retry_policy,
        )
        # Response will be a dictionary with metadata and key "embedding" that we are interested in
        return response["embedding"]
    

embed_fn = GeminiEmbeddingFunction()
embed_fn.document_mode = True

db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn)

In [31]:
# 3. Add documents to the collection
db.add(documents=[chunks[i].page_content for i in range(len(chunks))],
       metadatas=[chunks[j].metadata for j in range(len(chunks))],
       ids=[str(k) for k in range(len(chunks))])

### 5 - Retrieve Documents Based on a Query

1. Query -> Retrieval Agent: agent to take the summaries and ask the LLM wich one is more likely to have the info.

2. Retrieval Agent: based on the answer, take decision to do a vector search filtering the collection by metadatata(source=path of the pdf)

3. Retrieval Agent -> Generative LLM: ask to answer the query with the retrieved documents

4. Generative LLM -> Answer

In [81]:
# Agent function to give the summaries and the query to a generative LLM
def get_potential_document(summaries: dict, query: str) -> str:

    prompt_agent = """
    You are an expert assistant able to understand an user query and identify a document that can be relevant to answer it, 
    based on the summaries of every document.

    These are the available documents and their summaries, in the format of <document_name>:<summary>. 
    {context}

    This is the user query: {query}

    Give me the document name that is more relevant to solve the user query:
    """

    prompt_agent = prompt_agent.format(context = summaries, query=query)
    pdf_name = llm_generative.invoke(prompt_agent)

    return pdf_name

In [82]:
# Calling the function with the llm_generative model
query = "Tell me what you know about agents in a few lines, no more than 5"
pdf_name = get_potential_document(summaries, query)

In [78]:
# Vector Search filtering by pdf_name
embed_fn.document_mode = False  # mode for embedding query

retrieved_docs = db.query(query_texts=query, where={'source':pdf_name}, n_results=20)

### 6 - Augmented Generation: build a Question-Answering (Q&A) System

Now that we have found a relevant passage from the set of documents, the retrieval step, the next one is the augmented generation step. To that end, we are going to use a generative AI model from Gemini, the one we have been using so far for generating context `llm_generative`.

In addition, define a proper prompt to sent to the LLM model together with the input query and the context.

In [84]:
prompt_final_answer= """
You are a AI expert. Provide clear, concise answers based on the provided context. 
If the information is not found in the context, state that the answer is unavailable. 
Use a maximum of three sentences.

QUERY: {query}
CONTEXT: {context}
OUTPUT:
"""

prompt_final_answer = prompt_final_answer.format(query=query, context=retrieved_docs['documents'])
answer = llm_generative.invoke(prompt_final_answer) 
Markdown(answer)

A Generative AI agent is an application that tries to achieve a goal by observing the world and acting upon it using its available tools. Agents are autonomous and can act independently, especially when given proper goals. They can also proactively reason about what to do next, even without explicit instructions.