<a href="https://colab.research.google.com/github/NadeemMughal/RAG_and_CAG/blob/main/CAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Implementing RAG and CAG: A Comprehensive Guide with FAISS Integration and Document Summarization

## Author
**Muhammad Nadeem**

## Introduction
This notebook provides a detailed implementation of Retrieval-Augmented Generation (RAG) and Context-Aware Generation (CAG), along with their variations:
- **RAG with FAISS**
- **CAG with Summarization of Documents**
- **CAG without Summarization of Documents**

You can use this notebook to understand and execute these methods with your own documents to achieve insightful results.

## Implemented Techniques
### 1. **Retrieval-Augmented Generation (RAG)**
RAG combines retrieval-based approaches with generative models to enhance text generation by leveraging external knowledge sources.

### 2. **Context-Aware Generation (CAG)**
CAG improves generation by considering the broader context of a document or dataset, ensuring more meaningful and relevant outputs.

### 3. **RAG with FAISS**
FAISS (Facebook AI Similarity Search) is integrated into RAG for efficient document retrieval, enabling fast and scalable similarity searches.

### 4. **CAG with Document Summarization**
This approach preprocesses documents by summarizing them before using them for context-aware generation, reducing noise and improving efficiency.

### 5. **CAG without Document Summarization**
In this method, the full document is used for context-aware generation, ensuring that no information is lost during preprocessing.

## How to Use This Code
```markdown
1. Clone or download the notebook.
2. Install the required dependencies.
3. Replace the sample documents with your own.
4. Run the notebook and analyze the results.


# RAG vs CAG Implementation

In [None]:
# Install dependencies
!pip install PyMuPDF langchain_google_genai google-generativeai faiss-cpu sentence-transformers nltk


In [32]:

import fitz  # PyMuPDF for extracting text from PDFs
import nltk
import numpy as np
import faiss
from nltk.tokenize import sent_tokenize
from sentence_transformers import SentenceTransformer
from google.colab import userdata
from langchain_google_genai import ChatGoogleGenerativeAI

# Load Gemini API Key
GEMINI_API_KEY = userdata.get("GEMINI_API_KEY")

# Initialize Gemini model
model = ChatGoogleGenerativeAI(model="gemini-2.0-flash", api_key=GEMINI_API_KEY)

"""### Step 1: Extract Text from PDF"""

def extract_text_from_pdf(pdf_path):
    """Extracts text from a PDF file."""
    doc = fitz.open(pdf_path)
    return "\n".join([page.get_text("text") for page in doc])

pdf_text = extract_text_from_pdf("/content/2412.18199v1.pdf")

"""### Step 2: Chunk Text for Context Window"""

nltk.download('punkt')

def chunk_text(text, max_length=4096):
    """Splits text into manageable chunks for LLM context window."""
    sentences = sent_tokenize(text)
    chunks, current_chunk = [], ""

    for sentence in sentences:
        if len(current_chunk) + len(sentence) < max_length:
            current_chunk += sentence + " "
        else:
            chunks.append(current_chunk.strip())
            current_chunk = sentence + " "

    if current_chunk:
        chunks.append(current_chunk.strip())

    return chunks

text_chunks = chunk_text(pdf_text, max_length=4096)
print(f"Total Chunks: {len(text_chunks)}")

"""### Step 3: Store Context in FAISS for Retrieval"""

embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = np.array([embedding_model.encode(chunk) for chunk in text_chunks])

index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)

context_chunks = text_chunks  # Store original text chunks

def retrieve_relevant_chunks(query, top_k=3):
    """Retrieves the most relevant chunks from FAISS based on query."""
    query_embedding = embedding_model.encode(query).reshape(1, -1)
    _, indices = index.search(query_embedding, top_k)
    return [context_chunks[i] for i in indices[0]]

"""### Step 4: Query Gemini Model"""

def query_gemini(user_query, context):
    """Sends user query with retrieved document context to Gemini."""
    prompt = f"""
    Given the following context, answer the user's question.

    Context:
    {context}

    User Query:
    {user_query}
    """
    response = model.invoke(prompt)
    return response.content

"""### Step 5: Implement Conversation Memory"""

conversation_history = []

def chatbot_response(user_query):
    """Handles user queries, retrieves relevant document sections, and responds."""
    relevant_chunks = retrieve_relevant_chunks(user_query)
    context = "\n\n".join(relevant_chunks)

    chat_context = "\n".join(conversation_history[-5:])

    full_prompt = f"""
    Previous Conversation:
    {chat_context}

    Context from Document:
    {context}

    User Query:
    {user_query}
    """

    response = query_gemini(user_query, full_prompt)

    conversation_history.append(f"User: {user_query}")
    conversation_history.append(f"Chatbot: {response}")

    return response

"""### Step 6: Run Chatbot Console"""

def chatbot_console():
    """Runs the chatbot in the console."""
    print("\n📄 PDF Chatbot (Retrieval-Augmented Generation - RAG) 📄")
    print("Type 'exit' to stop the conversation.")

    while True:
        user_query = input("\nYou: ")
        if user_query.lower() == "exit":
            print("\nChatbot: Goodbye! 👋")
            break

        response = chatbot_response(user_query)
        print("\nChatbot:", response)

# Run chatbot
chatbot_console()




[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Total Chunks: 8

📄 PDF Chatbot (Retrieval-Augmented Generation - RAG) 📄
Type 'exit' to stop the conversation.

You: what's knowledge in your db?

Chatbot: I can access and process information from the provided document. The document discusses a system for automating medicine name extraction from handwritten prescriptions using deep learning techniques. It details the architecture of the system, which includes a Feature Pyramid Network (FPN) with ResNet-50 for object detection, a Region Proposal Network (RPN) for generating candidate regions, RoI Align for accurate feature extraction, and a TrOCR model for handwritten text recognition. The document also references several other studies related to handwritten text recognition and medicine name extraction.

You: can you give me the authors of it?

Chatbot: The document doesn't explicitly list the authors of the system being discussed for automating medicine name extraction. However, it references several papers. If you're interested in th

## CAG with Summarization of given document

In [30]:
import fitz  # PyMuPDF
from google.colab import userdata
from langchain_google_genai import ChatGoogleGenerativeAI

# Load Gemini API Key
GEMINI_API_KEY = userdata.get("GEMINI_API_KEY")

# Initialize LLM
model = ChatGoogleGenerativeAI(model="gemini-2.0-flash", api_key=GEMINI_API_KEY)

"""### Step 1: Extract Full Document Text"""

def extract_text_from_pdf(pdf_path):
    doc = fitz.open(pdf_path)
    text = ""
    for page in doc:
        text += page.get_text("text") + "\n"
    return text

# Extract full text from PDF
pdf_text = extract_text_from_pdf("/content/2412.18199v1.pdf")

"""### Step 2: Summarize Long Documents (Optional)"""

def summarize_text(text, max_length=8192):
    """
    If the document is too large for the model's context window,
    summarize it using the LLM before using it as context.
    """
    if len(text) < max_length:
        return text  # Use full text if within model limits

    prompt = f"""
    The following text is from a research paper. Summarize it in detail while preserving key points:

    {text}
    """
    response = model.invoke(prompt)
    return response.content  # Return summarized text

# Summarize if needed
document_context = summarize_text(pdf_text)

"""### Step 3: Modify Query Function to Use Full Context"""

conversation_history = []  # Stores previous conversation

def chatbot_response(user_query):
    """
    Uses the entire document as context instead of retrieving chunks.
    """
    # Use last 5 messages for context memory
    chat_context = "\n".join(conversation_history[-5:])

    # Construct final query
    full_prompt = f"""
    Previous Conversation:
    {chat_context}

    Document Context:
    {document_context}  # Entire document or its summary

    User Query:
    {user_query}
    """

    # Get response from Gemini
    response = model.invoke(full_prompt)

    # Store conversation history
    conversation_history.append(f"User: {user_query}")
    conversation_history.append(f"Chatbot: {response}")

    return response

"""### Step 4: Run Chatbot Console"""

def chatbot_console():
    print("\n📄 PDF Chatbot (Context-Augmented Generation - CAG) 📄")
    print("Type 'exit' to stop the conversation.")

    while True:
        user_query = input("\nYou: ")

        if user_query.lower() == "exit":
            print("\nChatbot: Goodbye! 👋")
            break

        response = chatbot_response(user_query)
        print("\nChatbot:", response)

# Run chatbot
chatbot_console()



📄 PDF Chatbot (Context-Augmented Generation - CAG) 📄
Type 'exit' to stop the conversation.

You: what knowledge you have?

Chatbot: content='I have knowledge about a research paper that addresses the challenge of extracting medicine names from handwritten doctor prescriptions. I understand the problem statement, the proposed solution (a hybrid approach combining Mask R-CNN and TrOCR), the roles of each model, the dataset used, the string matching technique, the performance achieved, the literature review, the comparison with other models, and the conclusion of the paper. I can answer questions about these aspects based on the provided document context. Essentially, my knowledge is limited to the content of the research paper you provided.' additional_kwargs={} response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': []} id='run-93ab1405-eca6-472e-94ca-2a15a5708e7d-0' usage_metadata={'input_tokens': 723, 'output_tokens'

#### CAG without Summarization

In [31]:
import fitz  # PyMuPDF
from google.colab import userdata
from langchain_google_genai import ChatGoogleGenerativeAI

# Load Gemini API Key
GEMINI_API_KEY = userdata.get("GEMINI_API_KEY")

# Initialize LLM
model = ChatGoogleGenerativeAI(model="gemini-2.0-flash", api_key=GEMINI_API_KEY)

"""### Step 1: Extract Full Document Text"""

def extract_text_from_pdf(pdf_path):
    """
    Extracts the full text from the PDF without summarization.
    """
    doc = fitz.open(pdf_path)
    text = ""
    for page in doc:
        text += page.get_text("text") + "\n"
    return text

# Extract full text from PDF (No Summarization)
document_context = extract_text_from_pdf("/content/2412.18199v1.pdf")

"""### Step 2: Modify Query Function to Use Full Context"""

conversation_history = []  # Stores previous conversation

def chatbot_response(user_query):
    """
    Uses the entire document as context instead of retrieving chunks.
    """
    # Use last 5 messages for context memory
    chat_context = "\n".join(conversation_history[-5:])

    # Construct final query
    full_prompt = f"""
    Previous Conversation:
    {chat_context}

    Document Context:
    {document_context}  # Entire document is passed here

    User Query:
    {user_query}
    """

    # Get response from Gemini
    response = model.invoke(full_prompt)

    # Store conversation history
    conversation_history.append(f"User: {user_query}")
    conversation_history.append(f"Chatbot: {response}")

    return response

"""### Step 3: Run Chatbot Console"""

def chatbot_console():
    print("\n📄 PDF Chatbot (Context-Augmented Generation - CAG) 📄")
    print("Type 'exit' to stop the conversation.")

    while True:
        user_query = input("\nYou: ")

        if user_query.lower() == "exit":
            print("\nChatbot: Goodbye! 👋")
            break

        response = chatbot_response(user_query)
        print("\nChatbot:", response)

# Run chatbot
chatbot_console()



📄 PDF Chatbot (Context-Augmented Generation - CAG) 📄
Type 'exit' to stop the conversation.

You: who are the authors of this paper?

Chatbot: content='The authors of this paper are:\n\n*   Usman Ali\n*   Sahil Ranmbail\n*   Muhammad Nadeem\n*   Hamid Ishfaq\n*   Muhammad Umer Ramzan\n*   Waqas Ali' additional_kwargs={} response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': []} id='run-e3440842-5607-4055-adfb-7cf2da3b285c-0' usage_metadata={'input_tokens': 7122, 'output_tokens': 49, 'total_tokens': 7171, 'input_token_details': {'cache_read': 0}}

You: give me all references of this document.

Chatbot: content='```\n[1] S. Gupta, A. Gupta, S. Khanna, and S. Arora, “Digitization of handwritten text using deep learning,” in 2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2022, pp. 596–600.\n[2] B. Pattanayak, T. Bibhuti, B. Dash, and S. Patra, “A novel technique for handwri