<a href="https://colab.research.google.com/github/debojit11/ml_nlp_dl_transformers/blob/main/RAG_week_18.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📚 Week 18 – Modular RAG Systems & Improvements

---

## 🎯 Objectives

This week, you'll:
- Understand the modular breakdown of RAG
- Separate retriever and generator logic
- Use text chunking for long documents
- Evaluate generation with better outputs
- Try cosine similarity in FAISS

---

## 🧱 Overview of a Modular RAG System

Modular RAG consists of:

1. **Retriever**  
   - Converts query to vector
   - Finds relevant chunks from the corpus
   - Uses dense retrieval (e.g., FAISS)

2. **Generator**  
   - Takes retrieved context
   - Answers using a generative model like T5/BART

This allows flexibility to:
- Swap models easily
- Fine-tune modules independently
- Add more complex retrieval logic

---

## 🔧 Setup – Load Corpus and Libraries

In [None]:
!pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl (30.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.7/30.7 MB[0m [31m56.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.10.0


In [None]:
from sentence_transformers import SentenceTransformer
from transformers import pipeline
import faiss
import numpy as np

## 📄 Define Your Corpus (Can Scale Later)

In [None]:
corpus = [
    "Transformers use self-attention for sequence modeling.",
    "The Eiffel Tower is a famous monument in Paris.",
    "Python is widely used for data science and machine learning.",
    "FAISS enables fast similarity search over dense vectors.",
    "T5 is a text-to-text transformer developed by Google.",
    "Hugging Face provides pretrained transformer models."
]

## 🔍 Retriever: Embed + Index + Search (Cosine Similarity)

In [None]:
# Load sentence-transformer model
embedder = SentenceTransformer("all-MiniLM-L6-v2")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
# Encode corpus
corpus_embeddings = embedder.encode(corpus, convert_to_numpy=True)

In [None]:
# Normalize for cosine similarity
corpus_embeddings = corpus_embeddings / np.linalg.norm(corpus_embeddings, axis=1, keepdims=True)

In [None]:
# Create cosine FAISS index
index = faiss.IndexFlatIP(corpus_embeddings.shape[1])  # IP = inner product ≈ cosine if vectors are normalized
index.add(corpus_embeddings)

In [None]:
# Modular retriever
def retrieve(query, k=3):
    query_embedding = embedder.encode([query], convert_to_numpy=True)
    query_embedding = query_embedding / np.linalg.norm(query_embedding)
    D, I = index.search(query_embedding, k)
    return [corpus[i] for i in I[0]]

## 🧪 Test Retriever

In [None]:
query = "What is T5 used for?"
docs = retrieve(query)
print("Top retrieved documents:")
for doc in docs:
    print("-", doc)

Top retrieved documents:
- T5 is a text-to-text transformer developed by Google.
- Python is widely used for data science and machine learning.
- Hugging Face provides pretrained transformer models.


## 🤖 Generator Module: Use T5 for Generation

In [None]:
generator = pipeline("text2text-generation", model="google/flan-t5-base")

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Device set to use cpu


In [None]:
def generate_answer(query, context_docs):
    context = " ".join(context_docs)
    prompt = f"question: {query} context: {context}"
    output = generator(prompt, max_length=64, do_sample=False)
    return output[0]['generated_text']

## 🧪 End-to-End RAG Demo

In [None]:
query = "What is FAISS?"
docs = retrieve(query)
answer = generate_answer(query, docs)

print("📥 Query:", query)
print("📚 Retrieved Context:")
for doc in docs:
    print("-", doc)
print("🧠 Generated Answer:", answer)

📥 Query: What is FAISS?
📚 Retrieved Context:
- FAISS enables fast similarity search over dense vectors.
- The Eiffel Tower is a famous monument in Paris.
- Python is widely used for data science and machine learning.
🧠 Generated Answer: enables fast similarity search over dense vectors


## 📦 Add Text Chunking for Long Documents (Optional)

In [None]:
from typing import List

In [None]:
def chunk_text(text: str, chunk_size: int = 20) -> List[str]:
    words = text.split()
    return [" ".join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]

In [None]:
# Example: long doc chunking
long_doc = "The Eiffel Tower is located in Paris. It is a global cultural icon of France. Constructed from iron in 1889 for the World's Fair, it stands over 300 meters tall and attracts millions of tourists every year."
chunks = chunk_text(long_doc)
print("Chunks:", chunks)

Chunks: ['The Eiffel Tower is located in Paris. It is a global cultural icon of France. Constructed from iron in 1889', "for the World's Fair, it stands over 300 meters tall and attracts millions of tourists every year."]


## 🧪 RAG with Chunked Corpus

In [None]:
# Add chunked long doc to corpus
new_chunks = chunk_text(long_doc)
corpus_extended = corpus + new_chunks

In [None]:
# Recompute embeddings and reindex
corpus_embeddings_ext = embedder.encode(corpus_extended, convert_to_numpy=True)
corpus_embeddings_ext = corpus_embeddings_ext / np.linalg.norm(corpus_embeddings_ext, axis=1, keepdims=True)

In [None]:
index_ext = faiss.IndexFlatIP(corpus_embeddings_ext.shape[1])
index_ext.add(corpus_embeddings_ext)

In [None]:
def retrieve_ext(query, k=3):
    query_embedding = embedder.encode([query], convert_to_numpy=True)
    query_embedding = query_embedding / np.linalg.norm(query_embedding)
    D, I = index_ext.search(query_embedding, k)
    return [corpus_extended[i] for i in I[0]]

In [None]:
# Try again with extended corpus
query = "When was Eiffel Tower built?"
docs = retrieve_ext(query)
answer = generate_answer(query, docs)

print("🗼 Query:", query)
print("📚 Retrieved Context:")
for doc in docs:
    print("-", doc)
print("🧠 Generated Answer:", answer)

🗼 Query: When was Eiffel Tower built?
📚 Retrieved Context:
- The Eiffel Tower is a famous monument in Paris.
- The Eiffel Tower is located in Paris. It is a global cultural icon of France. Constructed from iron in 1889
- for the World's Fair, it stands over 300 meters tall and attracts millions of tourists every year.
🧠 Generated Answer: 1889


## 📝 Exercises

1. Swap `SentenceTransformer` with `all-mpnet-base-v2`
2. Replace `FAISS` with BM25 (e.g., via `rank_bm25`)
3. Add support for top-k chunk filtering based on score thresholds
4. Chunk large PDF/text files and try multi-page RAG

---

➡️ Coming up next: **Week 19 – Advanced RAG: Hybrid Retrieval & Evaluation Metrics**