<a href="https://colab.research.google.com/github/debojit11/ml_nlp_dl_transformers/blob/main/RAG_week_17.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week 17: Retrieval-Augmented Generation (RAG) – Part 1

# **SECTION 1: Welcome & Objectives**

In [None]:
print("Welcome to Week 17: Retrieval-Augmented Generation!")
print("This week, you'll:")
print("- Understand what RAG is and why it's important")
print("- Learn about dense vs sparse retrieval")
print("- Build a simple QA system with FAISS + Transformers")

Welcome to Week 17: Retrieval-Augmented Generation!
This week, you'll:
- Understand what RAG is and why it's important
- Learn about dense vs sparse retrieval
- Build a simple QA system with FAISS + Transformers


# **SECTION 2: What is RAG?**

# 🧠 Week 17 – Introduction to Retrieval-Augmented Generation (RAG)

---

## 🔍 What is RAG?

**Retrieval-Augmented Generation (RAG)** combines:
- **Retrieval models** (e.g., FAISS, BM25, embeddings)
- **Generative models** (e.g., T5, BART)

Workflow:
1. Retrieve relevant chunks of text using embeddings
2. Feed them as context to a text generator
3. Generate grounded, factual outputs

---

## 📊 Why Use RAG?

| Problem                  | Traditional Model | With RAG         |
|--------------------------|-------------------|------------------|
| Factual hallucination    | ❌ Often inaccurate| ✅ Uses external knowledge |
| Outdated information     | ❌ Frozen knowledge| ✅ Can query up-to-date data |
| Long documents           | ❌ Token limits    | ✅ Retrieve only what's needed |

---

## 🧱 Components of RAG

| Component       | Role                                         |
|------------------|----------------------------------------------|
| Retriever        | Finds relevant documents (dense or sparse)   |
| Reader / Generator | Generates answer based on retrieved docs  |
| Index            | Fast lookup structure (e.g., FAISS)          |

---


## 🔁 Types of Retrieval

| Type    | Method            | Example Libraries          |
|---------|-------------------|----------------------------|
| Sparse  | TF-IDF, BM25       | scikit-learn, ElasticSearch |
| Dense   | Embedding-based    | sentence-transformers, FAISS |

---

## 🧪 Mini RAG Pipeline: Dense Retrieval + T5

1. Encode docs using `SentenceTransformer`
2. Store in FAISS index
3. At query time:
   - Encode query
   - Retrieve top-k docs
   - Pass to `T5` for generation

```python
# Example: "Where is the Eiffel Tower?"
retrieved = ["The Eiffel Tower is located in Paris.", "The capital of France is Paris."]  
context = " ".join(retrieved)  
prompt = "question: Where is the Eiffel Tower? context: " + context
```

# **SECTION 3: Setup a Tiny Document Corpus**

In [None]:
corpus = [
    "The capital of France is Paris.",
    "BERT is a transformer-based model developed by Google.",
    "T5 treats every NLP task as text-to-text.",
    "The Eiffel Tower is located in Paris.",
]

# **SECTION 4: Use Sentence Transformers for Embeddings**

In [None]:
!pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl (30.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.7/30.7 MB[0m [31m16.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.10.0


In [None]:
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

In [None]:
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(corpus, convert_to_numpy=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
# Create FAISS index
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)

# **SECTION 5: Retrieve Top-k Documents**

In [None]:
def retrieve(query, k=2):
    query_vec = model.encode([query], convert_to_numpy=True)
    D, I = index.search(query_vec, k)
    return [corpus[i] for i in I[0]]

print("Retrieved:", retrieve("Where is Eiffel Tower?"))

Retrieved: ['The Eiffel Tower is located in Paris.', 'The capital of France is Paris.']


# **SECTION 6: Generate Answer with T5**

In [None]:
from transformers import pipeline

In [None]:
t5 = pipeline("text2text-generation", model="t5-small")

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

Device set to use cpu


In [None]:
context = " ".join(retrieve("Where is Eiffel Tower?"))
question = "question: Where is the Eiffel Tower? context: " + context
print(t5(question, max_length=50)[0]['generated_text'])

Paris


# **SECTION 7: Exercises**

### 📝 Exercises:
1. Replace `t5-small` with `google/flan-t5-base` or `facebook/bart-large`.
2. Add more documents and test retrieval diversity.
3. Change the retrieval method to cosine similarity.
4. Try RAG for multi-question answering.

➡️ Coming up in Week 18: Building a Modular RAG System
We'll split retriever/generator and add streaming input, chunking, and improved accuracy!