<a href="https://colab.research.google.com/github/TABxSAID/5588/blob/main/Week%201%3A%20Hands-On%3A%20MiniRAG/Week1_HandsOn_MiniRAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CS 5588 — Week 1: Hands-On Lab
## Mini-RAG Pipeline: Embeddings → Retrieval → Grounded Generation

**Goals:**
- Generate semantic embeddings using a Transformer-based encoder
- Build a vector index for fast similarity search
- Retrieve top-k relevant document chunks
- Inject retrieved context into an LLM prompt for grounded generation

**Workflow:** GitHub → Colab → Hugging Face → Vector Store (FAISS / Chroma) → LLM

---


### GenAI Systems Context (Mini-RAG)
This lab implements a **mini Retrieval-Augmented Generation (RAG)** pipeline:
- A **Transformer encoder** produces semantic embeddings
- A **vector index (FAISS)** enables fast retrieval
- Retrieved context is what a downstream **LLM** would use for grounded generation


## Step 1 — Environment Setup
Install required libraries. This may take ~1 minute.


In [49]:
!pip install -q transformers datasets sentence-transformers faiss-cpu

## Step 2 — Load Dataset & Model from Hugging Face Hub
We use a lightweight news dataset and a sentence embedding model.


In [50]:
from datasets import load_dataset
from sentence_transformers import SentenceTransformer

dataset = load_dataset("ag_news", split="train[:200]")
model = SentenceTransformer("all-MiniLM-L6-v2")

texts = dataset["text"]

print(f"Loaded {len(texts)} documents")

Loaded 200 documents


## Step 3 — Create Embeddings
These vectors represent semantic meaning and enable retrieval before generation.


In [51]:
embeddings = model.encode(texts, show_progress_bar=True)
print('Embedding shape:', embeddings.shape)

Batches:   0%|          | 0/7 [00:00<?, ?it/s]

Embedding shape: (200, 384)


## Step 4 — Build a Vector Index (FAISS)
This simulates the retrieval layer in RAG systems.


In [52]:
import faiss
import numpy as np

dim = embeddings.shape[1]
index = faiss.IndexFlatL2(dim)
index.add(np.array(embeddings))
print('Index size:', index.ntotal)

Index size: 200


## Step 5 — Retrieval Function
Search for documents related to a query.


In [53]:
def search(query, k=3):
    q_emb = model.encode([query])
    distances, indices = index.search(np.array(q_emb), k)
    return [texts[int(i)] for i in indices[0]]


## Step 6 — Try It!


In [54]:
search("artificial intelligence in healthcare")

["U.K.'s NHS taps Gartner to help plan \\$9B IT overhaul LONDON -- The U.K.'s National Health Service (NHS) has tapped IT researcher Gartner Inc. to provide market intelligence services as the health organization forges ahead with a mammoth, 5 billion (\\$9.2 billion) project to upgrade its information technology infrastructure.",
 'UK Scientists Allowed to Clone Human Embryos (Reuters) Reuters - British scientists said on Wednesday\\they had received permission to clone human embryos for medical\\research, in what they believe to be the first such license to\\be granted in Europe.',
 'Coming to The Rescue Got a unique problem? Not to worry: you can find a financial planner for every specialized need']

## Reflection
In 1–2 sentences, explain how embeddings enable retrieval before generation in GenAI systems.


<span style="color: #16ac9f"> **Embeddings are able to represent textual information in a dense numerical form to encode semantic meaning in a way that allows a system to retrieve relevant information from a set of documents using vector space semantic search rather than keyword-based search. This helps to retrieve specific information that can be used in a generative model to produce correct outputs.**<span>