# Lab 07: RAG Implementation


**AI Demystified: Decoding Models, Compute, and Connectivity**


Welcome! In this lab you will explore **Retrieval Augmented Generation (RAG)**. It is a simple but powerful approach where documents are stored as embeddings, the **most relevant ones** are retrieved for a query, and a language model uses them to generate an answer..

We will:
- Encode a few knowledge base documents as embeddings
- Build a FAISS index for retrieval
- Ask a query, retrieve top documents, and use them as context
- Generate an answer with FLAN-T5

Each cell has only one statement and a brief comment.

## Step 1: Install dependencies

In [73]:
!pip -q install sentence-transformers faiss-cpu transformers torch scikit-learn  # install required libraries

## Step 2: Imports

In [74]:
from sentence_transformers import SentenceTransformer  # embedding model

In [75]:
import faiss  # vector index

In [76]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM  # flan-t5 model

## Step 3: Knowledge base

In [147]:
docs = ["The AI Demystified class is scheduled on the 12th of September 2025.",
        "The capital of India is New Delhi.",
        "The class location is Cisco office, Mumbai.",
        "The instructor is Jagdish Madhavan."]  # small KB

## Step 4: Embedding model

In [148]:
embedder = SentenceTransformer("all-MiniLM-L6-v2")  # load sentence-transformer

## Step 5: Encode documents

In [149]:
doc_embeddings = embedder.encode(docs)  # vectorize docs

## Step 6: Build FAISS index

In [150]:
index = faiss.IndexFlatL2(doc_embeddings.shape[1])  # index with L2 distance

In [151]:
index.add(doc_embeddings)  # add doc vectors

## Step 7: Load FLAN-T5 model

In [152]:
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base")  # tokenizer

In [153]:
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base")  # seq2seq model

## Step 8: Ask a question

In [154]:
query = "When is the AI Demystified class?"  # user query

In [155]:
query_vec = embedder.encode([query])  # embed query

In [156]:
scores, idxs = index.search(query_vec, k=3)  # retrieve top 3 docs

In [157]:
print(scores)

[[0.4283187 1.5005453 1.520394 ]]


In [158]:
print(idxs)

[[0 2 3]]


In [159]:
retrieved = [docs[i] for i in idxs[0]]  # fetch retrieved docs

In [160]:
print(retrieved)

['The AI Demystified class is scheduled on the 12th of September 2025.', 'The class location is Cisco office, Mumbai.', 'The instructor is Jagdish Madhavan.']


## Step 9: Build prompt

In [161]:
context = " ".join(retrieved)  # join retrieved docs

In [162]:
prompt = f"Answer the question based on the following notes: {context}. Question: {query}. After giving the fact, add one short helpful suggestion or related remark."

## Step 10: Generate answer

In [163]:
inputs = tokenizer(prompt, return_tensors="pt")  # tokenize prompt

In [164]:
outputs = model.generate(**inputs, max_new_tokens=100)  # run model

In [165]:
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)  # decode result

In [166]:
print(answer)  # show answer

The AI Demystified class is scheduled on the 12th of September 2025.


---
### ✅ Wrap-up
- We encoded docs into embeddings and indexed them with FAISS
- A query was embedded and matched against the docs
- Retrieved docs were combined into a context prompt
- The model generated an answer using both the query and context