### RAG - FUNDAMENTALS



## ‚è≥ The Case of the Outdated Manual

Imagine an employee named **Alex**. Alex works for a major technology company that has thousands of internal documents‚Äîmanuals, policies, and quarterly reports‚Äîspread across ancient shared drives.

Alex is building a critical report for a client and needs to know the warranty period for a specific product line, the **Alpha 7** model.

* **Alex turns to the company's AI Chatbot.** This chatbot was trained years ago on a massive, general dataset.
* **Query:** "What is the warranty period for the Alpha 7?"
* **The AI's Reply (without RAG):** "The standard warranty is 18 months, according to the general hardware terms I was trained on."

Alex feels confident... until the real world hits. The Alpha 7 was part of a special program launched *last month* where the warranty was **extended to 36 months**. The AI was confident, but it was confidentially **wrong**. It was stuck in the past.

***

**(Slide 2: Transition Slide - The Problem and the Solution)**

This challenge‚Äîthe confident, well-spoken, yet factually outdated AI‚Äîis the problem we face with Large Language Models (LLMs) alone.

**The Solution?** We don't need to retrain the entire AI every time a new document is published. We need to give it a modern **library card**.

We need **Retrieval Augmented Generation (RAG).**

### RAG: The Real-Time Librarian

RAG gives the AI a mechanism to instantly look up the single most current and relevant piece of information‚Äîlike that updated 36-month warranty policy‚Äîfrom our private, live knowledge base **before** it generates the answer.

***

**(Slide 3: Real-World Usefulness of RAG)**

### üåé Beyond the Manual: Where RAG is a Must-Have

RAG is how companies are making AI safe, accurate, and truly useful today.

| Domain | The Challenge Solved by RAG | Example of Grounded Answer |
| :---: | :---: | :---: |
| **Customer Support** | Getting real-time answers about new products, returns, and FAQs. | *Instead of a generic answer,* a chatbot answers with the **exact steps** from the **latest** refund policy document. |
| **Finance & Legal** | Keeping analysts updated on changing regulations and reports. | *Instead of summarizing a general concept,* a system summarizes the **new compliance ruling** from the **Q4 2024** regulatory filing. |
| **Healthcare** | Grounding diagnoses in the very latest medical research. | *Instead of a general description,* an AI provides a diagnosis supported by **citations** from a medical journal published **yesterday**. |

**In short: RAG ensures our LLMs stop relying on outdated memory and start grounding their intelligence in verifiable, up-to-the-minute facts.**

---

### üß† LLMs: The RAG Advantage

| Without RAG | With RAG |
| :---: | :---: |
| **THE ISOLATED STUDENT** | **THE INFORMED EXPERT** |
| ‚ùå **Relying Solely on Memory** | ‚úÖ **Accessing the "Open Textbook"** |
| Knowledge is **Outdated** or **Fuzzy** | Knowledge is **Current** and **Fact-Checked** |
| **Result:** Potential for **Hallucination** | **Result:** Enhanced **Accuracy** and **Trust** |

---


#### RAG ARCHITECTURE
<div style="background-color: white; text-align: center;">
<img src="../../assests/02-RAG_Systems/RAG_ARCH.png" alt="Plot 1" width="700" style="background-color: white;">
</div>

# üìö Understanding RAG: The Architecture
**Retrieval-Augmented Generation (RAG)** is a technique that gives Large Language Models (LLMs) access to private or up-to-date data that wasn't in their training set.

Think of it like an **Open Book Exam**:
* **The LLM:** The student (smart, but doesn't know specific internal details).
* **The Vector DB:** The textbook.
* **RAG:** The process of looking up the relevant chapter before answering the question.

### The Workflow at a Glance
Referencing the architecture diagram above, we will break the process into three distinct stages:
1.  **Ingestion:** Preparing the "textbook" (Yellow/Green top path).
2.  **Retrieval:** Finding the right "page" (Blue middle box).
3.  **Generation:** Writing the answer (Blue/Grey bottom box).

## üèóÔ∏è Phase 1: Ingestion (Preprocessing)
*Refer to the top branch of the diagram.*

Before we can search our data, we must prepare it. LLMs have a "Context Window" limit, meaning we cannot feed them an entire book at once. We must break it down.

### Key Steps:
1.  **Documents:** We load raw text (PDFs, TXT, HTML).
2.  **Chunking Algorithm (üî¥ Red Box):** We split the text into smaller, manageable pieces (e.g., 500 words).
    * *Why?* If chunks are too small, we lose context. If too large, we retrieve irrelevant noise.
3.  **Embedding Model:** This converts text into **Vectors** (lists of numbers).
    * *Concept:* $$\text{"Concrete"} \rightarrow [0.1, 0.5, 0.9]$$
4.  **Vector DB:** We store these numbers for fast searching.

> **Note:** The quality of your RAG system is heavily dependent on your **Chunking Strategy**.

## üîç Phase 2: Retrieval
*Refer to the middle "Retrieval" box in the diagram.*

When a user asks a question, we don't send it straight to the LLM yet. We first need to find relevant information.

### The "Semantic Search" Process:
1.  **User Question:** The user inputs a query.
2.  **Input Embedding:** We use the **same** embedding model from Phase 1 to turn the question into a vector.
3.  **Retrieval Algorithm (üî¥ Red Box):** We compare the *Question Vector* against our database of *Document Vectors*.
    * We use math (typically **Cosine Similarity**) to find the closest matches.
    * $$\text{similarity} = \cos(\theta) = \frac{A \cdot B}{\|A\| \|B\|}$$
4.  **Top-N Relevant Content:** The system returns the top 3-5 chunks that are mathematically closest in meaning to the question.

## üß† Phase 3: Augmentation & Generation
*Refer to the bottom "Augmentation" and "Generation" boxes.*

This is where the magic happens. We combine the user's intent with the retrieved data.

### 1. Augmentation (Prompt Construction)
We don't just paste the chunks. We wrap them in a **Prompt Template** to guide the LLM.
A typical structure looks like this:

```text
You are a helpful assistant. Answer the user question based ONLY on the context provided below.

Context:
{Top_N_Relevant_Content}

Question:
{User_Question}

## üíª Pseudo-Code for Simple RAG system
```python

# 1. Ingestion
documents = load_pdfs("data/")
chunks = split_text(documents, chunk_size=500)
vector_db = store_embeddings(chunks)

# 2. Retrieval
user_question = "What is the warranty period?"
question_vector = embed(user_question)
relevant_chunks = vector_db.similarity_search(question_vector, k=3)

# 3. Generation
prompt = f"""
Answer based on this context: {relevant_chunks}
Question: {user_question}
"""
response = llm.generate(prompt)

print(response)