# ✅ **What is RAG**

---

**RAG** stands for **Retrieval-Augmented Generation**.  
It combines **LLMs** with **external knowledge sources** to generate more accurate and grounded responses.

 **Purpose**:  
Retrieve relevant information from a knowledge base and feed it to the LLM for context-aware generation.

---

# Components of RAG

RAG typically uses **four key components**:

---

## 1. Document Loaders

- **Function**: Load data from various sources (PDFs, websites, Notion, files, etc.)
- **Purpose**: Convert raw, unstructured content into a structured list of `Document` objects used in LangChain pipelines.

- **Examples**:

- `PyPDFLoader` – Loads content from PDF files  
- `WebBaseLoader` – Loads content from websites  
- `NotionLoader` – Loads content from Notion pages  
- `TextLoader` – Loads plain `.txt` files  
- `DirectoryLoader` – Loads all documents from a directory (can combine with any loader type)  
- `CSVLoader` – Loads data from CSV files (as documents per row or column)  
- **Custom Document Loader** – You can subclass `BaseLoader` to define your own loader for any data source (e.g., internal tools, APIs)

---

###  Useful patterns in Directory Loader
| **Glob Pattern** | **What It Loads**                                          |
| ---------------- | ---------------------------------------------------------- |
| `"**/*.txt"`     | All `.txt` files in all folders and subfolders (recursive) |
| `"*.pdf"`        | All `.pdf` files in the root directory only                |
| `"data/*.csv"`   | All `.csv` files inside the `data/` folder (not recursive) |
| `"**/*"`         | All files of any type in all folders and subfolders        |

---

## 2. Text Splitters
- **Function**: Split large texts into smaller, manageable chunks.
- **Purpose**: Improve chunk-level retrieval accuracy and avoid context overflow.

- **Examples**:
| **Splitter Type**        | **What It Does**                      | **Examples**                                            |
| ------------------------ | ------------------------------------- | ------------------------------------------------------- |
| Length-Based             | Splits by size (tokens/chars)         | Fixed-size, overlapping, recursive character splitters  |
| Text Structure-Based     | Uses natural text layout              | Sentence-based, paragraph-based, custom delimiter split |
| Document Structure-Based | Uses formal layout structure          | Headings in Markdown, PDF sections, slide breaks        |
| Semantic Meaning-Based   | Splits where topic or meaning changes | Embedding-based splits, semantic chunkers               |

---

## 3. Vector Databases
- **Function**: Store text chunks as **embeddings** (vector representations).
- **Purpose**: Enable efficient **similarity search** based on vector distances.
- **Examples**:
  - FAISS  
  - Pinecone  
  - Chroma  
  - Weaviate  

---

## 4. Retrievers
A **Retriever** is a component in a RAG (Retrieval-Augmented Generation) system that:
- Accepts a **query** from the user.
- Searches a **data source** to find relevant documents.
- Returns these documents to be used by a language model for generating answers.

---

## Types of Retrievers

Retrievers can be classified in two main ways:

---

### 1️⃣ Based on **Data Source**

These determine **where** the retriever looks for information.

- 🔸 **Wikipedia Retriever**  
  Uses Wikipedia articles as the source of knowledge.

- 🔸 **Vector Store Retriever**  
  Retrieves documents based on vector similarity (e.g., using document embeddings).

- 🔸 **arXiv Retriever**  
  Searches academic papers on the arXiv platform.


---

### 2️⃣ Based on **Search Strategy**

These define **how** the retriever searches and ranks information.

- 🔹 **MMR (Maximal Marginal Relevance)**  
  Balances relevance and diversity to avoid redundant results.

- 🔹 **Multi-query Retrieval**  
  Uses multiple reworded versions of a query to gather broader or more accurate results.

- 🔹 **Contextual Compression**  
  Compresses the retrieved documents before passing them to the model (e.g., by summarizing or trimming).

---

# ✅ **load() vs lazy_load()**

- `load()`  
  - **Eager loading**  
  - Reads and processes **all documents at once**  
  - Returns a list of `Document` objects  
  - Good for small/medium datasets

- `lazy_load()`  
  - **Lazy (streamed) loading**  
  - Returns a **generator** instead of a list  
  - Memory efficient for large datasets  
  - Useful when you want to process documents one-by-one or in batches

```python
# Example
from langchain.document_loaders import TextLoader

loader = TextLoader("notes.txt")
docs = loader.load()         # Loads all documents immediately
docs_lazy = loader.lazy_load()  # Loads documents one-by-one as needed


---

# ✅ **Understanding RAG (Retrieval-Augmented Generation)**

RAG combines the power of:
- **Information Retrieval**  
- **Text Generation**

> Goal: Provide **accurate**, **context-aware** responses by enhancing LLMs with external knowledge.

---

# Overview of the Process


      Query       Context
        ⬇           ⬇
         \         /
          ➡️   Prompt
                  ⬇
                 LLM
                  ⬇
               Response


---

# 4 Key Phases of RAG

---

## 1️⃣ Indexing

> *Preparation step (done before queries are made)*

  - Raw documents (PDFs, articles, notes, etc.) are:
  - Split into **chunks**
  - Converted into **embeddings**
  - Stored in a **vector database** (like FAISS or Chroma)

Enables fast, efficient similarity search.

---

## 2️⃣ Retrieval

> *Happens when the user submits a query*

- The system:
  - Converts the query to an embedding
  - Retrieves **relevant document chunks** from the vector DB
- These retrieved chunks form the **Context**

- Supplies the LLM with real-world/domain-specific knowledge.

---

## 3️⃣ Augmentation

> *The core idea of RAG*

- The system merges:
  - The user **Query**
  - The **Context** from retrieval
- Together, they form a **Prompt** for the LLM

- Gives the LLM extra context for grounded answers.

---

## 4️⃣ Generation

> *Final step: answer creation*

- The LLM receives the **augmented prompt**
- Generates a **context-aware, relevant response**
- Response is returned to the user

- Powered by both the LLM's knowledge + retrieved data.

---

# Summary 

| Step             | What Happens                                         | Diagram Block  |
|------------------|------------------------------------------------------|----------------|
| **Indexing**     | Prepare documents: chunk → embed → store             | *Implied*      |
| **Retrieval**    | Fetch top relevant chunks from vector store          | Context        |
| **Augmentation** | Combine Query + Context into LLM prompt              | Prompt         |
| **Generation**   | LLM produces final answer from the prompt            | LLM → Response |

---

**RAG = Smart retrieval + powerful generation.**  
It makes LLMs more useful, factual, and personalized.

---

