#  RAG - Retrieval-Augmented Generation

### 1. Document Loading

* Use **Document Loaders** in LangChain.
* Examples: `PyPDFLoader`, `TextLoader`, `UnstructuredFileLoader`.
* Purpose: Bring external data (PDFs, text files, web pages) into LangChain.

---

### 2. Document Splitting

* Use **Text Splitters** like `CharacterTextSplitter` or `RecursiveCharacterTextSplitter`.
* Purpose: Break large documents into **small chunks** (e.g., 500–1000 tokens).
* Why: LLMs work better on smaller pieces of text.

---

### 3. Document Embedding

* Use **Embeddings models** (e.g., `OpenAIEmbeddings`, `HuggingFaceEmbeddings`).
* Each chunk → vector (list of numbers).
* **Cosine Similarity** is used to measure how close two vectors (chunks) are.

  * Example: Question vector vs stored chunk vectors → find the most similar ones.

---

### 4. Document Storing

* Store embeddings in a **Vector Store** (like FAISS, Chroma, Pinecone).
* Purpose: Efficient search & retrieval of similar chunks later.

---

### 5. Retrieval + Generation (RAG)

* When user asks a question:

  * Convert question → embedding.
  * Retrieve top-k **relevant and diverse** chunks from Vector Store (using cosine similarity).
  * Send question + retrieved context to LLM via a **RetrievalQA chain**.
  * LLM generates the final answer.

---

So in LangChain terms:
**Loader → Splitter → Embeddings → Vector Store → RetrievalQA Chain**

---