# LangChain Fundamentals: From Manual RAG to Framework


**You just built:** A complete RAG system from scratch (8 steps, 200+ lines of code)

**LangChain does:** The same thing in ~20 lines!

## Why LangChain?

### What You Built Manually:
```python
# Load PDF
reader = PdfReader(pdf_path)
pages = [page.extract_text() for page in reader.pages]

# Chunk text
chunks = chunk_text(pages, chunk_size=500)

# Create embeddings + store
collection.add(documents=chunks)

# Retrieve + Generate
retrieved = collection.query(question)
answer = llm.generate(question, retrieved)
```

### With LangChain:
```python
# Load + chunk + embed + store
loader = PyPDFLoader("file.pdf")
docs = loader.load_and_split()
vectorstore = Chroma.from_documents(docs, embeddings)

# Retrieve + generate (ONE LINE!)
qa_chain = RetrievalQA.from_chain_type(llm, retriever=vectorstore.as_retriever())
answer = qa_chain.invoke({"query": question})
```

**Same result, 80% less code!**

## What is LangChain?

**Framework for building LLM applications** with:
- Pre-built components (loaders, splitters, retrievers)
- Modular design (swap components easily)
- Production patterns (error handling, monitoring)
- Industry standard (used by thousands of companies)

## Your Learning Path

```
Manual RAG (You built) ‚Üí LangChain (Now) ‚Üí Production Apps
     ‚Üì                        ‚Üì                    ‚Üì
Understanding           Speed & Scale      Real Products
```

---

## Setup & Installation

In [None]:
# Install LangChain and dependencies (uncomment if needed)
# !pip install langchain langchain-openai langchain-community langchain-chroma pypdf chromadb

In [1]:
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()
# Verify it loaded
print("‚úÖ Environment loaded")
print(f"API Key found: {'OPENAI_API_KEY' in os.environ}")
if 'OPENAI_API_KEY' in os.environ:
    print(f"Key starts with: {os.environ['OPENAI_API_KEY'][:10]}...")
print("‚úÖ Environment loaded")

‚úÖ Environment loaded
API Key found: True
Key starts with: sk-proj-3Q...
‚úÖ Environment loaded


---

# Part 1: Core LangChain Components

Let's learn the building blocks one by one.

## Component 1: Document Loaders

**What you did manually:** Read PDF with PyPDF2, extract text, handle pages

**LangChain way:** One line!

In [2]:
from langchain_community.document_loaders import PyPDFLoader

# Load PDF (automatically handles pages, metadata, etc.)
loader = PyPDFLoader("llm_fundamentals.pdf")
documents = loader.load()

print(f"‚úÖ Loaded {len(documents)} pages")
print(f"\nSample document:")
print(f"Content preview: {documents[0].page_content[:200]}...")
print(f"Metadata: {documents[0].metadata}")

  from .autonotebook import tqdm as notebook_tqdm


‚úÖ Loaded 8 pages

Sample document:
Content preview: @genieincodebottle 
Instagram | GitHub | Medium | YouTube 
How to Be Better Than Most in GenAI 
 
Contents 
 
Core LLM Building Blocks ....................................................................
Metadata: {'producer': 'Microsoft¬Æ Word 2019', 'creator': 'Microsoft¬Æ Word 2019', 'creationdate': '2025-09-02T20:12:32+05:30', 'author': 'Rajesh Srivastava', 'moddate': '2025-09-02T20:12:32+05:30', 'source': 'llm_fundamentals.pdf', 'total_pages': 8, 'page': 0, 'page_label': '1'}


**What just happened?**

- ‚úÖ Loaded PDF
- ‚úÖ Extracted text from all pages
- ‚úÖ Created `Document` objects with metadata
- ‚úÖ All in 2 lines!

**Document object:** Standard format LangChain uses everywhere
```python
Document(
    page_content="text here",
    metadata={"source": "file.pdf", "page": 1}
)
```

### Other Loaders (LangChain has 100+ loaders!)

In [3]:
# Examples (don't run, just see the pattern)
from langchain_community.document_loaders import (
    TextLoader,           # .txt files
    CSVLoader,            # .csv files
    UnstructuredMarkdownLoader,  # .md files
    WebBaseLoader,        # Web pages
    DirectoryLoader,      # Entire folders
    NotionDBLoader,       # Notion databases
    SlackDirectoryLoader, # Slack messages
)

print("LangChain supports 100+ data sources out of the box!")

USER_AGENT environment variable not set, consider setting it to identify your requests.


LangChain supports 100+ data sources out of the box!


---

## Component 2: Text Splitters

**What you did manually:** Custom `chunk_text()` function with overlap logic

**LangChain way:** Pre-built splitters with best practices!

In [5]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Create splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,        # Max characters per chunk
    chunk_overlap=50,      # Overlap between chunks
    length_function=len,   # How to measure length
    separators=["\n\n", "\n", " ", ""]  # Try these in order
)

# Split documents
chunks = text_splitter.split_documents(documents)

print(f"‚úÖ Split {len(documents)} pages into {len(chunks)} chunks")
print(f"\nSample chunk:")
print(chunks[5].page_content)
print(f"\nMetadata: {chunks[5].metadata}")

‚úÖ Split 8 pages into 37 chunks

Sample chunk:
5. Attention ‚Üí Highlights the most relevant tokens in context 
6. Self-Attention ‚Üí Each token attends to every other token for context 
7. Cross-Attention ‚Üí Connect encoder and decoder (in encoder-decoder models) 
8. Multi-Head Attention ‚Üí Several attention heads capture different patterns in parallel 
9. Feed-Forward Networks ‚Üí Nonlinear layers that transform representations between 
attention blocks 
10. Residual Connections ‚Üí Shortcut links that preserve signals and help gradient flow

Metadata: {'producer': 'Microsoft¬Æ Word 2019', 'creator': 'Microsoft¬Æ Word 2019', 'creationdate': '2025-09-02T20:12:32+05:30', 'author': 'Rajesh Srivastava', 'moddate': '2025-09-02T20:12:32+05:30', 'source': 'llm_fundamentals.pdf', 'total_pages': 8, 'page': 1, 'page_label': '2'}


**Why RecursiveCharacterTextSplitter?**

It tries separators in order:
1. First try paragraph breaks (`\n\n`)
2. Then sentence breaks (`\n`)
3. Then word breaks (` `)
4. Finally characters if needed

**Result:** Smart, context-preserving chunks!

### Other Splitters

In [6]:
from langchain_text_splitters import (
    CharacterTextSplitter,     # Simple character-based
    TokenTextSplitter,         # Token-based (for LLMs)
    MarkdownTextSplitter,      # Markdown-aware
    PythonCodeTextSplitter,    # Code-aware
)

print("Different splitters for different content types!")

Different splitters for different content types!


---

## Component 3: Embeddings

**What you did manually:** SentenceTransformer model, manual encoding

**LangChain way:** Unified interface for any embedding model!

In [7]:
from langchain_huggingface import HuggingFaceEmbeddings

# Same model you used before!
embeddings = HuggingFaceEmbeddings(
    model_name="all-MiniLM-L6-v2"
)

# Test it
test_text = "LangChain makes RAG development easy"
embedding_vector = embeddings.embed_query(test_text)

print(f"‚úÖ Embeddings model loaded")
print(f"Embedding dimensions: {len(embedding_vector)}")
print(f"First 5 values: {embedding_vector[:5]}")

‚úÖ Embeddings model loaded
Embedding dimensions: 384
First 5 values: [-0.06979136914014816, 0.04351036995649338, 0.07203606516122818, -0.0365731455385685, -0.08926713466644287]


### Swap Models Easily!

In [8]:
from langchain_openai import OpenAIEmbeddings
from langchain_google_genai import GoogleGenerativeAIEmbeddings

# Just change one line to switch providers!
# embeddings = OpenAIEmbeddings()  # Use OpenAI instead
# embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")  # Or Google

print("Same interface, different providers - that's the power of LangChain!")

Same interface, different providers - that's the power of LangChain!


---

## Component 4: Vector Stores

**What you did manually:** ChromaDB client, collections, manual add/query

**LangChain way:** Unified interface for any vector store!

In [9]:
from langchain_chroma import Chroma

# Create vector store from documents (one line!)
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    collection_name="langchain_llm_fundamentals"
)

print(f"‚úÖ Vector store created with {vectorstore._collection.count()} chunks")

‚úÖ Vector store created with 37 chunks


**What just happened?**

1. ‚úÖ Created embeddings for all chunks
2. ‚úÖ Stored in ChromaDB
3. ‚úÖ Built similarity search index
4. ‚úÖ All automatic!

**Your manual version:** ~30 lines of code

**LangChain:** 1 line!

### Test Similarity Search

In [10]:
# Search for relevant chunks
query = "What is RAG?"
results = vectorstore.similarity_search(query, k=3)

print(f"Query: {query}\n")
for i, doc in enumerate(results, 1):
    print(f"Result {i}:")
    print(f"{doc.page_content[:150]}...")
    print(f"Source: Page {doc.metadata['page']}\n")

Query: What is RAG?

Result 1:
17. ALiBi / Relative Positional Encoding ‚Üí Alternative to RoPE for long contexts 
18. Linear / Performer Attention ‚Üí Efficient attention variants for ...
Source: Page 1

Result 2:
evaluation 
4. Human Evaluation ‚Üí Collect human judgments for accuracy, coherence, and safety 
5. Factuality / Truthfulness Metrics ‚Üí Specialized eval...
Source: Page 5

Result 3:
@genieincodebottle 
Instagram | GitHub | Medium | YouTube 
How to Be Better Than Most in GenAI 
 
Contents 
 
Core LLM Building Blocks ..................
Source: Page 0



### Swap Vector Stores Easily!

In [None]:
# from langchain_pinecone import Pinecone
# from langchain_community.vectorstores import FAISS, Weaviate

# Same code, just change the import!
# vectorstore = FAISS.from_documents(chunks, embeddings)
# vectorstore = Pinecone.from_documents(chunks, embeddings)

print("Change one line to switch vector databases - modularity!")

---

## Component 5: LLMs

**What you did manually:** OpenAI client, manual API calls, prompt formatting

**LangChain way:** Unified interface for all LLMs!

In [11]:
from langchain_openai import ChatOpenAI
import os

# Initialize LLM
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.3,
    api_key=os.environ["OPENAI_API_KEY"]
)

# Test it
response = llm.invoke("Explain RAG in one sentence")

print("‚úÖ LLM initialized")
print(f"Response: {response.content}")

‚úÖ LLM initialized
Response: RAG, or Retrieval-Augmented Generation, is a machine learning approach that combines retrieval of relevant information from a knowledge base with generative models to produce more accurate and contextually relevant responses in natural language processing tasks.


### Swap LLMs Easily!

In [None]:
# from langchain_google_genai import ChatGoogleGenerativeAI
# from langchain_anthropic import ChatAnthropic

# Just change one line!
# llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash")
# llm = ChatAnthropic(model="claude-3-5-sonnet-20241022")

print("Same interface for OpenAI, Google, Claude, and 50+ other providers!")

---

# Part 2: Building RAG with LangChain

Now let's combine everything into a complete RAG system!

## Method 1: Using RetrievalQA Chain

**The simplest way** - LangChain handles everything!

In [13]:
from langchain_classic.chains.retrieval_qa.base import RetrievalQA
# Create QA chain (combines retriever + LLM)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",  # "stuff" = put all context in one prompt
    retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
    return_source_documents=True
)

print("‚úÖ RAG chain created!")

‚úÖ RAG chain created!


In [14]:
# Ask a question
question = "What is LoRA and why is it useful?"
result = qa_chain.invoke({"query": question})

print(f"Question: {question}\n")
print(f"Answer:\n{result['result']}\n")
print("="*80)
print(f"Sources ({len(result['source_documents'])} chunks):")
for i, doc in enumerate(result['source_documents'], 1):
    print(f"\n{i}. Page {doc.metadata['page']}:")
    print(f"   {doc.page_content[:100]}...")

Question: What is LoRA and why is it useful?

Answer:
LoRA (Low-Rank Adaptation) is a method that involves using parameter-efficient adapters for fine-tuning large models. It allows for the adaptation of models to specific tasks or domains without the need to retrain the entire model, making it a cost-effective and efficient approach. LoRA is particularly useful because it enables fine-tuning on modest hardware, which is beneficial for users who may not have access to extensive computational resources.

Sources (3 chunks):

1. Page 2:
   9. QLoRA ‚Üí LoRA + quantization, enabling fine-tuning of huge models on modest hardware 
10. PEFT ‚Üí F...

2. Page 2:
   3. Sharded / Distributed Training ‚Üí Scale across multiple GPUs/nodes 
4. Continual / Lifelong Learni...

3. Page 0:
   @genieincodebottle 
Instagram | GitHub | Medium | YouTube 
How to Be Better Than Most in GenAI 
 
Co...


**Compare this to your manual code:**

```python
# Your manual RAG (simplified)
retrieved = collection.query(question)
context = "\n".join([doc for doc, _ in retrieved])
prompt = f"Context: {context}\n\nQuestion: {question}\n\nAnswer:"
response = openai_client.chat.completions.create(...)
answer = response.choices[0].message.content

# LangChain
answer = qa_chain.invoke({"query": question})
```

**Same result, way cleaner!** ‚ú®

## Method 2: Using LCEL (LangChain Expression Language)

**More control** - Build your own chain!

In [15]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Custom prompt template
template = """You are an AI assistant helping users understand LLM fundamentals.
Answer the question based ONLY on the provided context. Cite page numbers when possible.

Context:
{context}

Question: {question}

Answer:"""

prompt = ChatPromptTemplate.from_template(template)

# Helper function to format retrieved docs
def format_docs(docs):
    return "\n\n".join([
        f"[Page {doc.metadata['page']}]\n{doc.page_content}"
        for doc in docs
    ])

# Build the chain using LCEL
rag_chain = (
    {"context": vectorstore.as_retriever() | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

print("‚úÖ Custom RAG chain created with LCEL!")

‚úÖ Custom RAG chain created with LCEL!


In [None]:
# Use the custom chain
question = "What is attention mechanism?"
answer = rag_chain.invoke(question)

print(f"Question: {question}\n")
print(f"Answer:\n{answer}")

Question: What is attention mechanism?

Answer:
The attention mechanism highlights the most relevant tokens in context, allowing the model to focus on specific parts of the input when generating outputs. It enables each token to attend to every other token for context (self-attention) and connects the encoder and decoder in encoder-decoder models (cross-attention). Additionally, multi-head attention captures different patterns in parallel by using several attention heads. This mechanism is crucial for improving the model's ability to understand and generate language effectively. (Page 1)


**LCEL (|) Explained:**

```python
# The pipe (|) chains operations
chain = step1 | step2 | step3 | step4

# Same as:
result = step1(input)
result = step2(result)
result = step3(result)
output = step4(result)
```

**Our chain:**
```python
question ‚Üí retriever ‚Üí format ‚Üí prompt ‚Üí llm ‚Üí parse ‚Üí answer
```

Clean, readable, composable! üéØ

---

# Part 3: Advanced RAG Features

LangChain makes advanced techniques easy!

## Feature 1: Conversational RAG (with Memory)

**Remember previous questions!**

In [18]:
from langchain_classic.chains import ConversationalRetrievalChain
from langchain_classic.memory import ConversationBufferMemory

# Create memory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True,
    output_key="answer"
)

# Create conversational chain
conversational_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
    memory=memory,
    return_source_documents=True
)

print("‚úÖ Conversational RAG chain created with memory!")

‚úÖ Conversational RAG chain created with memory!


  memory = ConversationBufferMemory(


In [19]:
# First question
result1 = conversational_chain.invoke({"question": "What is LoRA?"})
print("Q1: What is LoRA?")
print(f"A1: {result1['answer']}\n")
print("="*80 + "\n")

# Follow-up question (uses context from previous!)
result2 = conversational_chain.invoke({"question": "What's the difference between that and QLoRA?"})
print("Q2: What's the difference between that and QLoRA?")
print(f"A2: {result2['answer']}")

print("\n" + "="*80)
print("Notice: The model knew 'that' = LoRA from previous question!")

Q1: What is LoRA?
A1: LoRA (Low-Rank Adaptation) is a method used in machine learning that allows for the fine-tuning of large models by updating only a small number of parameters. This approach helps to make the fine-tuning process more efficient, particularly when working with large models on limited hardware.


Q2: What's the difference between that and QLoRA?
A2: LoRA (Low-Rank Adaptation) is a method that allows for the efficient fine-tuning of large models by updating only a small number of parameters. QLoRA (Quantized LoRA) builds on this by incorporating quantization, which enables the fine-tuning of huge models on more modest hardware. Essentially, QLoRA combines the principles of LoRA with quantization techniques to make the process more resource-efficient.

Notice: The model knew 'that' = LoRA from previous question!


**Without memory:**
- "What's the difference between that and QLoRA?" ‚Üí Doesn't know what "that" is

**With memory:**
- Remembers "that" = LoRA from previous question!

**Perfect for chatbots!** üí¨

## Feature 2: Multiple Retrieval Strategies

In [20]:
# 1. Similarity Search (default)
retriever_similarity = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})

# 2. MMR (Maximum Marginal Relevance) - diverse results
retriever_mmr = vectorstore.as_retriever(search_type="mmr", search_kwargs={"k": 3, "fetch_k": 10})

# 3. Similarity with score threshold
retriever_threshold = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"score_threshold": 0.5}
)

# Test MMR (gets diverse results)
docs_mmr = retriever_mmr.invoke("What is fine-tuning?")
print("‚úÖ MMR Retrieval (diverse results):")
for i, doc in enumerate(docs_mmr, 1):
    print(f"{i}. Page {doc.metadata['page']}: {doc.page_content[:80]}...")

‚úÖ MMR Retrieval (diverse results):
1. Page 2: 3. Sharded / Distributed Training ‚Üí Scale across multiple GPUs/nodes 
4. Continu...
2. Page 3: 3. Top-k / Top-p ‚Üí Sampling filters, Higher = safer, looser = more diverse 
4. R...
3. Page 0: Training & Tuning .................................................................


**Retrieval Strategies:**

| Strategy | Best For |
|----------|----------|
| **similarity** | Most relevant results |
| **mmr** | Diverse results (avoid duplicates) |
| **similarity_score_threshold** | Only high-confidence matches |

**Swap with one parameter!** üîÑ

## Feature 3: Document Metadata Filtering

In [21]:
# Search only specific pages
docs_filtered = vectorstore.similarity_search(
    "What is attention?",
    k=3,
    filter={"page": 2}  # Only search page 2
)

print("Results from page 2 only:")
for doc in docs_filtered:
    print(f"Page {doc.metadata['page']}: {doc.page_content[:100]}...\n")

Results from page 2 only:
Page 2: 3. Sharded / Distributed Training ‚Üí Scale across multiple GPUs/nodes 
4. Continual / Lifelong Learni...

Page 2: 9. QLoRA ‚Üí LoRA + quantization, enabling fine-tuning of huge models on modest hardware 
10. PEFT ‚Üí F...

Page 2: 15. Distillation ‚Üí Transfer knowledge from a large model into a smaller one 
16. Gradient Descent & ...



**Use cases:**
- Search only recent documents (filter by date)
- Search specific sections (filter by chapter)
- User-specific data (filter by user_id)

**Production essential!** üéØ

---

# Part 4: Complete LangChain RAG Class

Let's build a production-ready RAG system using LangChain!

In [22]:
from typing import List, Dict

class LangChainRAG:
    """
    Production RAG system powered by LangChain.
    
    Why LangChain?
        - 10x less code than manual implementation
        - Easy to swap components (LLMs, vector stores, embeddings)
        - Built-in features (memory, streaming, error handling)
        - Production-tested by thousands of companies
    """
    
    def __init__(
        self,
        pdf_path: str,
        llm_model: str = "gpt-4o-mini",
        embedding_model: str = "all-MiniLM-L6-v2",
        chunk_size: int = 500,
        chunk_overlap: int = 50
    ):
        """
        Initialize RAG system from a PDF.
        
        Args:
            pdf_path: Path to PDF file
            llm_model: OpenAI model name
            embedding_model: HuggingFace embedding model
            chunk_size: Characters per chunk
            chunk_overlap: Overlap between chunks
        """
        print("Initializing LangChain RAG system...")
        
        # Load documents
        print(f"Loading PDF: {pdf_path}")
        loader = PyPDFLoader(pdf_path)
        documents = loader.load()
        
        # Split documents
        print(f"Splitting into chunks (size={chunk_size}, overlap={chunk_overlap})")
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=chunk_size,
            chunk_overlap=chunk_overlap
        )
        self.chunks = text_splitter.split_documents(documents)
        
        # Setup embeddings
        print(f"Loading embedding model: {embedding_model}")
        self.embeddings = HuggingFaceEmbeddings(model_name=embedding_model)
        
        # Create vector store
        print("Creating vector store...")
        self.vectorstore = Chroma.from_documents(
            documents=self.chunks,
            embedding=self.embeddings,
            collection_name="langchain_rag"
        )
        
        # Setup LLM
        print(f"Initializing LLM: {llm_model}")
        self.llm = ChatOpenAI(
            model=llm_model,
            temperature=0.3,
            api_key=os.environ["OPENAI_API_KEY"]
        )
        
        # Create QA chain
        self.qa_chain = RetrievalQA.from_chain_type(
            llm=self.llm,
            chain_type="stuff",
            retriever=self.vectorstore.as_retriever(search_kwargs={"k": 3}),
            return_source_documents=True
        )
        
        print(f"‚úÖ RAG system ready! ({len(self.chunks)} chunks indexed)\n")
    
    def ask(self, question: str) -> Dict:
        """
        Ask a question and get an answer with sources.
        
        Args:
            question: User's question
            
        Returns:
            Dictionary with answer and source information
        """
        result = self.qa_chain.invoke({"query": question})
        
        return {
            "question": question,
            "answer": result['result'],
            "sources": [
                {
                    "page": doc.metadata.get('page', 'N/A'),
                    "text": doc.page_content[:150] + "..."
                }
                for doc in result['source_documents']
            ]
        }
    
    def ask_multiple(self, questions: List[str]) -> List[Dict]:
        """
        Ask multiple questions at once.
        
        Args:
            questions: List of questions
            
        Returns:
            List of results
        """
        return [self.ask(q) for q in questions]

print("‚úÖ LangChainRAG class defined")

‚úÖ LangChainRAG class defined


## Use the LangChain RAG System

In [23]:
# Initialize (does everything automatically!)
rag = LangChainRAG(
    pdf_path="llm_fundamentals.pdf",
    llm_model="gpt-4o-mini"
)

Initializing LangChain RAG system...
Loading PDF: llm_fundamentals.pdf
Splitting into chunks (size=500, overlap=50)
Loading embedding model: all-MiniLM-L6-v2
Creating vector store...
Initializing LLM: gpt-4o-mini
‚úÖ RAG system ready! (37 chunks indexed)



In [24]:
# Ask questions
result = rag.ask("What are the main components of transformer architecture?")

print(f"Question: {result['question']}\n")
print(f"Answer:\n{result['answer']}\n")
print("="*80)
print(f"Sources ({len(result['sources'])} chunks):")
for i, source in enumerate(result['sources'], 1):
    print(f"\n{i}. Page {source['page']}:")
    print(f"   {source['text']}")

Question: What are the main components of transformer architecture?

Answer:
The main components of transformer architecture include:

1. **Encoder and Decoder**: The architecture consists of an encoder that processes the input and a decoder that generates the output. Some models use only the encoder (like BERT), while others use only the decoder (like GPT) or both (like T5).

2. **Self-Attention Mechanism**: This allows the model to weigh the importance of different words in a sentence when encoding or decoding.

3. **Feedforward Neural Networks**: Each layer of the encoder and decoder contains a feedforward neural network that processes the output of the self-attention mechanism.

4. **Layer Normalization**: This is applied to stabilize and speed up the training process.

5. **Positional Encoding**: Since transformers do not have a built-in sense of order, positional encoding is added to the input embeddings to provide information about the position of tokens in the sequence.

6. **M

In [25]:
# Multiple questions
questions = [
    "What is RLHF?",
    "Explain quantization",
    "What are vector databases?"
]

results = rag.ask_multiple(questions)

for result in results:
    print(f"\n{'='*80}")
    print(f"Q: {result['question']}")
    print(f"A: {result['answer']}")
    print(f"Sources: Pages {[s['page'] for s in result['sources']]}")


Q: What is RLHF?
A: RLHF stands for Reinforcement Learning from Human Feedback. It is a method used to align model outputs with human preferences by incorporating feedback from humans into the training process. This approach helps ensure that the model's responses are more in line with what users expect or prefer.
Sources: Pages [3, 2, 7]

Q: Explain quantization
A: Quantization is a technique used in machine learning and neural networks to reduce the precision of the numbers used to represent model parameters and activations. This process involves converting floating-point numbers (which typically use 32 bits) into lower-bit representations, such as 16-bit or 8-bit integers. The main goals of quantization are to decrease the model size, reduce memory bandwidth requirements, and improve inference speed, especially on hardware with limited computational resources.

There are different types of quantization, including:

1. **Post-training quantization**: This is applied after the model 

---

# Part 5: Comparison - Manual vs LangChain

## Code Comparison

### Your Manual RAG:
```python
# ~200+ lines of code

# Load PDF
reader = PdfReader(pdf_path)
pages = [...]

# Chunk
def chunk_text(...):
    # 20 lines of logic
chunks = chunk_text(pages)

# Embeddings + Store
model = SentenceTransformer(...)
client = chromadb.Client()
collection = client.create_collection(...)
collection.add(...)

# Retrieve + Generate
def retrieve(...):
    # 15 lines
def generate(...):
    # 25 lines

class SimpleRAG:
    # 100+ lines
```

### LangChain RAG:
```python
# ~50 lines of code

from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

# Load + Split
loader = PyPDFLoader("file.pdf")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500)
chunks = splitter.split_documents(docs)

# Embed + Store
embeddings = HuggingFaceEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)

# LLM + Chain
llm = ChatOpenAI(model="gpt-4o-mini")
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever()
)

# Use it
result = qa_chain.invoke({"query": "question"})
```

**75% less code!** üéâ

## Feature Comparison

| Feature | Manual | LangChain |
|---------|--------|------------|
| **Lines of code** | 200+ | ~50 |
| **PDF loading** | Manual | ‚úÖ Built-in |
| **Text splitting** | Custom logic | ‚úÖ Pre-built splitters |
| **Embeddings** | Manual encode | ‚úÖ Unified interface |
| **Vector store** | Manual setup | ‚úÖ One-liner |
| **Retrieval** | Custom similarity | ‚úÖ Multiple strategies |
| **Generation** | Manual prompts | ‚úÖ Chains |
| **Memory** | None | ‚úÖ Built-in |
| **Swap components** | Rewrite code | ‚úÖ Change one line |
| **Error handling** | Manual | ‚úÖ Built-in |
| **Streaming** | Manual | ‚úÖ Built-in |
| **Monitoring** | Manual | ‚úÖ LangSmith |

## When to Use Each?

### Use Manual RAG when:
- ‚úÖ Learning fundamentals (you did this!)
- ‚úÖ Need complete control
- ‚úÖ Very simple, specific use case

### Use LangChain when:
- ‚úÖ **Production applications**
- ‚úÖ **Need to iterate fast**
- ‚úÖ **Want to swap components easily**
- ‚úÖ **Team collaboration**
- ‚úÖ **Most real-world projects**

**90% of the time ‚Üí Use LangChain!** üöÄ

---

# Summary: Why LangChain?

## What You Learned

### Core Components:
1. ‚úÖ **Document Loaders** - PDF, CSV, Web, 100+ sources
2. ‚úÖ **Text Splitters** - Smart chunking strategies
3. ‚úÖ **Embeddings** - Unified interface for any model
4. ‚úÖ **Vector Stores** - ChromaDB, FAISS, Pinecone, etc.
5. ‚úÖ **LLMs** - OpenAI, Google, Claude, 50+ providers
6. ‚úÖ **Chains** - RetrievalQA, ConversationalRetrievalChain
7. ‚úÖ **Memory** - Conversation history
8. ‚úÖ **LCEL** - Composable chains with `|`

### Key Benefits:

‚úÖ **75% less code** - Focus on logic, not boilerplate  
‚úÖ **Modular** - Swap any component easily  
‚úÖ **Production-tested** - Used by thousands of companies  
‚úÖ **Well-documented** - Great community support  
‚úÖ **Fast iteration** - Try ideas quickly  
‚úÖ **Built-in features** - Memory, streaming, monitoring  

## Your Learning Journey

```
‚úÖ Day 1: Embeddings (manual)
‚úÖ Day 2: LLM APIs (manual)
‚úÖ Day 3: Basic RAG (manual)
‚úÖ Day 4: Production RAG (manual)
‚úÖ Day 5: LangChain (framework) ‚Üê You are here!
```

**You now know:**
1. ‚úÖ How RAG works under the hood (manual implementation)
2. ‚úÖ How to build production RAG fast (LangChain)

**This is powerful!** Most people only know #2. You know both! üí™

## Next Steps

1. ‚úÖ **Practice** - Build a LangChain RAG with your own PDFs
2. üîú **Advanced RAG** - Multi-query, parent-child chunks, hybrid search
3. üîú **LangChain Agents** - LLMs that use tools and make decisions
4. üîú **LangSmith** - Monitor and debug LangChain apps
5. üîú **Deploy** - Build a web UI (Streamlit) or API (FastAPI)

## Practice Exercise

**Challenge:** Build a multi-document RAG system

1. Load 3 different PDFs
2. Store them in the same vector store
3. Add metadata to track which PDF each chunk came from
4. Allow users to filter by source document

**Hint:** Use `DirectoryLoader` and metadata filtering!

---

## You're Now a LangChain Developer! üéâ

You can:
- ‚úÖ Build RAG systems in minutes (not hours)
- ‚úÖ Swap LLMs, embeddings, vector stores easily
- ‚úÖ Add memory and conversation history
- ‚úÖ Use production-ready patterns
- ‚úÖ Understand what's happening under the hood

**That last point is crucial** - because you built RAG manually first, you're not just using LangChain blindly. You understand every component! üß†

Keep building! üöÄ