# üéì Week 15 - Day 1: Retrieval-Augmented Generation (RAG)

## Today's Goals:
‚úÖ Understand RAG concepts and architecture

‚úÖ Learn about embeddings and vector stores

‚úÖ Build a complete RAG system with LangChain

‚úÖ Query your own documents using natural language

---

## üîß Part 1: Setup - Install & Import All Libraries

**IMPORTANT:** Run ALL cells in this part sequentially!

In [1]:
# STEP 1: Install required packages
print("üì¶ Installing packages... (this may take 1-2 minutes)\n")

!pip install -q langchain langchain-community langchain-huggingface
!pip install -q faiss-cpu sentence-transformers
!pip install -q langchain-groq python-dotenv
!pip install -q chromadb tiktoken

print("\n‚úÖ All packages installed successfully!")

üì¶ Installing packages... (this may take 1-2 minutes)



ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain 0.1.20 requires langchain-core<0.2.0,>=0.1.52, but you have langchain-core 1.1.0 which is incompatible.
langchain 0.1.20 requires langsmith<0.2.0,>=0.1.17, but you have langsmith 0.4.49 which is incompatible.
langchain-community 0.0.38 requires langchain-core<0.2.0,>=0.1.52, but you have langchain-core 1.1.0 which is incompatible.
langchain-community 0.0.38 requires langsmith<0.2.0,>=0.1.0, but you have langsmith 0.4.49 which is incompatible.
langchain-huggingface 0.0.3 requires langchain-core<0.3,>=0.1.52, but you have langchain-core 1.1.0 which is incompatible.
langchain-openai 0.1.7 requires langchain-core<0.3,>=0.1.46, but you have langchain-core 1.1.0 which is incompatible.
langchain-text-splitters 0.0.2 requires langchain-core<0.3,>=0.1.28, but you have langchain-core 1.1.0 which is incompatible.



‚úÖ All packages installed successfully!


In [2]:
# STEP 2: Import ALL libraries
import os
import warnings
warnings.filterwarnings('ignore')

# LangChain Core
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter
from langchain.schema import Document

# Document Loaders
from langchain_community.document_loaders import TextLoader, DirectoryLoader

# Embeddings
from langchain_huggingface import HuggingFaceEmbeddings

# Vector Stores
from langchain_community.vectorstores import FAISS, Chroma

# LLM
from langchain_groq import ChatGroq

# Chains
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

print("‚úÖ All libraries imported successfully!")

ModuleNotFoundError: No module named 'langchain_core.memory'

In [None]:
# STEP 3: Set up API Key for Groq (Free LLM API)
# Get your free API key from: https://console.groq.com/

# Option 1: Set directly (for learning - don't do this in production!)
GROQ_API_KEY = "your-groq-api-key-here"  # Replace with your actual key

# Option 2: Use environment variable (recommended)
# os.environ["GROQ_API_KEY"] = "your-key-here"

os.environ["GROQ_API_KEY"] = GROQ_API_KEY

print("‚úÖ API key configured!")
print("\nüí° Get a FREE Groq API key at: https://console.groq.com/")
print("üöÄ Ready to build RAG!")

---

## üìù Part 2: Understanding the RAG Pipeline

Before we code, let's understand what we're building!

### üéØ What is RAG?

**RAG = Retrieval-Augmented Generation**

It's a technique that makes LLMs smarter by:
1. **Retrieving** relevant information from your documents
2. **Augmenting** the LLM's prompt with that information
3. **Generating** an answer using both its knowledge AND your data

### üîÑ The RAG Pipeline:

```
üìÑ Documents ‚Üí ‚úÇÔ∏è Chunks ‚Üí üî¢ Embeddings ‚Üí üíæ Vector Store
                                                    ‚Üì
üí¨ Query ‚Üí üî¢ Embed Query ‚Üí üîç Search ‚Üí üì• Get Top-K ‚Üí ü§ñ LLM ‚Üí ‚úÖ Answer
```

---

## üìÑ Part 3: Creating Sample Documents

Let's create some sample documents to work with. In a real scenario, you'd load PDFs, Word docs, or text files.

In [None]:
# Create sample documents about a fictional company
# In real projects, you'd load actual files!

sample_documents = [
    Document(
        page_content="""
        TechCorp Employee Handbook - Chapter 1: Company Overview
        
        TechCorp was founded in 2015 by Sarah Chen and Michael Rodriguez. 
        Our headquarters is located in San Francisco, California. 
        We have over 500 employees across 3 offices: San Francisco, New York, and London.
        
        Our mission is to make AI accessible to everyone through innovative products.
        Our core values are: Innovation, Integrity, Inclusivity, and Impact.
        """,
        metadata={"source": "employee_handbook.pdf", "chapter": 1}
    ),
    Document(
        page_content="""
        TechCorp Employee Handbook - Chapter 2: Leave Policy
        
        Annual Leave: All employees receive 20 days of paid annual leave per year.
        Sick Leave: Employees can take up to 10 days of paid sick leave annually.
        Parental Leave: New parents receive 16 weeks of paid parental leave.
        
        To request leave, submit a request through the HR portal at least 2 weeks in advance.
        Emergency leave can be requested by emailing hr@techcorp.com.
        """,
        metadata={"source": "employee_handbook.pdf", "chapter": 2}
    ),
    Document(
        page_content="""
        TechCorp Employee Handbook - Chapter 3: Remote Work Policy
        
        TechCorp supports hybrid work arrangements. Employees can work remotely 
        up to 3 days per week. Core hours are 10 AM to 4 PM in your local timezone.
        
        To set up remote work:
        1. Get approval from your manager
        2. Ensure you have reliable internet (minimum 50 Mbps)
        3. Set up your home office following our ergonomics guide
        4. Install the company VPN for secure access
        
        Remote employees must attend in-person meetings when required.
        """,
        metadata={"source": "employee_handbook.pdf", "chapter": 3}
    ),
    Document(
        page_content="""
        TechCorp IT Support Guide - Password Reset
        
        To reset your password:
        1. Go to portal.techcorp.com/reset
        2. Enter your employee ID and registered email
        3. Click 'Send Reset Link'
        4. Check your email for the reset link (valid for 24 hours)
        5. Create a new password following our security requirements:
           - Minimum 12 characters
           - At least one uppercase letter
           - At least one number
           - At least one special character
        
        If you can't access your email, contact IT support at it-help@techcorp.com
        or call the helpdesk at extension 5555.
        """,
        metadata={"source": "it_guide.pdf", "topic": "password"}
    ),
    Document(
        page_content="""
        TechCorp Benefits Summary - Health Insurance
        
        All full-time employees are eligible for comprehensive health insurance.
        
        Plans offered:
        - Basic Plan: $0 monthly premium, $2000 deductible
        - Standard Plan: $50 monthly premium, $1000 deductible
        - Premium Plan: $150 monthly premium, $500 deductible
        
        Dental and vision coverage is included in all plans.
        Family coverage is available at additional cost.
        
        Enrollment period is January 1-31 each year.
        New employees can enroll within 30 days of their start date.
        """,
        metadata={"source": "benefits_guide.pdf", "topic": "health"}
    )
]

print(f"üìö Created {len(sample_documents)} sample documents!")
print("\nüìÑ Documents created:")
for i, doc in enumerate(sample_documents, 1):
    print(f"   {i}. {doc.metadata.get('source', 'unknown')} - {doc.page_content[:50]}...")

---

## ‚úÇÔ∏è Part 4: Text Splitting (Chunking)

Documents need to be split into smaller chunks for effective retrieval.

### Why Chunking?
- LLMs have context limits
- Smaller chunks = more precise retrieval
- Better for finding specific information

In [None]:
# Create a text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,        # Maximum characters per chunk
    chunk_overlap=50,      # Overlap between chunks (preserves context)
    length_function=len,
    separators=["\n\n", "\n", " ", ""]  # Split priorities
)

# Split the documents
chunks = text_splitter.split_documents(sample_documents)

print(f"‚úÇÔ∏è Split {len(sample_documents)} documents into {len(chunks)} chunks!")
print("\nüìä Chunk Statistics:")
print(f"   Average chunk size: {sum(len(c.page_content) for c in chunks) // len(chunks)} characters")
print(f"   Smallest chunk: {min(len(c.page_content) for c in chunks)} characters")
print(f"   Largest chunk: {max(len(c.page_content) for c in chunks)} characters")

In [None]:
# Let's examine a few chunks
print("üìã Sample Chunks:\n")
print("=" * 60)

for i, chunk in enumerate(chunks[:3], 1):
    print(f"\nüîπ Chunk {i}:")
    print(f"   Source: {chunk.metadata.get('source', 'unknown')}")
    print(f"   Length: {len(chunk.page_content)} chars")
    print(f"   Content: {chunk.page_content[:150]}...")
    print("-" * 60)

### üí° Key Insights:

‚úÖ **RecursiveCharacterTextSplitter** is smart - it tries to keep paragraphs together

‚úÖ **Chunk overlap** ensures context isn't lost at boundaries

‚úÖ **Metadata is preserved** - we can track which document each chunk came from

---

## üî¢ Part 5: Creating Embeddings

Embeddings convert text into numerical vectors that capture semantic meaning.

Similar meanings = Similar vectors!

In [None]:
# Create an embedding model
# We use a free, open-source model from HuggingFace
print("‚è≥ Loading embedding model... (first time may take 1-2 minutes)\n")

embeddings = HuggingFaceEmbeddings(
    model_name="all-MiniLM-L6-v2",  # Small, fast, and effective!
    model_kwargs={'device': 'cpu'},
    encode_kwargs={'normalize_embeddings': True}
)

print("‚úÖ Embedding model loaded successfully!")

In [None]:
# Let's see what an embedding looks like!
sample_text = "How do I reset my password?"
sample_embedding = embeddings.embed_query(sample_text)

print(f"üìù Text: '{sample_text}'")
print(f"\nüî¢ Embedding (first 10 values):")
print(f"   {sample_embedding[:10]}")
print(f"\nüìä Embedding dimensions: {len(sample_embedding)}")
print("\nüí° This 384-dimensional vector captures the 'meaning' of the text!")

In [None]:
# Demonstrate semantic similarity
import numpy as np

def cosine_similarity(vec1, vec2):
    """Calculate cosine similarity between two vectors"""
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

# Three sentences to compare
sentences = [
    "How do I reset my password?",
    "I forgot my login credentials",  # Similar meaning!
    "What is the weather today?"       # Different meaning!
]

# Get embeddings for all
embeddings_list = [embeddings.embed_query(s) for s in sentences]

print("üîç Semantic Similarity Demo:\n")
print(f"Sentence 1: '{sentences[0]}'")
print(f"Sentence 2: '{sentences[1]}'")
print(f"Sentence 3: '{sentences[2]}'")

sim_1_2 = cosine_similarity(embeddings_list[0], embeddings_list[1])
sim_1_3 = cosine_similarity(embeddings_list[0], embeddings_list[2])

print(f"\nüìä Similarity Scores:")
print(f"   Sentence 1 ‚Üî Sentence 2: {sim_1_2:.4f} (Similar meaning!)")
print(f"   Sentence 1 ‚Üî Sentence 3: {sim_1_3:.4f} (Different meaning!)")
print("\n‚úÖ Higher score = More similar meaning!")

### üí° Key Insights:

‚úÖ **Embeddings capture meaning**, not just keywords

‚úÖ **"Password" and "credentials"** are recognized as similar concepts

‚úÖ **Cosine similarity** measures how aligned two vectors are (0 to 1)

---

## üíæ Part 6: Creating a Vector Store

A vector store is a specialized database for storing and searching embeddings.

We'll use **FAISS** - Facebook's fast similarity search library!

In [None]:
# Create a FAISS vector store from our chunks
print("‚è≥ Creating vector store...\n")

# This automatically:
# 1. Converts all chunks to embeddings
# 2. Stores them in FAISS index
# 3. Keeps track of metadata

vectorstore = FAISS.from_documents(
    documents=chunks,
    embedding=embeddings
)

print("‚úÖ Vector store created successfully!")
print(f"üìä Indexed {len(chunks)} chunks")

In [None]:
# Let's test the retrieval!
query = "How many days of annual leave do I get?"

print(f"üîç Query: '{query}'\n")
print("=" * 60)

# Search for similar documents
results = vectorstore.similarity_search_with_score(query, k=3)

print(f"\nüì• Top 3 Retrieved Chunks:\n")
for i, (doc, score) in enumerate(results, 1):
    print(f"\nüîπ Result {i} (Similarity: {1-score:.4f}):")  # FAISS returns distance, not similarity
    print(f"   Source: {doc.metadata.get('source', 'unknown')}")
    print(f"   Content: {doc.page_content[:200]}...")
    print("-" * 60)

### üí° Key Insights:

‚úÖ **FAISS.from_documents()** handles embedding creation automatically

‚úÖ **similarity_search_with_score()** returns both documents AND similarity scores

‚úÖ The most relevant chunks about "leave policy" are retrieved!

---

## ü§ñ Part 7: Building the Complete RAG Chain

Now let's connect everything: Vector Store + LLM = RAG!

We'll use **Groq** for fast, free LLM inference.

In [None]:
# Initialize the LLM
llm = ChatGroq(
    model="llama-3.1-8b-instant",  # Fast and capable!
    temperature=0,  # Deterministic outputs
    max_tokens=500
)

print("‚úÖ LLM initialized!")
print("üìä Model: Llama 3.1 8B (via Groq)")

In [None]:
# Create a custom prompt template for RAG
prompt_template = """
You are a helpful assistant for TechCorp employees. 
Answer the question based ONLY on the following context.
If you don't know the answer based on the context, say "I don't have that information in my knowledge base."

Context:
{context}

Question: {question}

Answer:"""

PROMPT = PromptTemplate(
    template=prompt_template,
    input_variables=["context", "question"]
)

print("‚úÖ Prompt template created!")

In [None]:
# Create the retriever from our vector store
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3}  # Return top 3 most similar chunks
)

print("‚úÖ Retriever created!")
print("üìä Will retrieve top 3 most relevant chunks for each query")

In [None]:
# Build the RetrievalQA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",  # "stuff" = put all retrieved docs into prompt
    retriever=retriever,
    return_source_documents=True,  # Also return the source chunks!
    chain_type_kwargs={"prompt": PROMPT}
)

print("‚úÖ RAG Chain built successfully!")
print("\nüéâ Your RAG system is ready to answer questions!")

---

## üí¨ Part 8: Testing Your RAG System!

Let's ask some questions and see RAG in action!

In [None]:
def ask_question(question):
    """Ask a question and display the answer with sources"""
    print(f"\n‚ùì Question: {question}")
    print("=" * 60)
    
    # Get the answer
    result = qa_chain.invoke({"query": question})
    
    print(f"\n‚úÖ Answer:\n{result['result']}")
    
    print(f"\nüìö Sources Used:")
    for i, doc in enumerate(result['source_documents'], 1):
        print(f"   {i}. {doc.metadata.get('source', 'unknown')}")
    
    print("\n" + "=" * 60)
    return result

In [None]:
# Question 1: Leave Policy
result1 = ask_question("How many days of annual leave do employees get?")

In [None]:
# Question 2: Password Reset
result2 = ask_question("How do I reset my password?")

In [None]:
# Question 3: Remote Work
result3 = ask_question("Can I work from home? What are the requirements?")

In [None]:
# Question 4: Health Insurance
result4 = ask_question("What health insurance plans are available?")

In [None]:
# Question 5: Test with question NOT in documents
result5 = ask_question("What is the company's stock price?")

### üí° Key Observations:

‚úÖ **Accurate answers** - The system pulls information directly from the documents

‚úÖ **Source attribution** - We can see which documents were used

‚úÖ **Handles unknown queries** - When information isn't in the documents, it says so!

---

## üéØ Part 9: Mini Challenge

### üèÜ Challenge Tasks:

**Your Mission:**
1. Create your own document about a topic you're interested in
2. Add it to the vector store
3. Ask questions about your new document!

**Hints:**
```python
# Create a new document
my_doc = Document(
    page_content="Your content here...",
    metadata={"source": "my_document.txt"}
)

# Add to vector store
vectorstore.add_documents([my_doc])
```

**Expected Outcome:**
- Your RAG system should now answer questions about your new content!

In [None]:
# Your code here!
# Try adding your own document and asking questions about it

# Step 1: Create your document
# my_doc = Document(...)

# Step 2: Add to vector store
# vectorstore.add_documents([my_doc])

# Step 3: Ask questions!
# ask_question("Your question about your document")

pass

---

## üìö Summary - What We Learned Today

### 1. RAG Fundamentals üéØ
- RAG = Retrieval-Augmented Generation
- Combines search with LLM generation
- Grounds LLM answers in your actual data

### 2. Document Processing ‚úÇÔ∏è
- Split documents into manageable chunks
- Use overlap to preserve context
- Keep metadata for source tracking

### 3. Embeddings üî¢
- Convert text to numerical vectors
- Similar meanings = similar vectors
- Enable semantic search (not just keyword matching)

### 4. Vector Stores üíæ
- FAISS for fast similarity search
- Index documents for quick retrieval
- Return most relevant chunks

### 5. RAG Chain üîó
- Connect retriever + LLM
- Custom prompts guide the LLM
- Return answers with sources

---

## üéØ Key Takeaways

‚úÖ **RAG makes LLMs more accurate** by grounding answers in your data

‚úÖ **Chunk size matters** - experiment to find what works best

‚úÖ **Good prompts are crucial** - tell the LLM to use ONLY the context

‚úÖ **Source attribution** builds trust in AI answers

‚úÖ **LangChain simplifies** the entire RAG pipeline

---

## üí° Pro Tips

1. **Start with quality data** - Clean, well-structured documents work best
2. **Tune chunk size** - Try 500-1000 characters to start
3. **Use metadata** - Track sources for debugging and citations
4. **Test retrieval first** - Make sure the right chunks are being found
5. **Iterate on prompts** - The prompt template greatly affects output quality

---

## üöÄ Next Steps - Tomorrow!

**Day 2: Advanced RAG**
- Hybrid retrieval (keyword + semantic)
- Re-ranking strategies
- Handling large document collections
- Production-ready optimizations

**Get ready to take your RAG skills to the next level! üöÄ**

---

## üéâ Congratulations!

You've built your first RAG system!

You now know how to:
- ‚úÖ Process documents into chunks
- ‚úÖ Create embeddings for semantic search
- ‚úÖ Store and search vectors with FAISS
- ‚úÖ Build a complete RAG chain with LangChain
- ‚úÖ Answer questions from your own documents!

**Keep practicing and see you tomorrow! üöÄ**