# üéì Week 15 - Day 2: Advanced RAG Techniques

## Today's Goals:
‚úÖ Implement **Hybrid Search** (BM25 + Semantic)

‚úÖ Add **Re-ranking** for improved precision

‚úÖ Use **Multi-Query Retrieval** for better coverage

‚úÖ Build **Parent Document Retriever** for hierarchical context

‚úÖ Compare basic vs advanced RAG performance

---

## üîß Part 1: Setup - Install & Import All Libraries

**IMPORTANT:** Run ALL cells in this part before continuing!

In [1]:
# STEP 1: Install required packages (including new ones for advanced RAG)
print("üì¶ Installing packages... (this may take 2-3 minutes)\n")

!pip install -q langchain langchain-community langchain-huggingface
!pip install -q faiss-cpu sentence-transformers
!pip install -q langchain-groq python-dotenv
!pip install -q rank-bm25  # For BM25 keyword search
!pip install -q chromadb tiktoken

print("\n‚úÖ All packages installed successfully!")

üì¶ Installing packages... (this may take 2-3 minutes)



ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-groq 1.1.0 requires langchain-core<2.0.0,>=1.1.0, but you have langchain-core 0.1.53 which is incompatible.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain 0.1.20 requires langchain-core<0.2.0,>=0.1.52, but you have langchain-core 1.1.0 which is incompatible.
langchain 0.1.20 requires langsmith<0.2.0,>=0.1.17, but you have langsmith 0.4.49 which is incompatible.
langchain-community 0.0.38 requires langchain-core<0.2.0,>=0.1.52, but you have langchain-core 1.1.0 which is incompatible.
langchain-community 0.0.38 requires langsmith<0.2.0,>=0.1.0, but you have langsmith 0.4.49 which is incompatible.
langchain-huggingface 0.0.3 requires langchain-core<0.3,>=0.1.52, but you h


‚úÖ All packages installed successfully!


In [2]:
# STEP 2: Import ALL libraries
import os
import warnings
warnings.filterwarnings('ignore')

# Core
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document

# Document Loaders
from langchain_community.document_loaders import TextLoader

# Embeddings
from langchain_huggingface import HuggingFaceEmbeddings

# Vector Stores
from langchain_community.vectorstores import FAISS

# Retrievers - NEW for Day 2!
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore

# Re-ranking
from sentence_transformers import CrossEncoder

# LLM
from langchain_groq import ChatGroq

# Chains
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

# BM25 for keyword search
from rank_bm25 import BM25Okapi

import numpy as np

print("‚úÖ All libraries imported successfully!")

ModuleNotFoundError: No module named 'langchain_core.memory'

In [None]:
# STEP 3: Set up API Key
GROQ_API_KEY = "your-groq-api-key-here"  # Replace with your actual key
os.environ["GROQ_API_KEY"] = GROQ_API_KEY

print("‚úÖ API key configured!")
print("üí° Get a FREE Groq API key at: https://console.groq.com/")

---

## üìÑ Part 2: Create Sample Documents

We'll use the same TechCorp documents from Day 1, plus some additional ones to test advanced techniques.

In [None]:
# Create sample documents
sample_documents = [
    Document(
        page_content="""
        TechCorp Employee Handbook - Chapter 1: Company Overview
        
        TechCorp was founded in 2015 by Sarah Chen and Michael Rodriguez. 
        Our headquarters is located in San Francisco, California. 
        We have over 500 employees across 3 offices: San Francisco, New York, and London.
        
        Our mission is to make AI accessible to everyone through innovative products.
        Our core values are: Innovation, Integrity, Inclusivity, and Impact.
        The company ID for TechCorp in the registry is TC-2015-SF.
        """,
        metadata={"source": "employee_handbook.pdf", "chapter": 1, "topic": "company"}
    ),
    Document(
        page_content="""
        TechCorp Employee Handbook - Chapter 2: Leave Policy
        
        Annual Leave: All employees receive 20 days of paid annual leave per year.
        Sick Leave: Employees can take up to 10 days of paid sick leave annually.
        Parental Leave: New parents receive 16 weeks of paid parental leave.
        
        To request leave, submit a request through the HR portal at least 2 weeks in advance.
        Emergency leave can be requested by emailing hr@techcorp.com.
        Leave policy reference number: LP-2023-V2.
        """,
        metadata={"source": "employee_handbook.pdf", "chapter": 2, "topic": "leave"}
    ),
    Document(
        page_content="""
        TechCorp Employee Handbook - Chapter 3: Remote Work Policy
        
        TechCorp supports hybrid work arrangements. Employees can work remotely 
        up to 3 days per week. Core hours are 10 AM to 4 PM in your local timezone.
        
        To set up remote work:
        1. Get approval from your manager
        2. Ensure you have reliable internet (minimum 50 Mbps)
        3. Set up your home office following our ergonomics guide
        4. Install the company VPN for secure access
        
        Remote work policy code: RW-POL-2024.
        """,
        metadata={"source": "employee_handbook.pdf", "chapter": 3, "topic": "remote"}
    ),
    Document(
        page_content="""
        TechCorp IT Support Guide - Password Reset
        
        To reset your password:
        1. Go to portal.techcorp.com/reset
        2. Enter your employee ID and registered email
        3. Click 'Send Reset Link'
        4. Check your email for the reset link (valid for 24 hours)
        5. Create a new password following our security requirements
        
        Password requirements: Minimum 12 characters, uppercase, number, special char.
        IT ticket category: PWD-RESET-001
        """,
        metadata={"source": "it_guide.pdf", "topic": "password"}
    ),
    Document(
        page_content="""
        TechCorp Benefits Summary - Health Insurance
        
        All full-time employees are eligible for comprehensive health insurance.
        
        Plans offered:
        - Basic Plan (Plan ID: HI-BASIC-01): $0 monthly premium, $2000 deductible
        - Standard Plan (Plan ID: HI-STD-02): $50 monthly premium, $1000 deductible
        - Premium Plan (Plan ID: HI-PREM-03): $150 monthly premium, $500 deductible
        
        Dental and vision coverage is included in all plans.
        Enrollment period is January 1-31 each year.
        """,
        metadata={"source": "benefits_guide.pdf", "topic": "health"}
    ),
    Document(
        page_content="""
        TechCorp Expense Policy - Travel Reimbursement
        
        Business travel expenses are reimbursable following these guidelines:
        - Flights: Economy class for trips under 6 hours, business class for longer
        - Hotels: Up to $200/night in major cities, $150/night elsewhere
        - Meals: Up to $75/day per diem
        - Ground transport: Actual costs with receipts
        
        Submit expenses within 30 days through Expensify.
        Policy reference: EXP-TRV-2024-V1
        """,
        metadata={"source": "expense_policy.pdf", "topic": "travel"}
    ),
    Document(
        page_content="""
        TechCorp Product Documentation - API Reference
        
        TechCorp AI API v2.0 Documentation
        
        Base URL: https://api.techcorp.com/v2/
        Authentication: Bearer token in Authorization header
        Rate limits: 1000 requests/minute for standard tier
        
        Endpoints:
        - POST /analyze: Submit text for analysis
        - GET /results/{id}: Retrieve analysis results
        - DELETE /jobs/{id}: Cancel a running job
        
        Error code API-429: Rate limit exceeded
        Error code API-401: Invalid authentication
        """,
        metadata={"source": "api_docs.pdf", "topic": "api"}
    )
]

print(f"üìö Created {len(sample_documents)} sample documents!")
print("\nüìÑ Documents:")
for i, doc in enumerate(sample_documents, 1):
    print(f"   {i}. {doc.metadata.get('source')} - {doc.metadata.get('topic', 'general')}")

In [None]:
# Create text splitter and chunk documents
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,
    chunk_overlap=50,
    length_function=len,
)

chunks = text_splitter.split_documents(sample_documents)
print(f"‚úÇÔ∏è Split into {len(chunks)} chunks")

In [None]:
# Create embeddings model
print("‚è≥ Loading embedding model...\n")

embeddings = HuggingFaceEmbeddings(
    model_name="all-MiniLM-L6-v2",
    model_kwargs={'device': 'cpu'},
    encode_kwargs={'normalize_embeddings': True}
)

print("‚úÖ Embedding model loaded!")

In [None]:
# Create FAISS vector store (basic retriever)
vectorstore = FAISS.from_documents(chunks, embeddings)
basic_retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

print("‚úÖ Basic vector store created!")

---

## üîÄ Part 3: Hybrid Search (BM25 + Semantic)

Hybrid search combines:
- **BM25 (keyword search)**: Great for exact matches, IDs, specific terms
- **Semantic search**: Great for meaning, synonyms, context

Together = Best of both worlds! üéØ

In [None]:
# Create BM25 Retriever for keyword search
bm25_retriever = BM25Retriever.from_documents(chunks)
bm25_retriever.k = 4  # Return top 4 results

print("‚úÖ BM25 Retriever created!")
print("üìä Uses keyword matching (TF-IDF based)")

In [None]:
# Create Ensemble Retriever (Hybrid Search!)
# This combines BM25 and Vector search with equal weights

hybrid_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, basic_retriever],
    weights=[0.5, 0.5]  # Equal weight to keyword and semantic
)

print("‚úÖ Hybrid (Ensemble) Retriever created!")
print("üìä Combines: 50% BM25 + 50% Semantic Search")

In [None]:
# Let's test: Query with a specific ID (BM25 will help!)
query = "What is policy LP-2023-V2 about?"

print(f"üîç Query: '{query}'\n")
print("=" * 60)

# Compare: Semantic only vs Hybrid
print("\nüìä Semantic Search Results:")
semantic_results = basic_retriever.invoke(query)
for i, doc in enumerate(semantic_results[:2], 1):
    print(f"   {i}. {doc.metadata.get('source')} - {doc.page_content[:80]}...")

print("\nüîÄ Hybrid Search Results:")
hybrid_results = hybrid_retriever.invoke(query)
for i, doc in enumerate(hybrid_results[:2], 1):
    print(f"   {i}. {doc.metadata.get('source')} - {doc.page_content[:80]}...")

print("\nüí° Notice: Hybrid search finds the exact policy ID better!")

### üí° Key Insight:

When the query contains specific identifiers (like "LP-2023-V2"), BM25 excels at finding exact matches.
Semantic search might miss these because embeddings don't capture arbitrary IDs well.

**Hybrid = Best coverage!**

---

## üéØ Part 4: Re-ranking for Better Precision

**Problem**: Initial retrieval returns candidates, but the order might not be optimal.

**Solution**: Use a **Cross-Encoder** to re-score and re-order the results.

Cross-encoders are more accurate than bi-encoders because they see query AND document together!

In [None]:
# Load a Cross-Encoder for re-ranking
print("‚è≥ Loading Cross-Encoder re-ranker...\n")

reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

print("‚úÖ Re-ranker loaded!")
print("üìä Model: MS-MARCO MiniLM (trained on search relevance)")

In [None]:
def rerank_documents(query, documents, top_k=3):
    """
    Re-rank documents using a cross-encoder.
    
    Args:
        query: The search query
        documents: List of retrieved documents
        top_k: Number of top results to return
    
    Returns:
        Re-ranked documents (best first)
    """
    # Create query-document pairs for scoring
    pairs = [[query, doc.page_content] for doc in documents]
    
    # Get relevance scores from cross-encoder
    scores = reranker.predict(pairs)
    
    # Sort documents by score (descending)
    scored_docs = list(zip(documents, scores))
    scored_docs.sort(key=lambda x: x[1], reverse=True)
    
    # Return top-k documents
    return [(doc, score) for doc, score in scored_docs[:top_k]]

print("‚úÖ Re-ranking function created!")

In [None]:
# Test re-ranking
query = "How many vacation days do I get per year?"

print(f"üîç Query: '{query}'\n")
print("=" * 60)

# Step 1: Initial retrieval (get more candidates)
initial_results = hybrid_retriever.invoke(query)

print("üì• Initial Retrieval (Top 4):")
for i, doc in enumerate(initial_results[:4], 1):
    print(f"   {i}. [{doc.metadata.get('topic', '?')}] {doc.page_content[:60]}...")

# Step 2: Re-rank
reranked = rerank_documents(query, initial_results, top_k=3)

print("\nüéØ After Re-ranking (Top 3):")
for i, (doc, score) in enumerate(reranked, 1):
    print(f"   {i}. [Score: {score:.3f}] [{doc.metadata.get('topic', '?')}] {doc.page_content[:50]}...")

print("\n‚úÖ Re-ranking improved result ordering!")

### üí° Key Insight:

The cross-encoder gives a **relevance score** for each query-document pair.
This helps push the most relevant documents to the top!

**Best Practice**: Retrieve 20-50 candidates, re-rank to top 3-5.

---

## üîÑ Part 5: Multi-Query Retrieval

**Problem**: User queries are often vague or incomplete.

**Solution**: Use an LLM to generate multiple variations of the query, then retrieve for all of them!

This increases the chance of finding relevant documents.

In [None]:
# Initialize LLM for query generation
llm = ChatGroq(
    model="llama-3.1-8b-instant",
    temperature=0.3,  # Slightly creative for diverse queries
    max_tokens=200
)

print("‚úÖ LLM initialized for multi-query generation!")

In [None]:
# Create Multi-Query Retriever
multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=basic_retriever,
    llm=llm
)

print("‚úÖ Multi-Query Retriever created!")
print("üìä Will generate multiple query variations automatically")

In [None]:
# Test multi-query retrieval
import logging
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.DEBUG)

query = "time off"

print(f"üîç Original Query: '{query}'\n")
print("=" * 60)

# This will generate multiple queries and retrieve for all
results = multi_query_retriever.invoke(query)

print(f"\nüì• Retrieved {len(results)} unique documents!")
print("\nüîπ Sample Results:")
for i, doc in enumerate(results[:3], 1):
    print(f"   {i}. [{doc.metadata.get('topic', '?')}] {doc.page_content[:60]}...")

### üí° Key Insight:

The LLM automatically generated queries like:
- "vacation policy"
- "annual leave days"
- "PTO policy"

This catches more relevant documents than just searching "time off"!

---

## üå≥ Part 6: Parent Document Retriever

**Problem**: Small chunks = precise search, but limited context for LLM.

**Solution**: 
- **Search** on small child chunks (precise matching)
- **Return** larger parent chunks (rich context)

Best of both worlds!

In [None]:
# Create text splitters for parent and child
parent_splitter = RecursiveCharacterTextSplitter(
    chunk_size=800,   # Larger chunks for context
    chunk_overlap=100
)

child_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,   # Smaller chunks for precise search
    chunk_overlap=20
)

print("‚úÖ Text splitters created!")
print("üìä Parent: 800 chars | Child: 200 chars")

In [None]:
# Create a new vector store for parent-child retrieval
parent_vectorstore = FAISS.from_documents(
    child_splitter.split_documents(sample_documents),
    embeddings
)

# In-memory store for parent documents
docstore = InMemoryStore()

# Create Parent Document Retriever
parent_retriever = ParentDocumentRetriever(
    vectorstore=parent_vectorstore,
    docstore=docstore,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)

# Add documents
parent_retriever.add_documents(sample_documents)

print("‚úÖ Parent Document Retriever created!")

In [None]:
# Test Parent Document Retriever
query = "password reset steps"

print(f"üîç Query: '{query}'\n")
print("=" * 60)

# Normal retrieval (small chunks)
normal_results = basic_retriever.invoke(query)
print("üìÑ Normal Retrieval (small chunks):")
print(f"   Chunk length: {len(normal_results[0].page_content)} chars")
print(f"   Content: {normal_results[0].page_content[:100]}...")

# Parent document retrieval (returns larger context)
parent_results = parent_retriever.invoke(query)
print("\nüå≥ Parent Document Retrieval (full context):")
print(f"   Chunk length: {len(parent_results[0].page_content)} chars")
print(f"   Content: {parent_results[0].page_content[:200]}...")

print("\n‚úÖ Parent retriever returns more context for the LLM!")

---

## üèÜ Part 7: Build Complete Advanced RAG Chain

Now let's put it all together into a production-ready RAG system!

In [None]:
class AdvancedRAG:
    """
    Advanced RAG system with:
    - Hybrid search (BM25 + Semantic)
    - Re-ranking with cross-encoder
    - Customizable retrieval
    """
    
    def __init__(self, documents, embeddings, llm, reranker):
        # Text splitting
        splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=50)
        self.chunks = splitter.split_documents(documents)
        
        # Create retrievers
        self.vectorstore = FAISS.from_documents(self.chunks, embeddings)
        self.semantic_retriever = self.vectorstore.as_retriever(search_kwargs={"k": 10})
        self.bm25_retriever = BM25Retriever.from_documents(self.chunks)
        self.bm25_retriever.k = 10
        
        # Hybrid retriever
        self.hybrid_retriever = EnsembleRetriever(
            retrievers=[self.bm25_retriever, self.semantic_retriever],
            weights=[0.4, 0.6]  # Slightly favor semantic
        )
        
        # Re-ranker and LLM
        self.reranker = reranker
        self.llm = llm
        
        # Prompt template
        self.prompt = PromptTemplate(
            template="""
You are a helpful assistant. Answer the question based ONLY on the following context.
If you cannot find the answer in the context, say "I don't have that information."

Context:
{context}

Question: {question}

Answer:""",
            input_variables=["context", "question"]
        )
        
        print("‚úÖ Advanced RAG system initialized!")
    
    def retrieve_and_rerank(self, query, top_k=3):
        """Retrieve documents and rerank them"""
        # Step 1: Hybrid retrieval
        docs = self.hybrid_retriever.invoke(query)
        
        # Step 2: Re-rank
        pairs = [[query, doc.page_content] for doc in docs]
        scores = self.reranker.predict(pairs)
        
        # Sort by score
        scored = sorted(zip(docs, scores), key=lambda x: x[1], reverse=True)
        
        return [doc for doc, score in scored[:top_k]]
    
    def answer(self, question):
        """Get answer using advanced RAG"""
        # Retrieve and rerank
        docs = self.retrieve_and_rerank(question, top_k=3)
        
        # Combine context
        context = "\n\n".join([doc.page_content for doc in docs])
        
        # Generate answer
        prompt = self.prompt.format(context=context, question=question)
        response = self.llm.invoke(prompt)
        
        return {
            "answer": response.content,
            "sources": [doc.metadata.get('source', 'unknown') for doc in docs]
        }

# Create Advanced RAG system
advanced_rag = AdvancedRAG(
    documents=sample_documents,
    embeddings=embeddings,
    llm=llm,
    reranker=reranker
)

In [None]:
# Test the Advanced RAG system!
def ask_advanced(question):
    print(f"\n‚ùì Question: {question}")
    print("=" * 60)
    
    result = advanced_rag.answer(question)
    
    print(f"\n‚úÖ Answer:\n{result['answer']}")
    print(f"\nüìö Sources: {', '.join(result['sources'])}")
    print("=" * 60)
    return result

In [None]:
# Test with various queries
ask_advanced("What is policy LP-2023-V2?")

In [None]:
ask_advanced("How do I reset my password and what are the requirements?")

In [None]:
ask_advanced("What health insurance options are available and their costs?")

In [None]:
ask_advanced("Tell me about the API rate limits and error codes")

---

## üéØ Part 8: Mini Challenge

### üèÜ Challenge: Customize Your Advanced RAG

**Your Tasks:**
1. Add a new document about a topic of your choice
2. Experiment with different hybrid search weights
3. Compare results with basic vs advanced RAG

**Hints:**
```python
# Try different weights:
# More keyword focus: weights=[0.7, 0.3]
# More semantic focus: weights=[0.3, 0.7]
```

**Bonus:** Try implementing query expansion manually!

In [None]:
# Your code here!
# Try customizing the Advanced RAG system

pass

---

## üìö Summary - What We Learned Today

### 1. Hybrid Search üîÄ
- Combines BM25 (keyword) + Semantic (embedding) search
- Best for diverse query types
- Use `EnsembleRetriever` in LangChain

### 2. Re-ranking üéØ
- Cross-encoders provide more accurate relevance scores
- Retrieve more, re-rank to top-k
- Improves precision significantly

### 3. Multi-Query Retrieval üîÑ
- LLM generates query variations
- Increases recall (finds more relevant docs)
- Great for vague or ambiguous queries

### 4. Parent Document Retriever üå≥
- Search on small chunks, return large context
- Best of both: precise search + rich context
- Use for complex documents

### 5. Scaling Strategies üìà
- Metadata filtering reduces search space
- ANN algorithms for large datasets
- Choose right vector DB for your scale

---

## üéØ Key Takeaways

‚úÖ **Hybrid search** handles both exact matches and semantic similarity

‚úÖ **Re-ranking** improves result quality with minimal latency cost

‚úÖ **Query transformation** helps with vague user inputs

‚úÖ **Start simple, add complexity as needed**

‚úÖ **Always evaluate** - measure retrieval AND generation quality

---

## üí° Pro Tips

1. **Hybrid weights** depend on your use case - experiment!
2. **Re-rank top 20-50** candidates, return top 3-5
3. **Use metadata filtering** before vector search for large datasets
4. **Monitor latency** - re-ranking adds overhead
5. **Test with real queries** from your users

---

## üöÄ Next Steps

**Day 3: Communication Skills**
- Presenting AI projects to stakeholders
- Technical storytelling

**Week Project:**
- Build a chatbot that answers from uploaded documents
- Apply all the techniques learned this week!

---

## üéâ Congratulations!

You now have **production-ready RAG skills**!

You can:
- ‚úÖ Implement hybrid search for robust retrieval
- ‚úÖ Add re-ranking for better precision
- ‚úÖ Use query transformation for better coverage
- ‚úÖ Build hierarchical document retrieval
- ‚úÖ Choose the right techniques for your use case

**Keep building and see you tomorrow! üöÄ**