# RAG for Document Quality Improvement

**Retrieval-Augmented Generation (RAG)** combines information retrieval with LLM generation to produce better outputs.

## The Problem

Our DocumentGenerator agent creates bid documents, but they're generic. The LLM doesn't know:
- What good procurement documents look like for our domain
- Common patterns in successful bids
- Domain-specific terminology and structure

**Result**: Documents are grammatically correct but lack the quality and specificity of human-written bids.

## The Solution: RAG

Instead of generating from scratch, we:
1. **Build a knowledge base** of high-quality example documents
2. **Retrieve relevant examples** similar to the current tender
3. **Augment the prompt** with these examples before generation

This teaches the LLM by example, dramatically improving output quality.

---

## Learning Objectives

By the end of this notebook, you'll understand:
1. How RAG works (theory ‚Üí practice)
2. Building and querying vector databases
3. Embeddings and similarity search
4. Prompt augmentation techniques
5. Measuring RAG impact on output quality

Let's dive in!

## Part 1: Understanding RAG

### Traditional Generation (No RAG)

```
User: "Generate a bid for cybersecurity tender"
       ‚Üì
LLM: Uses only its training data
       ‚Üì
Output: Generic security document
```

**Problem**: LLM has general knowledge but not your specific domain expertise.

### RAG-Enhanced Generation

```
User: "Generate a bid for cybersecurity tender"
       ‚Üì
1. Search knowledge base for similar tenders
       ‚Üì
2. Retrieve: 3 high-quality cybersecurity bid examples
       ‚Üì
3. Augment prompt with retrieved examples
       ‚Üì
LLM: "Here are examples of excellent bids... Now generate:"
       ‚Üì
Output: Domain-specific, high-quality document
```

**Key insight**: RAG gives the LLM access to your institutional knowledge.

### How Similarity Search Works

Question: How do we find "similar" documents?

Answer: **Embeddings** - convert text into numerical vectors that capture meaning.

```
Text: "AI cybersecurity threat detection"
  ‚Üì (embedding model)
Vector: [0.23, -0.45, 0.89, ..., 0.12]  (1536 dimensions)

Text: "ML-based security monitoring"
  ‚Üì (embedding model)
Vector: [0.25, -0.43, 0.91, ..., 0.15]  (similar!)

Text: "Office furniture procurement"
  ‚Üì (embedding model)
Vector: [-0.67, 0.12, -0.34, ..., 0.78]  (very different)
```

Similar concepts have similar vectors. We measure similarity using **cosine similarity**:
- 1.0 = identical
- 0.9+ = very similar
- 0.7-0.9 = somewhat similar
- <0.5 = unrelated

## Part 2: Setting Up the Environment

We'll use:
- **ChromaDB**: Simple, fast vector database (no server needed)
- **OpenAI Embeddings**: Convert text to vectors (via our LM Studio endpoint)
- **Existing agents**: Filter and DocumentGenerator

In [None]:
# Setup
import sys
sys.path.insert(0, '../src')

import asyncio
from typing import List, Dict
from procurement_ai.config import Config

# Initialize
config = Config()

print("‚úÖ RAG environment initialized!")
print(f"LLM Endpoint: {config.LLM_BASE_URL}")
print(f"Model: {config.LLM_MODEL}")

## Part 3: Building a Knowledge Base

First, let's create sample high-quality documents. In production, these would be real successful bids.

### What Makes a Good Knowledge Base Entry?

Each document should have:
1. **Content**: The actual high-quality text
2. **Metadata**: Category, success rate, date (for filtering)
3. **Context**: When/why this example is relevant

Let's create 5-6 examples covering different procurement categories.

In [None]:
# Sample knowledge base: high-quality bid excerpts
KNOWLEDGE_BASE = [
    {
        "id": "kb_001",
        "category": "cybersecurity",
        "title": "AI-Powered Threat Detection Implementation",
        "content": """## Executive Summary

Our proposed solution delivers enterprise-grade threat detection leveraging machine learning algorithms specifically tuned for government security requirements. The system processes 10M+ events daily, automatically categorizing threats by severity and triggering appropriate response protocols.

## Technical Approach

We employ a three-tier detection architecture:

1. **Real-time Pattern Recognition**: Neural network models trained on 5+ years of government sector threat data, achieving 99.2% accuracy with <0.1% false positive rate.

2. **Behavioral Analytics**: Anomaly detection using unsupervised learning to identify zero-day threats and insider risks before they escalate.

3. **Automated Response**: Integration with existing SIEM infrastructure, enabling immediate containment actions while alerting security teams.

## Implementation Timeline

Phase 1 (Weeks 1-4): Infrastructure setup and data integration
Phase 2 (Weeks 5-8): Model training and validation
Phase 3 (Weeks 9-12): Deployment and team training

## Compliance & Certification

Our team holds ISO 27001, SOC 2 Type II, and government clearance. All data processing occurs on-premises, ensuring complete data sovereignty.""",
        "success_rate": 0.95,
        "year": 2025
    },
    {
        "id": "kb_002",
        "category": "ai",
        "title": "Predictive Maintenance AI Platform",
        "content": """## Solution Overview

Our predictive maintenance platform reduces equipment downtime by 40% through advanced AI-driven failure prediction. The system monitors 200+ sensor parameters in real-time, predicting failures 72 hours in advance with 94% accuracy.

## Technical Architecture

Built on proven open-source frameworks (TensorFlow, FastAPI), ensuring full transparency and no vendor lock-in. The platform consists of:

**Data Pipeline**: Ingests sensor data at 1000 Hz, processes using Apache Kafka for real-time streaming

**ML Models**: Ensemble of LSTM networks and Random Forests, retrained weekly with new operational data

**Alert System**: Multi-channel notifications (email, SMS, dashboard) with configurable thresholds

## Business Value

Based on pilot deployments:
- 40% reduction in unplanned downtime
- 25% decrease in maintenance costs
- ROI achieved within 8 months

## Team Expertise

Our engineers have deployed AI systems in 15+ industrial facilities. Lead architect: PhD in ML from MIT, 10 years industry experience.""",
        "success_rate": 0.88,
        "year": 2025
    },
    {
        "id": "kb_003",
        "category": "software",
        "title": "Custom CRM System Development",
        "content": """## Project Scope

We will develop a cloud-native CRM system tailored to your organization's unique workflow, replacing legacy systems while preserving 15 years of customer data integrity.

## Development Methodology

**Agile with Weekly Sprints**: Client reviews every Friday, ensuring alignment and early issue detection

**Technology Stack**:
- Frontend: React with TypeScript for type safety
- Backend: Python FastAPI, PostgreSQL database
- Cloud: AWS with auto-scaling (handles 10x traffic spikes)
- Security: OAuth2, encryption at rest and in transit

## Migration Strategy

Zero data loss guaranteed:
1. Automated backup before each migration step
2. Parallel running (old and new systems) for 30 days
3. Validation: 100% data reconciliation before legacy shutdown

## Support & Maintenance

Year 1: Included in project cost (24/7 support)
Year 2+: Optional SLA packages (99.9% uptime guarantee)

## References

Similar CRM projects completed for [Client A] and [Client B], both reporting >30% productivity gains post-implementation.""",
        "success_rate": 0.92,
        "year": 2024
    },
    {
        "id": "kb_004",
        "category": "data_analytics",
        "title": "Business Intelligence Dashboard Suite",
        "content": """## Vision

Transform your raw operational data into actionable insights with real-time dashboards that executives actually use. Our BI platform consolidates 15 data sources into unified, drill-down visualizations.

## Key Features

**Executive Dashboard**: 6 KPIs updated every 5 minutes
- Revenue trends with forecasting
- Operational efficiency metrics
- Customer satisfaction tracking

**Departmental Views**: Customized for sales, operations, finance

**Mobile-First Design**: Full functionality on tablets and phones

## Technical Implementation

Built on Tableau/Power BI (client preference), with custom connectors for your ERP, CRM, and legacy systems.

**Data Pipeline**: ETL processes running hourly, validating data quality at each step

**Performance**: 2-second load times even with 5-year historical data

## Training & Adoption

3-day workshop for power users + recorded tutorials for all staff. Our track record: 85% adoption rate within first month.""",
        "success_rate": 0.87,
        "year": 2025
    },
    {
        "id": "kb_005",
        "category": "cloud",
        "title": "Cloud Migration and Infrastructure Modernization",
        "content": """## Migration Strategy

We execute low-risk, phased cloud migrations that maintain business continuity. No downtime for critical services.

## Assessment Phase (Weeks 1-2)

- Inventory all applications and dependencies
- Categorize by migration complexity (lift-and-shift vs. re-architecture)
- Identify cost optimization opportunities

## Migration Approach

**Wave 1**: Non-critical systems (build confidence)
**Wave 2**: Business applications (during low-usage windows)
**Wave 3**: Mission-critical services (with full rollback plans)

## Cloud Architecture

**Multi-AZ Deployment**: High availability across 3 availability zones
**Auto-Scaling**: Handles traffic spikes automatically, reduces costs during low usage
**Disaster Recovery**: RPO=1 hour, RTO=4 hours

## Cost Governance

Budget alerts, rightsizing recommendations, and reserved instance strategies typically reduce cloud spend by 30-40% compared to on-demand pricing.

## Security & Compliance

All cloud resources configured following CIS benchmarks. Continuous compliance monitoring via AWS Security Hub / Azure Security Center.""",
        "success_rate": 0.91,
        "year": 2025
    }
]

print(f"‚úÖ Knowledge base created with {len(KNOWLEDGE_BASE)} high-quality examples")
print("\nCategories:")
for doc in KNOWLEDGE_BASE:
    print(f"  - {doc['category']}: {doc['title']} (Success rate: {doc['success_rate']:.0%})")

## Part 4: Creating Embeddings

Now we convert each document into a vector (embedding). This lets us perform similarity search.

### What Are Embeddings?

Think of embeddings as "GPS coordinates" for meaning:
- Similar concepts are close together in vector space
- Unrelated concepts are far apart
- The model learned these relationships from massive text datasets

We'll use our local LLM's embedding endpoint (compatible with OpenAI API).

In [None]:
import httpx
import numpy as np
from typing import List

class EmbeddingService:
    """Simple wrapper for creating text embeddings"""
    
    def __init__(self, config: Config):
        self.config = config
        self.base_url = config.LLM_BASE_URL
        
    async def create_embedding(self, text: str) -> List[float]:
        """Create embedding for a single text"""
        async with httpx.AsyncClient(timeout=30.0) as client:
            response = await client.post(
                f"{self.base_url}/embeddings",
                json={
                    "input": text,
                    "model": self.config.LLM_MODEL
                }
            )
            response.raise_for_status()
            data = response.json()
            return data['data'][0]['embedding']
    
    async def create_embeddings(self, texts: List[str]) -> List[List[float]]:
        """Create embeddings for multiple texts"""
        # Process in batch for efficiency
        embeddings = []
        for text in texts:
            emb = await self.create_embedding(text)
            embeddings.append(emb)
        return embeddings

# Initialize
embedding_service = EmbeddingService(config)

# Test it
test_text = "AI-powered cybersecurity threat detection system"
test_embedding = await embedding_service.create_embedding(test_text)

print(f"‚úÖ Embedding service working!")
print(f"   Text: '{test_text}'")
print(f"   Embedding dimensions: {len(test_embedding)}")
print(f"   First 5 values: {test_embedding[:5]}")

### Understanding Cosine Similarity

To find similar documents, we compare their embeddings using cosine similarity:

```
similarity = dot_product(vec1, vec2) / (||vec1|| * ||vec2||)
```

Result is between -1 and 1:
- **1.0**: Identical meaning
- **0.9-1.0**: Very similar
- **0.7-0.9**: Somewhat related
- **<0.5**: Unrelated

Let's implement it:

In [None]:
def cosine_similarity(vec1: List[float], vec2: List[float]) -> float:
    """Calculate cosine similarity between two vectors"""
    vec1_np = np.array(vec1)
    vec2_np = np.array(vec2)
    
    dot_product = np.dot(vec1_np, vec2_np)
    norm1 = np.linalg.norm(vec1_np)
    norm2 = np.linalg.norm(vec2_np)
    
    return dot_product / (norm1 * norm2)

# Test similarity between different concepts
text1 = "AI cybersecurity threat detection"
text2 = "ML-based security monitoring system"
text3 = "Office furniture and interior design"

emb1 = await embedding_service.create_embedding(text1)
emb2 = await embedding_service.create_embedding(text2)
emb3 = await embedding_service.create_embedding(text3)

sim_1_2 = cosine_similarity(emb1, emb2)
sim_1_3 = cosine_similarity(emb1, emb3)

print("üìä Similarity Test:")
print(f"\n  '{text1}'")
print(f"  vs")
print(f"  '{text2}'")
print(f"  ‚Üí Similarity: {sim_1_2:.3f} (Very similar! ‚úÖ)\n")

print(f"  '{text1}'")
print(f"  vs")
print(f"  '{text3}'")
print(f"  ‚Üí Similarity: {sim_1_3:.3f} (Very different, as expected ‚úÖ)")

## Part 5: Building the Vector Store

Now let's store our knowledge base documents with their embeddings in ChromaDB.

### Why Use a Vector Database?

We could calculate similarity against every document manually, but:
- Slow with 1000+ documents
- Need to recompute every query

Vector databases like Chroma:
- Index embeddings for fast search (milliseconds)
- Handle filtering by metadata
- Persist data to disk

Let's set it up:

In [None]:
# Install ChromaDB if needed
# !pip install chromadb

import chromadb
from chromadb.config import Settings

# Initialize Chroma (in-memory for this notebook)
chroma_client = chromadb.Client(Settings(
    anonymized_telemetry=False,
    allow_reset=True
))

# Create or get collection
collection = chroma_client.get_or_create_collection(
    name="procurement_knowledge_base",
    metadata={"description": "High-quality bid examples for RAG"}
)

print("‚úÖ ChromaDB initialized")
print(f"   Collection: {collection.name}")
print(f"   Documents: {collection.count()}")

### Loading Documents into ChromaDB

For each document, we need:
1. Document text (what to retrieve)
2. Embedding (for similarity search)
3. Metadata (for filtering)
4. Unique ID

In [None]:
async def load_knowledge_base():
    """Load documents into ChromaDB"""
    
    print("üîÑ Creating embeddings for knowledge base...")
    print(f"   (This may take 30-60 seconds for {len(KNOWLEDGE_BASE)} documents)\n")
    
    for i, doc in enumerate(KNOWLEDGE_BASE):
        # Create embedding
        embedding = await embedding_service.create_embedding(doc['content'])
        
        # Add to ChromaDB
        collection.add(
            documents=[doc['content']],
            embeddings=[embedding],
            metadatas=[{
                "category": doc['category'],
                "title": doc['title'],
                "success_rate": doc['success_rate'],
                "year": doc['year']
            }],
            ids=[doc['id']]
        )
        
        print(f"   ‚úì {i+1}/{len(KNOWLEDGE_BASE)}: {doc['title']}")
    
    print(f"\n‚úÖ Knowledge base loaded: {collection.count()} documents")

# Load it
await load_knowledge_base()

## Part 6: Querying the Vector Store

Now for the magic: semantic search!

### How It Works

1. User has a tender (e.g., "AI cybersecurity project")
2. We embed the tender description
3. ChromaDB finds documents with similar embeddings
4. We get back the most relevant examples

Let's test it:

In [None]:
async def search_knowledge_base(query: str, n_results: int = 2):
    """Search for relevant documents"""
    
    # Create embedding for query
    query_embedding = await embedding_service.create_embedding(query)
    
    # Search
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=n_results
    )
    
    return results

# Test Query 1: Cybersecurity
query1 = "We need to implement AI-based threat detection for government security."
print(f"üîç Query: '{query1}'\n")

results1 = await search_knowledge_base(query1, n_results=2)

print("Top 2 matches:")
for i, (doc_id, metadata, distance) in enumerate(zip(
    results1['ids'][0],
    results1['metadatas'][0],
    results1['distances'][0]
)):
    print(f"\n{i+1}. {metadata['title']}")
    print(f"   Category: {metadata['category']}")
    print(f"   Similarity: {1 - distance:.3f}")
    print(f"   Content preview: {results1['documents'][0][i][:150]}...")

In [None]:
# Test Query 2: Software Development
query2 = "Custom CRM application development with cloud hosting"
print(f"üîç Query: '{query2}'\n")

results2 = await search_knowledge_base(query2, n_results=2)

print("Top 2 matches:")
for i, (metadata, distance) in enumerate(zip(
    results2['metadatas'][0],
    results2['distances'][0]
)):
    print(f"\n{i+1}. {metadata['title']}")
    print(f"   Category: {metadata['category']}")
    print(f"   Similarity: {1 - distance:.3f}")

### Observations

Notice how:
- Cybersecurity query ‚Üí Cybersecurity document has highest similarity
- CRM query ‚Üí Software/Cloud documents rank highest
- The model understands semantic relationships (not just keyword matching)

This is the power of embeddings!

## Part 7: RAG-Enhanced Document Generation

Now let's put it all together: use retrieved examples to improve document generation.

### The RAG Pipeline

```python
def generate_with_rag(tender):
    # 1. Retrieve relevant examples
    examples = search_knowledge_base(tender.description)
    
    # 2. Augment prompt
    prompt = f"""
    Here are examples of excellent bids:
    {examples}
    
    Now generate a bid for:
    {tender}
    """
    
    # 3. Generate
    return llm.generate(prompt)
```

Let's implement it:

In [None]:
from procurement_ai.services.llm import LLMService

llm = LLMService(config)

async def generate_without_rag(tender_description: str) -> str:
    """Baseline: Generate without RAG"""
    
    prompt = f"""Generate a professional bid document for this tender:

{tender_description}

Include:
- Executive summary
- Technical approach
- Timeline
- Team qualifications

Keep it concise (300-400 words)."""
    
    response = await llm.generate(
        prompt=prompt,
        temperature=0.7,
        max_tokens=500
    )
    
    return response

async def generate_with_rag(tender_description: str, n_examples: int = 2) -> str:
    """Enhanced: Generate with RAG"""
    
    # 1. Retrieve relevant examples
    results = await search_knowledge_base(tender_description, n_results=n_examples)
    
    # 2. Format examples
    examples_text = ""
    for i, (doc, meta) in enumerate(zip(results['documents'][0], results['metadatas'][0])):
        examples_text += f"\n### Example {i+1}: {meta['title']}\n{doc}\n"
    
    # 3. Augmented prompt
    prompt = f"""You are writing a bid document. First, study these examples of excellent bids:

---
{examples_text}
---

Now, using the same level of professionalism and detail as the examples above, generate a bid for:

{tender_description}

Include:
- Executive summary
- Technical approach
- Timeline
- Team qualifications

Match the quality and structure of the example documents. Keep it concise (300-400 words)."""
    
    response = await llm.generate(
        prompt=prompt,
        temperature=0.7,
        max_tokens=500
    )
    
    return response

print("‚úÖ RAG generation pipeline ready!")

## Part 8: Comparing Results (Before vs. After RAG)

Let's test both approaches and compare quality.

In [None]:
# Test tender
test_tender = """Healthcare AI Advisory System

National Health Service requires an AI-powered system to provide real-time clinical decision support. 
The system should analyze patient data, suggest diagnoses, and recommend treatment plans while 
maintaining full compliance with medical privacy regulations.

Budget: ‚Ç¨1.5M
Timeline: 12 months
Requirements: ISO 27001, HIPAA compliance"""

print("üè• Test Tender:")
print(test_tender)
print("\n" + "="*80 + "\n")

In [None]:
# Generate WITHOUT RAG
print("üìÑ Generating WITHOUT RAG...\n")
doc_without_rag = await generate_without_rag(test_tender)

print("RESULT (No RAG):")
print("="*80)
print(doc_without_rag)
print("="*80)

In [None]:
# Generate WITH RAG
print("üìÑ Generating WITH RAG...\n")
doc_with_rag = await generate_with_rag(test_tender)

print("RESULT (With RAG):")
print("="*80)
print(doc_with_rag)
print("="*80)

### Quality Comparison

Look for these improvements in the RAG version:

**Structure**:
- ‚ùì Does it follow the example format?
- ‚ùì Are sections better organized?

**Technical Detail**:
- ‚ùì More specific technical approaches?
- ‚ùì Concrete numbers and metrics?

**Professionalism**:
- ‚ùì Sounds more like a real bid?
- ‚ùì Addresses compliance explicitly?

**Persuasiveness**:
- ‚ùì Mentions relevant experience?
- ‚ùì Provides reassurance on key concerns?

In most cases, RAG versions show significant quality improvement!

## Part 9: Measuring RAG Impact

We can't just eyeball quality. Let's measure improvement objectively.

### Metrics to Track

1. **Retrieval Quality**: Are we finding relevant examples?
   - Measure: Average similarity score of retrieved docs
   - Good: >0.75 similarity

2. **Generation Quality**: Is the output better?
   - Use LLM-as-judge to rate documents on 1-10 scale
   - Compare: with-RAG vs. without-RAG

3. **Relevance**: Does generated content incorporate retrieved examples?
   - Check for similar terminology/structure

Let's implement basic measurement:

In [None]:
async def evaluate_rag_quality(tender_description: str, n_tests: int = 3):
    """Evaluate RAG improvements"""
    
    print(f"üî¨ Running RAG evaluation on {n_tests} test cases...\n")
    
    test_cases = [
        "AI-powered cybersecurity threat detection for government systems",
        "Custom CRM software development with cloud hosting",
        "Predictive maintenance AI platform for industrial equipment"
    ][:n_tests]
    
    results = []
    
    for i, test_case in enumerate(test_cases, 1):
        print(f"Test {i}/{n_tests}: {test_case[:60]}...")
        
        # Check retrieval quality
        search_results = await search_knowledge_base(test_case, n_results=2)
        avg_similarity = sum(1 - d for d in search_results['distances'][0]) / len(search_results['distances'][0])
        
        results.append({
            'case': test_case,
            'retrieval_similarity': avg_similarity,
            'retrieved_categories': [m['category'] for m in search_results['metadatas'][0]]
        })
        
        print(f"  ‚Üí Retrieval similarity: {avg_similarity:.3f}")
        print(f"  ‚Üí Retrieved: {', '.join(results[-1]['retrieved_categories'])}\n")
    
    # Summary
    avg_ret = sum(r['retrieval_similarity'] for r in results) / len(results)
    
    print("\nüìä Results:")
    print(f"  Average retrieval similarity: {avg_ret:.3f}")
    if avg_ret > 0.75:
        print("  ‚úÖ Excellent retrieval quality!")
    elif avg_ret > 0.65:
        print("  ‚úì Good retrieval quality")
    else:
        print("  ‚ö†Ô∏è  Retrieval quality needs improvement")
    
    return results

eval_results = await evaluate_rag_quality()

## Part 10: Advanced RAG Techniques (Optional)

Our basic RAG is working! Here are enhancements for production:

### 1. Metadata Filtering

Retrieve only from specific categories:

```python
results = collection.query(
    query_embeddings=[query_embedding],
    n_results=3,
    where={"category": "cybersecurity"}  # Filter!
)
```

### 2. Hybrid Search

Combine semantic search with keyword matching:
- Semantic: Finds conceptually similar docs
- Keyword: Ensures specific terms are present

### 3. Re-ranking

After retrieval, re-rank by:
- Recency (prefer newer examples)
- Success rate (prefer proven approaches)
- Specificity (prefer detailed over generic)

### 4. Dynamic k

Retrieve variable numbers of examples:
- Complex tender ‚Üí More examples (k=3-4)
- Simple tender ‚Üí Fewer examples (k=1-2)

### 5. Chunk Optimization

Instead of whole documents:
- Split into sections (executive summary, technical approach, etc.)
- Retrieve most relevant sections from different documents
- Mix and match best parts

## Summary: What We Learned

### RAG Workflow

1. **Build Knowledge Base**: Collect high-quality example documents
2. **Create Embeddings**: Convert text to vectors using embedding models
3. **Store in Vector DB**: Use ChromaDB for efficient similarity search
4. **Retrieve**: Find relevant examples for each new request
5. **Augment**: Add examples to LLM prompt
6. **Generate**: LLM produces better output informed by examples

### Key Benefits

‚úÖ **Quality Improvement**: Documents match professional standards  
‚úÖ **Domain Adaptation**: LLM learns your specific patterns  
‚úÖ **Consistency**: Examples ensure similar structure/tone  
‚úÖ **Reduced Hallucination**: Grounded in real examples  

### When to Use RAG

RAG is powerful when:
- You have high-quality example documents
- Output quality matters (not just correctness)
- Domain-specific knowledge is needed
- You want to incorporate institutional knowledge

### Next Steps

Now that you understand RAG, we'll:
1. Integrate it into the DocumentGenerator agent
2. Build a knowledge base management system
3. Add evaluation to measure RAG impact
4. Deploy to production

---

## Exercises

Try these to deepen your understanding:

1. **Add more examples**: Create 2-3 more knowledge base entries in different categories

2. **Test edge cases**: What happens when you query about a category not in the KB?

3. **Experiment with k**: Try n_results=1, 2, 3, 4. How does it affect output?

4. **Metadata filtering**: Modify search to only retrieve from one category

5. **Quality comparison**: Generate 3 documents (with/without RAG) and compare side-by-side

---

*This notebook demonstrated RAG fundamentals. In the next phase, we'll build production-ready RAG infrastructure for the procurement system.*