# Module 06 - Notebook 01: RAG Fundamentals

## Learning Objectives
- Understand what RAG (Retrieval-Augmented Generation) is
- Compare RAG vs fine-tuning
- Learn RAG architecture and components
- Build your first simple RAG system

---

## 1. What is RAG?

**Retrieval-Augmented Generation (RAG)** combines:
- **Retrieval**: Finding relevant information from a knowledge base
- **Generation**: Using an LLM to generate answers based on retrieved context

### The RAG Pipeline:
```
User Query
    â†“
1. Retrieve relevant documents (vector search)
    â†“
2. Construct prompt with context
    â†“
3. Generate answer with LLM
    â†“
Answer + Sources
```

### Why RAG?
- âœ… Use **your own data** without retraining
- âœ… **Up-to-date** information (just update the knowledge base)
- âœ… **Citeable sources** for transparency
- âœ… **Lower cost** than fine-tuning
- âœ… **Quick to implement** and iterate

## 2. RAG vs Fine-Tuning

| Aspect | RAG | Fine-Tuning |
|--------|-----|-------------|
| **Setup Time** | Hours | Days/Weeks |
| **Cost** | Low (API + storage) | High (training compute) |
| **Updates** | Add to knowledge base | Retrain model |
| **Sources** | Citable | No citations |
| **Use Case** | Q&A, documentation, support | Style, tone, specialized tasks |
| **Flexibility** | Very flexible | Fixed after training |

### When to Use RAG:
- Frequently changing information
- Need source citations
- Multiple knowledge domains
- Quick deployment needed

### When to Fine-Tune:
- Specific writing style needed
- Domain-specific language
- Lower latency required
- Static knowledge base

## 3. Setup

In [None]:
!pip install -q openai chromadb sentence-transformers python-dotenv

In [None]:
import os
from openai import OpenAI
import chromadb
from sentence_transformers import SentenceTransformer
from dotenv import load_dotenv

load_dotenv()
openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
chroma_client = chromadb.Client()
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

print("âœ“ All systems ready for RAG!")

## 4. Building Your First RAG System

In [None]:
class SimpleRAG:
    """A minimal RAG system to understand the basics."""
    
    def __init__(self):
        self.openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        self.chroma_client = chromadb.Client()
        self.collection = self.chroma_client.create_collection("simple_rag")
        self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
    
    def add_documents(self, documents: list):
        """Add documents to the knowledge base."""
        # Generate embeddings
        embeddings = self.embedding_model.encode(documents).tolist()
        
        # Store in vector DB
        self.collection.add(
            documents=documents,
            embeddings=embeddings,
            ids=[f"doc_{i}" for i in range(len(documents))]
        )
        
        print(f"âœ“ Added {len(documents)} documents")
    
    def retrieve(self, query: str, n_results: int = 3) -> list:
        """Retrieve relevant documents."""
        # Generate query embedding
        query_embedding = self.embedding_model.encode([query])[0].tolist()
        
        # Search
        results = self.collection.query(
            query_embeddings=[query_embedding],
            n_results=n_results
        )
        
        return results['documents'][0]
    
    def generate(self, query: str, context: list) -> str:
        """Generate answer using retrieved context."""
        # Construct prompt
        context_text = "\n\n".join(context)
        
        prompt = f"""Answer the question based on the context below. If the answer is not in the context, say "I don't have enough information to answer that."

Context:
{context_text}

Question: {query}

Answer:"""
        
        # Generate with LLM
        response = self.openai_client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.7,
            max_tokens=200
        )
        
        return response.choices[0].message.content
    
    def query(self, question: str) -> dict:
        """Complete RAG: retrieve + generate."""
        # Step 1: Retrieve
        context = self.retrieve(question)
        
        # Step 2: Generate
        answer = self.generate(question, context)
        
        return {
            "question": question,
            "answer": answer,
            "sources": context
        }

# Demo the RAG system
rag = SimpleRAG()

# Add knowledge base
knowledge = [
    "Python was created by Guido van Rossum and released in 1991.",
    "Python is known for its simple and readable syntax.",
    "Machine learning is a subset of artificial intelligence.",
    "Neural networks are inspired by biological neurons in the brain.",
    "The pandas library is used for data analysis in Python.",
    "TensorFlow and PyTorch are popular deep learning frameworks."
]

rag.add_documents(knowledge)

# Ask questions
questions = [
    "Who created Python?",
    "What is machine learning?",
    "What is the capital of France?"  # Not in knowledge base
]

print("\n" + "="*60 + "\n")
for q in questions:
    result = rag.query(q)
    print(f"Q: {result['question']}")
    print(f"A: {result['answer']}")
    print(f"\nSources:")
    for i, source in enumerate(result['sources'][:2], 1):
        print(f"  {i}. {source}")
    print("\n" + "="*60 + "\n")

## 5. Understanding the RAG Flow

In [None]:
# Let's trace through a single query
test_query = "What frameworks are used for deep learning?"

print("Step 1: Retrieve relevant context")
context_docs = rag.retrieve(test_query, n_results=2)
print(f"Found {len(context_docs)} relevant documents:")
for i, doc in enumerate(context_docs, 1):
    print(f"  {i}. {doc}")

print("\nStep 2: Construct prompt with context")
context_text = "\n".join(context_docs)
prompt = f"""Context: {context_text}

Question: {test_query}

Answer based on the context:"""
print(prompt)

print("\nStep 3: Generate answer")
answer = rag.generate(test_query, context_docs)
print(f"Answer: {answer}")

## 6. RAG Challenges and Limitations

In [None]:
# Challenge 1: No relevant context
print("Challenge 1: Question outside knowledge base")
result = rag.query("What is quantum mechanics?")
print(f"Q: {result['question']}")
print(f"A: {result['answer']}")
print("Notice: System should admit it doesn't know\n")

# Challenge 2: Contradictory information
print("\nChallenge 2: Multiple documents")
# Add potentially conflicting info
rag.add_documents([
    "Python version 2 was released in 2000.",
    "Python 3 was released in 2008."
])
result = rag.query("When was Python released?")
print(f"Q: {result['question']}")
print(f"A: {result['answer']}")
print("Notice: May need to clarify which version\n")

# Challenge 3: Context size limits
print("\nChallenge 3: Too much context")
print("LLMs have token limits (e.g., 4096, 8192, 128k)")
print("Must balance: More context vs token limits")

## 7. Improving RAG Quality

In [None]:
class ImprovedRAG(SimpleRAG):
    """RAG with quality improvements."""
    
    def generate(self, query: str, context: list) -> str:
        """Enhanced generation with better prompting."""
        context_text = "\n\n".join([
            f"[Source {i+1}]: {doc}" 
            for i, doc in enumerate(context)
        ])
        
        # Improved prompt
        prompt = f"""You are a helpful assistant that answers questions based on provided context.

IMPORTANT INSTRUCTIONS:
1. Answer ONLY using information from the context below
2. If the answer is not in the context, say "I don't have enough information"
3. Cite your sources using [Source N] notation
4. Be concise and accurate

Context:
{context_text}

Question: {query}

Answer (with source citations):"""
        
        response = self.openai_client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3,  # Lower temp for factual accuracy
            max_tokens=300
        )
        
        return response.choices[0].message.content

# Test improved RAG
improved_rag = ImprovedRAG()
improved_rag.add_documents(knowledge)

result = improved_rag.query("Tell me about Python")
print("Improved RAG Response:")
print(result['answer'])
print("\nNotice: Should include [Source N] citations!")

## 8. Exercise: Enhance the RAG System

Improve the SimpleRAG with additional features.

In [None]:
# TODO: Complete this exercise
class EnhancedRAG(SimpleRAG):
    """
    Enhanced RAG with:
    - Confidence scoring
    - Query preprocessing
    - Response validation
    - Better error handling
    """
    
    def preprocess_query(self, query: str) -> str:
        """Clean and enhance the query."""
        # TODO: Implement
        # - Remove extra whitespace
        # - Expand abbreviations
        # - Add context if needed
        pass
    
    def assess_confidence(self, query: str, context: list) -> float:
        """Estimate confidence in the answer."""
        # TODO: Implement
        # - Check context relevance
        # - Measure context overlap
        # - Return 0-1 confidence score
        pass
    
    def query(self, question: str) -> dict:
        """Enhanced query with preprocessing and confidence."""
        # TODO: Implement
        # - Preprocess query
        # - Retrieve context
        # - Assess confidence
        # - Generate answer
        # - Return result with confidence
        pass

# Test your implementation
# enhanced = EnhancedRAG()
# enhanced.add_documents(knowledge)
# result = enhanced.query("Your question")
# print(f"Answer: {result['answer']}")
# print(f"Confidence: {result['confidence']}")

## Summary

You learned:
- âœ… What RAG is and why it's useful
- âœ… RAG vs fine-tuning tradeoffs
- âœ… Basic RAG architecture
- âœ… Building a simple RAG system
- âœ… Common challenges and solutions

## Key Takeaways

1. **RAG = Retrieve + Generate**: Find relevant info, then generate with LLM
2. **Better than raw LLM**: Provides current, specific, citable information
3. **Easier than fine-tuning**: Quick to build, easy to update
4. **Quality matters**: Good retrieval â†’ good answers
5. **Prompt engineering**: Critical for RAG quality

## Next Steps
- ðŸ“˜ Notebook 02: Document Processing Pipeline
- ðŸ”— Read about [RAG patterns](https://www.pinecone.io/learn/retrieval-augmented-generation/)