# RAG (Retrieval Augmented Generation) - From Scratch

This notebook connects:
1. **Data** â†’ Prepare and chunk documents
2. **Embeddings** â†’ Convert text to vectors (Day 1)
3. **LLMs** â†’ Generate intelligent responses (Day 2)
4. **RAG** â†’ Combine them to answer questions using your own data

## What is RAG?

**Problem:** LLMs don't know about:
- Your company's internal documents
- Recent events (after training cutoff)
- Your personal data

**Solution:** RAG = Retrieval + Generation
1. **Retrieve** relevant documents using embeddings (semantic search)
2. **Augment** the prompt with retrieved context
3. **Generate** response using LLM with context

## Production Code Practices

We'll use:
- **Functions** - Reusable, testable code blocks
- **Classes** - When we need to maintain state (like a RAG system)
- **Type hints** - Makes code more readable and catches errors
- **Docstrings** - Explains what functions do

**Why?** In real companies, code needs to be maintainable, reusable, and understandable by teams.

---

# Setup

In [None]:
# Install required packages (uncomment if needed)
# !pip install openai google-genai anthropic sentence-transformers chromadb python-dotenv numpy

In [1]:
from dotenv import load_dotenv
import os
from typing import List, Dict, Tuple
import numpy as np

# Load environment variables
load_dotenv()

print("âœ… Environment loaded")

âœ… Environment loaded


---

# Part 1: Understanding RAG Components

Let's build RAG step-by-step, starting with simple examples.

## Step 1: Sample Data (Your Knowledge Base)

In production, this could be:
- Company documents (PDFs, docs)
- Customer support tickets
- Product documentation
- Database records

For learning, we'll use simple text chunks.

In [2]:
# Sample knowledge base about AI/ML topics
knowledge_base = [
    "Machine learning is a subset of artificial intelligence that enables computers to learn from data without explicit programming. It uses algorithms to identify patterns and make predictions.",
    
    "Deep learning is a type of machine learning that uses neural networks with multiple layers. It's particularly effective for image recognition, natural language processing, and complex pattern recognition tasks.",
    
    "Natural Language Processing (NLP) is a field of AI that focuses on the interaction between computers and human language. It enables machines to understand, interpret, and generate human language.",
    
    "Embeddings are numerical representations of text that capture semantic meaning. Similar texts have similar embedding vectors, which enables semantic search and similarity comparison.",
    
    "RAG (Retrieval Augmented Generation) combines information retrieval with text generation. It retrieves relevant context from a knowledge base and uses it to generate more accurate and informed responses.",
    
    "OpenAI's GPT models are large language models trained on diverse internet text. They can perform various tasks like text generation, summarization, translation, and question answering.",
    
    "Vector databases store embeddings and enable fast similarity search. Popular options include Chroma, Pinecone, Weaviate, and FAISS. They're essential for production RAG systems.",
    
    "Fine-tuning is the process of adapting a pre-trained model to a specific task by training it on domain-specific data. It's useful when you need specialized behavior beyond what prompting can achieve."
]

print(f"Knowledge base has {len(knowledge_base)} documents")
print(f"\nExample document:\n{knowledge_base[0]}")

Knowledge base has 8 documents

Example document:
Machine learning is a subset of artificial intelligence that enables computers to learn from data without explicit programming. It uses algorithms to identify patterns and make predictions.


## Step 2: Create Embeddings

**Why a function?** We'll need to embed both:
- The knowledge base (once, at setup)
- User queries (every time they ask a question)

Using a function avoids code duplication.

In [7]:
from sentence_transformers import SentenceTransformer

def create_embeddings(texts: List[str], model_name: str = "all-MiniLM-L6-v2") -> np.ndarray:
    """
    Convert texts to embedding vectors.
    
    Args:
        texts: List of text strings to embed
        model_name: Name of the sentence transformer model
        
    Returns:
        NumPy array of embeddings (shape: [num_texts, embedding_dim])
        
    Why this is useful:
        - Reusable for both knowledge base and queries
        - Easy to test independently
        - Can swap embedding models easily
    """
    model = SentenceTransformer(model_name)
    embeddings = model.encode(texts)
    return embeddings

# Create embeddings for our knowledge base
kb_embeddings = create_embeddings(knowledge_base)

print(f"Created embeddings with shape: {kb_embeddings.shape}")
print(f"Each document is represented as a {kb_embeddings.shape[1]}-dimensional vector")

Created embeddings with shape: (8, 384)
Each document is represented as a 384-dimensional vector


## Step 3: Semantic Search (Retrieval)

Find the most relevant documents for a query using cosine similarity.

**Why a function?** This is the core retrieval logic that we'll use repeatedly.

In [12]:
from sklearn.metrics.pairwise import cosine_similarity

def retrieve_relevant_docs(
    query: str,
    knowledge_base: List[str],
    kb_embeddings: np.ndarray,
    top_k: int = 3
) -> List[Tuple[str, float]]:
    """
    Retrieve the most relevant documents for a query.
    
    Args:
        query: User's question
        knowledge_base: List of document texts
        kb_embeddings: Pre-computed embeddings of knowledge base
        top_k: Number of documents to retrieve
        
    Returns:
        List of (document_text, similarity_score) tuples
        
    How it works:
        1. Embed the query
        2. Calculate similarity with all documents
        3. Return top-k most similar
    """
    # Embed the query
    query_embedding = create_embeddings([query])
    
    # Calculate similarity scores
    similarities = cosine_similarity(query_embedding, kb_embeddings)[0]
    
    # Get top-k indices
    top_indices = np.argsort(similarities)[::-1][:top_k]  # Sort from lowest to highest: [1, 3, 2, 0, ...] (positions) # [::-1] â†’ Reverse to highest first: 
    
    # Return documents with scores
    results = [(knowledge_base[i], similarities[i]) for i in top_indices]
    return results

# Test retrieval
query = "What is deep learning?"
relevant_docs = retrieve_relevant_docs(query, knowledge_base, kb_embeddings, top_k=3)

print(f"Query: {query}\n")
for i, (doc, score) in enumerate(relevant_docs, 1):
    print(f"Result {i} (similarity: {score:.4f}):")
    print(f"{doc}\n")

Query: What is deep learning?

Result 1 (similarity: 0.8377):
Deep learning is a type of machine learning that uses neural networks with multiple layers. It's particularly effective for image recognition, natural language processing, and complex pattern recognition tasks.

Result 2 (similarity: 0.5488):
Machine learning is a subset of artificial intelligence that enables computers to learn from data without explicit programming. It uses algorithms to identify patterns and make predictions.

Result 3 (similarity: 0.3874):
Natural Language Processing (NLP) is a field of AI that focuses on the interaction between computers and human language. It enables machines to understand, interpret, and generate human language.



## Step 4: Generate Answer with LLM

Now we use the retrieved context to generate an informed answer.

**Why a function?** Clean separation of concerns - retrieval vs generation.

In [None]:
from openai import OpenAI

def generate_answer(
    query: str,
    context_docs: List[Tuple[str, float]],
    model: str = "gpt-4o-mini"
) -> Dict[str, any]:
    """
    Generate an answer using retrieved context.
    
    Args:
        query: User's question
        context_docs: Retrieved documents with similarity scores
        model: OpenAI model to use
        
    Returns:
        Dictionary with answer and metadata
        
    Production tip:
        Return metadata (tokens, sources) for debugging and cost tracking
    """
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    
    # Build context from retrieved documents
    context = "\n\n".join([doc for doc, _ in context_docs])
    
    # Create prompt with context
    system_prompt = """You are a helpful AI assistant. Answer the user's question based on the provided context. 
If the context doesn't contain relevant information, say so rather than making up an answer."""
    
    user_prompt = f"""Context:
{context}

Question: {query}

Answer based on the context above:"""
    
    # Generate response
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0.7,
        max_tokens=300
    )
    
    return {
        "answer": response.choices[0].message.content,
        "tokens_used": response.usage.total_tokens,
        "sources": [doc for doc, _ in context_docs],
        "similarity_scores": [score for _, score in context_docs]
    }

# Test RAG pipeline
query = "What is deep learning and what is it good for?"
relevant_docs = retrieve_relevant_docs(query, knowledge_base, kb_embeddings, top_k=2)
result = generate_answer(query, relevant_docs)

print(f"Query: {query}\n")
print(f"Answer:\n{result['answer']}\n")
print(f"Tokens used: {result['tokens_used']}")
print(f"\nSources used (similarity scores):")
for i, (source, score) in enumerate(zip(result['sources'], result['similarity_scores']), 1):
    print(f"{i}. [{score:.3f}] {source[:80]}...")

---

# Part 2: Production-Ready RAG System (Class-Based)

## Why Use a Class?

**Problem with functions:** We keep passing the same data around:
- `knowledge_base`
- `kb_embeddings`
- Model configurations

**Solution:** A class keeps related data and functions together.

**Benefits:**
- **State management** - Store knowledge base, embeddings once
- **Cleaner code** - No need to pass same parameters repeatedly
- **Reusable** - Create multiple RAG systems with different configs
- **Production-ready** - Easy to test, maintain, and extend

In [None]:
class SimpleRAG:
    """
    A production-quality RAG system.
    
    This class encapsulates the entire RAG pipeline:
    - Document storage and embedding
    - Semantic search/retrieval
    - Answer generation
    
    Why use a class?
    - Maintains state (knowledge base, embeddings)
    - Provides a clean API (add_documents, query)
    - Easy to configure and reuse
    """
    
    def __init__(
        self,
        embedding_model: str = "all-MiniLM-L6-v2",
        llm_model: str = "gpt-4o-mini",
        top_k: int = 3
    ):
        """
        Initialize the RAG system.
        
        Args:
            embedding_model: Sentence transformer model name
            llm_model: OpenAI model for generation
            top_k: Number of documents to retrieve
        """
        self.embedding_model = SentenceTransformer(embedding_model)
        self.llm_model = llm_model
        self.top_k = top_k
        self.openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
        
        # State: stores documents and their embeddings
        self.documents: List[str] = []
        self.embeddings: np.ndarray = None
    
    def add_documents(self, documents: List[str]) -> None:
        """
        Add documents to the knowledge base.
        
        Args:
            documents: List of text documents to add
            
        In production:
            - This might load from database/files
            - Could handle incremental updates
            - Might include document metadata
        """
        self.documents = documents
        print(f"Embedding {len(documents)} documents...")
        self.embeddings = self.embedding_model.encode(documents)
        print(f"âœ… {len(documents)} documents indexed")
    
    def retrieve(self, query: str) -> List[Tuple[str, float]]:
        """
        Retrieve most relevant documents for a query.
        
        Args:
            query: User's question
            
        Returns:
            List of (document, similarity_score) tuples
        """
        if self.embeddings is None:
            raise ValueError("No documents added. Call add_documents() first.")
        
        # Embed query
        query_embedding = self.embedding_model.encode([query])
        
        # Calculate similarities
        similarities = cosine_similarity(query_embedding, self.embeddings)[0]
        
        # Get top-k
        top_indices = np.argsort(similarities)[::-1][:self.top_k]
        
        return [(self.documents[i], similarities[i]) for i in top_indices]
    
    def generate(self, query: str, context_docs: List[Tuple[str, float]]) -> Dict:
        """
        Generate answer using retrieved context.
        
        Args:
            query: User's question
            context_docs: Retrieved documents with scores
            
        Returns:
            Dictionary with answer and metadata
        """
        # Build context
        context = "\n\n".join([doc for doc, _ in context_docs])
        
        # Create prompt
        system_prompt = """You are a helpful AI assistant. Answer questions based on the provided context.
If the context doesn't contain enough information, acknowledge this limitation."""
        
        user_prompt = f"""Context:
{context}

Question: {query}

Answer:"""
        
        # Generate
        response = self.openai_client.chat.completions.create(
            model=self.llm_model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            temperature=0.7,
            max_tokens=300
        )
        
        return {
            "answer": response.choices[0].message.content,
            "tokens": response.usage.total_tokens,
            "sources": context_docs
        }
    
    def query(self, question: str, return_sources: bool = True) -> Dict:
        """
        Main method: Query the RAG system.
        
        This is the public API - simple to use!
        
        Args:
            question: User's question
            return_sources: Whether to include source documents
            
        Returns:
            Dictionary with answer and optional metadata
        """
        # Step 1: Retrieve
        relevant_docs = self.retrieve(question)
        
        # Step 2: Generate
        result = self.generate(question, relevant_docs)
        
        # Format response
        response = {
            "question": question,
            "answer": result["answer"],
            "tokens_used": result["tokens"]
        }
        
        if return_sources:
            response["sources"] = [
                {"text": doc, "similarity": float(score)}
                for doc, score in result["sources"]
            ]
        
        return response

print("âœ… SimpleRAG class defined")

## Using the RAG System

See how clean the API is now!

In [None]:
# Initialize RAG system
rag = SimpleRAG(
    embedding_model="all-MiniLM-L6-v2",
    llm_model="gpt-4o-mini",
    top_k=3
)

# Add knowledge base
rag.add_documents(knowledge_base)

# Query the system
result = rag.query("What is RAG and why is it useful?")

print(f"Question: {result['question']}\n")
print(f"Answer:\n{result['answer']}\n")
print(f"Tokens used: {result['tokens_used']}\n")
print("Sources:")
for i, source in enumerate(result['sources'], 1):
    print(f"{i}. [Similarity: {source['similarity']:.3f}]")
    print(f"   {source['text'][:100]}...\n")

---

# Part 3: Multiple Queries Demo

Let's test different types of questions to see how RAG performs.

In [None]:
questions = [
    "What is the difference between machine learning and deep learning?",
    "What are embeddings used for?",
    "Which vector databases are mentioned?",
    "What is fine-tuning?"
]

for i, question in enumerate(questions, 1):
    print(f"\n{'='*80}")
    print(f"Question {i}: {question}")
    print('='*80)
    
    result = rag.query(question, return_sources=False)
    print(f"\nAnswer:\n{result['answer']}")
    print(f"\nTokens: {result['tokens_used']}")

---

# Part 4: RAG with Different LLM Providers

Let's extend our class to support multiple LLM providers (OpenAI, Gemini, Claude).

In [None]:
from google import genai
from google.genai import types
from anthropic import Anthropic

class MultiProviderRAG(SimpleRAG):
    """
    Extended RAG system supporting multiple LLM providers.
    
    Why extend the class?
    - Reuses all retrieval logic from SimpleRAG
    - Only adds multi-provider support
    - Demonstrates Object-Oriented Programming (inheritance)
    """
    
    def __init__(
        self,
        provider: str = "openai",  # "openai", "gemini", or "claude"
        embedding_model: str = "all-MiniLM-L6-v2",
        llm_model: str = None,
        top_k: int = 3
    ):
        """
        Initialize multi-provider RAG.
        
        Args:
            provider: Which LLM provider to use
            embedding_model: Sentence transformer model
            llm_model: Provider-specific model (auto-set if None)
            top_k: Number of documents to retrieve
        """
        # Set default models per provider
        default_models = {
            "openai": "gpt-4o-mini",
            "gemini": "gemini-2.5-flash",
            "claude": "claude-3-5-haiku-20241022"
        }
        
        self.provider = provider
        llm_model = llm_model or default_models[provider]
        
        # Initialize parent class
        super().__init__(embedding_model, llm_model, top_k)
        
        # Initialize provider-specific clients
        if provider == "gemini":
            self.gemini_client = genai.Client(api_key=os.environ["GOOGLE_API_KEY"])
        elif provider == "claude":
            self.claude_client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
    
    def generate(self, query: str, context_docs: List[Tuple[str, float]]) -> Dict:
        """
        Generate answer using the configured provider.
        
        This overrides the parent method to support multiple providers.
        """
        context = "\n\n".join([doc for doc, _ in context_docs])
        
        if self.provider == "openai":
            return self._generate_openai(query, context)
        elif self.provider == "gemini":
            return self._generate_gemini(query, context)
        elif self.provider == "claude":
            return self._generate_claude(query, context)
    
    def _generate_openai(self, query: str, context: str) -> Dict:
        """Generate using OpenAI."""
        response = self.openai_client.chat.completions.create(
            model=self.llm_model,
            messages=[
                {"role": "system", "content": "Answer based on the provided context."},
                {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}\n\nAnswer:"}
            ],
            temperature=0.7,
            max_tokens=300
        )
        return {
            "answer": response.choices[0].message.content,
            "tokens": response.usage.total_tokens,
            "sources": context
        }
    
    def _generate_gemini(self, query: str, context: str) -> Dict:
        """Generate using Gemini."""
        prompt = f"Context:\n{context}\n\nQuestion: {query}\n\nAnswer based on the context:"
        
        response = self.gemini_client.models.generate_content(
            model=self.llm_model,
            contents=prompt,
            config=types.GenerateContentConfig(
                temperature=0.7,
                max_output_tokens=300
            )
        )
        return {
            "answer": response.text,
            "tokens": response.usage_metadata.total_token_count,
            "sources": context
        }
    
    def _generate_claude(self, query: str, context: str) -> Dict:
        """Generate using Claude."""
        prompt = f"Context:\n{context}\n\nQuestion: {query}\n\nAnswer based on the context above:"
        
        response = self.claude_client.messages.create(
            model=self.llm_model,
            max_tokens=300,
            temperature=0.7,
            messages=[{"role": "user", "content": prompt}]
        )
        return {
            "answer": response.content[0].text,
            "tokens": response.usage.input_tokens + response.usage.output_tokens,
            "sources": context
        }

print("âœ… MultiProviderRAG class defined")

## Compare Different LLM Providers

In [None]:
# Test question
test_question = "What is deep learning and what are its applications?"

providers = ["openai", "gemini", "claude"]

for provider in providers:
    print(f"\n{'='*80}")
    print(f"Testing with {provider.upper()}")
    print('='*80)
    
    try:
        # Create RAG system for this provider
        rag = MultiProviderRAG(provider=provider, top_k=2)
        rag.add_documents(knowledge_base)
        
        # Query
        result = rag.query(test_question, return_sources=False)
        
        print(f"\nAnswer:\n{result['answer']}")
        print(f"\nTokens used: {result['tokens_used']}")
        
    except Exception as e:
        print(f"Error with {provider}: {e}")
        print("Make sure you have the API key set in your .env file")

---

# Part 5: RAG with Vector Database (ChromaDB)

## Why Vector Databases?

Our in-memory approach works for small datasets, but in production:
- **Scalability**: Millions of documents
- **Persistence**: Data survives restarts
- **Speed**: Optimized similarity search
- **Features**: Filtering, metadata, updates

ChromaDB is perfect for learning - simple API, runs locally.

In [None]:
import chromadb
from chromadb.utils import embedding_functions

class ProductionRAG:
    """
    Production-ready RAG using ChromaDB.
    
    Why this is better for production:
    - Persistent storage
    - Handles large datasets efficiently
    - Built-in embedding generation
    - Metadata filtering
    """
    
    def __init__(
        self,
        collection_name: str = "knowledge_base",
        llm_provider: str = "openai",
        llm_model: str = "gpt-4o-mini",
        top_k: int = 3
    ):
        """
        Initialize production RAG system.
        
        Args:
            collection_name: Name for the vector collection
            llm_provider: LLM provider (openai, gemini, claude)
            llm_model: Model name
            top_k: Number of results to retrieve
        """
        # Initialize ChromaDB
        self.client = chromadb.Client()
        
        # Use sentence transformers for embeddings
        self.embedding_function = embedding_functions.SentenceTransformerEmbeddingFunction(
            model_name="all-MiniLM-L6-v2"
        )
        
        # Create or get collection
        self.collection = self.client.get_or_create_collection(
            name=collection_name,
            embedding_function=self.embedding_function
        )
        
        self.llm_provider = llm_provider
        self.llm_model = llm_model
        self.top_k = top_k
        
        # Initialize LLM client
        self.openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    
    def add_documents(
        self,
        documents: List[str],
        metadatas: List[Dict] = None,
        ids: List[str] = None
    ) -> None:
        """
        Add documents to the vector database.
        
        Args:
            documents: List of text documents
            metadatas: Optional metadata for each document
            ids: Optional IDs (auto-generated if not provided)
        """
        if ids is None:
            ids = [f"doc_{i}" for i in range(len(documents))]
        
        if metadatas is None:
            metadatas = [{"source": "knowledge_base"} for _ in documents]
        
        self.collection.add(
            documents=documents,
            metadatas=metadatas,
            ids=ids
        )
        print(f"âœ… Added {len(documents)} documents to ChromaDB")
    
    def query(self, question: str) -> Dict:
        """
        Query the RAG system.
        
        Args:
            question: User's question
            
        Returns:
            Dictionary with answer and metadata
        """
        # Retrieve from ChromaDB
        results = self.collection.query(
            query_texts=[question],
            n_results=self.top_k
        )
        
        # Extract documents and distances
        documents = results['documents'][0]
        distances = results['distances'][0]
        
        # Build context
        context = "\n\n".join(documents)
        
        # Generate answer
        prompt = f"""Context:
{context}

Question: {question}

Answer based on the context above:"""
        
        response = self.openai_client.chat.completions.create(
            model=self.llm_model,
            messages=[
                {"role": "system", "content": "Answer based on the provided context."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.7,
            max_tokens=300
        )
        
        return {
            "question": question,
            "answer": response.choices[0].message.content,
            "tokens_used": response.usage.total_tokens,
            "sources": [
                {"text": doc, "distance": dist}
                for doc, dist in zip(documents, distances)
            ]
        }

print("âœ… ProductionRAG class defined")

In [None]:
# Create production RAG system
prod_rag = ProductionRAG(
    collection_name="ai_knowledge",
    llm_model="gpt-4o-mini",
    top_k=3
)

# Add documents with metadata
metadatas = [{"topic": "AI", "index": i} for i in range(len(knowledge_base))]
prod_rag.add_documents(knowledge_base, metadatas=metadatas)

# Query
result = prod_rag.query("What are vector databases and why are they important for RAG?")

print(f"Question: {result['question']}\n")
print(f"Answer:\n{result['answer']}\n")
print(f"Tokens used: {result['tokens_used']}\n")
print("Sources (lower distance = more relevant):")
for i, source in enumerate(result['sources'], 1):
    print(f"{i}. [Distance: {source['distance']:.3f}]")
    print(f"   {source['text'][:100]}...\n")

---

# Summary: Your Complete RAG Journey

## What You Learned

### 1. **The RAG Pipeline**
```
Data â†’ Embeddings â†’ Vector Store â†’ Retrieval â†’ LLM â†’ Answer
```

### 2. **Production Code Practices**

**Functions:**
- âœ… Reusable code blocks
- âœ… Easy to test
- âœ… Clear inputs/outputs

**Classes:**
- âœ… Manage state (knowledge base, embeddings)
- âœ… Clean API (`rag.query()`)
- âœ… Extensible (inheritance for multi-provider)

**Type Hints & Docstrings:**
- âœ… Self-documenting code
- âœ… Catches errors early
- âœ… Better IDE support

### 3. **From Simple to Production**

| Approach | Best For | Limitations |
|----------|----------|-------------|
| **Functions** | Learning, prototypes | Passing data repeatedly |
| **SimpleRAG Class** | Small projects | In-memory only |
| **MultiProviderRAG** | Flexibility | Still in-memory |
| **ProductionRAG** | Real applications | Requires vector DB |

### 4. **Key Concepts**

- **Semantic Search**: Find relevant info using meaning, not keywords
- **Context Window**: How much text the LLM can process
- **Top-K Retrieval**: Get K most relevant documents
- **Cosine Similarity**: Measure how similar two vectors are (0-1)

## Next Steps

1. âœ… **Practice with your own data**
   - PDFs, documentation, notes
   - Use `PyPDF2` or `langchain` loaders

2. âœ… **Advanced RAG Techniques**
   - Chunking strategies
   - Hybrid search (keyword + semantic)
   - Re-ranking retrieved documents
   - Parent-child chunking

3. âœ… **Production Enhancements**
   - Error handling
   - Logging and monitoring
   - Caching for repeated queries
   - Rate limiting

4. âœ… **Learn Advanced Tools**
   - LangChain (RAG framework)
   - LlamaIndex (data framework)
   - Vector databases (Pinecone, Weaviate)

## Your AI Engineering Path

```
âœ… Day 1: Embeddings
âœ… Day 2: LLM APIs  
âœ… Day 3: RAG
ðŸ”œ Advanced Prompting
ðŸ”œ Agents & Function Calling
ðŸ”œ Fine-tuning
ðŸ”œ Production Deployment
```

You're building real AI engineering skills! ðŸš€