# RAG Engine Mini - Hands-On Learning Notebook

Welcome to the RAG Engine Mini hands-on learning notebook! This notebook will guide you through the key concepts and implementation details of the RAG system.

## Table of Contents
1. [Project Overview](#overview)
2. [Architecture Deep Dive](#architecture)
3. [Domain Layer Exploration](#domain)
4. [Application Layer Walkthrough](#application)
5. [Adapters Layer Analysis](#adapters)
6. [API Layer Interaction](#api)
7. [Workers Layer Understanding](#workers)
8. [Complete RAG Pipeline](#pipeline)
9. [Practical Exercises](#exercises)

<a id="overview"></a>
## 1. Project Overview

RAG Engine Mini is a production-grade Retrieval-Augmented Generation (RAG) starter template that bridges the gap between notebook experiments and real-world AI systems. Built with Clean Architecture principles, it provides a solid foundation for building intelligent document Q&A systems.

In [None]:
# Let's start by exploring the project structure
import os
from pathlib import Path

# Define the project root
project_root = Path("../../../")  # Adjust based on your notebook location
print(f"Project Root: {project_root}")

# List main directories
main_dirs = [d for d in project_root.iterdir() if d.is_dir()]
print("\nMain directories:")
for d in main_dirs:
    print(f"  - {d.name}")

In [None]:
# Explore the source code structure
src_dir = project_root / "src"
if src_dir.exists():
    print("\nSource code structure:")
    for d in src_dir.iterdir():
        if d.is_dir():
            print(f"  - {d.name}")
else:
    print("Source directory not found")

<a id="architecture"></a>
## 2. Architecture Deep Dive

The RAG Engine Mini follows Clean Architecture principles with four main layers:

1. **Domain Layer**: Pure business logic with no external dependencies
2. **Application Layer**: Use cases, services, and ports (interfaces)
3. **Adapters Layer**: Concrete implementations (DB, vector store, LLM, etc.)
4. **API Layer**: FastAPI routes and controllers

In [None]:
# Let's examine the main.py file to understand the application structure
main_py_path = src_dir / "main.py"
if main_py_path.exists():
    with open(main_py_path, 'r', encoding='utf-8') as f:
        content = f.read()
        
    # Print first 50 lines to understand the structure
    lines = content.split('\n')
    print("First 50 lines of main.py:")
    for i, line in enumerate(lines[:50]):
        print(f"{i+1:2d}: {line}")
else:
    print("main.py not found")

In [None]:
# Let's look at the bootstrap module to understand dependency injection
bootstrap_path = src_dir / "core" / "bootstrap.py"
if bootstrap_path.exists():
    with open(bootstrap_path, 'r', encoding='utf-8') as f:
        content = f.read()
        
    # Find the get_container function
    lines = content.split('\n')
    in_function = False
    for i, line in enumerate(lines):
        if '@lru_cache' in line and 'get_container' in content[i:i+100]:
            in_function = True
        if in_function:
            print(f"{i+1:2d}: {line}")
            if line.strip().endswith('}') and in_function:
                break
else:
    print("bootstrap.py not found")

<a id="domain"></a>
## 3. Domain Layer Exploration

The domain layer contains pure business logic and entities. Let's explore the key domain concepts.

In [None]:
# Look at the domain entities
domain_entities_path = src_dir / "domain" / "entities.py"
if domain_entities_path.exists():
    with open(domain_entities_path, 'r', encoding='utf-8') as f:
        content = f.read()
        
    print("Domain Entities:")
    print(content[:1000])  # Show first 1000 characters
else:
    print("entities.py not found")

In [None]:
# Let's examine the TenantId value object
import re

if domain_entities_path.exists():
    with open(domain_entities_path, 'r', encoding='utf-8') as f:
        content = f.read()
        
    # Extract TenantId class
    tenant_id_pattern = r"(@dataclass\(frozen=True\)\s+class TenantId:[^}]+?\})"
    matches = re.findall(tenant_id_pattern, content, re.DOTALL)
    
    if matches:
        print("TenantId class definition:")
        print(matches[0])
    else:
        print("TenantId class not found in the expected format")

<a id="application"></a>
## 4. Application Layer Walkthrough

The application layer orchestrates business logic through use cases and services. Let's explore the key components.

In [None]:
# Look at the ask question hybrid use case
ask_hybrid_path = src_dir / "application" / "use_cases" / "ask_question_hybrid.py"
if ask_hybrid_path.exists():
    with open(ask_hybrid_path, 'r', encoding='utf-8') as f:
        content = f.read()
        
    print("AskQuestionHybridUseCase - First 100 lines:")
    lines = content.split('\n')
    for i, line in enumerate(lines[:100]):
        print(f"{i+1:2d}: {line}")
else:
    print("ask_question_hybrid.py not found")

In [None]:
# Look at the AskHybridRequest class
if ask_hybrid_path.exists():
    with open(ask_hybrid_path, 'r', encoding='utf-8') as f:
        content = f.read()
        
    # Extract AskHybridRequest class
    request_pattern = r"(@dataclass\s+class AskHybridRequest:[^}]+?\})"
    matches = re.findall(request_pattern, content, re.DOTALL)
    
    if matches:
        print("AskHybridRequest class definition:")
        print(matches[0])
    else:
        print("AskHybridRequest class not found in the expected format")

<a id="adapters"></a>
## 5. Adapters Layer Analysis

The adapters layer provides concrete implementations for the ports defined in the application layer. Let's explore some key adapters.

In [None]:
# Look at the Qdrant vector store adapter
qdrant_adapter_path = src_dir / "adapters" / "vector" / "qdrant_store.py"
if qdrant_adapter_path.exists():
    with open(qdrant_adapter_path, 'r', encoding='utf-8') as f:
        content = f.read()
        
    print("QdrantVectorStore - First 50 lines:")
    lines = content.split('\n')
    for i, line in enumerate(lines[:50]):
        print(f"{i+1:2d}: {line}")
else:
    print("qdrant_store.py not found")

In [None]:
# Look at the OpenAI LLM adapter
openai_adapter_path = src_dir / "adapters" / "llm" / "openai_llm.py"
if not openai_adapter_path.exists():
    # Try alternative path
    openai_adapter_path = src_dir / "adapters" / "llm" / "openai_llm.py"
    
if openai_adapter_path.exists():
    with open(openai_adapter_path, 'r', encoding='utf-8') as f:
        content = f.read()
        
    print("OpenAILLM - First 50 lines:")
    lines = content.split('\n')
    for i, line in enumerate(lines[:50]):
        print(f"{i+1:2d}: {line}")
else:
    print("OpenAI LLM adapter not found")
    
    # Look for any LLM adapter
    llm_dir = src_dir / "adapters" / "llm"
    if llm_dir.exists():
        print("\nAvailable LLM adapters:")
        for adapter in llm_dir.iterdir():
            if adapter.suffix == '.py':
                print(f"  - {adapter.name}")

<a id="api"></a>
## 6. API Layer Interaction

The API layer exposes functionality through FastAPI routes. Let's explore the key endpoints.

In [None]:
# Look at the ask route
ask_route_path = src_dir / "api" / "v1" / "routes_ask.py"
if ask_route_path.exists():
    with open(ask_route_path, 'r', encoding='utf-8') as f:
        content = f.read()
        
    print("Routes Ask - Full content:")
    print(content)
else:
    print("routes_ask.py not found")

In [None]:
# Look at the dependencies module
deps_path = src_dir / "api" / "v1" / "deps.py"
if deps_path.exists():
    with open(deps_path, 'r', encoding='utf-8') as f:
        content = f.read()
        
    print("API Dependencies:")
    print(content)
else:
    print("deps.py not found")

<a id="workers"></a>
## 7. Workers Layer Understanding

The workers layer handles background processing tasks using Celery. Let's explore the key tasks.

In [None]:
# Look at the tasks module
tasks_path = src_dir / "workers" / "tasks.py"
if tasks_path.exists():
    with open(tasks_path, 'r', encoding='utf-8') as f:
        content = f.read()
        
    print("Celery Tasks - First 100 lines:")
    lines = content.split('\n')
    for i, line in enumerate(lines[:100]):
        print(f"{i+1:2d}: {line}")
else:
    print("tasks.py not found")

<a id="pipeline"></a>
## 8. Complete RAG Pipeline

Now let's simulate the complete RAG pipeline by walking through the key steps.

In [None]:
# Let's create a simplified simulation of the RAG pipeline
import time
from dataclasses import dataclass
from typing import List, Optional
from enum import Enum

# Simplified domain entities
@dataclass
class DocumentId:
    value: str

@dataclass
class TenantId:
    value: str

@dataclass
class Chunk:
    id: str
    text: str
    score: float = 0.0

# Simplified RAG pipeline simulation
class RAGPipelineSimulator:
    def __init__(self):
        self.documents = {}
        self.chunks = []
        self.vectors = {}  # Mock vector store
        
    def upload_document(self, tenant_id: str, content: str, filename: str):
        print(f"Step 1: Uploading document '{filename}' for tenant {tenant_id}")
        
        # Create document ID
        import uuid
        doc_id = str(uuid.uuid4())
        
        # Store document
        self.documents[doc_id] = {
            'tenant_id': tenant_id,
            'content': content,
            'filename': filename,
            'status': 'uploaded'
        }
        
        print(f"  Created document ID: {doc_id}")
        return doc_id
    
    def index_document(self, doc_id: str):
        print(f"\nStep 2: Indexing document {doc_id}")
        
        # Update status
        self.documents[doc_id]['status'] = 'processing'
        print("  Status updated to 'processing'")
        
        # Simulate text extraction
        content = self.documents[doc_id]['content']
        print(f"  Extracted text (first 50 chars): {content[:50]}...")
        
        # Simulate chunking
        print("  Performing chunking...")
        # Simple chunking: split by sentences
        sentences = content.split('. ')
        chunks = []
        for i, sentence in enumerate(sentences[:5]):  # Just first 5 for demo
            chunk_id = f"{doc_id}_chunk_{i}"
            chunk = Chunk(id=chunk_id, text=sentence.strip())
            chunks.append(chunk)
            
            # Simulate embedding
            # In real system, this would call embedding API
            vector = [float(ord(c) % 100) for c in sentence[:20]]  # Mock vector
            self.vectors[chunk_id] = vector
        
        self.chunks.extend(chunks)
        print(f"  Created {len(chunks)} chunks and stored vectors")
        
        # Update status
        self.documents[doc_id]['status'] = 'indexed'
        print(f"  Status updated to 'indexed'")
        
    def search(self, query: str, top_k: int = 3):
        print(f"\nStep 3: Searching for query: '{query}'")
        
        # Simulate query embedding
        query_vector = [float(ord(c) % 100) for c in query[:20]]  # Mock vector
        print(f"  Generated query vector (first 10 dims): {query_vector[:10]}")
        
        # Simulate similarity search
        print("  Performing similarity search...")
        
        # Calculate mock similarities
        results = []
        for chunk in self.chunks:
            # Calculate mock similarity (in real system, this would be cosine similarity)
            similarity = sum(min(q, v) for q, v in zip(query_vector, self.vectors[chunk.id]))
            chunk.score = similarity
            results.append(chunk)
        
        # Sort by score and return top-k
        results.sort(key=lambda x: x.score, reverse=True)
        top_results = results[:top_k]
        
        print(f"  Found {len(top_results)} relevant chunks")
        return top_results
    
    def generate_answer(self, query: str, context_chunks: List[Chunk]):
        print(f"\nStep 4: Generating answer for query: '{query}'")
        
        # Build context from chunks
        context = " ".join([chunk.text for chunk in context_chunks])
        print(f"  Context length: {len(context)} characters")
        
        # Simulate LLM generation
        print("  Generating answer with LLM...")
        time.sleep(0.5)  # Simulate processing time
        
        # Mock answer generation
        answer = f"Based on the provided context, the answer to '{query}' is: This is a simulated answer based on the context: '{context[:100]}...'"
        
        print(f"  Generated answer: {answer[:100]}...")
        return answer
    
    def rag_pipeline(self, tenant_id: str, query: str, doc_content: str = None):
        print("="*60)
        print("STARTING RAG PIPELINE SIMULATION")
        print("="*60)
        
        # Step 1: Upload document (if provided)
        doc_id = None
        if doc_content:
            doc_id = self.upload_document(tenant_id, doc_content, "sample_doc.txt")
        
        # Step 2: Index document (if uploaded)
        if doc_id:
            self.index_document(doc_id)
        
        # Step 3: Search
        search_results = self.search(query)
        
        # Step 4: Generate answer
        answer = self.generate_answer(query, search_results)
        
        print("\n" + "="*60)
        print("RAG PIPELINE COMPLETED")
        print("="*60)
        
        return {
            'answer': answer,
            'sources': [chunk.id for chunk in search_results],
            'query': query
        }

In [None]:
# Run the RAG pipeline simulation
simulator = RAGPipelineSimulator()

# Sample document content
sample_doc = """
Artificial Intelligence (AI) is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals. Leading AI textbooks define the field as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals.

Colloquially, the term "artificial intelligence" is often used to describe machines that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving".

As machines become increasingly capable, tasks considered to require "intelligence" are often removed from the definition of AI, a phenomenon known as the AI effect. A quip in Tesler's Theorem says "AI is whatever hasn't been done yet."
"""

# Run the pipeline
result = simulator.rag_pipeline(
    tenant_id="user_123",
    query="What is artificial intelligence?",
    doc_content=sample_doc
)

<a id="exercises"></a>
## 9. Practical Exercises

Now let's try some exercises to reinforce your understanding.

In [None]:
# Exercise 1: Modify the simulator to add reranking
print("EXERCISE 1: Enhance the RAG simulator with reranking functionality")
print("\nCurrently, our simulator only does basic similarity search.")
print("Try to implement a reranking step that reorders results based on")
print("how well they match the query keywords.")

# TODO: Implement reranking functionality
# Hint: You could implement a simple keyword matching score
# and combine it with the similarity score

In [None]:
# Exercise 2: Add multi-tenancy validation
print("EXERCISE 2: Add tenant isolation validation")
print("\nModify the simulator to ensure that one tenant cannot access")
print("another tenant's documents or vectors.")

# TODO: Add tenant validation to all methods
# Hint: Check tenant_id in all operations

In [None]:
# Exercise 3: Implement caching
print("EXERCISE 3: Add embedding caching")
print("\nAdd a cache to store previously computed embeddings")
print("to avoid recomputing them for the same text.")

# TODO: Add embedding cache to the simulator
# Hint: Use a dictionary to store text -> vector mappings

## Summary

In this notebook, we've explored:

1. **Project Overview**: Understanding the RAG Engine Mini architecture
2. **Clean Architecture**: How the system separates concerns into layers
3. **Domain Layer**: Pure business logic with tenant isolation
4. **Application Layer**: Use cases that orchestrate business logic
5. **Adapters Layer**: Concrete implementations of external services
6. **API Layer**: FastAPI endpoints exposing functionality
7. **Workers Layer**: Background processing with Celery
8. **Complete RAG Pipeline**: From document upload to answer generation
9. **Hands-on Practice**: Simulating the RAG pipeline and exercises

This system demonstrates production-ready patterns for building scalable RAG applications with proper separation of concerns, multi-tenancy, and observability.