# 🎯 Adaptive RAG Implementation

## Dynamic Strategy Selection for Intelligent AI Systems

This notebook demonstrates an **Adaptive RAG system** that intelligently selects retrieval strategies based on query complexity:

- 🚫 **No Retrieval**: Direct answers for simple factual questions
- 🔍 **Single-step RAG**: Standard retrieval for straightforward queries
- 🔄 **Multi-step RAG**: Complex retrieval for multi-faceted questions
- 🧠 **Intelligent Routing**: Dynamic strategy selection
- 🎓 **Educational Focus**: Tutor system example

### Key Benefits of Adaptive Architecture
- **Efficiency**: No unnecessary retrieval for simple questions
- **Quality**: Appropriate depth for query complexity
- **Speed**: Faster responses for basic queries
- **Accuracy**: Deep analysis for complex problems

In [1]:
# Install required packages
!pip install sentence-transformers faiss-cpu google-generativeai rank-bm25 transformers scikit-learn numpy python-dotenv sympy matplotlib

You should consider upgrading via the '/home/mohdasimkhan/.pyenv/versions/3.10.2/envs/rags/bin/python -m pip install --upgrade pip' command.[0m


In [2]:
# Import libraries
import numpy as np
import json
import re
import os
import time
import uuid
import math
import sympy as sp
from typing import List, Dict, Tuple, Optional, Any, Union
from dataclasses import dataclass, field
from abc import ABC, abstractmethod
from enum import Enum
from datetime import datetime, timedelta

from sentence_transformers import SentenceTransformer
import faiss
import google.generativeai as genai
from rank_bm25 import BM25Okapi
from sklearn.feature_extraction.text import TfidfVectorizer
from dotenv import load_dotenv

load_dotenv()
print("📚 Libraries imported successfully!")

  from .autonotebook import tqdm as notebook_tqdm


📚 Libraries imported successfully!


## 🏗️ Core Data Structures

Define the foundational structures for our adaptive system:

In [3]:
# Enums for system configuration
class QueryComplexity(Enum):
    SIMPLE = "simple"           # Direct factual questions
    MODERATE = "moderate"       # Single-step retrieval needed
    COMPLEX = "complex"         # Multi-step analysis required

class QueryType(Enum):
    FACTUAL = "factual"         # "What is 2+2?"
    DEFINITION = "definition"   # "What is photosynthesis?"
    CALCULATION = "calculation" # "Calculate the area of a circle"
    EXPLANATION = "explanation" # "Explain how photosynthesis works"
    PROBLEM_SOLVING = "problem_solving" # "Solve this physics problem"
    COMPARATIVE = "comparative" # "Compare mitosis and meiosis"
    ANALYTICAL = "analytical"   # "Analyze the causes of WWII"

class AdaptiveStrategy(Enum):
    NO_RETRIEVAL = "no_retrieval"     # Answer directly
    SINGLE_STEP = "single_step"       # One retrieval pass
    MULTI_STEP = "multi_step"         # Multiple retrieval passes

class Subject(Enum):
    MATHEMATICS = "mathematics"
    SCIENCE = "science"
    HISTORY = "history"
    LITERATURE = "literature"
    GENERAL = "general"

# Core data structures
@dataclass
class Document:
    id: str
    title: str
    content: str
    subject: Subject
    difficulty_level: int  # 1-5 scale
    keywords: List[str] = field(default_factory=list)
    embedding: Optional[np.ndarray] = None

@dataclass
class Query:
    id: str
    text: str
    query_type: QueryType
    complexity: QueryComplexity
    subject: Subject
    confidence_score: float = 0.0
    user_id: Optional[str] = None

@dataclass
class RetrievalResult:
    document: Document
    score: float
    rank: int
    retrieval_step: int = 1  # Which step of multi-step retrieval

@dataclass
class AdaptiveResponse:
    query: Query
    strategy_used: AdaptiveStrategy
    retrieved_documents: List[RetrievalResult]
    generated_answer: str
    confidence_score: float
    processing_steps: List[str]
    processing_time: float
    reasoning_chain: List[str] = field(default_factory=list)  # For multi-step

print("🏗️ Data structures defined!")

🏗️ Data structures defined!


## 📚 Educational Knowledge Base

Create a comprehensive knowledge base for our educational tutor:

In [4]:
# Educational knowledge base
educational_knowledge = [
    # Mathematics
    {
        "id": "math_001",
        "title": "Basic Arithmetic Operations",
        "subject": "mathematics",
        "difficulty_level": 1,
        "content": "Arithmetic operations include addition (+), subtraction (-), multiplication (×), and division (÷). Addition combines numbers to find their sum. Subtraction finds the difference between numbers. Multiplication is repeated addition. Division splits a number into equal parts. Order of operations (PEMDAS): Parentheses, Exponents, Multiplication/Division (left to right), Addition/Subtraction (left to right).",
        "keywords": ["addition", "subtraction", "multiplication", "division", "arithmetic", "PEMDAS"]
    },
    {
        "id": "math_002",
        "title": "Geometry Fundamentals",
        "subject": "mathematics",
        "difficulty_level": 2,
        "content": "Geometry studies shapes, sizes, and properties of figures. Key concepts: Area measures the space inside a shape. Perimeter is the distance around a shape. For circles: Area = πr², Circumference = 2πr. For rectangles: Area = length × width, Perimeter = 2(length + width). For triangles: Area = ½ × base × height. The Pythagorean theorem: a² + b² = c² for right triangles.",
        "keywords": ["geometry", "area", "perimeter", "circle", "rectangle", "triangle", "pythagorean"]
    },
    {
        "id": "math_003",
        "title": "Algebra Basics",
        "subject": "mathematics",
        "difficulty_level": 3,
        "content": "Algebra uses letters (variables) to represent unknown numbers. Linear equations have the form ax + b = c. To solve: isolate the variable by performing inverse operations on both sides. Quadratic equations have the form ax² + bx + c = 0. Solutions use the quadratic formula: x = (-b ± √(b²-4ac))/2a. Functions represent relationships between variables, written as f(x) = expression.",
        "keywords": ["algebra", "variables", "equations", "linear", "quadratic", "functions"]
    },
    
    # Science
    {
        "id": "sci_001",
        "title": "Photosynthesis Process",
        "subject": "science",
        "difficulty_level": 2,
        "content": "Photosynthesis is the process by which plants convert sunlight, carbon dioxide, and water into glucose and oxygen. The equation: 6CO₂ + 6H₂O + light energy → C₆H₁₂O₆ + 6O₂. It occurs in chloroplasts containing chlorophyll. Two stages: Light reactions (in thylakoids) capture energy and split water. Calvin cycle (in stroma) uses energy to make glucose from CO₂. This process is crucial for life on Earth as it produces oxygen and food.",
        "keywords": ["photosynthesis", "chloroplast", "chlorophyll", "glucose", "oxygen", "carbon dioxide"]
    },
    {
        "id": "sci_002",
        "title": "Cell Division: Mitosis and Meiosis",
        "subject": "science",
        "difficulty_level": 3,
        "content": "Cell division produces new cells. Mitosis creates two identical diploid cells for growth and repair. Stages: Prophase (chromosomes condense), Metaphase (chromosomes align), Anaphase (chromosomes separate), Telophase (nuclei reform). Meiosis creates four genetically different haploid gametes for reproduction. Has two divisions (meiosis I and II). Crossing over in prophase I increases genetic diversity. Mitosis maintains chromosome number; meiosis reduces it by half.",
        "keywords": ["mitosis", "meiosis", "cell division", "chromosomes", "diploid", "haploid", "gametes"]
    },
    {
        "id": "sci_003",
        "title": "Newton's Laws of Motion",
        "subject": "science",
        "difficulty_level": 3,
        "content": "Newton's three laws describe motion and forces. First Law (Inertia): Objects at rest stay at rest, objects in motion stay in motion, unless acted upon by a force. Second Law: Force equals mass times acceleration (F = ma). Greater force or less mass means greater acceleration. Third Law: For every action, there is an equal and opposite reaction. When you push on something, it pushes back with equal force. These laws explain everything from walking to rocket propulsion.",
        "keywords": ["newton", "laws", "motion", "force", "inertia", "acceleration", "mass"]
    },
    
    # History
    {
        "id": "hist_001",
        "title": "World War II Causes and Timeline",
        "subject": "history",
        "difficulty_level": 4,
        "content": "World War II (1939-1945) was caused by multiple factors: Treaty of Versailles created resentment in Germany, global economic depression destabilized governments, rise of totalitarian regimes (Nazi Germany, Fascist Italy, Imperial Japan), failure of League of Nations to maintain peace. Key events: Germany invaded Poland (Sept 1939), Pearl Harbor attack (Dec 1941), D-Day invasion (June 1944), atomic bombs on Japan (Aug 1945). The war resulted in 70-85 million deaths and reshaped global politics.",
        "keywords": ["world war ii", "hitler", "nazi", "holocaust", "pearl harbor", "allies", "axis"]
    },
    {
        "id": "hist_002",
        "title": "American Revolution",
        "subject": "history",
        "difficulty_level": 3,
        "content": "The American Revolution (1775-1783) was fought between Great Britain and thirteen American colonies. Causes included taxation without representation, restrictive laws like the Stamp Act and Tea Act, and British military presence. Key events: Boston Tea Party (1773), Lexington and Concord battles (1775), Declaration of Independence (1776), Valley Forge winter (1777-78), Yorktown victory (1781). The war established the United States as an independent nation and influenced democratic movements worldwide.",
        "keywords": ["american revolution", "independence", "boston tea party", "declaration", "washington"]
    },
    
    # Literature
    {
        "id": "lit_001",
        "title": "Shakespeare's Literary Techniques",
        "subject": "literature",
        "difficulty_level": 4,
        "content": "William Shakespeare used various literary techniques: Iambic pentameter (10 syllables per line with alternating unstressed/stressed pattern), metaphors and similes for vivid imagery, dramatic irony (audience knows what characters don't), soliloquies reveal inner thoughts, foreshadowing hints at future events. His plays blend comedy and tragedy, explore universal themes like love, power, betrayal, and redemption. Character development shows psychological depth and moral complexity.",
        "keywords": ["shakespeare", "iambic pentameter", "metaphor", "irony", "soliloquy", "tragedy", "comedy"]
    }
]

print(f"📚 Educational knowledge base created with {len(educational_knowledge)} documents")
print(f"🎓 Subjects: {set(doc['subject'] for doc in educational_knowledge)}")
print(f"📊 Difficulty levels: {sorted(set(doc['difficulty_level'] for doc in educational_knowledge))}")

📚 Educational knowledge base created with 9 documents
🎓 Subjects: {'history', 'science', 'mathematics', 'literature'}
📊 Difficulty levels: [1, 2, 3, 4]


## 🧩 Base Module Classes

Define abstract base classes for adaptive architecture:

In [5]:
# Base module class
class BaseAdaptiveModule(ABC):
    def __init__(self, name: str):
        self.name = name
        self.call_count = 0
        self.strategy_usage = {strategy: 0 for strategy in AdaptiveStrategy}
        self.created_at = datetime.now()
    
    @abstractmethod
    def process(self, input_data: Any) -> Any:
        pass
    
    def update_stats(self, strategy_used: Optional[AdaptiveStrategy] = None):
        self.call_count += 1
        if strategy_used:
            self.strategy_usage[strategy_used] += 1
    
    def get_info(self):
        return {
            'name': self.name,
            'calls': self.call_count,
            'strategy_usage': dict(self.strategy_usage),
            'created': self.created_at
        }

print("🧩 Base adaptive module class defined!")

🧩 Base adaptive module class defined!


## 🧠 Query Analysis Module

Intelligent query understanding and complexity assessment:

In [6]:
class QueryAnalysisModule(BaseAdaptiveModule):
    def __init__(self):
        super().__init__("QueryAnalysisModule")
        
        # Simple patterns that don't need retrieval
        self.simple_patterns = {
            'basic_math': [r'\d+\s*[+\-*/]\s*\d+', r'what is \d+ [+\-*/] \d+'],
            'basic_facts': [r'what is \d+ \+ \d+', r'how much is', r'what does \w+ mean']
        }
        
        # Query type patterns
        self.type_patterns = {
            QueryType.CALCULATION: [r'calculate', r'compute', r'find the (area|volume|perimeter)', r'solve \d+'],
            QueryType.DEFINITION: [r'what is', r'define', r'meaning of'],
            QueryType.EXPLANATION: [r'explain', r'how does', r'why does', r'describe'],
            QueryType.PROBLEM_SOLVING: [r'solve', r'find the solution', r'how to solve'],
            QueryType.COMPARATIVE: [r'compare', r'difference', r'versus', r'vs'],
            QueryType.ANALYTICAL: [r'analyze', r'examine', r'discuss', r'evaluate'],
            QueryType.FACTUAL: [r'when', r'where', r'who', r'which']
        }
        
        # Subject keywords
        self.subject_keywords = {
            Subject.MATHEMATICS: ["math", "algebra", "geometry", "calculus", "equation", "formula", "calculate"],
            Subject.SCIENCE: ["biology", "chemistry", "physics", "cell", "molecule", "force", "energy"],
            Subject.HISTORY: ["war", "revolution", "century", "empire", "battle", "treaty", "civilization"],
            Subject.LITERATURE: ["shakespeare", "novel", "poem", "author", "character", "theme", "metaphor"]
        }
    
    def process(self, query_text: str) -> Query:
        self.update_stats()
        
        query_id = str(uuid.uuid4())[:8]
        query_type = self._detect_type(query_text)
        complexity = self._assess_complexity(query_text, query_type)
        subject = self._detect_subject(query_text)
        confidence = self._calculate_confidence(query_text, query_type, complexity)
        
        return Query(
            id=query_id,
            text=query_text,
            query_type=query_type,
            complexity=complexity,
            subject=subject,
            confidence_score=confidence
        )
    
    def _detect_type(self, text: str) -> QueryType:
        text_lower = text.lower()
        
        for query_type, patterns in self.type_patterns.items():
            for pattern in patterns:
                if re.search(pattern, text_lower):
                    return query_type
        
        return QueryType.FACTUAL
    
    def _assess_complexity(self, text: str, query_type: QueryType) -> QueryComplexity:
        text_lower = text.lower()
        complexity_score = 0
        
        # Check for simple patterns first
        for category, patterns in self.simple_patterns.items():
            for pattern in patterns:
                if re.search(pattern, text_lower):
                    return QueryComplexity.SIMPLE
        
        # Length-based complexity
        word_count = len(text.split())
        if word_count > 20:
            complexity_score += 2
        elif word_count > 10:
            complexity_score += 1
        
        # Query type complexity
        complex_types = [QueryType.ANALYTICAL, QueryType.PROBLEM_SOLVING, QueryType.COMPARATIVE]
        moderate_types = [QueryType.EXPLANATION, QueryType.CALCULATION]
        
        if query_type in complex_types:
            complexity_score += 2
        elif query_type in moderate_types:
            complexity_score += 1
        
        # Complexity indicators
        complex_indicators = ['analyze', 'evaluate', 'compare', 'contrast', 'multiple', 'various', 'several']
        complexity_score += sum(1 for indicator in complex_indicators if indicator in text_lower)
        
        # Mathematical complexity
        if re.search(r'solve.*equation|find.*derivative|integral|theorem|proof', text_lower):
            complexity_score += 2
        
        # Classification
        if complexity_score >= 3:
            return QueryComplexity.COMPLEX
        elif complexity_score >= 1:
            return QueryComplexity.MODERATE
        else:
            return QueryComplexity.SIMPLE
    
    def _detect_subject(self, text: str) -> Subject:
        text_lower = text.lower()
        subject_scores = {}
        
        for subject, keywords in self.subject_keywords.items():
            score = sum(1 for keyword in keywords if keyword in text_lower)
            subject_scores[subject] = score
        
        if max(subject_scores.values()) > 0:
            return max(subject_scores, key=subject_scores.get)
        return Subject.GENERAL
    
    def _calculate_confidence(self, text: str, query_type: QueryType, complexity: QueryComplexity) -> float:
        confidence = 0.5  # Base confidence
        
        # Higher confidence for clear patterns
        if any(re.search(pattern, text.lower()) for patterns in self.type_patterns.values() for pattern in patterns):
            confidence += 0.3
        
        # Subject-specific confidence
        text_lower = text.lower()
        subject_matches = sum(1 for keywords in self.subject_keywords.values() for keyword in keywords if keyword in text_lower)
        confidence += min(0.2, subject_matches * 0.05)
        
        return min(1.0, confidence)

# Test query analysis module
query_analysis = QueryAnalysisModule()

test_queries = [
    "What is 5 + 3?",
    "Explain photosynthesis",
    "Compare mitosis and meiosis and analyze their roles in genetic diversity"
]

print("🧠 Testing Query Analysis Module:")
for query_text in test_queries:
    query = query_analysis.process(query_text)
    print(f"   '{query_text}'")
    print(f"   → Type: {query.query_type.value}, Complexity: {query.complexity.value}, Subject: {query.subject.value}")

print("\n✅ Query Analysis Module ready!")

🧠 Testing Query Analysis Module:
   'What is 5 + 3?'
   → Type: definition, Complexity: simple, Subject: general
   'Explain photosynthesis'
   → Type: explanation, Complexity: moderate, Subject: general
   'Compare mitosis and meiosis and analyze their roles in genetic diversity'
   → Type: comparative, Complexity: complex, Subject: general

✅ Query Analysis Module ready!


## 🚫 No-Retrieval Handler

Direct answer generation for simple queries:

In [7]:
class NoRetrievalHandler(BaseAdaptiveModule):
    def __init__(self):
        super().__init__("NoRetrievalHandler")
        
        # Basic math operations
        self.math_operations = {
            '+': lambda x, y: x + y,
            '-': lambda x, y: x - y,
            '*': lambda x, y: x * y,
            '/': lambda x, y: x / y if y != 0 else "Error: Division by zero",
            '**': lambda x, y: x ** y,
            '%': lambda x, y: x % y if y != 0 else "Error: Division by zero"
        }
        
        # Simple facts database
        self.simple_facts = {
            'pi': 3.14159,
            'e': 2.71828,
            'speed of light': '299,792,458 m/s',
            'gravity': '9.8 m/s²',
            'absolute zero': '-273.15°C or 0 Kelvin'
        }
    
    def process(self, query: Query) -> Tuple[str, float]:
        self.update_stats(AdaptiveStrategy.NO_RETRIEVAL)
        
        text_lower = query.text.lower()
        
        # Handle basic math
        if query.query_type == QueryType.FACTUAL and query.subject == Subject.MATHEMATICS:
            math_result = self._handle_basic_math(query.text)
            if math_result:
                return math_result, 0.95
        
        # Handle simple facts
        fact_result = self._handle_simple_facts(text_lower)
        if fact_result:
            return fact_result, 0.9
        
        # Handle basic definitions
        definition_result = self._handle_basic_definitions(text_lower)
        if definition_result:
            return definition_result, 0.8
        
        # Fallback
        return "I can answer this directly, but I need more specific information. Could you rephrase your question?", 0.3
    
    def _handle_basic_math(self, text: str) -> Optional[str]:
        # Extract simple math expressions
        patterns = [
            r'(\d+(?:\.\d+)?)\s*([+\-*/])\s*(\d+(?:\.\d+)?)',
            r'what is (\d+(?:\.\d+)?)\s*([+\-*/])\s*(\d+(?:\.\d+)?)',
            r'(\d+(?:\.\d+)?)\s*(\+|plus|add)\s*(\d+(?:\.\d+)?)',
            r'(\d+(?:\.\d+)?)\s*(\-|minus|subtract)\s*(\d+(?:\.\d+)?)',
            r'(\d+(?:\.\d+)?)\s*(\*|×|times|multiply)\s*(\d+(?:\.\d+)?)',
            r'(\d+(?:\.\d+)?)\s*(\/|÷|divided by)\s*(\d+(?:\.\d+)?)'
        ]
        
        for pattern in patterns:
            match = re.search(pattern, text.lower())
            if match:
                try:
                    num1 = float(match.group(1))
                    operator = match.group(2)
                    num2 = float(match.group(3))
                    
                    # Normalize operator
                    op_map = {
                        'plus': '+', 'add': '+',
                        'minus': '-', 'subtract': '-',
                        'times': '*', 'multiply': '*', '×': '*',
                        'divided by': '/', '÷': '/'
                    }
                    operator = op_map.get(operator, operator)
                    
                    if operator in self.math_operations:
                        result = self.math_operations[operator](num1, num2)
                        if isinstance(result, str):  # Error case
                            return result
                        
                        # Format result nicely
                        if result == int(result):
                            return f"**{num1} {operator} {num2} = {int(result)}**"
                        else:
                            return f"**{num1} {operator} {num2} = {result:.4f}**"
                            
                except (ValueError, ZeroDivisionError):
                    continue
        
        return None
    
    def _handle_simple_facts(self, text: str) -> Optional[str]:
        for fact, value in self.simple_facts.items():
            if fact in text:
                return f"**{fact.title()}**: {value}"
        return None
    
    def _handle_basic_definitions(self, text: str) -> Optional[str]:
        basic_definitions = {
            'photosynthesis': "Photosynthesis is the process by which plants use sunlight to make food from carbon dioxide and water, producing oxygen as a byproduct.",
            'gravity': "Gravity is the force that attracts objects toward each other, most noticeably pulling objects toward the Earth.",
            'mitosis': "Mitosis is cell division that produces two identical cells from one parent cell.",
            'democracy': "Democracy is a system of government where people have the power to choose their representatives through voting."
        }
        
        for term, definition in basic_definitions.items():
            if term in text and ('what is' in text or 'define' in text):
                return f"**{term.title()}**: {definition}"
        
        return None

# Test no-retrieval handler
no_retrieval = NoRetrievalHandler()

test_simple_queries = [
    Query("1", "What is 15 + 27?", QueryType.FACTUAL, QueryComplexity.SIMPLE, Subject.MATHEMATICS),
    Query("2", "What is gravity?", QueryType.DEFINITION, QueryComplexity.SIMPLE, Subject.SCIENCE)
]

print("🚫 Testing No-Retrieval Handler:")
for query in test_simple_queries:
    answer, confidence = no_retrieval.process(query)
    print(f"   '{query.text}' → {answer} (Confidence: {confidence:.2f})")

print("\n✅ No-Retrieval Handler ready!")

🚫 Testing No-Retrieval Handler:
   'What is 15 + 27?' → **15.0 + 27.0 = 42** (Confidence: 0.95)
   'What is gravity?' → **Gravity**: 9.8 m/s² (Confidence: 0.90)

✅ No-Retrieval Handler ready!


## 🔍 Single-Step Retrieval Module

Standard RAG for moderate complexity queries:

In [8]:
class SingleStepRetrievalModule(BaseAdaptiveModule):
    def __init__(self):
        super().__init__("SingleStepRetrievalModule")
        
        # Initialize components
        self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
        self.documents = []
        self.semantic_index = None
        self.bm25_index = None
        
        print("🔍 Single-step retrieval module initialized")
    
    def index_documents(self, documents: List[Dict]):
        print(f"📚 Indexing {len(documents)} documents...")
        
        # Convert to Document objects
        self.documents = [
            Document(
                id=doc['id'],
                title=doc['title'],
                content=doc['content'],
                subject=Subject(doc['subject']),
                difficulty_level=doc['difficulty_level'],
                keywords=doc.get('keywords', [])
            )
            for doc in documents
        ]
        
        # Build semantic index
        self._build_semantic_index()
        
        # Build keyword index
        self._build_keyword_index()
        
        print("✅ Documents indexed successfully!")
    
    def _build_semantic_index(self):
        doc_texts = [f"{doc.title} {doc.content}" for doc in self.documents]
        embeddings = self.embedding_model.encode(doc_texts)
        
        # Store embeddings
        for doc, embedding in zip(self.documents, embeddings):
            doc.embedding = embedding
        
        # Create FAISS index
        dimension = embeddings.shape[1]
        self.semantic_index = faiss.IndexFlatIP(dimension)
        faiss.normalize_L2(embeddings)
        self.semantic_index.add(embeddings.astype('float32'))
    
    def _build_keyword_index(self):
        doc_texts = [f"{doc.title} {doc.content}" for doc in self.documents]
        tokenized_docs = [text.lower().split() for text in doc_texts]
        self.bm25_index = BM25Okapi(tokenized_docs)
    
    def process(self, query: Query, top_k: int = 3) -> List[RetrievalResult]:
        self.update_stats(AdaptiveStrategy.SINGLE_STEP)
        
        # Filter by subject if specific
        relevant_docs = self._filter_by_subject(query.subject)
        
        if not relevant_docs:
            relevant_docs = list(range(len(self.documents)))
        
        # Perform hybrid retrieval
        semantic_results = self._semantic_search(query, top_k * 2, relevant_docs)
        keyword_results = self._keyword_search(query, top_k * 2, relevant_docs)
        
        # Combine results
        combined_results = self._combine_results(semantic_results, keyword_results, top_k)
        
        return combined_results
    
    def _filter_by_subject(self, subject: Subject) -> List[int]:
        if subject == Subject.GENERAL:
            return list(range(len(self.documents)))
        
        return [i for i, doc in enumerate(self.documents) if doc.subject == subject]
    
    def _semantic_search(self, query: Query, top_k: int, doc_indices: List[int]) -> List[RetrievalResult]:
        query_embedding = self.embedding_model.encode([query.text])
        faiss.normalize_L2(query_embedding)
        
        # Get all scores and filter by indices
        scores, indices = self.semantic_index.search(query_embedding.astype('float32'), len(self.documents))
        
        filtered_results = []
        for score, idx in zip(scores[0], indices[0]):
            if idx in doc_indices:
                filtered_results.append((score, idx))
                if len(filtered_results) >= top_k:
                    break
        
        results = []
        for i, (score, idx) in enumerate(filtered_results):
            results.append(RetrievalResult(
                document=self.documents[idx],
                score=float(score),
                rank=i + 1,
                retrieval_step=1
            ))
        
        return results
    
    def _keyword_search(self, query: Query, top_k: int, doc_indices: List[int]) -> List[RetrievalResult]:
        query_tokens = query.text.lower().split()
        scores = self.bm25_index.get_scores(query_tokens)
        
        # Filter and sort
        filtered_scores = [(scores[i], i) for i in doc_indices]
        filtered_scores.sort(key=lambda x: x[0], reverse=True)
        
        results = []
        for i, (score, idx) in enumerate(filtered_scores[:top_k]):
            results.append(RetrievalResult(
                document=self.documents[idx],
                score=float(score),
                rank=i + 1,
                retrieval_step=1
            ))
        
        return results
    
    def _combine_results(self, semantic_results: List[RetrievalResult], 
                        keyword_results: List[RetrievalResult], top_k: int) -> List[RetrievalResult]:
        # Normalize scores
        self._normalize_scores(semantic_results)
        self._normalize_scores(keyword_results)
        
        # Combine with weights
        combined_scores = {}
        semantic_weight = 0.7
        keyword_weight = 0.3
        
        for result in semantic_results:
            doc_id = result.document.id
            combined_scores[doc_id] = {
                'document': result.document,
                'semantic_score': result.score,
                'keyword_score': 0.0
            }
        
        for result in keyword_results:
            doc_id = result.document.id
            if doc_id in combined_scores:
                combined_scores[doc_id]['keyword_score'] = result.score
            else:
                combined_scores[doc_id] = {
                    'document': result.document,
                    'semantic_score': 0.0,
                    'keyword_score': result.score
                }
        
        # Calculate final scores
        final_results = []
        for doc_id, scores in combined_scores.items():
            final_score = (semantic_weight * scores['semantic_score'] + 
                          keyword_weight * scores['keyword_score'])
            
            final_results.append(RetrievalResult(
                document=scores['document'],
                score=final_score,
                rank=0,
                retrieval_step=1
            ))
        
        # Sort and assign ranks
        final_results.sort(key=lambda x: x.score, reverse=True)
        for i, result in enumerate(final_results[:top_k]):
            result.rank = i + 1
        
        return final_results[:top_k]
    
    def _normalize_scores(self, results: List[RetrievalResult]):
        if not results:
            return
        
        scores = [result.score for result in results]
        min_score, max_score = min(scores), max(scores)
        
        if max_score > min_score:
            for result in results:
                result.score = (result.score - min_score) / (max_score - min_score)

# Initialize single-step retrieval module
single_step = SingleStepRetrievalModule()
single_step.index_documents(educational_knowledge)

print("✅ Single-Step Retrieval Module ready!")

🔍 Single-step retrieval module initialized
📚 Indexing 9 documents...
✅ Documents indexed successfully!
✅ Single-Step Retrieval Module ready!


## 🔄 Multi-Step Retrieval Module

Complex retrieval with iterative refinement:

In [9]:
class MultiStepRetrievalModule(BaseAdaptiveModule):
    def __init__(self, single_step_module: SingleStepRetrievalModule):
        super().__init__("MultiStepRetrievalModule")
        self.single_step = single_step_module
        self.max_steps = 3
    
    def process(self, query: Query) -> Tuple[List[RetrievalResult], List[str]]:
        self.update_stats(AdaptiveStrategy.MULTI_STEP)
        
        reasoning_chain = []
        all_results = []
        
        # Step 1: Initial retrieval
        reasoning_chain.append(f"Step 1: Initial retrieval for '{query.text}'")
        initial_results = self.single_step.process(query, top_k=3)
        all_results.extend(initial_results)
        
        if not initial_results:
            reasoning_chain.append("No relevant documents found")
            return [], reasoning_chain
        
        # Step 2: Extract key concepts and expand search
        key_concepts = self._extract_key_concepts(query, initial_results)
        reasoning_chain.append(f"Step 2: Identified key concepts: {', '.join(key_concepts)}")
        
        for concept in key_concepts[:2]:  # Limit to avoid too many queries
            concept_query = Query(
                id=f"{query.id}_concept",
                text=concept,
                query_type=QueryType.DEFINITION,
                complexity=QueryComplexity.MODERATE,
                subject=query.subject
            )
            
            concept_results = self.single_step.process(concept_query, top_k=2)
            for result in concept_results:
                result.retrieval_step = 2
            all_results.extend(concept_results)
            reasoning_chain.append(f"Step 2.{len(reasoning_chain)-1}: Retrieved {len(concept_results)} docs for '{concept}'")
        
        # Step 3: Cross-reference and validate
        if query.query_type in [QueryType.COMPARATIVE, QueryType.ANALYTICAL]:
            cross_ref_results = self._cross_reference_search(query, all_results)
            all_results.extend(cross_ref_results)
            reasoning_chain.append(f"Step 3: Cross-referenced topics, found {len(cross_ref_results)} additional docs")
        
        # Deduplicate and rank
        final_results = self._deduplicate_and_rank(all_results)
        reasoning_chain.append(f"Final: Consolidated to {len(final_results)} unique documents")
        
        return final_results, reasoning_chain
    
    def _extract_key_concepts(self, query: Query, results: List[RetrievalResult]) -> List[str]:
        # Extract important terms from query and top documents
        concepts = set()
        
        # From query
        query_words = [word.lower() for word in query.text.split() 
                      if len(word) > 3 and word.lower() not in ['what', 'how', 'why', 'when', 'where', 'compare', 'analyze']]
        concepts.update(query_words)
        
        # From top documents
        for result in results[:2]:  # Top 2 documents
            doc_keywords = result.document.keywords
            concepts.update([kw.lower() for kw in doc_keywords])
        
        return list(concepts)[:5]  # Limit to 5 concepts
    
    def _cross_reference_search(self, query: Query, existing_results: List[RetrievalResult]) -> List[RetrievalResult]:
        # For comparative/analytical queries, search for related topics
        cross_ref_results = []
        
        if query.query_type == QueryType.COMPARATIVE:
            # Look for comparison terms
            comparison_terms = ['difference', 'similarity', 'versus', 'contrast']
            for term in comparison_terms:
                term_query = Query(
                    id=f"{query.id}_cross",
                    text=f"{term} {query.text}",
                    query_type=QueryType.EXPLANATION,
                    complexity=QueryComplexity.MODERATE,
                    subject=query.subject
                )
                
                results = self.single_step.process(term_query, top_k=1)
                for result in results:
                    result.retrieval_step = 3
                cross_ref_results.extend(results)
        
        return cross_ref_results
    
    def _deduplicate_and_rank(self, results: List[RetrievalResult]) -> List[RetrievalResult]:
        # Remove duplicates by document ID
        seen_docs = set()
        unique_results = []
        
        for result in results:
            if result.document.id not in seen_docs:
                seen_docs.add(result.document.id)
                unique_results.append(result)
        
        # Re-rank by score and step
        unique_results.sort(key=lambda x: (x.retrieval_step, -x.score))
        
        # Assign new ranks
        for i, result in enumerate(unique_results[:7]):  # Top 7 for complex queries
            result.rank = i + 1
        
        return unique_results[:7]

# Initialize multi-step retrieval module
multi_step = MultiStepRetrievalModule(single_step)

print("✅ Multi-Step Retrieval Module ready!")

✅ Multi-Step Retrieval Module ready!


## 🤖 Adaptive Generation Module

Context-aware answer generation for different strategies:

In [10]:
class AdaptiveGenerationModule(BaseAdaptiveModule):
    def __init__(self):
        super().__init__("AdaptiveGenerationModule")
        
        # Try to initialize Gemini
        api_key = os.getenv('GEMINI_API_KEY')
        if api_key:
            try:
                genai.configure(api_key=api_key)
                self.model = genai.GenerativeModel('gemini-1.5-flash')
                self.has_llm = True
                print("🤖 Gemini API configured")
            except Exception as e:
                print(f"⚠️ Gemini error: {e}")
                self.has_llm = False
        else:
            print("⚠️ No Gemini API key. Using template generation.")
            self.has_llm = False
    
    def process(self, query: Query, strategy: AdaptiveStrategy, 
               retrieved_docs: List[RetrievalResult] = None, 
               reasoning_chain: List[str] = None,
               direct_answer: str = None) -> Tuple[str, float]:
        self.update_stats(strategy)
        
        if strategy == AdaptiveStrategy.NO_RETRIEVAL:
            return self._generate_direct_answer(query, direct_answer)
        elif strategy == AdaptiveStrategy.SINGLE_STEP:
            return self._generate_single_step_answer(query, retrieved_docs)
        else:  # MULTI_STEP
            return self._generate_multi_step_answer(query, retrieved_docs, reasoning_chain)
    
    def _generate_direct_answer(self, query: Query, direct_answer: str) -> Tuple[str, float]:
        if direct_answer and "Error" not in direct_answer:
            return direct_answer, 0.95
        
        # Educational explanation for direct answers
        if query.subject == Subject.MATHEMATICS:
            return self._generate_math_explanation(query)
        
        return "I can provide a direct answer, but I need more specific information.", 0.5
    
    def _generate_single_step_answer(self, query: Query, retrieved_docs: List[RetrievalResult]) -> Tuple[str, float]:
        if not retrieved_docs:
            return "I couldn't find relevant information to answer your question.", 0.2
        
        if self.has_llm:
            return self._generate_with_llm_single(query, retrieved_docs)
        else:
            return self._generate_template_single(query, retrieved_docs)
    
    def _generate_multi_step_answer(self, query: Query, retrieved_docs: List[RetrievalResult], 
                                   reasoning_chain: List[str]) -> Tuple[str, float]:
        if not retrieved_docs:
            return "I couldn't find enough information for a comprehensive analysis.", 0.2
        
        if self.has_llm:
            return self._generate_with_llm_multi(query, retrieved_docs, reasoning_chain)
        else:
            return self._generate_template_multi(query, retrieved_docs, reasoning_chain)
    
    def _generate_math_explanation(self, query: Query) -> Tuple[str, float]:
        # Extract numbers and operation from query
        text = query.text.lower()
        
        if "area" in text and "circle" in text:
            return "To find the area of a circle, use the formula: **Area = πr²**, where r is the radius. π ≈ 3.14159.", 0.9
        elif "perimeter" in text or "circumference" in text:
            return "To find the circumference of a circle, use: **Circumference = 2πr** or **πd**, where r is radius and d is diameter.", 0.9
        
        return "I can help with basic math calculations and geometry formulas.", 0.6
    
    def _generate_with_llm_single(self, query: Query, retrieved_docs: List[RetrievalResult]) -> Tuple[str, float]:
        # Prepare context
        context = "\n\n".join([f"**{doc.document.title}**: {doc.document.content}" 
                              for doc in retrieved_docs[:3]])
        
        prompt = f"""You are an educational tutor expert in {query.subject.value}. 
Answer the student's question using the provided educational content.

Context:
{context}

Student Question: {query.text}
Question Type: {query.query_type.value}

Provide a clear, educational answer that helps the student understand the concept.
Use examples when helpful and explain step-by-step for complex topics.

Answer:"""
        
        try:
            response = self.model.generate_content(prompt)
            return response.text, 0.9
        except Exception as e:
            return f"Error generating response: {str(e)}", 0.0
    
    def _generate_with_llm_multi(self, query: Query, retrieved_docs: List[RetrievalResult], 
                                reasoning_chain: List[str]) -> Tuple[str, float]:
        # Group documents by retrieval step
        step_docs = {}
        for doc in retrieved_docs:
            step = doc.retrieval_step
            if step not in step_docs:
                step_docs[step] = []
            step_docs[step].append(doc)
        
        context_parts = []
        for step in sorted(step_docs.keys()):
            context_parts.append(f"**Step {step} Sources:**")
            for doc in step_docs[step][:2]:  # Limit per step
                context_parts.append(f"- {doc.document.title}: {doc.document.content[:300]}...")
        
        context = "\n".join(context_parts)
        reasoning = "\n".join([f"- {step}" for step in reasoning_chain])
        
        prompt = f"""You are an educational tutor providing a comprehensive analysis for {query.subject.value}.
This is a complex question requiring multi-step reasoning.

Research Process:
{reasoning}

Educational Sources:
{context}

Complex Question: {query.text}
Question Type: {query.query_type.value}

Provide a comprehensive, well-structured answer that:
1. Addresses all aspects of the question
2. Uses multiple sources to build the complete picture
3. Explains connections between concepts
4. Provides educational insights and analysis

Answer:"""
        
        try:
            response = self.model.generate_content(prompt)
            return response.text, 0.95
        except Exception as e:
            return f"Error generating response: {str(e)}", 0.0
    
    def _generate_template_single(self, query: Query, retrieved_docs: List[RetrievalResult]) -> Tuple[str, float]:
        top_doc = retrieved_docs[0].document
        
        if query.query_type == QueryType.DEFINITION:
            answer = f"**{query.text.replace('What is', '').replace('Define', '').strip().title()}**\n\n"
            answer += f"{top_doc.content}"
            
            if len(retrieved_docs) > 1:
                answer += f"\n\n**Additional Information:** {retrieved_docs[1].document.content[:150]}..."
        
        elif query.query_type == QueryType.EXPLANATION:
            answer = f"**Explanation: {query.text}**\n\n"
            answer += f"{top_doc.content}\n\n"
            answer += "This is a fundamental concept that's important to understand for further learning."
        
        else:
            answer = f"Based on educational resources about {query.subject.value}:\n\n"
            answer += f"**{top_doc.title}**: {top_doc.content}"
        
        confidence = min(0.8, 0.5 + (len(retrieved_docs) * 0.1))
        return answer, confidence
    
    def _generate_template_multi(self, query: Query, retrieved_docs: List[RetrievalResult], 
                                reasoning_chain: List[str]) -> Tuple[str, float]:
        answer = f"**Comprehensive Analysis: {query.text}**\n\n"
        
        # Add reasoning process
        answer += "**Research Process:**\n"
        for step in reasoning_chain[-3:]:  # Last 3 steps
            answer += f"• {step}\n"
        
        answer += "\n**Analysis:**\n\n"
        
        # Group by retrieval step
        step_docs = {}
        for doc in retrieved_docs:
            step = doc.retrieval_step
            if step not in step_docs:
                step_docs[step] = []
            step_docs[step].append(doc)
        
        for step in sorted(step_docs.keys()):
            if step == 1:
                answer += "**Primary Analysis:**\n"
            elif step == 2:
                answer += "**\nSupporting Concepts:**\n"
            else:
                answer += "\n**Cross-Referenced Information:**\n"
            
            for doc in step_docs[step][:2]:  # Top 2 per step
                answer += f"• **{doc.document.title}**: {doc.document.content[:200]}...\n"
        
        answer += "\n**Conclusion:** This multi-faceted analysis provides a comprehensive understanding of the topic from multiple perspectives."
        
        confidence = min(0.9, 0.6 + (len(retrieved_docs) * 0.05))
        return answer, confidence

# Initialize adaptive generation module
adaptive_generation = AdaptiveGenerationModule()
print("✅ Adaptive Generation Module ready!")

🤖 Gemini API configured
✅ Adaptive Generation Module ready!


## 🚦 Adaptive Router

Intelligent strategy selection based on query analysis:

In [11]:
class AdaptiveRouter(BaseAdaptiveModule):
    def __init__(self):
        super().__init__("AdaptiveRouter")
        
        # Strategy mapping based on complexity
        self.complexity_strategy_map = {
            QueryComplexity.SIMPLE: AdaptiveStrategy.NO_RETRIEVAL,
            QueryComplexity.MODERATE: AdaptiveStrategy.SINGLE_STEP,
            QueryComplexity.COMPLEX: AdaptiveStrategy.MULTI_STEP
        }
        
        # Override rules for specific query types
        self.query_type_overrides = {
            QueryType.CALCULATION: AdaptiveStrategy.NO_RETRIEVAL,  # Most calculations can be done directly
            QueryType.FACTUAL: AdaptiveStrategy.NO_RETRIEVAL,     # Simple facts don't need retrieval
            QueryType.COMPARATIVE: AdaptiveStrategy.MULTI_STEP,   # Comparisons benefit from multi-step
            QueryType.ANALYTICAL: AdaptiveStrategy.MULTI_STEP     # Analysis needs comprehensive info
        }
        
        # Subject-specific adjustments
        self.subject_adjustments = {
            Subject.MATHEMATICS: -1,  # Math often doesn't need retrieval
            Subject.SCIENCE: 0,       # Science varies
            Subject.HISTORY: 1,       # History often needs context
            Subject.LITERATURE: 1     # Literature needs context
        }
    
    def process(self, query: Query) -> Dict[str, Any]:
        self.update_stats()
        
        # Start with complexity-based strategy
        strategy = self.complexity_strategy_map[query.complexity]
        
        # Apply query type overrides
        if query.query_type in self.query_type_overrides:
            override_strategy = self.query_type_overrides[query.query_type]
            
            # But respect complexity for certain overrides
            if query.complexity == QueryComplexity.COMPLEX and override_strategy == AdaptiveStrategy.NO_RETRIEVAL:
                strategy = AdaptiveStrategy.SINGLE_STEP  # Compromise
            else:
                strategy = override_strategy
        
        # Apply subject adjustments
        if query.subject in self.subject_adjustments:
            adjustment = self.subject_adjustments[query.subject]
            strategy = self._adjust_strategy(strategy, adjustment)
        
        # Special cases
        strategy = self._apply_special_rules(query, strategy)
        
        # Calculate confidence and estimated time
        confidence = self._calculate_routing_confidence(query, strategy)
        estimated_time = self._estimate_processing_time(strategy)
        
        return {
            'strategy': strategy,
            'confidence': confidence,
            'estimated_time': estimated_time,
            'reasoning': self._explain_routing_decision(query, strategy)
        }
    
    def _adjust_strategy(self, strategy: AdaptiveStrategy, adjustment: int) -> AdaptiveStrategy:
        strategies = [AdaptiveStrategy.NO_RETRIEVAL, AdaptiveStrategy.SINGLE_STEP, AdaptiveStrategy.MULTI_STEP]
        current_idx = strategies.index(strategy)
        new_idx = max(0, min(len(strategies) - 1, current_idx + adjustment))
        return strategies[new_idx]
    
    def _apply_special_rules(self, query: Query, strategy: AdaptiveStrategy) -> AdaptiveStrategy:
        text_lower = query.text.lower()
        
        # Force no retrieval for very simple math
        if re.search(r'^\s*\d+\s*[+\-*/]\s*\d+\s*[=?]?\s*$', text_lower):
            return AdaptiveStrategy.NO_RETRIEVAL
        
        # Force multi-step for complex comparisons
        comparison_words = ['compare', 'contrast', 'analyze differences', 'evaluate']
        if any(word in text_lower for word in comparison_words) and len(query.text.split()) > 8:
            return AdaptiveStrategy.MULTI_STEP
        
        # Force single-step for definitions if not already no-retrieval
        if query.query_type == QueryType.DEFINITION and strategy != AdaptiveStrategy.NO_RETRIEVAL:
            return AdaptiveStrategy.SINGLE_STEP
        
        return strategy
    
    def _calculate_routing_confidence(self, query: Query, strategy: AdaptiveStrategy) -> float:
        confidence = 0.7  # Base confidence
        
        # Higher confidence for clear patterns
        if query.confidence_score > 0.8:
            confidence += 0.2
        
        # Strategy-specific adjustments
        if strategy == AdaptiveStrategy.NO_RETRIEVAL and query.complexity == QueryComplexity.SIMPLE:
            confidence += 0.2
        elif strategy == AdaptiveStrategy.MULTI_STEP and query.complexity == QueryComplexity.COMPLEX:
            confidence += 0.1
        
        return min(1.0, confidence)
    
    def _estimate_processing_time(self, strategy: AdaptiveStrategy) -> float:
        time_estimates = {
            AdaptiveStrategy.NO_RETRIEVAL: 0.5,
            AdaptiveStrategy.SINGLE_STEP: 2.0,
            AdaptiveStrategy.MULTI_STEP: 4.5
        }
        return time_estimates[strategy]
    
    def _explain_routing_decision(self, query: Query, strategy: AdaptiveStrategy) -> str:
        explanations = {
            AdaptiveStrategy.NO_RETRIEVAL: f"Direct answer suitable for {query.complexity.value} {query.query_type.value} question",
            AdaptiveStrategy.SINGLE_STEP: f"Standard retrieval for {query.complexity.value} {query.query_type.value} about {query.subject.value}",
            AdaptiveStrategy.MULTI_STEP: f"Multi-step analysis needed for {query.complexity.value} {query.query_type.value} question"
        }
        return explanations[strategy]

# Test adaptive router
adaptive_router = AdaptiveRouter()

test_routing_queries = [
    Query("1", "What is 8 * 7?", QueryType.CALCULATION, QueryComplexity.SIMPLE, Subject.MATHEMATICS, 0.9),
    Query("2", "Explain photosynthesis", QueryType.EXPLANATION, QueryComplexity.MODERATE, Subject.SCIENCE, 0.8),
    Query("3", "Compare and analyze the causes and effects of mitosis versus meiosis in genetic diversity", 
          QueryType.COMPARATIVE, QueryComplexity.COMPLEX, Subject.SCIENCE, 0.85)
]

print("🚦 Testing Adaptive Router:")
for query in test_routing_queries:
    routing = adaptive_router.process(query)
    print(f"   '{query.text[:50]}...'")
    print(f"   → Strategy: {routing['strategy'].value}, Time: {routing['estimated_time']:.1f}s")
    print(f"   → Reasoning: {routing['reasoning']}")
    print()

print("✅ Adaptive Router ready!")

🚦 Testing Adaptive Router:
   'What is 8 * 7?...'
   → Strategy: no_retrieval, Time: 0.5s
   → Reasoning: Direct answer suitable for simple calculation question

   'Explain photosynthesis...'
   → Strategy: single_step, Time: 2.0s
   → Reasoning: Standard retrieval for moderate explanation about science

   'Compare and analyze the causes and effects of mito...'
   → Strategy: multi_step, Time: 4.5s
   → Reasoning: Multi-step analysis needed for complex comparative question

✅ Adaptive Router ready!


## 🎓 Complete Adaptive RAG System

Educational tutor with intelligent strategy selection:

In [12]:
class AdaptiveRAGSystem:
    def __init__(self):
        print("🎓 Initializing Adaptive RAG Educational Tutor...")
        
        # Initialize all modules
        self.query_analysis = query_analysis
        self.adaptive_router = adaptive_router
        self.no_retrieval = no_retrieval
        self.single_step = single_step
        self.multi_step = multi_step
        self.generation = adaptive_generation
        
        # System metrics
        self.total_queries = 0
        self.strategy_usage = {strategy: 0 for strategy in AdaptiveStrategy}
        self.avg_processing_time = 0.0
        self.student_sessions = {}
        
        print("✅ Adaptive RAG Educational Tutor initialized!")
        print(f"🧠 Modules: Query Analysis, Router, No-Retrieval, Single-Step, Multi-Step, Generation")
    
    def tutor_response(self, question: str, student_id: str = None) -> AdaptiveResponse:
        start_time = time.time()
        
        try:
            print(f"\n🎓 Student Question: '{question}'")
            print("=" * 60)
            
            # Step 1: Analyze the question
            print("🧠 Step 1: Analyzing question...")
            query = self.query_analysis.process(question)
            query.user_id = student_id
            
            print(f"   📊 Analysis: {query.query_type.value} | {query.complexity.value} | {query.subject.value}")
            print(f"   🎯 Confidence: {query.confidence_score:.2f}")
            
            # Step 2: Route to appropriate strategy
            print("🚦 Step 2: Selecting teaching strategy...")
            routing = self.adaptive_router.process(query)
            strategy = routing['strategy']
            
            print(f"   📋 Strategy: {strategy.value}")
            print(f"   ⏱️ Estimated time: {routing['estimated_time']:.1f}s")
            print(f"   💭 Reasoning: {routing['reasoning']}")
            
            # Step 3: Execute strategy
            retrieved_docs = []
            reasoning_chain = []
            direct_answer = None
            processing_steps = ['analysis', 'routing']
            
            if strategy == AdaptiveStrategy.NO_RETRIEVAL:
                print("🚫 Step 3: Generating direct answer...")
                direct_answer, confidence = self.no_retrieval.process(query)
                processing_steps.append('direct_generation')
                
            elif strategy == AdaptiveStrategy.SINGLE_STEP:
                print("🔍 Step 3: Single-step retrieval...")
                retrieved_docs = self.single_step.process(query, top_k=3)
                print(f"   📚 Retrieved {len(retrieved_docs)} documents")
                processing_steps.extend(['single_retrieval', 'generation'])
                
            else:  # MULTI_STEP
                print("🔄 Step 3: Multi-step retrieval and analysis...")
                retrieved_docs, reasoning_chain = self.multi_step.process(query)
                print(f"   📚 Retrieved {len(retrieved_docs)} documents across {len(reasoning_chain)} steps")
                print(f"   🧠 Reasoning steps: {len(reasoning_chain)}")
                processing_steps.extend(['multi_retrieval', 'analysis', 'generation'])
            
            # Step 4: Generate educational response
            print("🤖 Step 4: Generating educational response...")
            answer, confidence = self.generation.process(
                query, strategy, retrieved_docs, reasoning_chain, direct_answer
            )
            
            print(f"   ✨ Generated response (Confidence: {confidence:.2f})")
            
            # Create response
            processing_time = time.time() - start_time
            
            response = AdaptiveResponse(
                query=query,
                strategy_used=strategy,
                retrieved_documents=retrieved_docs,
                generated_answer=answer,
                confidence_score=confidence,
                processing_steps=processing_steps,
                processing_time=processing_time,
                reasoning_chain=reasoning_chain
            )
            
            # Update metrics
            self._update_metrics(response, student_id)
            
            print(f"\n✅ Response generated in {processing_time:.2f}s")
            return response
            
        except Exception as e:
            print(f"❌ Error: {str(e)}")
            error_query = Query("error", question, QueryType.FACTUAL, QueryComplexity.SIMPLE, Subject.GENERAL, 0.0, student_id)
            return AdaptiveResponse(
                query=error_query,
                strategy_used=AdaptiveStrategy.NO_RETRIEVAL,
                retrieved_documents=[],
                generated_answer=f"I encountered an error while processing your question: {str(e)}",
                confidence_score=0.0,
                processing_steps=['error'],
                processing_time=time.time() - start_time
            )
    
    def _update_metrics(self, response: AdaptiveResponse, student_id: str):
        self.total_queries += 1
        self.strategy_usage[response.strategy_used] += 1
        
        # Update running average
        self.avg_processing_time = ((self.avg_processing_time * (self.total_queries - 1)) + 
                                   response.processing_time) / self.total_queries
        
        # Track student sessions
        if student_id:
            if student_id not in self.student_sessions:
                self.student_sessions[student_id] = {
                    'questions': [],
                    'subjects': {},
                    'strategies_used': {strategy: 0 for strategy in AdaptiveStrategy},
                    'avg_confidence': 0.0
                }
            
            session = self.student_sessions[student_id]
            session['questions'].append({
                'question': response.query.text,
                'subject': response.query.subject.value,
                'strategy': response.strategy_used.value,
                'confidence': response.confidence_score,
                'timestamp': datetime.now()
            })
            
            # Update subject tracking
            subject = response.query.subject.value
            if subject not in session['subjects']:
                session['subjects'][subject] = 0
            session['subjects'][subject] += 1
            
            # Update strategy usage
            session['strategies_used'][response.strategy_used] += 1
            
            # Update average confidence
            confidences = [q['confidence'] for q in session['questions']]
            session['avg_confidence'] = sum(confidences) / len(confidences)
    
    def get_system_stats(self) -> Dict:
        return {
            'total_queries': self.total_queries,
            'strategy_usage': dict(self.strategy_usage),
            'avg_processing_time': self.avg_processing_time,
            'active_students': len(self.student_sessions),
            'module_stats': {
                'query_analysis': self.query_analysis.get_info(),
                'router': self.adaptive_router.get_info(),
                'no_retrieval': self.no_retrieval.get_info(),
                'single_step': self.single_step.get_info(),
                'multi_step': self.multi_step.get_info(),
                'generation': self.generation.get_info()
            }
        }
    
    def get_student_progress(self, student_id: str) -> Dict:
        if student_id not in self.student_sessions:
            return {"error": "Student not found"}
        
        session = self.student_sessions[student_id]
        return {
            'total_questions': len(session['questions']),
            'subjects_studied': session['subjects'],
            'learning_strategies': session['strategies_used'],
            'avg_confidence': session['avg_confidence'],
            'recent_questions': session['questions'][-5:]  # Last 5 questions
        }

# Initialize complete adaptive system
adaptive_tutor = AdaptiveRAGSystem()
print("\n🚀 Adaptive RAG Educational Tutor ready!")

🎓 Initializing Adaptive RAG Educational Tutor...
✅ Adaptive RAG Educational Tutor initialized!
🧠 Modules: Query Analysis, Router, No-Retrieval, Single-Step, Multi-Step, Generation

🚀 Adaptive RAG Educational Tutor ready!


In [13]:
#!pip install unittest

In [14]:
import unittest
from datetime import datetime
from typing import List, Dict

class TestAdaptiveRAGSystem(unittest.TestCase):
    @classmethod
    def setUpClass(cls):
        """Initialize the system once for all tests"""
        cls.tutor = AdaptiveRAGSystem()
        cls.test_student_id = "test_student_123"
        
        # Sample documents for verification
        cls.documents_by_id = {doc['id']: doc for doc in educational_knowledge}
    
    def test_no_retrieval_scenarios(self):
        """Test simple queries that should be answered directly"""
        print("\n=== Testing No-Retrieval Scenarios ===")
        
        test_cases = [
            ("What is 15 + 27?", "42", "Simple math calculation"),
            ("What is gravity?", "gravity", "Simple definition"),
            ("Calculate 3 * 7", "21", "Direct calculation"),
            ("What is the value of pi?", "3.14159", "Simple fact")
        ]
        
        for question, expected, description in test_cases:
            with self.subTest(description):
                response = self.tutor.tutor_response(question, self.test_student_id)
                
                # Verify strategy
                self.assertEqual(response.strategy_used, AdaptiveStrategy.NO_RETRIEVAL,
                               f"Should use no-retrieval for: {description}")
                
                # Verify answer contains expected content
                self.assertIn(expected, response.generated_answer,
                            f"Answer for '{question}' should contain '{expected}'")
                
                # Verify no documents were retrieved
                self.assertEqual(len(response.retrieved_documents), 0,
                               "No documents should be retrieved for direct answers")
                
                print(f"✓ {description}: {response.strategy_used.value} (Confidence: {response.confidence_score:.2f})")

    def test_single_step_scenarios(self):
        """Test moderate complexity queries needing single retrieval"""
        print("\n=== Testing Single-Step Retrieval Scenarios ===")
        
        test_cases = [
            ("Explain photosynthesis", "photosynthesis", "chloroplast", "Concept explanation"),
            ("What is mitosis?", "mitosis", "cell division", "Definition query"),
            ("Describe Newton's first law", "Newton", "inertia", "Physics concept"),
            ("How does cell division work?", "cell division", "mitosis", "Process explanation")
        ]
        
        for question, term1, term2, description in test_cases:
            with self.subTest(description):
                response = self.tutor.tutor_response(question, self.test_student_id)
                
                # Verify strategy
                self.assertEqual(response.strategy_used, AdaptiveStrategy.SINGLE_STEP,
                               f"Should use single-step for: {description}")
                
                # Verify documents were retrieved
                self.assertGreater(len(response.retrieved_documents), 0,
                                 "Should retrieve documents for single-step")
                
                # Verify answer contains relevant terms
                self.assertIn(term1, response.generated_answer.lower(),
                            f"Answer should mention '{term1}'")
                self.assertIn(term2, response.generated_answer.lower(),
                            f"Answer should mention '{term2}'")
                
                # Verify document relevance
                top_doc = response.retrieved_documents[0].document
                self.assertIn(term1.lower(), top_doc.content.lower() or top_doc.title.lower(),
                            f"Top doc should be about '{term1}'")
                
                print(f"✓ {description}: Retrieved {len(response.retrieved_documents)} docs (Confidence: {response.confidence_score:.2f})")

    def test_multi_step_scenarios(self):
        """Test complex queries needing multi-step retrieval"""
        print("\n=== Testing Multi-Step Retrieval Scenarios ===")
        
        test_cases = [
            ("Compare and contrast mitosis and meiosis", "mitosis", "meiosis", "genetic", "Comparison query"),
            ("Analyze the causes and effects of World War II", "World War II", "causes", "effects", "Historical analysis"),
            ("Explain how photosynthesis and cellular respiration are related", "photosynthesis", "respiration", "energy", "Concept relationship"),
            ("Discuss the similarities and differences between plant and animal cells", "plant", "animal", "cells", "Comparative analysis")
        ]
        
        for question, term1, term2, term3, description in test_cases:
            with self.subTest(description):
                response = self.tutor.tutor_response(question, self.test_student_id)
                
                # Verify strategy
                self.assertEqual(response.strategy_used, AdaptiveStrategy.MULTI_STEP,
                               f"Should use multi-step for: {description}")
                
                # Verify multiple documents were retrieved
                self.assertGreater(len(response.retrieved_documents), 2,
                                 "Should retrieve multiple docs for multi-step")
                
                # Verify reasoning steps
                self.assertGreater(len(response.reasoning_chain), 1,
                                 "Should have multiple reasoning steps")
                
                # Verify answer contains relevant terms
                answer_lower = response.generated_answer.lower()
                self.assertIn(term1.lower(), answer_lower,
                            f"Answer should mention '{term1}'")
                self.assertIn(term2.lower(), answer_lower,
                            f"Answer should mention '{term2}'")
                self.assertIn(term3.lower(), answer_lower,
                            f"Answer should mention '{term3}'")
                
                print(f"✓ {description}: {len(response.reasoning_chain)} steps, {len(response.retrieved_documents)} docs")

    def test_query_analysis(self):
        """Test the query analysis module"""
        print("\n=== Testing Query Analysis ===")
        
        test_cases = [
            ("What is 5 + 3?", QueryType.FACTUAL, QueryComplexity.SIMPLE, Subject.MATHEMATICS),
            ("Explain the process of photosynthesis", QueryType.EXPLANATION, QueryComplexity.MODERATE, Subject.SCIENCE),
            ("Compare mitosis and meiosis", QueryType.COMPARATIVE, QueryComplexity.COMPLEX, Subject.SCIENCE),
            ("Analyze the causes of World War II", QueryType.ANALYTICAL, QueryComplexity.COMPLEX, Subject.HISTORY)
        ]
        
        for question, expected_type, expected_complexity, expected_subject in test_cases:
            with self.subTest(question[:20]):
                query = self.tutor.query_analysis.process(question)
                
                self.assertEqual(query.query_type, expected_type,
                               f"'{question}' should be {expected_type}")
                self.assertEqual(query.complexity, expected_complexity,
                               f"'{question}' complexity should be {expected_complexity}")
                self.assertEqual(query.subject, expected_subject,
                               f"'{question}' subject should be {expected_subject}")
                
                print(f"✓ '{question[:20]}...' → {query.query_type.value}, {query.complexity.value}, {query.subject.value}")

    def test_routing_logic(self):
        """Test the adaptive routing decisions"""
        print("\n=== Testing Routing Logic ===")
        
        test_cases = [
            ("5 + 3", AdaptiveStrategy.NO_RETRIEVAL, "Simple math"),
            ("What is gravity?", AdaptiveStrategy.NO_RETRIEVAL, "Simple fact"),
            ("Explain photosynthesis", AdaptiveStrategy.SINGLE_STEP, "Concept explanation"),
            ("Compare mitosis and meiosis", AdaptiveStrategy.MULTI_STEP, "Complex comparison"),
            ("Analyze the themes in Shakespeare's works", AdaptiveStrategy.MULTI_STEP, "Literary analysis")
        ]
        
        for question, expected_strategy, description in test_cases:
            with self.subTest(description):
                query = self.tutor.query_analysis.process(question)
                routing = self.tutor.adaptive_router.process(query)
                
                self.assertEqual(routing['strategy'], expected_strategy,
                               f"'{question}' should route to {expected_strategy}")
                
                print(f"✓ {description}: {routing['strategy'].value} (Expected: {expected_strategy.value})")

    def test_system_metrics(self):
        """Test that system metrics are being tracked"""
        print("\n=== Testing System Metrics ===")
        
        # Run some test queries to generate metrics
        test_queries = [
            "What is 8 * 7?",
            "Explain photosynthesis",
            "Compare mitosis and meiosis"
        ]
        
        for query in test_queries:
            self.tutor.tutor_response(query, self.test_student_id)
        
        # Check metrics
        stats = self.tutor.get_system_stats()
        
        self.assertGreater(stats['total_queries'], 0, "Should track total queries")
        self.assertGreater(sum(stats['strategy_usage'].values()), 0,
                         "Should track strategy usage")
        
        # Check student progress
        progress = self.tutor.get_student_progress(self.test_student_id)
        self.assertGreater(progress['total_questions'], 0,
                         "Should track student questions")
        self.assertGreater(len(progress['subjects_studied']), 0,
                         "Should track subjects studied")
        
        print("✓ System metrics tracking verified")
        print(f"   Total queries: {stats['total_queries']}")
        print(f"   Strategies used: {stats['strategy_usage']}")

if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False)

🎓 Initializing Adaptive RAG Educational Tutor...
✅ Adaptive RAG Educational Tutor initialized!
🧠 Modules: Query Analysis, Router, No-Retrieval, Single-Step, Multi-Step, Generation

=== Testing Multi-Step Retrieval Scenarios ===

🎓 Student Question: 'Compare and contrast mitosis and meiosis'
🧠 Step 1: Analyzing question...
   📊 Analysis: comparative | complex | general
   🎯 Confidence: 0.80
🚦 Step 2: Selecting teaching strategy...
   📋 Strategy: multi_step
   ⏱️ Estimated time: 4.5s
   💭 Reasoning: Multi-step analysis needed for complex comparative question
🔄 Step 3: Multi-step retrieval and analysis...
   📚 Retrieved 5 documents across 6 steps
   🧠 Reasoning steps: 6
🤖 Step 4: Generating educational response...
   ✨ Generated response (Confidence: 0.95)

✅ Response generated in 6.76s
✓ Comparison query: 6 steps, 5 docs

🎓 Student Question: 'Analyze the causes and effects of World War II'
🧠 Step 1: Analyzing question...
   📊 Analysis: analytical | complex | history
   🎯 Confidence: 0.85

.

   ✨ Generated response (Confidence: 0.95)

✅ Response generated in 5.41s
✓ Comparative analysis: 6 steps, 3 docs

=== Testing No-Retrieval Scenarios ===

🎓 Student Question: 'What is 15 + 27?'
🧠 Step 1: Analyzing question...
   📊 Analysis: definition | simple | general
   🎯 Confidence: 0.80
🚦 Step 2: Selecting teaching strategy...
   📋 Strategy: no_retrieval
   ⏱️ Estimated time: 0.5s
   💭 Reasoning: Direct answer suitable for simple definition question
🚫 Step 3: Generating direct answer...
🤖 Step 4: Generating educational response...
   ✨ Generated response (Confidence: 0.95)

✅ Response generated in 0.00s

🎓 Student Question: 'What is gravity?'
🧠 Step 1: Analyzing question...
   📊 Analysis: definition | simple | general
   🎯 Confidence: 0.80
🚦 Step 2: Selecting teaching strategy...
   📋 Strategy: no_retrieval
   ⏱️ Estimated time: 0.5s
   💭 Reasoning: Direct answer suitable for simple definition question
🚫 Step 3: Generating direct answer...
🤖 Step 4: Generating educational response

.
FAIL: test_multi_step_scenarios (__main__.TestAdaptiveRAGSystem) [Historical analysis]
Test complex queries needing multi-step retrieval
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/ipykernel_203284/288453671.py", line 100, in test_multi_step_scenarios
    self.assertGreater(len(response.retrieved_documents), 2,
AssertionError: 2 not greater than 2 : Should retrieve multiple docs for multi-step

FAIL: test_multi_step_scenarios (__main__.TestAdaptiveRAGSystem) [Concept relationship]
Test complex queries needing multi-step retrieval
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/ipykernel_203284/288453671.py", line 96, in test_multi_step_scenarios
    self.assertEqual(response.strategy_used, AdaptiveStrategy.MULTI_STEP,
AssertionError: <AdaptiveStrategy.SINGLE_STEP: 'single_step'> != <AdaptiveStrategy.MULTI_STEP: 'multi_step'> : Should use mu

   ✨ Generated response (Confidence: 0.95)

✅ Response generated in 5.77s
✓ System metrics tracking verified
   Total queries: 15
   Strategies used: {<AdaptiveStrategy.NO_RETRIEVAL: 'no_retrieval'>: 6, <AdaptiveStrategy.SINGLE_STEP: 'single_step'>: 5, <AdaptiveStrategy.MULTI_STEP: 'multi_step'>: 4}
