# DAT409 | Implement hybrid search with Aurora PostgreSQL for MCP retrieval

## 🚀 Build Production-Ready Hybrid Search with Amazon Aurora PostgreSQL & Amazon Bedrock

---

### 📋 Builder's Session
- **Duration**: 60 minutes
- **Level**: 400 (Expert)
- **Prerequisites**: Aurora PostgreSQL cluster, AWS credentials configured

### 🎯 Your Mission: The Black Friday Playbook

**It's November 1st.** Black Friday is 28 days away. You have engineering logs from 365 days of operations - insights from DBAs, SREs, developers, and data engineers.

**The Challenge**: Different teams describe the same problems differently:
- DBA: "FATAL: remaining connection slots are reserved"
- Developer: "HikariPool-1 - Connection timeout after 30000ms"
- SRE: "CloudWatch Alarm: DatabaseConnections > 990"

**Your Solution**: Build a hybrid search system that understands ALL these perspectives using:
- **Trigram search** (pg_trgm) - Handles typos AND exact matches
- **Semantic search** (pgvector) - Understands concepts
- **ML Reranking** (Cohere) - Optimizes relevance

---

## 📦 Module 1: Environment Setup

Install and import all required packages for hybrid search implementation.

In [None]:
# Environment setup
import warnings
import os
import sys

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')
os.environ['PYTHONWARNINGS'] = 'ignore'

print("📦 Installing required packages...")
%pip install --quiet --upgrade psycopg pgvector boto3 pandas numpy tqdm python-dotenv ipywidgets 2>/dev/null

# Import all libraries
import json
import time
import hashlib
from datetime import datetime, timedelta
from typing import List, Dict, Tuple, Optional
from difflib import SequenceMatcher
import psycopg
from pgvector.psycopg import register_vector
import boto3
import pandas as pd
import numpy as np
from tqdm import tqdm
from dotenv import load_dotenv
import ipywidgets as widgets
from IPython.display import display, HTML

print("✅ All packages installed successfully!")
print("\n🎯 Ready to build hybrid search for Black Friday preparation!")

## 📊 Module 2: Load and Analyze Historical Data

Load incident logs from the past year and understand the data distribution.

In [None]:
# Load and analyze historical incident data
print("📂 Loading Historical Incident Data")
print("=" * 70)

# Load the dataset
try:
    with open('../data/incident_logs.json', 'r') as f:
        incident_logs = json.load(f)
    print("✅ Loaded dataset")
except FileNotFoundError:
    print("❌ Dataset not found at ../data/incident_logs.json")
    print("   Please ensure the incident_logs.json file is in the data directory")
    raise

print(f"✅ Total logs: {len(incident_logs):,}")
print(f"📅 Timespan: 365 days of engineering observations\n")

# Analyze distribution
personas = {}
severities = {}
critical_samples = []

for log in incident_logs:
    persona = log['mcp_metadata']['persona']
    severity = log['mcp_metadata'].get('severity', 'info')
    personas[persona] = personas.get(persona, 0) + 1
    severities[severity] = severities.get(severity, 0) + 1
    
    if severity == 'critical' and len(critical_samples) < 2:
        critical_samples.append((persona, log['content'][:100]))

# Display analysis
print("👥 Engineering Teams (Different Perspectives):")
for persona, count in sorted(personas.items()):
    print(f"  {persona:15} {count:4} logs")

print("\n⚠️ Severity Distribution:")
for severity, count in sorted(severities.items()):
    emoji = {'critical': '🔴', 'warning': '🟡', 'info': '🔵'}.get(severity, '⚪')
    percentage = (count / len(incident_logs)) * 100
    print(f"  {emoji} {severity:8}: {count:4} logs ({percentage:5.1f}%)")

print("\n🔴 Sample Critical Incidents:")
for persona, content in critical_samples:
    print(f"  [{persona}] {content}...")

print("\n💡 Challenge: Each team uses different terminology for the same issue!")

## 🗄️ Module 3: Connect to Aurora PostgreSQL

Establish connection to Aurora PostgreSQL with pgvector support.

In [None]:
# Connect to Aurora PostgreSQL
print("🔌 Connecting to Aurora PostgreSQL")
print("=" * 70)

# Load environment variables
load_dotenv()

# Database configuration
DB_CONFIG = {
    'host': os.getenv('DB_HOST'),
    'dbname': os.getenv('DB_NAME'),
    'user': os.getenv('DB_USER'),
    'password': os.getenv('DB_PASSWORD'),
    'port': int(os.getenv('DB_PORT', '5432'))
}

def get_db_connection():
    """Create a connection to Aurora PostgreSQL with pgvector support"""
    conn = psycopg.connect(**DB_CONFIG, autocommit=True)
    register_vector(conn)
    return conn

# Establish connection
conn = get_db_connection()

# Verify connection and extensions
with conn.cursor() as cur:
    cur.execute("SELECT version();")
    db_version = cur.fetchone()[0]
    
    cur.execute("SELECT extname FROM pg_extension WHERE extname IN ('vector', 'pg_trgm');")
    extensions = [row[0] for row in cur.fetchall()]

print(f"✅ Connected successfully")
print(f"   Version: {db_version.split(',')[0]}")
print(f"   Extensions: {', '.join(extensions) if extensions else 'Will be enabled'}")

## 🏗️ Module 4: Create Schema and Load Data

Create optimized schema for hybrid search and load data with embeddings.

In [None]:
# Create schema
print("🏗️ Setting Up Database Schema")
print("=" * 70)

with conn.cursor() as cur:
    # Enable extensions
    cur.execute("CREATE EXTENSION IF NOT EXISTS vector;")
    cur.execute("CREATE EXTENSION IF NOT EXISTS pg_trgm;")
    print("✅ Extensions enabled: pgvector, pg_trgm")
    
    # Create table
    cur.execute("DROP TABLE IF EXISTS incident_logs CASCADE;")
    cur.execute("""
        CREATE TABLE incident_logs (
            doc_id TEXT PRIMARY KEY,
            content TEXT NOT NULL,
            persona TEXT NOT NULL,
            timestamp TIMESTAMPTZ NOT NULL,
            severity TEXT DEFAULT 'info',
            metrics JSONB,
            content_embedding vector(1024),
            created_at TIMESTAMPTZ DEFAULT NOW()
        );
    """)
    print("✅ Table created with hybrid search capabilities")

## 🧠 Module 5: Generate Embeddings with Amazon Bedrock

Set up Cohere Embed v3 for semantic understanding.

In [None]:
# Initialize Amazon Bedrock
print("🧠 Setting Up Amazon Bedrock with Cohere Models")
print("=" * 70)

# Initialize Bedrock clients
bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-west-2'
)

bedrock_agent_runtime = boto3.client(
    service_name='bedrock-agent-runtime',
    region_name='us-west-2'
)

USING_REAL_EMBEDDINGS = True

def generate_embedding(text: str, input_type: str = 'search_document') -> List[float]:
    """Generate embeddings using Cohere Embed v3"""
    global USING_REAL_EMBEDDINGS
    
    if len(text) > 2048:
        text = text[:2048]
    
    try:
        response = bedrock_runtime.invoke_model(
            modelId='cohere.embed-english-v3',
            contentType='application/json',
            accept='application/json',
            body=json.dumps({
                'texts': [text],
                'input_type': input_type,
                'truncate': 'END'
            })
        )
        response_body = json.loads(response['body'].read())
        USING_REAL_EMBEDDINGS = True
        return response_body['embeddings'][0]
    except Exception as e:
        USING_REAL_EMBEDDINGS = False
        np.random.seed(hash(text) % 2**32)
        return np.random.randn(1024).tolist()

def generate_embeddings_batch(texts: List[str], input_type: str = 'search_document') -> List[List[float]]:
    """Generate embeddings for multiple texts efficiently"""
    global USING_REAL_EMBEDDINGS
    max_batch_size = 96
    all_embeddings = []
    
    for i in range(0, len(texts), max_batch_size):
        batch_texts = texts[i:i+max_batch_size]
        batch_texts = [text[:2048] if len(text) > 2048 else text for text in batch_texts]
        
        try:
            response = bedrock_runtime.invoke_model(
                modelId='cohere.embed-english-v3',
                contentType='application/json',
                accept='application/json',
                body=json.dumps({
                    'texts': batch_texts,
                    'input_type': input_type,
                    'truncate': 'END'
                })
            )
            response_body = json.loads(response['body'].read())
            all_embeddings.extend(response_body['embeddings'])
            USING_REAL_EMBEDDINGS = True
        except Exception as e:
            USING_REAL_EMBEDDINGS = False
            for text in batch_texts:
                np.random.seed(hash(text) % 2**32)
                all_embeddings.append(np.random.randn(1024).tolist())
    
    return all_embeddings

# Test embedding
test_embedding = generate_embedding("test", 'search_query')
print(f"✅ Embeddings ready (1024 dimensions)")
print(f"   {'Using REAL Cohere embeddings' if USING_REAL_EMBEDDINGS else 'Using MOCK embeddings (Bedrock unavailable)'}")

## 💾 Module 6: Load Data with Embeddings

Load incident logs into Aurora with batch embedding generation for efficiency.

In [None]:
# Load data with embeddings
print("💾 Loading Data with Embeddings")
print("=" * 70)

def load_data_optimized(conn, logs: List[Dict], batch_size: int = 50):
    """Load data with batch embedding generation"""
    
    with conn.cursor() as cur:
        cur.execute("TRUNCATE TABLE incident_logs;")
        conn.commit()
        
        stats = {'loaded': 0, 'errors': 0, 'api_calls': 0}
        
        # Generate embeddings in batches
        print(f"Generating embeddings for {len(logs)} logs...")
        all_embeddings = []
        texts_for_embedding = [log['content'] for log in logs]
        
        with tqdm(total=len(texts_for_embedding), desc="Embeddings", unit="text") as pbar:
            for i in range(0, len(texts_for_embedding), 96):
                batch_texts = texts_for_embedding[i:i+96]
                batch_embeddings = generate_embeddings_batch(batch_texts, 'search_document')
                all_embeddings.extend(batch_embeddings)
                stats['api_calls'] += 1
                pbar.update(len(batch_texts))
        
        # Insert data
        print(f"Inserting into database...")
        
        with tqdm(total=len(logs), desc="Inserting", unit="log") as pbar:
            for i in range(0, len(logs), batch_size):
                batch_logs = logs[i:i+batch_size]
                batch_embeddings = all_embeddings[i:i+batch_size]
                
                insert_data = []
                for log, embedding in zip(batch_logs, batch_embeddings):
                    insert_data.append((
                        log['doc_id'],
                        log['content'],
                        log['mcp_metadata']['persona'],
                        log['mcp_metadata']['timestamp'],
                        log['mcp_metadata'].get('severity', 'info'),
                        json.dumps(log['mcp_metadata'].get('metrics', {})),
                        embedding
                    ))
                
                try:
                    cur.executemany("""
                        INSERT INTO incident_logs (
                            doc_id, content, persona, timestamp,
                            severity, metrics, content_embedding
                        ) VALUES (%s, %s, %s, %s, %s, %s, %s)
                        ON CONFLICT (doc_id) DO NOTHING;
                    """, insert_data)
                    stats['loaded'] += len(insert_data)
                    pbar.update(len(batch_logs))
                except Exception as e:
                    stats['errors'] += len(batch_logs)
                    pbar.update(len(batch_logs))
        
        return stats

# Load the data
start_time = time.time()
stats = load_data_optimized(conn, incident_logs, batch_size=100)
elapsed = time.time() - start_time

print(f"\n✅ Data Loading Complete!")
print(f"   Loaded: {stats['loaded']:,} logs in {elapsed:.1f}s")
print(f"   API Calls: {stats['api_calls']} (batch processing)")

## ⚡ Module 7: Create Search Indexes

Build high-performance indexes for millisecond-latency search.

In [None]:
# Create indexes
print("⚡ Creating High-Performance Indexes")
print("=" * 70)

with conn.cursor() as cur:
    # HNSW index for vectors
    print("Creating HNSW index for semantic search...")
    cur.execute("""
        CREATE INDEX IF NOT EXISTS idx_embedding
        ON incident_logs USING hnsw (content_embedding vector_cosine_ops)
        WITH (m = 16, ef_construction = 64);
    """)
    
    # GIN index for trigrams
    print("Creating GIN index for trigram search...")
    cur.execute("""
        CREATE INDEX IF NOT EXISTS idx_trgm
        ON incident_logs USING gin(content gin_trgm_ops);
    """)
    
    # Metadata indexes
    print("Creating metadata indexes...")
    cur.execute("CREATE INDEX IF NOT EXISTS idx_persona ON incident_logs(persona);")
    cur.execute("CREATE INDEX IF NOT EXISTS idx_severity ON incident_logs(severity);")
    cur.execute("CREATE INDEX IF NOT EXISTS idx_timestamp ON incident_logs(timestamp);")
    
    # Update statistics
    cur.execute("ANALYZE incident_logs;")
    
    print("\n✅ All indexes created successfully!")

## 🔍 Module 8: Implement Core Search Methods

Build trigram and semantic search functions.

In [None]:
# Core search methods
print("🔍 Implementing Core Search Methods")
print("=" * 70)

def trigram_search(query: str, conn, limit: int = 5) -> List[Tuple]:
    """Trigram search - handles typos AND exact matches"""
    with conn.cursor() as cur:
        cur.execute("""
            SELECT doc_id, content, persona, severity,
                   GREATEST(
                       similarity(%s, content),
                       word_similarity(%s, content),
                       strict_word_similarity(%s, content)
                   ) as score
            FROM incident_logs
            WHERE similarity(%s, content) > 0.1
               OR word_similarity(%s, content) > 0.2
               OR %s <%% content
            ORDER BY score DESC
            LIMIT %s;
        """, (query, query, query, query, query, query, limit))
        return cur.fetchall()

def semantic_search(query: str, conn, limit: int = 5) -> List[Tuple]:
    """Semantic search - understands concepts"""
    query_embedding = generate_embedding(query, 'search_query')
    
    with conn.cursor() as cur:
        cur.execute("""
            SELECT doc_id, content, persona, severity,
                   1 - (content_embedding <=> %s::vector) as score
            FROM incident_logs
            WHERE content_embedding IS NOT NULL
            ORDER BY content_embedding <=> %s::vector
            LIMIT %s;
        """, (query_embedding, query_embedding, limit))
        return cur.fetchall()

# Quick test
test_query = "connection pool exhausted"
print(f"Testing search with: '{test_query}'\n")

trgm = trigram_search(test_query, conn, 3)
print(f"Trigram found {len(trgm)} results")
if trgm:
    print(f"  Top: {trgm[0][1][:60]}...")

sem = semantic_search(test_query, conn, 3)
print(f"\nSemantic found {len(sem)} results")
if sem:
    print(f"  Top: {sem[0][1][:60]}...")

print("\n✅ Search methods implemented!")

## 🚀 Module 9: Hybrid Search with Reciprocal Rank Fusion

Combine both search methods for optimal results.

In [None]:
# Hybrid search implementation
print("🚀 Implementing Hybrid Search")
print("=" * 70)

def hybrid_search(
    query: str, 
    conn, 
    weights: Optional[Dict] = None, 
    limit: int = 10,
    persona_filter: Optional[str] = None,
    severity_filter: Optional[str] = None
) -> List[Tuple]:
    """Hybrid search combining trigram and semantic with deduplication"""
    
    # Auto-detect weights
    if weights is None:
        query_upper = query.upper()
        if any(term in query_upper for term in ['PG-', 'ERROR:', 'FATAL:']):
            weights = {'semantic': 0.3, 'trigram': 0.7}
        elif len(query.split()) > 5:
            weights = {'semantic': 0.7, 'trigram': 0.3}
        else:
            weights = {'semantic': 0.5, 'trigram': 0.5}
    
    # Build filters
    filter_conditions = []
    filter_params = []
    
    if persona_filter:
        filter_conditions.append("persona = %s")
        filter_params.append(persona_filter)
    
    if severity_filter:
        filter_conditions.append("severity = %s")
        filter_params.append(severity_filter)
    
    where_clause = " AND " + " AND ".join(filter_conditions) if filter_conditions else ""
    
    # Get results from both methods
    fetch_limit = limit * 3
    
    with conn.cursor() as cur:
        # Trigram search
        trigram_query = f"""
            SELECT doc_id, content, persona, severity,
                   GREATEST(
                       similarity(%s, content),
                       word_similarity(%s, content),
                       strict_word_similarity(%s, content)
                   ) as score
            FROM incident_logs
            WHERE (similarity(%s, content) > 0.1
                   OR word_similarity(%s, content) > 0.2
                   OR %s <%% content) {where_clause}
            ORDER BY score DESC
            LIMIT %s;
        """
        cur.execute(trigram_query,
                   [query, query, query, query, query, query] + filter_params + [fetch_limit])
        trgm_results = cur.fetchall()
        
        # Semantic search
        query_embedding = generate_embedding(query, 'search_query')
        semantic_query = f"""
            SELECT doc_id, content, persona, severity,
                   1 - (content_embedding <=> %s::vector) as score
            FROM incident_logs
            WHERE content_embedding IS NOT NULL {where_clause}
            ORDER BY content_embedding <=> %s::vector
            LIMIT %s;
        """
        cur.execute(semantic_query, 
                   [query_embedding] + filter_params + [query_embedding, fetch_limit])
        sem_results = cur.fetchall()
    
    # Combine with reciprocal rank fusion
    all_results = {}
    
    # Process trigram
    for i, (doc_id, content, persona, severity, score) in enumerate(trgm_results):
        all_results[doc_id] = {
            'content': content, 'persona': persona, 'severity': severity,
            'trigram_score': score, 'trigram_rank': 1 / (i + 1),
            'semantic_score': 0, 'semantic_rank': 0,
            'found_by': ['trigram']
        }
    
    # Process semantic
    for i, (doc_id, content, persona, severity, score) in enumerate(sem_results):
        if doc_id not in all_results:
            all_results[doc_id] = {
                'content': content, 'persona': persona, 'severity': severity,
                'trigram_score': 0, 'trigram_rank': 0,
                'semantic_score': 0, 'semantic_rank': 0,
                'found_by': []
            }
        all_results[doc_id]['semantic_score'] = score
        all_results[doc_id]['semantic_rank'] = 1 / (i + 1)
        if 'semantic' not in all_results[doc_id]['found_by']:
            all_results[doc_id]['found_by'].append('semantic')
    
    # Calculate hybrid scores
    for doc_id, data in all_results.items():
        score_component = (
            weights['trigram'] * data['trigram_score'] + 
            weights['semantic'] * data['semantic_score']
        )
        rank_component = (
            weights['trigram'] * data['trigram_rank'] + 
            weights['semantic'] * data['semantic_rank']
        )
        consensus_boost = 1.1 if len(data['found_by']) > 1 else 1.0
        data['hybrid_score'] = (0.7 * score_component + 0.3 * rank_component) * consensus_boost
    
    # Sort and deduplicate
    sorted_results = sorted(
        all_results.items(),
        key=lambda x: x[1]['hybrid_score'],
        reverse=True
    )
    
    # Deduplicate similar content
    deduplicated_results = []
    seen_hashes = set()
    
    for doc_id, data in sorted_results:
        normalized_content = ' '.join(data['content'][:200].lower().split())
        content_hash = hashlib.md5(normalized_content.encode()).hexdigest()
        
        if content_hash not in seen_hashes:
            deduplicated_results.append((doc_id, data))
            seen_hashes.add(content_hash)
            if len(deduplicated_results) >= limit:
                break
    
    return deduplicated_results

# Test hybrid search
test_query = "connection pool exhausted timeout"
results = hybrid_search(test_query, conn, limit=3)

print(f"Hybrid search for: '{test_query}'\n")
print(f"Found {len(results)} results:\n")

for i, (doc_id, data) in enumerate(results, 1):
    emoji = {'critical': '🔴', 'warning': '🟡', 'info': '🔵'}.get(data['severity'], '⚪')
    print(f"{i}. {emoji} [{data['persona']}] Score: {data['hybrid_score']:.3f}")
    print(f"   Found by: {', '.join(data['found_by'])}")
    print(f"   {data['content'][:80]}...\n")

print("✅ Hybrid search implemented!")

## 🎯 Module 10: ML Reranking with Cohere

Add Cohere Rerank v3.5 for optimal relevance scoring.

In [None]:
# ML Reranking
print("🎯 Implementing ML Reranking")
print("=" * 70)

def rerank_with_cohere(query: str, search_results: List[Tuple], limit: int = 5) -> List[Tuple]:
    """Rerank results using Cohere Rerank v3.5"""
    
    if not search_results:
        return []
    
    try:
        # Prepare documents
        documents = []
        for doc_id, result in search_results:
            doc_text = (
                f"[SEVERITY: {result['severity'].upper()}] "
                f"[TEAM: {result['persona'].upper()}] "
                f"{result['content']}"
            )
            documents.append({
                "type": "INLINE",
                "inlineDocumentSource": {
                    "type": "TEXT",
                    "textDocument": {"text": doc_text}
                }
            })
        
        # Call Bedrock Rerank
        model_arn = "arn:aws:bedrock:us-west-2::foundation-model/cohere.rerank-v3-5:0"
        
        response = bedrock_agent_runtime.rerank(
            queries=[{"type": "TEXT", "textQuery": {"text": query}}],
            sources=documents,
            rerankingConfiguration={
                "type": "BEDROCK_RERANKING_MODEL",
                "bedrockRerankingConfiguration": {
                    "numberOfResults": min(limit, len(search_results)),
                    "modelConfiguration": {"modelArn": model_arn}
                }
            }
        )
        
        # Process reranked results
        reranked = []
        for result in response['results']:
            idx = result['index']
            doc_id, original_result = search_results[idx]
            relevance_score = result['relevanceScore']
            reranked.append((doc_id, original_result, relevance_score))
        
        return reranked
        
    except Exception as e:
        # Fallback to hybrid scores
        fallback = []
        for i, (doc_id, result) in enumerate(search_results[:limit]):
            score = result.get('hybrid_score', 0.5 - i * 0.05)
            fallback.append((doc_id, result, score))
        return fallback

def search_pipeline(
    query: str, 
    conn, 
    limit: int = 5,
    persona_filter: Optional[str] = None,
    severity_filter: Optional[str] = None
) -> List[Tuple]:
    """Complete search pipeline with ML reranking"""
    
    # Step 1: Hybrid search
    hybrid_results = hybrid_search(
        query, conn, 
        limit=limit * 2,
        persona_filter=persona_filter,
        severity_filter=severity_filter
    )
    
    # Step 2: ML Reranking
    reranked = rerank_with_cohere(query, hybrid_results, limit)
    
    return reranked

# Test complete pipeline
test_query = "FATAL connection database error"
print(f"Testing complete pipeline: '{test_query}'\n")

results = search_pipeline(test_query, conn, limit=3, severity_filter='critical')

print(f"Final Results (ML-Reranked):\n")
for i, (doc_id, data, relevance) in enumerate(results, 1):
    emoji = {'critical': '🔴', 'warning': '🟡', 'info': '🔵'}.get(data['severity'], '⚪')
    print(f"{i}. {emoji} [{data['persona']}] Relevance: {relevance:.3f}")
    print(f"   {data['content'][:100]}...\n")

print("✅ Complete pipeline with ML reranking ready!")

## 🎮 Interactive Search with Temporal Pattern Analysis

Try different queries with time-based filtering to discover peak event patterns, team-specific insights, and severity correlations.

In [None]:
# Interactive Search Widget with Temporal Filter and Extended Score Interpretation
print("🎮 Interactive Hybrid Search Tester - Enhanced Version")
print("=" * 70)

import time
import hashlib
from typing import List, Dict, Tuple, Optional
from datetime import datetime, timedelta
import ipywidgets as widgets
from IPython.display import display, HTML, clear_output
import uuid

# Clear any previous widgets
clear_output(wait=True)

print("🎮 Interactive Hybrid Search Tester (with Temporal Filtering)")
print("=" * 70)

# Verify dependencies
if 'conn' not in globals():
    print("❌ Database connection not found. Please run Module 3 first!")
    raise RuntimeError("Database connection required")

# SINGLETON PATTERN
if '_widget_state' not in globals():
    _widget_state = {'initialized': False, 'button': None, 'output': None}

# Clean up existing widget state
if _widget_state['initialized']:
    if _widget_state['button'] is not None:
        _widget_state['button'].close()
    if _widget_state['output'] is not None:
        _widget_state['output'].close()
    _widget_state = {'initialized': False, 'button': None, 'output': None}

# Enhanced search pipeline with temporal filtering
def widget_search_pipeline_enhanced(
    query: str, 
    conn, 
    limit: int = 5,
    persona_filter: Optional[str] = None,
    severity_filter: Optional[str] = None,
    weights: Optional[Dict] = None,
    start_date: Optional[datetime] = None,
    end_date: Optional[datetime] = None
) -> List[Tuple]:
    """Enhanced search pipeline with temporal filtering"""
    
    # Build filters including temporal
    filter_conditions = []
    filter_params = []
    
    if persona_filter:
        filter_conditions.append("persona = %s")
        filter_params.append(persona_filter)
    
    if severity_filter:
        filter_conditions.append("severity = %s")
        filter_params.append(severity_filter)
    
    if start_date:
        filter_conditions.append("timestamp >= %s")
        filter_params.append(start_date.isoformat() + 'Z')
    
    if end_date:
        filter_conditions.append("timestamp <= %s")
        filter_params.append(end_date.isoformat() + 'Z')
    
    where_clause = " AND " + " AND ".join(filter_conditions) if filter_conditions else ""
    
    # Get results with temporal filtering
    fetch_limit = limit * 3
    
    with conn.cursor() as cur:
        # Trigram search with temporal filter
        trigram_query = f"""
            SELECT doc_id, content, persona, severity, timestamp,
                   GREATEST(
                       similarity(%s, content),
                       word_similarity(%s, content),
                       strict_word_similarity(%s, content)
                   ) as score
            FROM incident_logs
            WHERE (similarity(%s, content) > 0.1
                   OR word_similarity(%s, content) > 0.2
                   OR %s <%% content) {where_clause}
            ORDER BY score DESC
            LIMIT %s;
        """
        cur.execute(trigram_query,
                   [query, query, query, query, query, query] + filter_params + [fetch_limit])
        trgm_results = cur.fetchall()
        
        # Semantic search with temporal filter
        query_embedding = generate_embedding(query, 'search_query')
        semantic_query = f"""
            SELECT doc_id, content, persona, severity, timestamp,
                   1 - (content_embedding <=> %s::vector) as score
            FROM incident_logs
            WHERE content_embedding IS NOT NULL {where_clause}
            ORDER BY content_embedding <=> %s::vector
            LIMIT %s;
        """
        cur.execute(semantic_query, 
                   [query_embedding] + filter_params + [query_embedding, fetch_limit])
        sem_results = cur.fetchall()
    
    # Process results
    all_results = {}
    
    # Process trigram
    for i, row in enumerate(trgm_results):
        doc_id, content, persona, severity, timestamp, score = row
        all_results[doc_id] = {
            'content': content, 'persona': persona, 'severity': severity,
            'timestamp': timestamp,
            'trigram_score': score, 'trigram_rank': 1 / (i + 1),
            'semantic_score': 0, 'semantic_rank': 0,
            'found_by': ['trigram']
        }
    
    # Process semantic
    for i, row in enumerate(sem_results):
        doc_id, content, persona, severity, timestamp, score = row
        if doc_id not in all_results:
            all_results[doc_id] = {
                'content': content, 'persona': persona, 'severity': severity,
                'timestamp': timestamp,
                'trigram_score': 0, 'trigram_rank': 0,
                'semantic_score': 0, 'semantic_rank': 0,
                'found_by': []
            }
        all_results[doc_id]['semantic_score'] = score
        all_results[doc_id]['semantic_rank'] = 1 / (i + 1)
        if 'semantic' not in all_results[doc_id]['found_by']:
            all_results[doc_id]['found_by'].append('semantic')
    
    # Calculate hybrid scores
    if weights is None:
        weights = {'semantic': 0.5, 'trigram': 0.5}
    
    for doc_id, data in all_results.items():
        score_component = (
            weights['trigram'] * data['trigram_score'] + 
            weights['semantic'] * data['semantic_score']
        )
        rank_component = (
            weights['trigram'] * data['trigram_rank'] + 
            weights['semantic'] * data['semantic_rank']
        )
        consensus_boost = 1.1 if len(data['found_by']) > 1 else 1.0
        data['hybrid_score'] = (0.7 * score_component + 0.3 * rank_component) * consensus_boost
    
    # Sort and deduplicate
    sorted_results = sorted(
        all_results.items(),
        key=lambda x: x[1]['hybrid_score'],
        reverse=True
    )
    
    # Deduplicate
    deduplicated_results = []
    seen_hashes = set()
    
    for doc_id, data in sorted_results:
        normalized_content = ' '.join(data['content'][:200].lower().split())
        content_hash = hashlib.md5(normalized_content.encode()).hexdigest()
        
        if content_hash not in seen_hashes:
            deduplicated_results.append((doc_id, data))
            seen_hashes.add(content_hash)
            if len(deduplicated_results) >= limit:
                break
    
    # Try ML reranking if available
    if 'rerank_with_cohere' in globals() and deduplicated_results:
        try:
            reranked = rerank_with_cohere(query, deduplicated_results, limit)
            return reranked
        except:
            pass
    
    # Return with scores
    reranked = []
    for i, (doc_id, result) in enumerate(deduplicated_results[:limit]):
        score = result.get('hybrid_score', 0.5)
        reranked.append((doc_id, result, score))
    
    return reranked

# Enhanced explanation
explanation = """
<div style="background-color: #f0f8ff; padding: 15px; border-radius: 5px; margin-bottom: 15px;">
<h3>🎯 Why This Matters for Black Friday</h3>
<p><b>The Challenge:</b> During peak events, different teams describe the same problem differently:</p>
<ul>
<li>🔧 DBAs say: "FATAL: connection slots reserved"</li>
<li>💻 Developers say: "HikariPool timeout 30000ms"</li>
<li>📊 SREs say: "CloudWatch DatabaseConnections > 990"</li>
</ul>
<p><b>The Solution:</b> Hybrid search with temporal filtering helps you:</p>
<ul>
<li>✅ Find similar incidents from specific time periods (e.g., last Black Friday)</li>
<li>✅ Identify seasonal patterns and recurring issues</li>
<li>✅ Handle typos and misspellings in urgent situations</li>
<li>✅ Discover patterns across team boundaries</li>
<li>✅ Build time-aware playbooks for peak events</li>
</ul>
</div>
"""

# Detailed scenario instructions
scenarios = """
<div style="background-color: #f5f5f5; padding: 15px; border-radius: 5px; margin-bottom: 15px;">
    <h4>💡 Detailed Search Scenarios & Instructions</h4>
    
    <div style="background-color: white; padding: 10px; margin: 10px 0; border-left: 4px solid #2196F3;">
        <h5>1. Typo-Tolerant Search (Trigram Power)</h5>
        <p><b>Query:</b> "conection pol exausted" (intentional typos)</p>
        <p><b>Settings:</b></p>
        <ul>
            <li>Team: All</li>
            <li>Severity: Critical</li>
            <li>Time Range: All time</li>
            <li>Weights: Trigram 70%, Semantic 30%</li>
        </ul>
        <p><b>Expected:</b> Trigram will find "connection pool exhausted" despite typos</p>
    </div>
    
    <div style="background-color: white; padding: 10px; margin: 10px 0; border-left: 4px solid #4CAF50;">
        <h5>2. Error Code Search (Exact Match)</h5>
        <p><b>Query:</b> "PG-53300"</p>
        <p><b>Settings:</b></p>
        <ul>
            <li>Team: DBA (DBAs typically log error codes)</li>
            <li>Severity: Critical</li>
            <li>Time Range: Last 30 days</li>
            <li>Weights: Trigram 80%, Semantic 20%</li>
        </ul>
        <p><b>Expected:</b> Exact error code matches with high precision</p>
    </div>
    
    <div style="background-color: white; padding: 10px; margin: 10px 0; border-left: 4px solid #FF9800;">
        <h5>3. Conceptual Pattern Search (Semantic Focus)</h5>
        <p><b>Query:</b> "resource exhaustion memory pressure"</p>
        <p><b>Settings:</b></p>
        <ul>
            <li>Team: All (different teams describe differently)</li>
            <li>Severity: All</li>
            <li>Time Range: November (Black Friday period)</li>
            <li>Weights: Semantic 70%, Trigram 30%</li>
        </ul>
        <p><b>Expected:</b> Finds related concepts like OOM, swap usage, buffer cache</p>
    </div>
    
    <div style="background-color: white; padding: 10px; margin: 10px 0; border-left: 4px solid #9C27B0;">
        <h5>4. Historical Pattern Analysis (Temporal Focus)</h5>
        <p><b>Query:</b> "performance degradation slow queries"</p>
        <p><b>Settings:</b></p>
        <ul>
            <li>Team: All</li>
            <li>Severity: Critical + Warning</li>
            <li><b>Time Range: Nov 24-30 (Black Friday week)</b></li>
            <li>Weights: Balanced 50/50</li>
        </ul>
        <p><b>Expected:</b> Reveals patterns specific to high-traffic periods</p>
    </div>
    
    <div style="background-color: white; padding: 10px; margin: 10px 0; border-left: 4px solid #F44336;">
        <h5>5. Cross-Team Correlation</h5>
        <p><b>Query:</b> "connection timeout"</p>
        <p><b>Settings:</b></p>
        <ul>
            <li><b>Team: Switch between DBA, Developer, SRE</b></li>
            <li>Severity: All</li>
            <li>Time Range: Last 7 days</li>
            <li>Weights: Balanced 50/50</li>
        </ul>
        <p><b>Expected:</b> See how different teams describe the same issue</p>
    </div>
    
    <div style="background-color: #fff3cd; padding: 10px; margin-top: 15px; border-radius: 5px;">
        <h5>🎯 Pro Tips for Effective Searching:</h5>
        <ul>
            <li><b>Start broad, then narrow:</b> Begin with "All" filters, then refine based on initial results</li>
            <li><b>Use time ranges strategically:</b> Compare peak vs. normal periods to identify patterns</li>
            <li><b>Adjust weights based on query type:</b> Error codes → high trigram, concepts → high semantic</li>
            <li><b>Combine filters:</b> Team + Severity + Time gives the most targeted results</li>
            <li><b>Look for consensus:</b> Results found by BOTH methods (trigram + semantic) are highest confidence</li>
        </ul>
    </div>
</div>
"""

display(widgets.HTML(explanation))
display(widgets.HTML(scenarios))

# Create widgets
query_input = widgets.Text(
    value='connection pool exhausted',
    placeholder='Enter search query',
    description='Query:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='500px')
)

sample_queries = widgets.Dropdown(
    options=[
        'connection pool exhausted',
        'conection pol exausted',  # with typos
        'PG-53300 FATAL',
        'resource exhaustion OOM memory',
        'deadlock detected',
        'performance degradation slow queries',
        'buffer cache hit ratio',
        'autovacuum dead tuples',
        'replication lag',
        'disk space critical'
    ],
    value='connection pool exhausted',
    description='Examples:',
    style={'description_width': 'initial'}
)

persona_filter = widgets.Dropdown(
    options=['All'] + ['dba', 'developer', 'sre', 'data_engineer'],
    value='All',
    description='Team:',
    style={'description_width': 'initial'}
)

severity_filter = widgets.Dropdown(
    options=['All', 'critical', 'warning', 'info'],
    value='All',
    description='Severity:',
    style={'description_width': 'initial'}
)

# Temporal filter options
temporal_presets = widgets.Dropdown(
    options=[
        ('All Time', 'all'),
        ('Last 7 Days', '7d'),
        ('Last 30 Days', '30d'),
        ('November (Black Friday)', 'november'),
        ('Black Friday Week', 'bf_week'),
        ('December (Holiday)', 'december'),
        ('Q4 (Peak Season)', 'q4'),
        ('Custom Range', 'custom')
    ],
    value='all',
    description='Time Range:',
    style={'description_width': 'initial'}
)

# Custom date pickers (initially hidden)
start_date = widgets.DatePicker(
    description='Start:',
    value=datetime(2024, 1, 1),
    style={'description_width': 'initial'}
)

end_date = widgets.DatePicker(
    description='End:',
    value=datetime(2024, 12, 31),
    style={'description_width': 'initial'}
)

custom_dates_box = widgets.HBox([start_date, end_date])
custom_dates_box.layout.display = 'none'

semantic_weight = widgets.FloatSlider(
    value=0.5, min=0, max=1, step=0.1,
    description='Semantic:',
    style={'description_width': 'initial'},
    readout_format='.0%'
)

trigram_weight = widgets.FloatSlider(
    value=0.5, min=0, max=1, step=0.1,
    description='Trigram:',
    style={'description_width': 'initial'},
    readout_format='.0%'
)

result_limit = widgets.IntSlider(
    value=5, min=1, max=10,
    description='Results:',
    style={'description_width': 'initial'}
)

search_button = widgets.Button(
    description='🔍 Search',
    button_style='primary',
    layout=widgets.Layout(width='100px')
)

output = widgets.Output(layout={'border': '1px solid #ddd', 'padding': '10px', 'border-radius': '5px'})

# Store in global state
_widget_state['button'] = search_button
_widget_state['output'] = output
_widget_state['initialized'] = True

# Event handlers
def on_sample_change(change):
    query_input.value = change['new']

def on_temporal_change(change):
    if change['new'] == 'custom':
        custom_dates_box.layout.display = 'flex'
    else:
        custom_dates_box.layout.display = 'none'

sample_queries.observe(on_sample_change, names='value')
temporal_presets.observe(on_temporal_change, names='value')

def on_semantic_change(change):
    trigram_weight.value = 1 - change['new']

def on_trigram_change(change):
    semantic_weight.value = 1 - change['new']

semantic_weight.observe(on_semantic_change, names='value')
trigram_weight.observe(on_trigram_change, names='value')

_search_lock = False

def run_search(b):
    """Execute search with enhanced features"""
    global _search_lock
    
    if _search_lock:
        return
    _search_lock = True
    
    output.clear_output(wait=True)
    
    try:
        query = query_input.value
        if not query:
            with output:
                display(widgets.HTML("<p style='color: red;'>⚠️ Please enter a search query</p>"))
            return
        
        # Prepare parameters
        persona = None if persona_filter.value == 'All' else persona_filter.value
        severity = None if severity_filter.value == 'All' else severity_filter.value
        weights = {
            'semantic': semantic_weight.value,
            'trigram': trigram_weight.value
        }
        
        # Handle temporal filtering
        start_dt = None
        end_dt = None
        
        if temporal_presets.value == '7d':
            end_dt = datetime(2024, 11, 29)  # Black Friday 2024
            start_dt = end_dt - timedelta(days=7)
        elif temporal_presets.value == '30d':
            end_dt = datetime(2024, 11, 29)
            start_dt = end_dt - timedelta(days=30)
        elif temporal_presets.value == 'november':
            start_dt = datetime(2024, 11, 1)
            end_dt = datetime(2024, 11, 30)
        elif temporal_presets.value == 'bf_week':
            start_dt = datetime(2024, 11, 24)
            end_dt = datetime(2024, 11, 30)
        elif temporal_presets.value == 'december':
            start_dt = datetime(2024, 12, 1)
            end_dt = datetime(2024, 12, 31)
        elif temporal_presets.value == 'q4':
            start_dt = datetime(2024, 10, 1)
            end_dt = datetime(2024, 12, 31)
        elif temporal_presets.value == 'custom':
            start_dt = datetime.combine(start_date.value, datetime.min.time())
            end_dt = datetime.combine(end_date.value, datetime.max.time())
        
        # Run search
        start_time = time.time()
        results = widget_search_pipeline_enhanced(
            query, 
            conn, 
            limit=result_limit.value,
            persona_filter=persona,
            severity_filter=severity,
            weights=weights,
            start_date=start_dt,
            end_date=end_dt
        )
        elapsed = (time.time() - start_time) * 1000
        
        # Build HTML
        html_parts = []
        
        # Header with temporal info
        temporal_desc = temporal_presets.label if temporal_presets.value != 'custom' else f"{start_date.value} to {end_date.value}"
        
        html_parts.append(f"""
        <div style="background-color: #e8f4f8; padding: 10px; border-radius: 5px; margin-bottom: 10px;">
            <h3>🔍 Search Results</h3>
            <table style="width: 100%;">
                <tr><td><b>Query:</b></td><td>{query}</td></tr>
                <tr><td><b>Time Range:</b></td><td>{temporal_desc}</td></tr>
                <tr><td><b>Weights:</b></td><td>Semantic: {semantic_weight.value:.0%}, Trigram: {trigram_weight.value:.0%}</td></tr>
                <tr><td><b>Filters:</b></td><td>Team: {persona_filter.value}, Severity: {severity_filter.value}</td></tr>
            </table>
        </div>
        <p style='color: green;'>✅ Found {len(results)} results in {elapsed:.1f}ms</p>
        """)
        
        if results:
            # Results table
            html_parts.append("""
            <table style="width: 100%; border-collapse: collapse; margin-top: 10px;">
            <thead>
                <tr style="background-color: #f0f0f0;">
                    <th style="padding: 8px; text-align: left; border: 1px solid #ddd;">#</th>
                    <th style="padding: 8px; text-align: left; border: 1px solid #ddd;">Severity</th>
                    <th style="padding: 8px; text-align: left; border: 1px solid #ddd;">Team</th>
                    <th style="padding: 8px; text-align: left; border: 1px solid #ddd;">Date</th>
                    <th style="padding: 8px; text-align: left; border: 1px solid #ddd;">Relevance</th>
                    <th style="padding: 8px; text-align: left; border: 1px solid #ddd;">Methods</th>
                    <th style="padding: 8px; text-align: left; border: 1px solid #ddd;">Content</th>
                </tr>
            </thead>
            <tbody>
            """)
            
            for i, (doc_id, data, relevance) in enumerate(results, 1):
                emoji = {'critical': '🔴', 'warning': '🟡', 'info': '🔵'}.get(data['severity'], '⚪')
                methods = ', '.join(data['found_by'])
                content_preview = data['content'][:100] + '...' if len(data['content']) > 100 else data['content']
                row_color = '#ffffff' if i % 2 == 0 else '#f9f9f9'
                
                # Format timestamp if available
                date_str = 'N/A'
                if 'timestamp' in data and data['timestamp']:
                    try:
                        if isinstance(data['timestamp'], str):
                            dt = datetime.fromisoformat(data['timestamp'].replace('Z', '+00:00'))
                        else:
                            dt = data['timestamp']
                        date_str = dt.strftime('%Y-%m-%d')
                    except:
                        pass
                
                html_parts.append(f"""
                <tr style="background-color: {row_color};">
                    <td style="padding: 8px; border: 1px solid #ddd;">{i}</td>
                    <td style="padding: 8px; border: 1px solid #ddd;">{emoji} {data['severity']}</td>
                    <td style="padding: 8px; border: 1px solid #ddd;">{data['persona']}</td>
                    <td style="padding: 8px; border: 1px solid #ddd; font-size: 0.9em;">{date_str}</td>
                    <td style="padding: 8px; border: 1px solid #ddd;">{relevance:.3f}</td>
                    <td style="padding: 8px; border: 1px solid #ddd; font-size: 0.9em;">{methods}</td>
                    <td style="padding: 8px; border: 1px solid #ddd; font-size: 0.9em;">{content_preview}</td>
                </tr>
                """)
            
            html_parts.append("</tbody></table>")
            
            # Enhanced score interpretation
            html_parts.append(f"""
            <div style="background-color: #e6f3ff; padding: 15px; border-radius: 5px; margin-top: 15px; border-left: 4px solid #1976d2;">
                <h4>📊 Understanding Your Relevance Scores</h4>
                
                <p><b>Score Interpretation Guide:</b></p>
                <ul style="margin-bottom: 10px;">
                    <li><b>0.8-1.0 (Exact Match):</b> Nearly identical incidents - Study these first for direct pattern matches</li>
                    <li><b>0.6-0.8 (High Relevance):</b> Same root cause described differently by teams - Critical for cross-team insights</li>
                    <li><b>0.4-0.6 (Moderate Relevance):</b> Related concepts that may reveal cascade effects or dependencies</li>
                    <li><b>0.2-0.4 (Low Relevance):</b> Peripheral matches - Often contain common keywords but different context. 
                        These are still valuable for understanding the broader incident landscape and may reveal unexpected correlations</li>
                    <li><b>< 0.2 (Minimal Relevance):</b> Weak connection - May share only basic terms or be conceptually distant</li>
                </ul>
                
                <div style="background-color: #fff3e0; padding: 10px; border-radius: 5px; margin: 10px 0;">
                    <p><b>💡 Why Lower Scores Matter:</b></p>
                    <p>In your results, scores around 0.2-0.3 for "memory exhaustion" incidents indicate these are conceptually 
                    related but not direct matches to your query. They're valuable because:</p>
                    <ul>
                        <li>They show different failure modes that occurred during similar conditions</li>
                        <li>Memory issues often precede or follow connection pool problems</li>
                        <li>Understanding these correlations helps build comprehensive runbooks</li>
                    </ul>
                </div>
                
                <p><b>Methods Column Insights:</b></p>
                <ul>
                    <li><b>"trigram, semantic":</b> Found by BOTH methods = highest confidence match</li>
                    <li><b>"trigram" only:</b> Exact keyword/error code match - precise but may miss related issues</li>
                    <li><b>"semantic" only:</b> Conceptually similar - reveals different terminology for same problem</li>
                </ul>
                
                <p><b>Temporal Context:</b></p>
                <p>Your selected time range <b>({temporal_desc})</b> helps identify:</p>
                <ul>
                    <li>Seasonal patterns specific to peak events</li>
                    <li>Recurring issues that appear during high load</li>
                    <li>Evolution of problems over time</li>
                </ul>
                
                <p style="margin-top: 10px; font-weight: bold; color: #1976d2;">
                🎯 Action: Export these results grouped by score ranges to build a comprehensive incident response matrix!
                </p>
            </div>
            """)
        else:
            html_parts.append("""
            <div style="background-color: #fff0f0; padding: 10px; border-radius: 5px;">
                <p>❌ No results found. Try:</p>
                <ul>
                    <li>Expanding the time range</li>
                    <li>Adjusting filter criteria</li>
                    <li>Modifying search weights</li>
                    <li>Using broader search terms</li>
                </ul>
            </div>
            """)
        
        # Display complete HTML
        complete_html = ''.join(html_parts)
        with output:
            display(widgets.HTML(complete_html))
            
    except Exception as e:
        with output:
            display(widgets.HTML(f"""
            <div style="background-color: #fff0f0; padding: 10px; border-radius: 5px;">
                <p style='color: red;'>❌ Error: {str(e)[:200]}</p>
            </div>
            """))
    finally:
        time.sleep(0.1)
        _search_lock = False

# Attach handler
if hasattr(search_button, '_click_handlers'):
    search_button._click_handlers.callbacks.clear()
search_button.on_click(run_search)

# Layout
query_section = widgets.VBox([
    widgets.HTML("<h4>🔍 Search Query</h4>"),
    widgets.HBox([query_input, sample_queries])
])

filter_section = widgets.VBox([
    widgets.HTML("<h4>🔧 Filters & Time Range</h4>"),
    widgets.HBox([persona_filter, severity_filter]),
    widgets.HBox([temporal_presets, result_limit]),
    custom_dates_box
])

weight_section = widgets.VBox([
    widgets.HTML("<h4>⚖️ Search Method Weights</h4>"),
    widgets.HBox([semantic_weight, trigram_weight]),
    widgets.HTML("<p style='font-size: 0.9em; color: #666;'>Adjust to prioritize semantic understanding vs exact/typo matching</p>")
])

control_panel = widgets.VBox([
    query_section,
    filter_section,
    weight_section,
    widgets.HTML("<br>"),
    search_button
], layout={'padding': '10px', 'border': '1px solid #ddd', 'border-radius': '5px'})

# Display everything
display(control_panel)
display(widgets.HTML("<br>"))
display(output)

print("\n✅ Enhanced widget with temporal filtering loaded successfully!")

## 🏁 Workshop Complete!

### ✅ What You've Built

You've successfully implemented a production-ready hybrid search system with:

- **Trigram Search** - Handles typos and exact matches
- **Semantic Search** - Understands conceptual relationships
- **Hybrid Fusion** - Combines both methods optimally
- **ML Reranking** - Uses Cohere for relevance optimization
- **Interactive Testing** - Widget-based query exploration

### 🎯 Key Takeaways

1. **Trigram search** (pg_trgm) handles both typos AND exact matches - no need for separate FTS
2. **Semantic search** (pgvector) finds conceptually related content
3. **Reciprocal rank fusion** effectively combines different scoring methods
4. **ML reranking** provides the final optimization for relevance
5. **Aurora PostgreSQL** provides all the tools needed for production hybrid search

### 📚 Next Steps

1. **Deploy to Production**
   - Use Aurora read replicas for search workloads
   - Implement caching for frequent queries
   - Monitor query performance with Performance Insights

2. **Optimize for Your Use Case**
   - Fine-tune weights based on user feedback
   - Add more metadata filters as needed
   - Consider implementing query expansion

3. **Scale for Black Friday**
   - Test with production-scale data
   - Implement connection pooling
   - Set up alerting for search latency

### 🙏 Thank You!

Thank you for participating in **DAT409**! You're now ready to implement hybrid search for MCP retrieval and prevent Black Friday incidents.

---

**Built with ❤️ by AWS Database Specialists for re:Invent 2025**