<div style="background: linear-gradient(135deg, #66bb6a 0%, #43a047 100%); 
            color: white; 
            padding: 20px 25px; 
            border-radius: 10px; 
            border-left: 6px solid #ffd700;
            margin: 25px 0;
            box-shadow: 0 4px 6px rgba(0,0,0,0.1);">
<h2 style="margin: 0 0 12px 0; color: white; font-size: 22px; font-weight: 600;">‚úÖ DAT409 Solutions - Complete Hybrid Search Implementation</h2>
<p style="margin: 0 0 10px 0; font-size: 16px; opacity: 0.95;">
<strong>Reference Implementation:</strong> Production-ready hybrid search with fuzzy, semantic, and RRF fusion
</p>
<p style="margin: 0 0 8px 0; font-size: 16px;">
‚è±Ô∏è <strong>Runtime:</strong> ~3 minutes (Run All) | <strong>Level:</strong> 400 (Expert)
</p>
<p style="margin: 0; font-size: 16px; opacity: 0.9;">
üöÄ <strong>Quick Start:</strong> Select Kernel ‚Üí Python 3.13.3 ‚Üí Run All
</p>
</div>

---

### üìã Implementations Included

| Method | Extension | Key Pattern |
|--------|-----------|-------------|
| Keyword Search | `tsvector` | `ts_rank()` + GIN index |
| Fuzzy Search | `pg_trgm` | `similarity()` + `%%` operator |
| Semantic Search | `pgvector` | `<=>` cosine distance + HNSW |
| Hybrid RRF | SQL CTEs | `ROW_NUMBER()` + rank fusion |
| Cohere Rerank | Bedrock | Cross-encoder refinement |

---

## ‚öôÔ∏è Step 0: Verify Python Kernel

In [None]:
import sys
version = sys.version.split()[0]
print(f"üêç Python version: {version}")
if version.startswith('3.13'):
    print("‚úÖ Correct! You're using Python 3.13")
else:
    print(f"‚ö†Ô∏è  WARNING: Expected Python 3.13, but found {version}")
    print("   Please change the kernel: Click top-right ‚Üí Select Kernel ‚Üí Python 3.13.3")

## üì¶ Step 1: Environment & Database Setup

In [None]:
# ============================================================
# ENVIRONMENT & DATABASE SETUP
# ============================================================

import sys
import os
import warnings
warnings.filterwarnings('ignore')

import boto3
import json
import psycopg
from pgvector.psycopg import register_vector
import pandas as pd
import numpy as np
from pathlib import Path
from typing import Optional
from dotenv import load_dotenv
from IPython.display import display, HTML, clear_output
import ipywidgets as widgets

# Load environment variables
env_path = Path('/workshop/notebooks/.env')
if env_path.exists():
    load_dotenv(env_path, override=True)
    print("‚úÖ Environment loaded")
else:
    print("‚ö†Ô∏è  .env file not found - check bootstrap")

# Database configuration
dbhost = os.getenv('DB_HOST')
dbport = os.getenv('DB_PORT', '5432')
dbuser = os.getenv('DB_USER')
dbpass = os.getenv('DB_PASSWORD')
dbname = os.getenv('DB_NAME', 'workshop_db')
aws_region = os.getenv('AWS_REGION', 'us-west-2')

if not all([dbhost, dbuser, dbpass]):
    print("‚ùå Missing credentials - check .env file")
    sys.exit(1)

print(f"\nüìç Configuration:")
print(f"   Database: {dbuser}@{dbhost}:{dbport}/{dbname}")
print(f"   AWS Region: {aws_region}")

# Initialize Bedrock client
bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name=aws_region
)

# Test database connection
try:
    with psycopg.connect(
        host=dbhost, port=dbport, user=dbuser,
        password=dbpass, dbname=dbname, autocommit=True
    ) as conn:
        conn.execute("CREATE EXTENSION IF NOT EXISTS vector")
        register_vector(conn)
        
        pg_version = conn.execute("SELECT version()").fetchone()[0].split(',')[0]
        pgvector_version = conn.execute(
            "SELECT extversion FROM pg_extension WHERE extname = 'vector'"
        ).fetchone()
        
        print(f"   PostgreSQL: {pg_version}")
        print(f"   pgvector: v{pgvector_version[0]}" if pgvector_version else "   ‚ö†Ô∏è  pgvector not installed")
        
        result = conn.execute("""
            SELECT COUNT(*) as count, COUNT(embedding) as with_embeddings 
            FROM bedrock_integration.product_catalog
        """).fetchone()
        
        if result and result[0] > 0:
            print(f"   Products: {result[0]:,} ({result[1]:,} with embeddings)")
        else:
            print("   ‚ö†Ô∏è  No data found")
            
except Exception as e:
    print(f"‚ùå Database connection failed: {e}")
    sys.exit(1)

# Embedding generation function
def generate_embedding(text: str, input_type: str = "search_query") -> Optional[list]:
    """Generate embeddings using Cohere Embed v3 via Bedrock."""
    if not text:
        return None
    try:
        response = bedrock_runtime.invoke_model(
            modelId='cohere.embed-english-v3',
            body=json.dumps({
                "texts": [text],
                "input_type": input_type,
                "embedding_types": ["float"]
            })
        )
        result = json.loads(response['body'].read())
        return result['embeddings']['float'][0]
    except Exception as e:
        print(f"‚ùå Embedding generation failed: {e}")
        return None

# Verify Bedrock
test_embedding = generate_embedding("test", "search_query")
if test_embedding:
    print(f"\nü§ñ Bedrock: Cohere Embed v3 ‚úÖ")
else:
    print("\n‚ö†Ô∏è  Bedrock embedding test failed")

print("\n‚úÖ Setup complete!")

## üìä Step 2: Data Overview

In [None]:
with psycopg.connect(
    host=dbhost, port=dbport, user=dbuser,
    password=dbpass, autocommit=True
) as conn:
    stats = conn.execute("""
        SELECT 
            COUNT(*) as total,
            COUNT(embedding) as with_embeddings,
            COUNT(DISTINCT category_name) as categories
        FROM bedrock_integration.product_catalog;
    """).fetchone()
    
    print(f"üìä Products: {stats[0]:,} | Embeddings: {stats[1]:,} | Categories: {stats[2]}")
    
    indexes = conn.execute("""
        SELECT indexname FROM pg_indexes
        WHERE schemaname = 'bedrock_integration' AND tablename = 'product_catalog';
    """).fetchall()
    
    print(f"üîç Indexes: {len(indexes)} configured")
    print("\n‚úÖ Database ready!")

---

## üîç Step 3: Search Method Implementations

All implementations are production-ready with proper error handling.

### 1. Keyword Search (Baseline)

In [None]:
# ============================================================
# 1. KEYWORD SEARCH - FULL-TEXT (BASELINE)
# ============================================================

def keyword_search(query: str, limit: int = 10) -> list[dict]:
    """Full-text search using PostgreSQL tsvector/tsquery."""
    with psycopg.connect(
        host=dbhost, port=dbport, user=dbuser,
        password=dbpass, autocommit=True
    ) as conn:
        results = conn.execute("""
            SELECT 
                "productId",
                product_description,
                category_name,
                price,
                stars,
                reviews,
                imgurl,
                ts_rank(to_tsvector('english', product_description), query) AS score
            FROM bedrock_integration.product_catalog, to_tsquery('english', %(query)s) query
            WHERE to_tsvector('english', product_description) @@ query
            ORDER BY score DESC
            LIMIT %(limit)s;
        """, {'query': ' & '.join(query.split()), 'limit': limit}).fetchall()
        
        return [{
            'productId': r[0],
            'description': r[1][:200] + '...',
            'category': r[2],
            'price': float(r[3]) if r[3] else 0,
            'stars': float(r[4]) if r[4] else 0,
            'reviews': int(r[5]) if r[5] else 0,
            'imgUrl': r[6],
            'score': float(r[7]) if r[7] else 0,
            'method': 'Keyword'
        } for r in results]

print("‚úÖ Keyword search ready")

### 2. Fuzzy Search (Solution)

**Key Pattern**: `similarity()` + `%%` operator with GIN index

```sql
similarity(lower(product_description), lower(%(query)s)) AS sim
WHERE lower(product_description) %% lower(%(query)s)
```

**Why This Works**:
- `similarity()` computes trigram overlap ratio (0.0-1.0)
- `%%` pre-filters using GIN index before computing exact similarity
- `lower()` ensures case-insensitive matching

In [None]:
# ============================================================
# 2. FUZZY SEARCH - TYPO TOLERANCE (‚úÖ COMPLETE)
# ============================================================

def fuzzy_search(query: str, limit: int = 10) -> list[dict]:
    """Fuzzy search using pg_trgm for typo tolerance."""
    with psycopg.connect(
        host=dbhost, port=dbport, user=dbuser,
        password=dbpass, autocommit=True
    ) as conn:
        results = conn.execute("""
            SELECT 
                "productId",
                product_description,
                category_name,
                price,
                stars,
                reviews,
                imgurl,
                similarity(lower(product_description), lower(%(query)s)) AS sim
            FROM bedrock_integration.product_catalog
            WHERE lower(product_description) %% lower(%(query)s)
            ORDER BY sim DESC
            LIMIT %(limit)s;
        """, {'query': query, 'limit': limit}).fetchall()
        
        return [{
            'productId': r[0],
            'description': r[1][:200] + '...',
            'category': r[2],
            'price': float(r[3]) if r[3] else 0,
            'stars': float(r[4]) if r[4] else 0,
            'reviews': int(r[5]) if r[5] else 0,
            'imgUrl': r[6],
            'score': float(r[7]) if r[7] else 0,
            'method': 'Fuzzy'
        } for r in results]

print("‚úÖ Fuzzy search ready")

In [None]:
# Test fuzzy search
results = fuzzy_search("wireles headphon", limit=3)
print(f"üîç Fuzzy: Found {len(results)} results for 'wireles headphon'")
if results:
    print(f"   Top: {results[0]['description'][:50]}... (sim: {results[0]['score']:.3f})")

### 3. Semantic Search (Solution)

**Key Pattern**: Cosine distance with HNSW index

```sql
(1 - (embedding <=> %(embedding)s::vector)) AS similarity
ORDER BY embedding <=> %(embedding)s::vector
```

**Why This Works**:
- `<=>` returns cosine distance (0=identical, 2=opposite)
- `1 - distance` converts to similarity for display
- ORDER BY distance (not similarity) enables HNSW index scan
- `::vector` cast required for parameter binding

In [None]:
# ============================================================
# 3. SEMANTIC SEARCH - VECTOR SIMILARITY (‚úÖ COMPLETE)
# ============================================================

def semantic_search(query: str, limit: int = 10) -> list[dict]:
    """Semantic search using pgvector with Cohere embeddings."""
    query_embedding = generate_embedding(query, "search_query")
    if not query_embedding:
        print("‚ùå Failed to generate query embedding")
        return []
    
    with psycopg.connect(
        host=dbhost, port=dbport, user=dbuser,
        password=dbpass, autocommit=True
    ) as conn:
        register_vector(conn)
        
        results = conn.execute("""
            SELECT 
                "productId",
                product_description,
                category_name,
                price,
                stars,
                reviews,
                imgurl,
                (1 - (embedding <=> %(embedding)s::vector)) AS similarity
            FROM bedrock_integration.product_catalog
            WHERE embedding IS NOT NULL
            ORDER BY embedding <=> %(embedding)s::vector
            LIMIT %(limit)s;
        """, {'embedding': query_embedding, 'limit': limit}).fetchall()
        
        return [{
            'productId': r[0],
            'description': r[1][:200] + '...',
            'category': r[2],
            'price': float(r[3]) if r[3] else 0,
            'stars': float(r[4]) if r[4] else 0,
            'reviews': int(r[5]) if r[5] else 0,
            'imgUrl': r[6],
            'score': float(r[7]) if r[7] else 0,
            'method': 'Semantic'
        } for r in results]

print("‚úÖ Semantic search ready")

In [None]:
# Test semantic search
results = semantic_search("gift for coffee lover", limit=3)
print(f"üß† Semantic: Found {len(results)} results for 'gift for coffee lover'")
if results:
    print(f"   Top: {results[0]['description'][:50]}... (sim: {results[0]['score']:.3f})")

### 4. Weighted Hybrid Search (Demonstration)

**Purpose**: Show the score normalization problem that RRF solves.

‚ö†Ô∏è **Intentionally Flawed**: Raw scores from different methods have incompatible ranges.

In [None]:
# ============================================================
# 4. WEIGHTED HYBRID - DEMONSTRATES SCORE PROBLEM
# ============================================================

def hybrid_search(
    query: str,
    semantic_weight: float = 0.7,
    keyword_weight: float = 0.3,
    limit: int = 10
) -> list[dict]:
    """Weighted hybrid search (demonstrates normalization problem)."""
    total = semantic_weight + keyword_weight
    semantic_weight = semantic_weight / total
    keyword_weight = keyword_weight / total
    
    semantic_results = semantic_search(query, limit * 2)
    keyword_results = keyword_search(query, limit * 2)
    
    product_scores = {}
    product_data = {}
    
    for result in semantic_results:
        pid = result['productId']
        product_scores[pid] = result['score'] * semantic_weight
        product_data[pid] = result
    
    for result in keyword_results:
        pid = result['productId']
        if pid in product_scores:
            product_scores[pid] += result['score'] * keyword_weight
        else:
            product_scores[pid] = result['score'] * keyword_weight
            product_data[pid] = result
    
    sorted_products = sorted(product_scores.items(), key=lambda x: x[1], reverse=True)[:limit]
    
    results = []
    for pid, score in sorted_products:
        product = product_data[pid].copy()
        product['score'] = score
        product['method'] = 'Hybrid'
        results.append(product)
    
    return results

print("‚úÖ Weighted hybrid ready (for comparison)")

### 5. Hybrid RRF Search (Solution)

**Key Pattern**: Reciprocal Rank Fusion with SQL CTEs

```sql
ROW_NUMBER() OVER (ORDER BY embedding <=> %(embedding)s::vector) AS rank
...
(1.0 / (60 + COALESCE(s.rank, 1000))) + (1.0 / (60 + COALESCE(k.rank, 1000))) AS rrf_score
```

**Why This Works**:
- `ROW_NUMBER()` assigns ranks 1, 2, 3... independent of score magnitude
- RRF formula `1/(k+rank)` creates comparable contributions from each method
- `COALESCE(rank, 1000)` penalizes products appearing in only one method
- `FULL OUTER JOIN` captures products from either method
- k=60 is empirically optimal across domains (Microsoft Research)

In [None]:
# ============================================================
# 5. HYBRID RRF SEARCH (‚úÖ COMPLETE)
# ============================================================

def hybrid_search_rrf(query: str, k: int = 60, limit: int = 10) -> list[dict]:
    """Hybrid search using Reciprocal Rank Fusion."""
    query_embedding = generate_embedding(query, "search_query")
    if not query_embedding:
        return []
    
    with psycopg.connect(
        host=dbhost, port=dbport, user=dbuser,
        password=dbpass, autocommit=True
    ) as conn:
        register_vector(conn)
        
        results = conn.execute("""
            WITH semantic_results AS (
                SELECT 
                    "productId",
                    ROW_NUMBER() OVER (ORDER BY embedding <=> %(embedding)s::vector) AS rank
                FROM bedrock_integration.product_catalog
                WHERE embedding IS NOT NULL
                ORDER BY embedding <=> %(embedding)s::vector
                LIMIT 50
            ),
            keyword_results AS (
                SELECT 
                    "productId",
                    ROW_NUMBER() OVER (ORDER BY ts_rank(to_tsvector('english', product_description), query) DESC) AS rank
                FROM bedrock_integration.product_catalog, to_tsquery('english', %(keyword_query)s) query
                WHERE to_tsvector('english', product_description) @@ query
                LIMIT 50
            )
            SELECT 
                COALESCE(s."productId", k."productId") AS "productId",
                p.product_description,
                p.category_name,
                p.price,
                p.stars,
                p.reviews,
                p.imgurl,
                (1.0 / (60 + COALESCE(s.rank, 1000))) + (1.0 / (60 + COALESCE(k.rank, 1000))) AS rrf_score
            FROM semantic_results s
            FULL OUTER JOIN keyword_results k ON s."productId" = k."productId"
            JOIN bedrock_integration.product_catalog p ON COALESCE(s."productId", k."productId") = p."productId"
            ORDER BY rrf_score DESC
            LIMIT %(limit)s;
        """, {
            'embedding': query_embedding,
            'keyword_query': ' & '.join(query.split()),
            'limit': limit
        }).fetchall()
        
        return [{
            'productId': r[0],
            'description': r[1][:200] + '...',
            'category': r[2],
            'price': float(r[3]) if r[3] else 0,
            'stars': float(r[4]) if r[4] else 0,
            'reviews': int(r[5]) if r[5] else 0,
            'imgUrl': r[6],
            'score': float(r[7]) if r[7] else 0,
            'method': 'Hybrid-RRF'
        } for r in results]

print("‚úÖ Hybrid RRF search ready")

In [None]:
# Test RRF search
results = hybrid_search_rrf("affordable wireless bluetooth headphones", limit=3)
print(f"‚öñÔ∏è RRF: Found {len(results)} results")
for i, r in enumerate(results, 1):
    print(f"   {i}. {r['description'][:45]}... (rrf: {r['score']:.4f})")

### 6. Cohere Rerank (Bonus)

In [None]:
# ============================================================
# 6. COHERE RERANK (BONUS)
# ============================================================

def rerank_results(query: str, results: list[dict], top_n: int = 10) -> list[dict]:
    """Rerank search results using Cohere Rerank v3.5."""
    if not results:
        return results
    
    try:
        documents = [r['description'] for r in results]
        
        response = bedrock_runtime.invoke_model(
            modelId='cohere.rerank-v3-5:0',
            body=json.dumps({
                "api_version": 2,
                "query": query,
                "documents": documents,
                "top_n": min(top_n, len(documents))
            })
        )
        
        rerank_response = json.loads(response['body'].read())
        reranked = []
        
        for item in rerank_response['results']:
            idx = item['index']
            result = results[idx].copy()
            result['rerank_score'] = item['relevance_score']
            reranked.append(result)
        
        return reranked
    except Exception as e:
        print(f"Rerank failed: {e}")
        return results

print("‚úÖ Rerank function ready")

---

## üéÆ Step 4: Interactive Search Interface

See all implementations working side-by-side.

In [None]:
# ============================================================
# INTERACTIVE SEARCH INTERFACE
# ============================================================

def create_search_interface():
    """Create an interactive search interface with proper product display"""
    import ipywidgets as widgets
    from IPython.display import display, HTML
    
    # Professional style definitions
    style = """
    <style>
        .search-container { padding: 20px; background: #f8f9fa; border-radius: 10px; }
        .result-card { 
            margin: 15px 0; padding: 20px; background: white; 
            border-radius: 8px; border: 1px solid #e3e6e8;
            transition: all 0.3s; position: relative;
            box-shadow: 0 1px 2px rgba(0,0,0,0.05);
        }
        .result-card:hover { 
            box-shadow: 0 8px 20px rgba(0,0,0,0.12); 
            transform: translateY(-2px);
            border-color: #ff9900;
        }
        .method-badge {
            position: absolute; top: 15px; right: 15px;
            padding: 5px 12px; border-radius: 20px;
            font-size: 11px; font-weight: bold;
            text-transform: uppercase;
        }
        .keyword { background: #e3f2fd; color: #1565c0; }
        .fuzzy { background: #fce4ec; color: #c2185b; }
        .semantic { background: #e8f5e9; color: #2e7d32; }
        .hybrid { background: #fff3e0; color: #e65100; }
        
        .product-content { display: flex; gap: 20px; }
        .product-image {
            flex-shrink: 0; width: 150px; height: 150px;
            object-fit: contain; border: 1px solid #e3e6e8;
            border-radius: 4px; padding: 10px; background: white;
        }
        .product-details { flex-grow: 1; }
        .product-title {
            font-size: 16px; color: #0066c0; text-decoration: none;
            font-weight: 500; line-height: 1.4; display: block; margin-bottom: 8px;
        }
        .product-title:hover { color: #c7511f; text-decoration: underline; }
        .product-price {
            font-size: 21px; color: #B12704; font-weight: 500; margin: 8px 0;
        }
        .product-rating {
            display: flex; align-items: center; gap: 8px; margin: 8px 0;
        }
        .stars { color: #ff9900; }
        .product-category { color: #565959; font-size: 12px; margin-top: 8px; }
        .score-info {
            margin-top: 12px; padding-top: 12px; border-top: 1px solid #e3e6e8;
            display: flex; justify-content: space-between; align-items: center;
        }
        .score-bar {
            height: 6px; background: #e9ecef; border-radius: 3px;
            overflow: hidden; flex-grow: 1; margin-right: 10px; max-width: 200px;
        }
        .score-fill {
            height: 100%; background: linear-gradient(90deg, #ff9900, #ff6600);
            transition: width 0.5s;
        }
        .score-text { color: #565959; font-size: 12px; font-weight: 500; }
        .comparison-grid {
            display: grid; grid-template-columns: repeat(auto-fit, minmax(400px, 1fr));
            gap: 20px; margin-top: 20px;
        }
        .no-results {
            padding: 40px; text-align: center; color: #565959;
            background: #f7f8f8; border-radius: 8px;
        }
    </style>
    """
    
    # Widget definitions
    query_input = widgets.Text(
        value='',
        placeholder='Try "Apple AirPods" or "coffee maker" or "laptop bag"...',
        description='Search:',
        style={'description_width': '80px'},
        layout=widgets.Layout(width='700px')
    )
    
    search_method = widgets.RadioButtons(
        options=[
            ('Keyword (Exact Match)', 'keyword'),
            ('Fuzzy (Typo Tolerance)', 'fuzzy'),
            ('Semantic (Conceptual)', 'semantic'),
            ('Hybrid (Combined)', 'hybrid'),
            ('Hybrid-RRF (Rank Fusion)', 'hybrid_rrf'),
            ('üîç Compare All Methods', 'compare')
        ],
        value='compare',
        description='Method:',
        style={'description_width': '80px'}
    )
    
    # Hybrid search weight sliders
    semantic_weight = widgets.FloatSlider(
        value=0.7, min=0, max=1, step=0.1,
        description='Semantic:',
        style={'description_width': '80px'},
        layout=widgets.Layout(width='350px')
    )
    
    keyword_weight = widgets.FloatSlider(
        value=0.3, min=0, max=1, step=0.1,
        description='Keyword:',
        style={'description_width': '80px'},
        layout=widgets.Layout(width='350px')
    )
    
    results_limit = widgets.IntSlider(
        value=3, min=1, max=10, step=1,
        description='Results:',
        style={'description_width': '80px'},
        layout=widgets.Layout(width='300px')
    )
    
    search_button = widgets.Button(
        description='üîç Search Products',
        button_style='primary',
        layout=widgets.Layout(width='200px', height='40px')
    )
    
    rerank_checkbox = widgets.Checkbox(
        value=False,
        description='Use Cohere Rerank',
        style={'description_width': 'initial'}
    )
    
    results_output = widgets.Output()
    
    # Example queries that demonstrate real differences
    example_queries = [
        # Exact keyword matches
        ("wireless bluetooth headphones", "Common Terms", "keyword"),
        ("stainless steel water bottle", "Product Type", "keyword"),
        
        # Conceptual searches
        ("something to keep coffee hot all day", "Problem Solving", "semantic"),
        ("gift for someone who loves cooking", "Gift Ideas", "semantic"),
        
        # Typo tolerance
        ("wireles blutooth hedphones", "With Typos", "fuzzy"),
        ("stainles steel watter botle", "Misspellings", "fuzzy"),
        
        # Balanced hybrid (RRF excels here)
        ("durable laptop backpack with USB charging", "Multi-Feature", "hybrid_rrf"),
        ("ergonomic office chair under 300 dollars", "Specs + Price", "hybrid_rrf"),
        
        # Mixed queries
        ("organic sustainable water bottle", "Features + Product", "hybrid"),
        ("affordable noise canceling headphones under 200", "Specs + Budget", "hybrid"),
        
        # Activity based
        ("equipment for home yoga practice", "Activity Based", "semantic"),
        ("tools for remote work from home", "Use Case", "semantic")
    ]
    
    def format_result(result: dict, method_class: str = '') -> str:
        """Format a single search result with full product display"""
        # Extract product details
        product_id = result.get('productId', 'Unknown')
        description = result.get('description', 'No description available')
        price = result.get('price', 0)
        stars = result.get('stars', 0)
        reviews = result.get('reviews', 0)
        category = result.get('category', 'Unknown Category')
        score = result.get('score', 0)
        rerank_score = result.get('rerank_score', None)
        img_url = result.get('imgUrl', '')  # Changed to imgUrl with capital U
        
        # Create star display
        star_display = '‚òÖ' * int(stars) + '‚òÜ' * (5 - int(stars))
        
        # Generate Amazon search link
        search_terms = description.split()[:5]
        link_url = f"https://www.amazon.com/s?k={'+'.join(search_terms)}"
        
        # Calculate score percentage for visual bar
        display_score = rerank_score if rerank_score is not None else score
        score_percent = min(display_score * 100, 100) if display_score > 0 else 0
        
        # Score label
        score_label = "Rerank Score" if rerank_score is not None else "Relevance"
        
        # Simple direct image embed exactly like Part 2 notebook
        return f"""
        <div class="result-card">
            <div class="method-badge {method_class}">{result.get('method', 'Unknown')}</div>
            
            <div class="product-content">
                <img src="{img_url}" style="width: 150px; height: 150px; object-fit: contain; border: 1px solid #e3e6e8; border-radius: 4px; padding: 10px; background: white;">
                
                <div class="product-details">
                    <a href="{link_url}" target="_blank" class="product-title">
                        {description}
                    </a>
                    
                    <div class="product-price">${price:.2f}</div>
                    
                    <div class="product-rating">
                        <span class="stars">{star_display}</span>
                        <span style="color: #007185; font-size: 14px;">({reviews:,} reviews)</span>
                    </div>
                    
                    <div class="product-category">Category: {category}</div>
                    
                    <div class="score-info">
                        <div style="display: flex; align-items: center; flex-grow: 1;">
                            <div class="score-bar">
                                <div class="score-fill" style="width: {score_percent}%"></div>
                            </div>
                            <span class="score-text">{score_label}: {display_score:.3f}</span>
                        </div>
                        <a href="{link_url}" target="_blank" style="color: #ff9900; text-decoration: none; font-size: 13px;">
                            View on Amazon ‚Üí
                        </a>
                    </div>
                </div>
            </div>
        </div>
        """
    
    def set_example_query(query: str, method: str | None = None):
        """Set an example query and optionally the search method"""
        query_input.value = query
        if method:
            search_method.value = method
    
    # Create example buttons
    example_buttons = []
    for query, label, best_method in example_queries:
        btn = widgets.Button(
            description=f"{label}: {query[:30]}..." if len(query) > 30 else f"{label}: {query}",
            layout=widgets.Layout(width='auto', margin='2px'),
            tooltip=f"Best with: {best_method}"
        )
        btn.on_click(lambda b, q=query, m=best_method: set_example_query(q, m))
        example_buttons.append(btn)
    
    def on_search_clicked(b):
        """Handle search button click"""
        results_output.clear_output()
        
        with results_output:
            display(HTML(style))
            
            query = query_input.value
            method = search_method.value
            limit = results_limit.value
            use_rerank = rerank_checkbox.value
            
            if not query:
                display(HTML('<div class="no-results">Please enter a search query!</div>'))
                return
            
            display(HTML(f'<h3 style="color: #0f1111;">üîç Results for: "{query}"</h3>'))
            
            if method == 'compare':
                # Compare all methods
                methods_to_compare = [
                    ('Keyword (Exact)', keyword_search, 'keyword'),
                    ('Fuzzy (Typos)', fuzzy_search, 'fuzzy'),
                    ('Semantic (Cohere)', semantic_search, 'semantic'),
                    ('Hybrid (70/30)', lambda q, l: hybrid_search(q, 0.7, 0.3, l), 'hybrid'),
                    ('Hybrid-RRF (k=60)', lambda q, l: hybrid_search_rrf(q, 60, l), 'hybrid')
                ]
                
                # Method colors
                method_colors = {
                    'keyword': '1565c0',
                    'fuzzy': 'c2185b',
                    'semantic': '2e7d32',
                    'hybrid': 'e65100'
                }
                
                html_output = '<div class="comparison-grid">'
                
                for method_name, func, css_class in methods_to_compare:
                    border_color = method_colors.get(css_class, '666666')
                    html_output += f'<div><h4 style="color: #0f1111; border-bottom: 2px solid #{border_color}; padding-bottom: 8px; margin-bottom: 15px;">{method_name}</h4>'
                    
                    try:
                        import time
                        start = time.time()
                        results = func(query, limit)
                        elapsed = time.time() - start
                        
                        # Apply reranking if enabled
                        if use_rerank and results:
                            results = rerank_results(query, results, min(len(results), 2))
                        
                        if results:
                            html_output += f'<p style="color: #565959; font-size: 12px;">Found {len(results)} results in {elapsed:.3f}s</p>'
                            for result in results[:2]:  # Show top 2 per method
                                html_output += format_result(result, css_class)
                        else:
                            html_output += '<div class="no-results">No results found with this method</div>'
                            
                    except Exception as e:
                        html_output += f'<div class="no-results">Error: {str(e)}</div>'
                    
                    html_output += '</div>'
                
                html_output += '</div>'
                display(HTML(html_output))
                
            else:
                # Single method search
                try:
                    import time
                    start = time.time()
                    
                    if method == 'keyword':
                        results = keyword_search(query, limit)
                        css_class = 'keyword'
                        method_name = 'Keyword (Exact Match)'
                    elif method == 'fuzzy':
                        results = fuzzy_search(query, limit)
                        css_class = 'fuzzy'
                        method_name = 'Fuzzy (Typo Tolerance)'
                    elif method == 'semantic':
                        results = semantic_search(query, limit)
                        css_class = 'semantic'
                        method_name = 'Semantic Search (Cohere)'
                    elif method == 'hybrid':
                        results = hybrid_search(
                            query, 
                            semantic_weight.value,
                            keyword_weight.value,
                            limit
                        )
                        css_class = 'hybrid'
                        method_name = f'Hybrid (S:{semantic_weight.value:.1f}/K:{keyword_weight.value:.1f})'
                    elif method == 'hybrid_rrf':
                        results = hybrid_search_rrf(query, 60, limit)
                        css_class = 'hybrid'
                        method_name = 'Hybrid-RRF (k=60)'
                    
                    elapsed = time.time() - start
                    
                    # Apply reranking if enabled
                    if use_rerank and results:
                        rerank_start = time.time()
                        results = rerank_results(query, results, len(results))
                        rerank_time = time.time() - rerank_start
                        total_time = elapsed + rerank_time
                        
                        display(HTML(f'''
                            <p style="color: #565959;">
                                Method: <strong>{method_name}</strong> | 
                                Search: <strong>{elapsed:.3f}s</strong> | 
                                Rerank: <strong>{rerank_time:.3f}s</strong> |
                                Total: <strong>{total_time:.3f}s</strong> | 
                                Results: <strong>{len(results)}</strong>
                            </p>
                        '''))
                    else:
                        display(HTML(f'''
                            <p style="color: #565959;">
                                Method: <strong>{method_name}</strong> | 
                                Time: <strong>{elapsed:.3f}s</strong> | 
                                Results: <strong>{len(results)}</strong>
                            </p>
                        '''))
                    
                    if results:
                        for result in results:
                            display(HTML(format_result(result, css_class)))
                    else:
                        display(HTML('<div class="no-results">No products found. Try a different search term or method.</div>'))
                        
                except Exception as e:
                    display(HTML(f'<div class="no-results">Error: {str(e)}</div>'))
                    import traceback
                    print(traceback.format_exc())
    
    search_button.on_click(on_search_clicked)
    
    # Create status display for weights
    weight_status = widgets.HTML(
        value="<div style='padding: 5px; font-size: 0.9em; color: #2E8B57;'>‚úì Weights sum to 1.0</div>"
    )

    def validate_and_update_weights(change):
        current_sum = round(semantic_weight.value + keyword_weight.value, 1)
        
        if current_sum > 1:
            # If semantic weight was changed
            if change.owner == semantic_weight:
                keyword_weight.value = max(0, round(1 - semantic_weight.value, 1))
            # If keyword weight was changed
            else:
                semantic_weight.value = max(0, round(1 - keyword_weight.value, 1))
            
            current_sum = round(semantic_weight.value + keyword_weight.value, 1)
        
        # Update status display
        if current_sum > 1:
            weight_status.value = f"<div style='padding: 5px; font-size: 0.9em; color: #DC143C;'>‚ö†Ô∏è Sum exceeds 1 (Current: {current_sum})</div>"
        elif current_sum == 1:
            weight_status.value = f"<div style='padding: 5px; font-size: 0.9em; color: #2E8B57;'>‚úì Weights sum to {current_sum}</div>"
        else:
            weight_status.value = f"<div style='padding: 5px; font-size: 0.9em; color: #DAA520;'>‚ÑπÔ∏è Sum is {current_sum}</div>"

    # Observe changes in both sliders
    semantic_weight.observe(validate_and_update_weights, names='value')
    keyword_weight.observe(validate_and_update_weights, names='value')
    
    # Layout
    display(HTML("""
        <style>
            .adaptive-title {
                color: #000000;
            }
            @media (prefers-color-scheme: dark) {
                .adaptive-title { color: #ffffff; }
            }
            body.vscode-dark .adaptive-title,
            body.vscode-high-contrast .adaptive-title,
            .jp-Notebook-dark .adaptive-title {
                color: #ffffff;
            }
        </style>
        <h2 class="adaptive-title">üõçÔ∏è Amazon Product Search Comparison</h2>
        <div style="background: #f7f8f8; padding: 15px; border-radius: 8px; margin: 15px 0;">
            <h4 style="color: #0f1111; margin-top: 0;">Search Method Strengths:</h4>
            <ul style="color: #565959; margin: 10px 0;">
                <li><strong style="color: #1565c0;">Keyword:</strong> Perfect for exact product names, SKUs, brand searches</li>
                <li><strong style="color: #c2185b;">Fuzzy:</strong> Handles typos and misspellings</li>
                <li><strong style="color: #2e7d32;">Semantic:</strong> Understands intent and concepts using Cohere embeddings</li>
                <li><strong style="color: #e65100;">Hybrid:</strong> Best overall - combines keyword matching with semantic understanding</li>
            </ul>
           <div style="border-left: 4px solid #4CAF50; padding-left: 10px; margin-top: 10px; color: black;"> 
                <strong>ü§ñ Cohere Models:</strong> embed-english-v3 (embeddings) ‚Ä¢ rerank-v3-5:0 (re-ranking)
            </div>
            <div style="background: #fff3e0; border-left: 4px solid #e65100; padding: 12px; margin-top: 15px; border-radius: 4px;">
                <h4 style="color: #2e7d32; margin-top: 0; margin-bottom: 8px;">üí° Understanding Hybrid Search Approaches</h4>
                <p style="color: #1b5e20; margin: 8px 0; font-size: 13px;">
                    <strong>Challenge:</strong> Different search methods produce vastly different score ranges (semantic: 0.7-1.0, keyword: 0.01-0.1), causing one method to dominate weighted combinations.
                </p>
                <p style="color: #1b5e20; margin: 8px 0; font-size: 13px;">
                    <strong>Solutions Demonstrated:</strong>
                </p>
                <ul style="color: #1b5e20; margin: 8px 0; font-size: 13px; padding-left: 20px;">
                    <li><strong>Hybrid (70/30):</strong> Weighted score fusion - simple but requires careful tuning</li>
                    <li><strong>Hybrid-RRF:</strong> Rank-based fusion - robust, no normalization needed ‚ú®</li>
                    <li><strong>Cohere Rerank:</strong> ML-based re-ranking - most sophisticated approach</li>
                </ul>
                <div style="background: #e8f5e9; border-left: 4px solid #4caf50; padding: 10px; margin: 8px 0; font-size: 13px; color: #1b5e20;">
                    <strong>üí° Try the examples below</strong> to see how each method handles different query types!
                </div>
            </div>
        </div>
    """))
    
    # Display interface
    display(widgets.VBox([
        widgets.HTML('<h4 style="color: #0f1111; margin: 15px 0;">üìù Example Searches (Click to Try):</h4>'),
        widgets.GridBox(
            example_buttons,
            layout=widgets.Layout(
                grid_template_columns='repeat(3, 1fr)',
                grid_gap='5px'
            )
        ),
        widgets.HTML('<hr style="margin: 20px 0; border-color: #e3e6e8;">'),
        query_input,
        search_method,
        widgets.HTML('<h4 style="color: #0f1111;">‚öôÔ∏è Options:</h4>'),
        widgets.HBox([
            widgets.VBox([
                widgets.HTML('<strong>Hybrid Weights:</strong>'),
                semantic_weight,
                keyword_weight,
                weight_status
            ]),
            widgets.VBox([
                results_limit,
                rerank_checkbox
            ])
        ]),
        search_button,
        results_output
    ]))

# Performance Note: Observe query times in UI below
# Target: <100ms for production | Actual: ~20-50ms with HNSW

# Create and display the interface
create_search_interface()

---

## üî¨ Step 5: Method Comparison

In [None]:
def compare_methods(query: str, limit: int = 5):
    """Compare all search methods side-by-side."""
    print(f"üîç Query: \"{query}\"\n")
    print("=" * 75)
    
    methods = [
        ('Keyword', keyword_search),
        ('Fuzzy', fuzzy_search),
        ('Semantic', semantic_search),
        ('Hybrid RRF', lambda q, l: hybrid_search_rrf(q, 60, l))
    ]
    
    for name, func in methods:
        try:
            results = func(query, limit)
            print(f"\nüìä {name.upper()} ({len(results)} results)")
            print("-" * 40)
            if results:
                for i, r in enumerate(results[:3], 1):
                    print(f"  {i}. {r['description'][:45]}... ({r['score']:.3f})")
            else:
                print("  No results")
        except Exception as e:
            print(f"\nüìä {name.upper()}: Error - {e}")

print("‚úÖ Comparison function ready")

In [None]:
# Compare methods with different query types
compare_methods("wireles hedphones")  # Typos ‚Üí Fuzzy shines

In [None]:
compare_methods("gift for coffee lover")  # Conceptual ‚Üí Semantic shines

In [None]:
compare_methods("affordable noise canceling under 200")  # Complex ‚Üí RRF shines

---

## üìä Step 6: Score Distribution Analysis

Visualize why RRF works better than weighted fusion.

In [None]:
# Show score ranges from different methods
query = "wireless bluetooth headphones"

semantic_results = semantic_search(query, 10)
keyword_results = keyword_search(query, 10)

print("üìä SCORE DISTRIBUTION COMPARISON")
print("=" * 50)
print(f"\nQuery: \"{query}\"\n")

if semantic_results:
    sem_scores = [r['score'] for r in semantic_results]
    print(f"SEMANTIC SCORES (similarity 0-1):")
    print(f"  Range: {min(sem_scores):.3f} - {max(sem_scores):.3f}")
    print(f"  Spread: {max(sem_scores) - min(sem_scores):.3f}")

if keyword_results:
    kw_scores = [r['score'] for r in keyword_results]
    print(f"\nKEYWORD SCORES (ts_rank):")
    print(f"  Range: {min(kw_scores):.4f} - {max(kw_scores):.4f}")
    print(f"  Spread: {max(kw_scores) - min(kw_scores):.4f}")

print("\n‚ö†Ô∏è  PROBLEM: These ranges are incompatible!")
print("   Semantic ~0.7-0.9 vs Keyword ~0.01-0.08")
print("   Weighted fusion would let semantic dominate.")
print("\n‚úÖ SOLUTION: RRF uses ranks (1,2,3...) instead of scores")

---

## üéØ Key Takeaways

### Method Selection Guide

| Method | Latency | Best For | Index Type |
|--------|---------|----------|------------|
| Keyword | <10ms | Exact terms, SKUs | GIN (tsvector) |
| Fuzzy | <50ms | Typos, user input | GIN (trigram) |
| Semantic | <100ms | Concepts, intent | HNSW (vector) |
| Hybrid RRF | <150ms | Mixed patterns | Combined |

### RRF vs Weighted Fusion

| Aspect | Weighted | RRF |
|--------|----------|-----|
| Score normalization | Required | Not needed |
| Weight tuning | Per-domain | k=60 works universally |
| Implementation | Complex | Simple addition |
| Production use | Rare | Google, Elasticsearch, etc. |

---

<div style="background: linear-gradient(135deg, #66bb6a 0%, #43a047 100%); 
            color: white; 
            padding: 20px 30px; 
            border-radius: 12px; 
            margin: 25px 0;
            border-left: 5px solid #ffd700;">
<h2 style="margin: 0 0 12px 0; color: white;">‚úÖ Solutions Complete!</h2>
<p style="margin: 0 0 15px 0; font-size: 15px;">
All search methods implemented and tested. Use this notebook as a reference.
</p>
<p style="margin: 0; font-size: 14px;">
üöÄ <strong>Next:</strong> Explore the Streamlit demo for MCP integration and RLS!
</p>
</div>

### üöÄ Launch Demo
```bash
demo
streamlit run streamlit_app.py
```

### üìö Resources
**GitHub:** [github.com/aws-samples/sample-dat409-hybrid-search-aurora-mcp](https://github.com/aws-samples/sample-dat409-hybrid-search-aurora-mcp)