# Part 1: AI-Powered Semantic Product Search
### Configuration, Vector Embeddings and Data Ingestion with pgvector and Amazon Bedrock

<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 20px; border-radius: 10px; color: white; margin: 20px 0;">
    <h3 style="margin: 0 0 10px 0;">⏱️ Estimated Time: 20 Minutes</h3>
    <p style="margin: 0; font-size: 0.95em;">This hands-on lab introduces semantic search fundamentals through a real-world product catalog application.</p>
</div>

---

## 🎯 Learning Objectives

By the end of this lab, you will:

1. **Understand Vector Embeddings**: Learn how AI models convert text into high-dimensional vectors that capture semantic meaning
2. **Implement pgvector**: Set up and use PostgreSQL's vector extension for similarity search
3. **Compare Search Methods**: See the dramatic difference between keyword and semantic search
4. **Optimize Performance**: Create HNSW indexes for fast vector similarity queries
5. **Build Interactive Search**: Create a production-ready search interface with real-time results

---

## Overview

Blaize Bazaar uses semantic search to help customers find products using natural language queries. The system understands context beyond simple keyword matching.

**Tech Stack**:
- **Embeddings**: Amazon Titan Text Embeddings V2 (1024 dimensions)
- **Database**: Aurora PostgreSQL with pgvector extension
- **Search**: Cosine similarity matching via HNSW indexing

## Dataset

- **Source**: [Amazon Products Dataset 2023](https://www.kaggle.com/datasets/asaniczka/amazon-products-dataset-2023-1-4m-products/data)
- **Size**: 21,704 curated products with verified ratings

## Product Catalog Schema

Table: `bedrock_integration.product_catalog`

| Column | Type | Description | 
|--------|------|-------------|
| productId | VARCHAR(255) PK | Unique product identifier |
| product_description | TEXT | Product details |
| imgurl, producturl | TEXT | Product media links |
| stars, reviews | NUMERIC, INT | Rating metrics |
| price | NUMERIC | Product price |
| category_id, category_name | INT, VARCHAR(255) | Category info |
| isbestseller, boughtinlastmonth | BOOLEAN, INT | Sales metrics |
| quantity | INT | Stock level |
| **embedding** | **VECTOR(1024)** | **Semantic search vector** |

**Search Optimization Indexes**:

| Index Type | Column | Purpose |
|------------|--------|---------|
| HNSW | `embedding` | Vector similarity (m=16, ef_construction=64, cosine) |
| GIN | `product_description` | Full-text search (English) |
| B-tree | `category_name` | Category filtering |
| B-tree | `price` | Price range queries |
| B-tree (PK) | `productId` | Unique lookups |

*All indexes are created with `IF NOT EXISTS` and optimized via `VACUUM ANALYZE`*

## Architecture

![Product Search Architecture](../static/Product_Catalog.png)

**Search Flow**:
1. Generate embeddings from product descriptions (Bedrock Titan)
2. Store vectors in PostgreSQL with pgvector
3. Convert customer queries to embeddings
4. Return top matches via cosine similarity

<div style="background: #d3f9d8; border: 2px solid #51cf66; border-radius: 8px; padding: 15px; margin: 20px 0;">
    <div style="font-weight: bold; color: #000000; margin-bottom: 8px;">🚀 Quick Start Guide</div>
    <div style="color: #000000; line-height: 1.6;">
        <strong>First time here?</strong> Simply run all cells from top to bottom:
        <ol style="margin: 8px 0;">
            <li><strong>Menu → Cell → Run All</strong> (or Shift+Enter through each cell)</li>
            <li>Wait ~6-7 minutes for all processing to complete</li>
            <li>Try the interactive search widget that appears</li>
            <li>Compare keyword vs. semantic search results</li>
        </ol>
        <strong>⏱️ Time breakdown:</strong> Setup (1 min) → Embedding generation (3 min) → Indexing (2 min) → Interactive exploration (remaining time)
    </div>
</div>

## Setup

Install dependencies with pinned versions for reproducibility:

In [None]:
# ========================================================================
# DAT406 Workshop - Dependency Installation
# ========================================================================
# Installs all required packages from requirements.txt
# 
# Includes:
#   - AWS SDK (boto3)
#   - PostgreSQL driver with pgvector support (psycopg)
#   - Data science stack (pandas, numpy, matplotlib, seaborn)
#   - FastAPI for backend APIs
#   - Jupyter widgets and utilities
# ========================================================================

%pip install -q -r requirements.txt
print("✅ Setup complete! Ready to start the workshop.")

<div style="background: #e7f5ff; border-left: 4px solid #1971c2; padding: 12px; margin: 15px 0; border-radius: 4px;">
    <strong style="color: #000000;">✅ Checkpoint 1 of 5:</strong> <span style="color: #000000;">Dependencies installed. Next up: Database configuration.</span>
</div>

## Import Libraries & Initialize Services

In [None]:
import pandas as pd
import numpy as np
import boto3
import json
import psycopg
from psycopg_pool import ConnectionPool
from pgvector.psycopg import register_vector
from pandarallel import pandarallel
from tqdm.notebook import tqdm
import time
from datetime import datetime
import ipywidgets as widgets
from IPython.display import display, HTML, clear_output
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from sklearn.manifold import TSNE

warnings.filterwarnings('ignore')

# Configure visualization
sns.set_style('darkgrid')
plt.rcParams['figure.figsize'] = (12, 6)

# Initialize Bedrock client with retry configuration
bedrock_runtime = boto3.client(
    'bedrock-runtime',
    config=boto3.session.Config(
        retries={'max_attempts': 3, 'mode': 'adaptive'}
    )
)

print("✅ Libraries initialized")

## Database Configuration

Initialize connection pool for optimized throughput:

In [None]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Get DB secret ARN from environment (set by bootstrap script)
DB_SECRET_ARN = os.getenv('DB_SECRET_ARN', 'apg-pgvector-secret-dat406')

# Retrieve credentials from Secrets Manager
client = boto3.client('secretsmanager')
response = client.get_secret_value(SecretId=DB_SECRET_ARN)
db_secrets = json.loads(response['SecretString'])

# Database connection string
DB_CONNINFO = f"host={db_secrets['host']} port={db_secrets['port']} " \
              f"user={db_secrets['username']} password={db_secrets['password']} dbname=postgres"

# Initialize or reinitialize connection pool
try:
    # Check if pool exists and is closed
    if 'pool' in globals() and pool.closed:
        print("⚠️ Pool was closed, reinitializing...")
        pool = ConnectionPool(
            conninfo=DB_CONNINFO,
            min_size=5,
            max_size=10,
            timeout=30,
            max_waiting=20
        )
    elif 'pool' not in globals():
        # First time initialization
        pool = ConnectionPool(
            conninfo=DB_CONNINFO,
            min_size=5,
            max_size=10,
            timeout=30,
            max_waiting=20
        )
    else:
        print("✅ Using existing pool connection")
except Exception as e:
    print(f"Reinitializing pool due to: {e}")
    pool = ConnectionPool(
        conninfo=DB_CONNINFO,
        min_size=5,
        max_size=10,
        timeout=30,
        max_waiting=20
    )

print(f"✅ Connection pool ready: {db_secrets['host']}")
print(f"   Pool size: 5-10 connections | Timeout: 30s")

<div style="background: #e7f5ff; border-left: 4px solid #1971c2; padding: 12px; margin: 15px 0; border-radius: 4px;">
    <strong style="color: #000000;">✅ Checkpoint 2 of 5:</strong> <span style="color: #000000;">Database connected. Next: Setting up schema with vector support.</span>
</div>

## Schema Setup with Optimized Indexes

In [None]:
def setup_database():
    """Create schema with performance-optimized indexes"""
    with pool.connection() as conn:
        with conn.cursor() as cur:
            # Enable extensions
            cur.execute("CREATE EXTENSION IF NOT EXISTS vector;")
            register_vector(conn)
            
            # Create schema
            cur.execute("CREATE SCHEMA IF NOT EXISTS bedrock_integration;")
            
            # Drop existing table for clean slate
            cur.execute("DROP TABLE IF EXISTS bedrock_integration.product_catalog CASCADE;")
            
            # Create table with optimized data types
            cur.execute("""
                CREATE TABLE bedrock_integration.product_catalog (
                    "productId" VARCHAR(255) PRIMARY KEY,
                    product_description TEXT NOT NULL,
                    imgurl TEXT,
                    producturl TEXT,
                    stars NUMERIC(3,2),
                    reviews INTEGER,
                    price NUMERIC(10,2),
                    category_id INTEGER,
                    isbestseller BOOLEAN DEFAULT FALSE,
                    boughtinlastmonth INTEGER,
                    category_name VARCHAR(255),
                    quantity INTEGER DEFAULT 0,
                    embedding vector(1024),
                    created_at TIMESTAMP DEFAULT NOW()
                );
            """)
            
            conn.commit()
            
    print("✅ Database schema created")

setup_database()

<div style="background: #e7f5ff; border-left: 4px solid #1971c2; padding: 12px; margin: 15px 0; border-radius: 4px;">
    <strong style="color: #000000;">✅ Checkpoint 3 of 5:</strong> <span style="color: #000000;">Schema created with pgvector extension. Next: Loading product data.</span>
</div>

## Load & Validate Product Data

In [None]:
# Load dataset
df = pd.read_csv('../data/amazon-products-sample.csv')

# Data quality checks
initial_count = len(df)
df = df.dropna(subset=['product_description'])
df = df.drop_duplicates(subset=['productId'])

# Fill missing values with sensible defaults
df = df.fillna({
    'stars': 0.0,
    'reviews': 0,
    'price': 0.0,
    'category_id': 0,
    'isbestseller': False,
    'boughtinlastmonth': 0,
    'category_name': 'Uncategorized',
    'quantity': 0,
    'imgUrl': '',
    'productURL': '',
    'isBestSeller': False,
    'boughtInLastMonth': 0,
    'producturl': ''
})

# Truncate long descriptions for embedding efficiency
df['product_description'] = df['product_description'].str[:2000]

print(f"📊 Data Quality Report:")
print(f"   Initial rows: {initial_count:,}")
print(f"   Valid products: {len(df):,}")
print(f"   Categories: {df['category_name'].nunique()}")
print(f"   Avg price: ${df['price'].mean():.2f}")
print(f"   Avg rating: {df['stars'].mean():.2f}/5.0")

df.head(3)

## Parallel Embedding Generation

Generate 1024-dimensional embeddings using Amazon Titan V2 with parallel processing:

<div style="background: #e7f5ff; border-left: 4px solid #1971c2; padding: 15px; margin: 20px 0; border-radius: 4px;">
    <div style="font-weight: bold; color: #000000; margin-bottom: 8px;">💡 Understanding Embeddings</div>
    <div style="color: #000000; line-height: 1.6;">
        Embeddings are numerical representations of text in high-dimensional space (1024 dimensions in our case). 
        Similar concepts are placed close together, enabling semantic search. For example:
        <ul style="margin: 8px 0;">
            <li>"wireless headphones" and "Bluetooth earbuds" → Similar vectors (close together)</li>
            <li>"headphones" and "refrigerator" → Dissimilar vectors (far apart)</li>
        </ul>
        This process takes ~3 minutes for 21,704 products using parallel processing.
    </div>
</div>

In [None]:
def generate_embedding(text):
    """Generate Titan v2 embedding with error handling"""
    try:
        response = bedrock_runtime.invoke_model(
            body=json.dumps({
                'inputText': str(text)[:2000],  # Truncate for API limits
                'dimensions': 1024,
                'normalize': True
            }),
            modelId='amazon.titan-embed-text-v2:0',
            accept="application/json",
            contentType="application/json"
        )
        return json.loads(response['body'].read())['embedding']
    except Exception as e:
        print(f"⚠️  Embedding error: {str(e)[:50]}")
        return [0.0] * 1024  # Return zero vector on error

# Initialize parallel processing (10 workers for optimal throughput)
pandarallel.initialize(progress_bar=True, nb_workers=10, verbose=0)

# Generate embeddings in parallel
print("🔄 Generating embeddings... (ETA: ~3 minutes)")
start = time.time()

df['embedding'] = df['product_description'].parallel_apply(generate_embedding)

elapsed = time.time() - start
print(f"\n✅ Embeddings generated in {elapsed:.1f}s ({len(df)/elapsed:.1f} products/sec)")

## Optimized Batch Insertion

Insert data using `executemany` with UPSERT for idempotency:

In [None]:
def batch_insert_products(df, batch_size=1000):
    """Optimized batch insertion with progress tracking"""
    start_time = time.time()
    
    with pool.connection() as conn:
        with conn.cursor() as cur:
            # Prepare batches
            batches = []
            for _, row in df.iterrows():
                batches.append(tuple([
                    row['productId'],
                    row['product_description'],
                    row.get('imgUrl', row.get('imgurl', '')),
                    row.get('productURL', row.get('producturl', '')),
                    float(row['stars']),
                    int(row['reviews']),
                    float(row['price']),
                    int(row['category_id']),
                    bool(row.get('isBestSeller', row.get('isbestseller', False))),
                    int(row.get('boughtInLastMonth', row.get('boughtinlastmonth', 0))),
                    row['category_name'],
                    int(row['quantity']),
                    row['embedding']
                ]))
            
            # Execute batch insert with UPSERT
            insert_sql = """
                INSERT INTO bedrock_integration.product_catalog 
                ("productId", product_description, imgurl, producturl, stars, reviews, 
                 price, category_id, isbestseller, boughtinlastmonth, category_name, 
                 quantity, embedding)
                VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
                ON CONFLICT ("productId") DO UPDATE SET
                    product_description = EXCLUDED.product_description,
                    embedding = EXCLUDED.embedding
            """
            
            # Process in chunks for progress feedback
            with tqdm(total=len(batches), desc="Inserting batches") as pbar:
                for i in range(0, len(batches), batch_size):
                    chunk = batches[i:i+batch_size]
                    cur.executemany(insert_sql, chunk)
                    conn.commit()
                    pbar.update(len(chunk))
            
            # Verify insertion
            cur.execute("SELECT COUNT(*) FROM bedrock_integration.product_catalog")
            count = cur.fetchone()[0]
            
    elapsed = time.time() - start_time
    print(f"\n✅ Inserted {count:,} products in {elapsed:.1f}s ({count/elapsed:.0f} rows/sec)")
    return count

batch_insert_products(df)

## Create Performance Indexes

Build HNSW vector index + supporting indexes for optimal query performance:

<div style="background: #e7f5ff; border-left: 4px solid #1971c2; padding: 15px; margin: 20px 0; border-radius: 4px;">
    <div style="font-weight: bold; color: #000000; margin-bottom: 8px;">💡 Why HNSW Index?</div>
    <div style="color: #000000; line-height: 1.6;">
        HNSW (Hierarchical Navigable Small World) is a graph-based algorithm that enables fast approximate nearest neighbor search:
        <ul style="margin: 8px 0;">
            <li><strong>m=16:</strong> Number of connections per layer (higher = more accuracy, slower build)</li>
            <li><strong>ef_construction=64:</strong> Search breadth during index building</li>
            <li><strong>Result:</strong> Sub-50ms search latency even with millions of vectors</li>
        </ul>
        Without this index, finding similar vectors would require comparing against every product (very slow!).
    </div>
</div>

In [None]:
def create_indexes():
    """Create optimized indexes with timing"""
    with pool.connection() as conn:
        with conn.cursor() as cur:
            indexes = [
                ("HNSW Vector Index", """
                    CREATE INDEX IF NOT EXISTS idx_product_embedding_hnsw 
                    ON bedrock_integration.product_catalog 
                    USING hnsw (embedding vector_cosine_ops)
                    WITH (m = 16, ef_construction = 64)
                """),
                ("Full-Text Search (GIN)", """
                    CREATE INDEX IF NOT EXISTS idx_product_fts 
                    ON bedrock_integration.product_catalog
                    USING GIN (to_tsvector('english', product_description))
                """),
                ("Category B-Tree", """
                    CREATE INDEX IF NOT EXISTS idx_product_category 
                    ON bedrock_integration.product_catalog(category_name)
                """),
                ("Price Range", """
                    CREATE INDEX IF NOT EXISTS idx_product_price 
                    ON bedrock_integration.product_catalog(price) 
                    WHERE price > 0
                """)
            ]
            
            for name, sql in indexes:
                start = time.time()
                cur.execute(sql)
                elapsed = time.time() - start
                print(f"✅ {name}: {elapsed:.2f}s")
            
            conn.commit()
    
    # Run VACUUM ANALYZE outside transaction
    with pool.connection() as conn:
        conn.autocommit = True
        with conn.cursor() as cur:
            cur.execute("VACUUM ANALYZE bedrock_integration.product_catalog")
    
    print("\n✅ All indexes created and analyzed")

create_indexes()

<div style="background: #e7f5ff; border-left: 4px solid #1971c2; padding: 12px; margin: 15px 0; border-radius: 4px;">
    <strong style="color: #000000;">✅ Checkpoint 4 of 5:</strong> <span style="color: #000000;">Indexes optimized. Ready for interactive search!</span>
</div>

## Interactive Semantic Search Widget

**⚠️ Important**: Run this cell only once. If you see duplicate results, restart the kernel and run all cells again.

Explore the product catalog with real-time semantic search:

In [None]:
# Create interactive UI with side-by-side comparison
search_input = widgets.Text(
    value='something to keep my drinks cold',
    placeholder='Enter search query...',
    description='Search:',
    style={'description_width': '80px'},
    layout=widgets.Layout(width='70%')
)

limit_slider = widgets.IntSlider(
    value=5,
    min=1,
    max=10,
    step=1,
    description='Results:',
    style={'description_width': '80px'},
    layout=widgets.Layout(width='30%')
)

search_button = widgets.Button(
    description='🔍 Search',
    button_style='primary',
    layout=widgets.Layout(width='150px', height='40px')
)

loading_indicator = widgets.HTML(value="")

output = widgets.Output(
    layout=widgets.Layout(
        border='1px solid #ddd',
        padding='15px',
        margin='15px 0',
        min_height='200px'
    )
)

# Example query buttons - designed to show keyword search failures
example_queries = [
    "something to keep my drinks cold",
    "gift for someone who loves reading",
    "make my home more secure",
    "wireless noise cancelling headphones",
    "portable power for camping"
]

def create_example_button(query_text):
    """Create a button for example queries"""
    button = widgets.Button(
        description=query_text[:35] + '...' if len(query_text) > 35 else query_text,
        layout=widgets.Layout(width='auto', margin='2px'),
        style={'button_color': '#e9ecef'},
        tooltip=query_text
    )
    
    def on_click(b):
        search_input.value = query_text
    
    button.on_click(on_click)
    return button

example_buttons = [create_example_button(q) for q in example_queries]

def keyword_search_simulation(query, num_results=5):
    """Simulate traditional keyword search using PostgreSQL full-text search"""
    try:
        with pool.connection() as conn:
            with conn.cursor() as cur:
                cur.execute("""
                    SELECT 
                        \"productId\",
                        product_description,
                        imgurl,
                        producturl,
                        price,
                        stars,
                        reviews,
                        category_name,
                        ts_rank(to_tsvector('english', product_description), 
                               plainto_tsquery('english', %s)) as keyword_score
                    FROM bedrock_integration.product_catalog
                    WHERE to_tsvector('english', product_description) @@ plainto_tsquery('english', %s)
                    ORDER BY keyword_score DESC
                    LIMIT %s
                """, (query, query, num_results))
                
                return cur.fetchall()
    except:
        return []

def perform_unified_search(b):
    """Execute both semantic and keyword search with beautiful side-by-side display"""
    loading_indicator.value = "<h4 style='color: #007bff; text-align: center;'>🔍 Searching...</h4>"
    
    with output:
        clear_output(wait=True)
        query = search_input.value
        limit = limit_slider.value
        
        if not query.strip():
            display(HTML("<p style='color: #dc3545;'>⚠️ Please enter a search query</p>"))
            loading_indicator.value = ""
            return
        
        try:
            # Generate query embedding for semantic search
            query_embedding = generate_embedding(query)
            
            # Execute semantic search
            with pool.connection() as conn:
                with conn.cursor() as cur:
                    start = time.time()
                    cur.execute("""
                        SELECT 
                            \"productId\",
                            product_description,
                            imgurl,
                            producturl,
                            price,
                            stars,
                            reviews,
                            category_name,
                            1 - (embedding <=> %s::vector) as similarity
                        FROM bedrock_integration.product_catalog
                        WHERE embedding IS NOT NULL
                        ORDER BY embedding <=> %s::vector
                        LIMIT %s
                    """, (query_embedding, query_embedding, limit))
                    
                    semantic_results = cur.fetchall()
                    semantic_latency = (time.time() - start) * 1000
            
            # Execute keyword search
            start = time.time()
            keyword_results = keyword_search_simulation(query, limit)
            keyword_latency = (time.time() - start) * 1000
            
            # Build enhanced HTML output with side-by-side comparison
            html_output = f"""
            <style>
                .search-container {{
                    font-family: 'Amazon Ember', Arial, sans-serif;
                    max-width: 1400px;
                }}
                .comparison-header {{
                    background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
                    color: white;
                    padding: 20px;
                    border-radius: 10px;
                    margin-bottom: 20px;
                    box-shadow: 0 4px 6px rgba(0,0,0,0.1);
                }}
                .search-stats {{
                    font-size: 0.95em;
                    opacity: 0.95;
                    margin-top: 10px;
                }}
                .comparison-grid {{
                    display: grid;
                    grid-template-columns: 1fr 1fr;
                    gap: 20px;
                    margin: 20px 0;
                }}
                .search-column {{
                    border: 3px solid;
                    border-radius: 12px;
                    padding: 0;
                    background: white;
                    box-shadow: 0 4px 12px rgba(0,0,0,0.1);
                }}
                .keyword-column {{
                    border-color: #ff6b6b;
                }}
                .semantic-column {{
                    border-color: #51cf66;
                }}
                .column-header {{
                    padding: 15px;
                    font-size: 1.2em;
                    font-weight: bold;
                    text-align: center;
                    color: white;
                    border-radius: 8px 8px 0 0;
                }}
                .keyword-header {{
                    background: #ff6b6b;
                }}
                .semantic-header {{
                    background: #51cf66;
                }}
                .column-subheader {{
                    font-size: 0.75em;
                    font-weight: normal;
                    opacity: 0.95;
                    margin-top: 4px;
                }}
                .results-container {{
                    padding: 15px;
                    min-height: 200px;
                }}
                .product-card {{
                    margin: 15px 0;
                    padding: 15px;
                    border: 1px solid #ddd;
                    border-radius: 10px;
                    box-shadow: 0 2px 8px rgba(0,0,0,0.08);
                    background: white;
                    transition: all 0.3s ease;
                }}
                .product-card:hover {{
                    transform: translateY(-3px);
                    box-shadow: 0 6px 16px rgba(0,0,0,0.15);
                    border-color: #ba68c8;
                }}
                .product-grid {{
                    display: grid;
                    grid-template-columns: 140px 1fr;
                    gap: 15px;
                    align-items: start;
                }}
                .product-image {{
                    width: 100%;
                    height: auto;
                    border-radius: 6px;
                    border: 1px solid #eee;
                }}
                .product-info {{
                    display: flex;
                    flex-direction: column;
                    gap: 8px;
                }}
                .product-title {{
                    font-size: 1em;
                    font-weight: 600;
                    color: #232f3e;
                    line-height: 1.4;
                    margin: 0 0 6px 0;
                }}
                .product-price {{
                    color: #B12704;
                    font-weight: bold;
                    font-size: 1.3em;
                }}
                .product-rating {{
                    display: flex;
                    align-items: center;
                    gap: 6px;
                    font-size: 0.9em;
                }}
                .stars {{
                    color: #FFA41C;
                    font-size: 1em;
                }}
                .reviews {{
                    color: #007185;
                }}
                .product-category {{
                    color: #565959;
                    font-size: 0.85em;
                    background: #f0f2f5;
                    padding: 3px 8px;
                    border-radius: 4px;
                    display: inline-block;
                }}
                .similarity-badge {{
                    background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
                    color: white;
                    padding: 4px 10px;
                    border-radius: 20px;
                    font-weight: bold;
                    font-size: 0.85em;
                    display: inline-block;
                }}
                .product-link {{
                    background: #ff9900;
                    color: white;
                    padding: 6px 12px;
                    border-radius: 6px;
                    text-decoration: none;
                    display: inline-block;
                    margin-top: 6px;
                    font-weight: 500;
                    font-size: 0.85em;
                    transition: background 0.3s ease;
                }}
                .product-link:hover {{
                    background: #e88b00;
                }}
                .rank-badge {{
                    background: #232f3e;
                    color: white;
                    padding: 3px 8px;
                    border-radius: 4px;
                    font-weight: bold;
                    font-size: 0.85em;
                    display: inline-block;
                    margin-bottom: 8px;
                }}
                .no-results {{
                    text-align: center;
                    padding: 60px 20px;
                    color: #666;
                }}
                .no-results-icon {{
                    font-size: 4em;
                    margin-bottom: 15px;
                    opacity: 0.3;
                }}
                .no-results-text {{
                    font-size: 1.1em;
                    font-weight: 500;
                    color: #444;
                    margin-bottom: 8px;
                }}
                .no-results-reason {{
                    font-size: 0.9em;
                    color: #666;
                    font-style: italic;
                }}
                .insight-box {{
                    background: #fff3cd;
                    border: 2px solid #ffc107;
                    border-radius: 8px;
                    padding: 15px;
                    margin: 20px 0;
                }}
                .insight-title {{
                    font-weight: bold;
                    color: #856404;
                    margin-bottom: 8px;
                    font-size: 1.05em;
                }}
            </style>
            
            <div class="search-container">
                <div class="comparison-header">
                    <h2 style="margin: 0 0 8px 0;">🔍 Search Comparison: Keyword vs. Semantic</h2>
                    <div class="search-stats">
                        Query: <strong>"{query}"</strong><br>
                        Keyword Results: <strong>{len(keyword_results)}</strong> in {keyword_latency:.1f}ms | 
                        Semantic Results: <strong>{len(semantic_results)}</strong> in {semantic_latency:.1f}ms
                    </div>
                </div>
                
                <div class="insight-box">
                    <div class="insight-title">💡 The Key Difference:</div>
                    <div style="color: #856404; line-height: 1.6;">
                        <div style="background: white; padding: 12px; border-radius: 6px; margin: 10px 0;">
                            <strong style="color: #c92a2a;">❌ Keyword Search (Red):</strong> 
                            <ul style="margin: 8px 0; padding-left: 25px;">
                                <li><strong>Conceptual queries FAIL completely</strong> - try "something to keep my drinks cold" or "make my home more secure"</li>
                                <li><strong>Only works when you guess the exact product vocabulary</strong> - "wireless noise cancelling headphones" succeeds because you used industry terms</li>
                                <li><strong>Unreliable and frustrating</strong> - users must learn product-specific jargon</li>
                            </ul>
                        </div>
                        <div style="background: white; padding: 12px; border-radius: 6px; margin: 10px 0;">
                            <strong style="color: #2b8a3e;">✅ Semantic Search (Green):</strong>
                            <ul style="margin: 8px 0; padding-left: 25px;">
                                <li><strong>ALL queries succeed</strong> - natural language, technical terms, or conceptual requests</li>
                                <li><strong>Understands intent</strong> - "keep drinks cold" finds coolers, insulated bottles, ice packs</li>
                                <li><strong>No vocabulary guessing required</strong> - users can describe what they want, not what it's called</li>
                            </ul>
                        </div>
                        <div style="background: #fff5e6; padding: 10px; border-left: 4px solid #ff9900; margin: 10px 0;">
                            <strong>🎯 Key Insight:</strong> Notice how some queries work for both methods while others only work for semantic search. 
                            This shows keyword search isn't completely broken—it just forces users to speak "product language" instead of natural language.
                        </div>
                    </div>
                </div>
                
                <div class="comparison-grid">
                    <div class="search-column keyword-column">
                        <div class="column-header keyword-header">
                            🔤 Keyword Search
                            <div class="column-subheader">Traditional exact text matching</div>
                        </div>
                        <div class="results-container">
            """
            
            # Keyword search results
            if not keyword_results:
                html_output += """
                    <div class="no-results">
                        <div class="no-results-icon">❌</div>
                        <div class="no-results-text">No Results Found</div>
                        <div class="no-results-reason">Query words not found in product descriptions</div>
                    </div>
                """
            else:
                for i, row in enumerate(keyword_results, 1):
                    pid, desc, img_url, prod_url, price, stars, reviews, cat, score = row
                    
                    img_url = img_url or 'https://via.placeholder.com/140x140?text=No+Image'
                    prod_url = prod_url or '#'
                    price = price or 0.0
                    stars = stars or 0.0
                    reviews = reviews or 0
                    
                    full_stars = int(stars)
                    half_star = 1 if (stars - full_stars) >= 0.5 else 0
                    empty_stars = 5 - full_stars - half_star
                    star_display = '⭐' * full_stars + ('⭐' * half_star) + '☆' * empty_stars
                    
                    html_output += f"""
                    <div class="product-card">
                        <div class="rank-badge">#{i}</div>
                        <div class="product-grid">
                            <div>
                                <img src="{img_url}" class="product-image" alt="Product image">
                            </div>
                            <div class="product-info">
                                <div class="product-title">{desc[:120]}{'...' if len(desc) > 120 else ''}</div>
                                <div class="product-price">${price:.2f}</div>
                                <div class="product-rating">
                                    <span class="stars">{star_display}</span>
                                    <span style="color: #232f3e; font-weight: 500;">{stars:.1f}</span>
                                    <span class="reviews">({reviews:,})</span>
                                </div>
                                <div>
                                    <span class="product-category">📦 {cat}</span>
                                </div>
                                {f'<a href="{prod_url}" target="_blank" class="product-link">View on Amazon →</a>' if prod_url != '#' else ''}
                            </div>
                        </div>
                    </div>
                    """
            
            html_output += """
                        </div>
                    </div>
                    
                    <div class="search-column semantic-column">
                        <div class="column-header semantic-header">
                            🧠 Semantic Search
                            <div class="column-subheader">AI-powered meaning-based matching</div>
                        </div>
                        <div class="results-container">
            """
            
            # Semantic search results
            if not semantic_results:
                html_output += '<div class="no-results">No results found</div>'
            else:
                for i, row in enumerate(semantic_results, 1):
                    pid, desc, img_url, prod_url, price, stars, reviews, cat, sim = row
                    
                    img_url = img_url or 'https://via.placeholder.com/140x140?text=No+Image'
                    prod_url = prod_url or '#'
                    price = price or 0.0
                    stars = stars or 0.0
                    reviews = reviews or 0
                    
                    full_stars = int(stars)
                    half_star = 1 if (stars - full_stars) >= 0.5 else 0
                    empty_stars = 5 - full_stars - half_star
                    star_display = '⭐' * full_stars + ('⭐' * half_star) + '☆' * empty_stars
                    
                    similarity_pct = sim * 100 if sim else 0
                    
                    html_output += f"""
                    <div class="product-card">
                        <div class="rank-badge">#{i}</div>
                        <div class="product-grid">
                            <div>
                                <img src="{img_url}" class="product-image" alt="Product image">
                            </div>
                            <div class="product-info">
                                <div class="product-title">{desc[:120]}{'...' if len(desc) > 120 else ''}</div>
                                <div class="product-price">${price:.2f}</div>
                                <div class="product-rating">
                                    <span class="stars">{star_display}</span>
                                    <span style="color: #232f3e; font-weight: 500;">{stars:.1f}</span>
                                    <span class="reviews">({reviews:,})</span>
                                </div>
                                <div>
                                    <span class="product-category">📦 {cat}</span>
                                </div>
                                <div>
                                    <span class="similarity-badge">🎯 {similarity_pct:.1f}% Match</span>
                                </div>
                                {f'<a href="{prod_url}" target="_blank" class="product-link">View on Amazon →</a>' if prod_url != '#' else ''}
                            </div>
                        </div>
                    </div>
                    """
            
            html_output += """
                        </div>
                    </div>
                </div>
            </div>
            """
            
            display(HTML(html_output))
            
        except Exception as e:
            display(HTML(f"<p style='color: #dc3545;'>❌ Error: {str(e)}</p>"))
    
    loading_indicator.value = ""

search_button.on_click(perform_unified_search)

# Display unified interface
display(widgets.VBox([
    widgets.HTML("""
        <div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); 
                    padding: 20px; border-radius: 10px; margin-bottom: 20px; 
                    box-shadow: 0 4px 6px rgba(0,0,0,0.1);">
            <h2 style="margin: 0; color: white; text-align: center;">
                🔍 Interactive Search Comparison
            </h2>
            <p style="margin: 10px 0 0 0; color: rgba(255,255,255,0.9); text-align: center;">
                See the difference: Keyword vs. Semantic Search
            </p>
        </div>
    """),
    widgets.HTML("<h4 style='margin: 15px 0 8px 0;'>Try these queries (designed to show keyword search limitations):</h4>"),
    widgets.HBox(example_buttons, layout=widgets.Layout(flex_flow='row wrap')),
    widgets.HTML("<h4 style='margin: 20px 0 10px 0;'>Or enter your own search:</h4>"),
    widgets.HBox([search_input, limit_slider]),
    widgets.HBox([search_button], layout=widgets.Layout(justify_content='center', margin='10px 0')),
    loading_indicator,
    output
], layout=widgets.Layout(padding='15px')))

## 🎓 Optional Advanced Topics

<div style="background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%); padding: 20px; border-radius: 10px; color: white; margin: 20px 0; border: 3px solid #e91e63;">
    <h3 style="margin: 0 0 10px 0; color: white;">⚠️ OPTIONAL SECTION - Skip for Now</h3>
    <p style="margin: 0; font-size: 1em; line-height: 1.6; color: white;">
        <strong>Time Management:</strong> These advanced topics are <strong>NOT required</strong> for Lab 2.<br>
        <strong>Recommendation:</strong> Complete Lab 2 first, then return here if time permits.<br>
        <strong>Purpose:</strong> Deep dive into embedding visualization and threshold tuning for interested participants.
    </p>
</div>

---

The following sections provide advanced insights into vector embeddings and similarity search optimization. These are supplementary materials for deeper exploration.

---

### 1. Embedding Space Visualization: Understanding Cross-Category Similarity

This visualization shows how product embeddings cluster in high-dimensional space. Products with similar meanings are placed closer together, even if they're in different categories. This explains why semantic search can find relevant products across category boundaries.

In [None]:
def visualize_embedding_space():
    """Visualize product embeddings in 2D space using dimensionality reduction"""
    # Using sklearn.manifold.TSNE imported at notebook start
    
    print("🔍 Fetching product embeddings for visualization...")
    
    with pool.connection() as conn:
        # Get sample of products with embeddings from different categories
        query = """
            WITH category_samples AS (
                SELECT 
                    "productId",
                    product_description,
                    category_name,
                    embedding,
                    ROW_NUMBER() OVER (PARTITION BY category_name ORDER BY RANDOM()) as rn
                FROM bedrock_integration.product_catalog
                WHERE embedding IS NOT NULL 
                AND category_name IN (
                    'Smart Home: Security Cameras and Systems',
                    'Smart Home: Voice Assistants and Hubs',
                    'Kitchen & Dining',
                    'Outdoor Recreation',
                    'Hair Care Products'
                )
            )
            SELECT 
                "productId",
                product_description,
                category_name,
                embedding
            FROM category_samples
            WHERE rn <= 30
            ORDER BY category_name
        """
        df_viz = pd.read_sql_query(query, conn)
    
    print(f"✅ Loaded {len(df_viz)} products from {df_viz['category_name'].nunique()} categories")
    
    # Extract embeddings as numpy array
    # Convert embeddings (handle both string and array formats)
    def parse_embedding(emb):
        if isinstance(emb, str):
            # Remove brackets and parse as floats
            return np.array([float(x) for x in emb.strip('[]').split(',')])
        elif isinstance(emb, list):
            return np.array(emb)
        else:
            return np.array(emb)
    
    embeddings_matrix = np.array([parse_embedding(emb) for emb in df_viz['embedding']])
    
    print("🧮 Reducing 1024 dimensions to 2D using t-SNE (this may take a minute)...")
    tsne = TSNE(n_components=2, random_state=42, perplexity=min(30, len(df_viz)-1))
    embeddings_2d = tsne.fit_transform(embeddings_matrix)
    
    # Create interactive visualization with tooltips using matplotlib with mplcursors
    try:
        import mplcursors
        has_mplcursors = True
    except:
        has_mplcursors = False
    
    fig, ax = plt.subplots(figsize=(14, 10))
    
    # Color map for categories
    categories = df_viz['category_name'].unique()
    colors = plt.cm.Set3(np.linspace(0, 1, len(categories)))
    category_colors = dict(zip(categories, colors))
    
    # Plot each category and store scatter objects
    scatter_objects = []
    for category in categories:
        mask = df_viz['category_name'] == category
        scatter = ax.scatter(
            embeddings_2d[mask, 0],
            embeddings_2d[mask, 1],
            c=[category_colors[category]],
            label=category,
            s=100,
            alpha=0.7,
            edgecolors='black',
            linewidth=0.5
        )
        scatter_objects.append((scatter, df_viz[mask]))
    
    # Add interactive tooltips if mplcursors is available
    if has_mplcursors:
        for scatter, data in scatter_objects:
            cursor = mplcursors.cursor(scatter, hover=True)
            
            @cursor.connect("add")
            def on_add(sel):
                idx = sel.index
                row = data.iloc[idx]
                sel.annotation.set_text(
                    f"Category: {row['category_name']}\n"
                    f"Product: {row['product_description'][:60]}...\n"
                    f"💡 Tip: Products close together have similar meanings!"
                )
                sel.annotation.get_bbox_patch().set(fc="white", alpha=0.95)
                sel.annotation.set_fontsize(9)
    
    ax.set_xlabel('t-SNE Dimension 1', fontsize=12, fontweight='bold')
    ax.set_ylabel('t-SNE Dimension 2', fontsize=12, fontweight='bold')
    ax.set_title('Product Embedding Space Visualization (1024D → 2D)\nHover over points to see product details', 
                 fontsize=14, fontweight='bold', pad=20)
    ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=9)
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    if not has_mplcursors:
        print("\n💡 Tip: Install mplcursors for interactive tooltips: pip install mplcursors")
    
    # Analysis
    print("\n📊 Key Observations:")
    print("=" * 80)
    print("• Products cluster by semantic meaning, not just category")
    print("• Similar products from different categories can be close together")
    print("• This explains why semantic search finds relevant items across categories")
    print("• Dense clusters indicate strong semantic similarity within product types")
    print("\n💡 Why This Matters:")
    print("When a user searches for 'home automation', semantic search can find relevant")
    print("products in both 'Security Cameras' and 'Voice Assistants' categories because")
    print("their embeddings are nearby in this high-dimensional space!")
    
visualize_embedding_space()

### 2. Interactive Similarity Threshold Tuning: Finding Your Sweet Spot

Similarity thresholds control the quality vs. quantity tradeoff in search results. This interactive tool helps you understand how different thresholds affect result precision.

In [None]:
def interactive_threshold_tuning():
    """Interactive tool for understanding similarity threshold effects"""
    
    # Create widgets
    query_input = widgets.Text(
        value='smart home devices',
        placeholder='Enter search query...',
        description='Query:',
        style={'description_width': '100px'},
        layout=widgets.Layout(width='60%')
    )
    
    threshold_slider = widgets.FloatSlider(
        value=0.65,
        min=0.4,
        max=0.9,
        step=0.05,
        description='Threshold:',
        style={'description_width': '100px'},
        layout=widgets.Layout(width='60%'),
        readout_format='.2f'
    )
    
    max_results_slider = widgets.IntSlider(
        value=20,
        min=10,
        max=50,
        step=5,
        description='Max Results:',
        style={'description_width': '100px'},
        layout=widgets.Layout(width='60%')
    )
    
    analyze_button = widgets.Button(
        description='🔍 Analyze',
        button_style='primary',
        layout=widgets.Layout(width='150px')
    )
    
    output = widgets.Output(
        layout=widgets.Layout(
            border='1px solid #ddd',
            padding='15px',
            margin='15px 0'
        )
    )
    
    def analyze_threshold(b):
        with output:
            clear_output(wait=True)
            plt.close('all')  # Clear any previous plots
            query = query_input.value
            threshold = threshold_slider.value
            max_results = max_results_slider.value
            
            if not query.strip():
                print("⚠️ Please enter a query")
                return
            
            # Generate embedding
            query_embedding = generate_embedding(query)
            
            # Get results with similarity scores
            with pool.connection() as conn:
                with conn.cursor() as cur:
                    cur.execute("""
                        SELECT 
                            "productId",
                            product_description,
                            category_name,
                            price,
                            1 - (embedding <=> %s::vector) as similarity
                        FROM bedrock_integration.product_catalog
                        WHERE embedding IS NOT NULL
                        ORDER BY embedding <=> %s::vector
                        LIMIT %s
                    """, (query_embedding, query_embedding, max_results))
                    
                    results = cur.fetchall()
            
            # Analyze results
            similarities = [r[4] for r in results]
            above_threshold = [s for s in similarities if s >= threshold]
            below_threshold = [s for s in similarities if s < threshold]
            
            # Create single comprehensive visualization with detailed explanation
            fig = plt.figure(figsize=(16, 6))
            gs = fig.add_gridspec(1, 2, width_ratios=[1.2, 1], hspace=0.3)
            ax1 = fig.add_subplot(gs[0])
            ax2 = fig.add_subplot(gs[1])
            
            # LEFT: Similarity distribution with threshold line
            ax1.hist(similarities, bins=25, color='#ba68c8', alpha=0.7, edgecolor='black', linewidth=1.5)
            ax1.axvline(threshold, color='#d32f2f', linestyle='--', linewidth=3, 
                       label=f'Current Threshold: {threshold:.2f}', zorder=10)
            
            # Fill regions
            ylim = ax1.get_ylim()[1]
            ax1.fill_between([threshold, max(similarities)], 0, ylim, 
                            alpha=0.15, color='green', label=f'✅ Accepted ({len(above_threshold)})')
            ax1.fill_between([min(similarities), threshold], 0, ylim, 
                            alpha=0.15, color='red', label=f'❌ Rejected ({len(below_threshold)})')
            
            ax1.set_xlabel('Similarity Score (0 = unrelated, 1 = identical)', fontsize=11, fontweight='bold')
            ax1.set_ylabel('Number of Products', fontsize=11, fontweight='bold')
            ax1.set_title(f'Similarity Distribution: "{query}"\nProducts to the right of the line are returned', 
                         fontsize=12, fontweight='bold', pad=15)
            ax1.legend(loc='upper left', fontsize=10)
            ax1.grid(True, alpha=0.3, linestyle=':')
            
            # Add annotation explaining the distribution
            ax1.text(0.98, 0.97, 
                    f'💡 Higher threshold = Fewer but\nmore precise results',
                    transform=ax1.transAxes,
                    fontsize=9,
                    verticalalignment='top',
                    horizontalalignment='right',
                    bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))
            
            # RIGHT: Threshold impact curve
            thresholds = np.arange(0.4, 0.91, 0.05)
            result_counts = []
            
            for t in thresholds:
                count = sum(1 for s in similarities if s >= t)
                result_counts.append(count)
            
            ax2.plot(thresholds, result_counts, marker='o', linewidth=3, 
                    markersize=10, color='#6a1b9a', label='Result Count')
            ax2.axvline(threshold, color='#d32f2f', linestyle='--', linewidth=3,
                       label=f'Your Setting: {threshold:.2f}', zorder=10)
            ax2.axhline(len(above_threshold), color='#d32f2f', linestyle=':', 
                       linewidth=2, alpha=0.5)
            
            # Highlight the current point
            current_count = sum(1 for s in similarities if s >= threshold)
            ax2.scatter([threshold], [current_count], s=200, color='#d32f2f', 
                       zorder=11, edgecolors='black', linewidth=2)
            
            ax2.set_xlabel('Similarity Threshold', fontsize=11, fontweight='bold')
            ax2.set_ylabel('Number of Results', fontsize=11, fontweight='bold')
            ax2.set_title('Precision vs. Recall Tradeoff\nMove slider to see how result count changes', 
                         fontsize=12, fontweight='bold', pad=15)
            ax2.legend(loc='upper right', fontsize=10)
            ax2.grid(True, alpha=0.3, linestyle=':')
            
            # Add annotation
            ax2.text(0.02, 0.97, 
                    f'📊 At {threshold:.2f}:\nReturning {len(above_threshold)} results',
                    transform=ax2.transAxes,
                    fontsize=9,
                    verticalalignment='top',
                    bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.8))
            
            plt.tight_layout()
            plt.show()
            plt.close(fig)  # Explicitly close this figure
            
            # Print analysis
            print(f"\n📊 Analysis for threshold {threshold:.2f}:")
            print("=" * 80)
            print(f"✅ Results accepted: {len(above_threshold)} ({len(above_threshold)/len(results)*100:.1f}%)")
            print(f"❌ Results rejected: {len(below_threshold)} ({len(below_threshold)/len(results)*100:.1f}%)")
            
            if above_threshold:
                print(f"\n📈 Accepted Results Statistics:")
                print(f"   • Highest similarity: {max(above_threshold):.3f}")
                print(f"   • Lowest similarity: {min(above_threshold):.3f}")
                print(f"   • Average similarity: {np.mean(above_threshold):.3f}")
            
            # Recommendations
            print(f"\n💡 Threshold Recommendations:")
            print("=" * 80)
            
            if threshold < 0.5:
                print("⚠️  VERY LOW threshold (< 0.50): Extremely inclusive, many loosely related products")
                print("   Use for: Maximum exploration, brainstorming, discovery mode")
            elif threshold < 0.65:
                print("📊 LOW threshold (0.50-0.65): Inclusive, casts a wide net")
                print("   Use for: Broad searches, finding unexpected connections")
            elif threshold < 0.75:
                print("✅ MEDIUM threshold (0.65-0.75): Balanced precision and recall - RECOMMENDED")
                print("   Use for: General search, good default for most use cases")
            elif threshold < 0.85:
                print("🎯 HIGH threshold (0.75-0.85): Good precision, focused results")
                print("   Use for: When you want more relevant, focused matches")
            else:
                print("🔒 VERY HIGH threshold (> 0.85): Extremely strict, only very close matches")
                print("   Use for: Finding near-duplicates or highly specific matches")
            
            # Show top accepted and rejected items
            print(f"\n📋 Sample Results:")
            print("=" * 80)
            
            if above_threshold:
                print(f"\n✅ Top 3 ACCEPTED (≥ {threshold:.2f}):")
                for i, (pid, desc, cat, price, sim) in enumerate(results[:3], 1):
                    if sim >= threshold:
                        print(f"   {i}. [{sim:.3f}] {desc[:80]}...")
                        print(f"      ${price:.2f} | {cat}")
            
            if below_threshold:
                print(f"\n❌ First 3 REJECTED (< {threshold:.2f}):")
                rejected_items = [(r[0], r[1], r[2], r[3], r[4]) for r in results if r[4] < threshold]
                for i, (pid, desc, cat, price, sim) in enumerate(rejected_items[:3], 1):
                    print(f"   {i}. [{sim:.3f}] {desc[:80]}...")
                    print(f"      ${price:.2f} | {cat}")
    
    analyze_button.on_click(analyze_threshold)
    
    # Display interface
    display(widgets.VBox([
        widgets.HTML("""
            <div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); 
                        padding: 15px; border-radius: 8px; margin-bottom: 15px;">
                <h3 style="margin: 0; color: white;">🎯 Interactive Threshold Tuning</h3>
                <p style="margin: 8px 0 0 0; color: rgba(255,255,255,0.9); font-size: 0.9em;">
                    Adjust the threshold to see how it affects result precision and recall
                </p>
            </div>
        """),
        query_input,
        threshold_slider,
        max_results_slider,
        widgets.HBox([analyze_button], layout=widgets.Layout(justify_content='center', margin='10px 0')),
        output
    ], layout=widgets.Layout(padding='15px')))

interactive_threshold_tuning()

## 🎉 Congratulations! You've Built a Semantic Search System

<div style="background: #d3f9d8; border: 2px solid #51cf66; border-radius: 10px; padding: 20px; margin: 20px 0;">
    <h3 style="margin: 0 0 10px 0; color: #000000;">✅ What You Accomplished</h3>
    <ul style="margin: 0; line-height: 1.8; color: #000000;">
        <li>Loaded and validated 21,704 products</li>
        <li>Generated 1024-dimensional embeddings using Amazon Bedrock Titan V2</li>
        <li>Created optimized HNSW vector index for fast similarity search</li>
        <li>Built an interactive search interface comparing keyword vs. semantic search</li>
        <li>Explored advanced topics: embedding visualization and threshold tuning</li>
    </ul>
</div>

### 🎯 Key Takeaways

1. **Semantic Search is Powerful**: Unlike keyword search, it understands intent and finds relevant results even with different vocabulary
2. **Vector Embeddings Enable AI**: Converting text to vectors allows mathematical similarity comparisons
3. **pgvector is Production-Ready**: With proper indexing (HNSW), vector search scales to millions of items
4. **Amazon Bedrock Simplifies AI**: No model training or hosting required - just API calls

### 🚀 What's Next?

Now that you have a semantic search foundation, you can:

1. **Proceed to Lab 2**: Build on this with advanced filtering and hybrid search
2. **Experiment Further**: Try different queries in the interactive widget above
3. **Explore Advanced Topics**: Dive into embedding visualization and threshold tuning (optional sections)
4. **Consider Production**: This architecture scales to real-world e-commerce applications

### 💡 Real-World Applications

This same pattern powers:
- **E-commerce**: Product discovery and recommendations
- **Customer Support**: Finding relevant documentation and articles
- **Content Platforms**: Personalized content recommendations
- **Enterprise Search**: Finding documents across large repositories

### 📚 Additional Resources

- [Amazon Bedrock Documentation](https://docs.aws.amazon.com/bedrock/)
- [pgvector GitHub](https://github.com/pgvector/pgvector)
- [Aurora PostgreSQL](https://aws.amazon.com/rds/aurora/)

---

**Ready for more?** Continue to Lab 2 to build agentic AI applications with reasoning and tool use!

In [None]:
# Note: Pool will remain open for continued use
# If you need to close it, uncomment the line below:
# pool.close()

print("✅ Notebook complete! Pool remains open for continued exploration.")