# Part 3: Building AI-Powered Hybrid Product Search with pgvector and Amazon Bedrock
### Building and Validating Hybrid Search

Welcome to Part 3 of our workshop on building an AI-powered hybrid search system. In this section, we'll bring together Amazon Bedrock's embedding capabilities and Aurora PostgreSQL's pgvector extension to create a powerful, context-aware product search engine. The hybrid search implementation combines semantic search (vector-based) with keyword search (traditional text matching) to get the best of both worlds. It retrieves documents that are both semantically relevant and contain exact or similar keywords, improving precision and recall.

## Contents
1. Basic Semantic Search: Learn how to implement pure vector similarity search
2. Hybrid search : Combine both semantic search and keyword based search
3. Advanced Search with Filters: Combine semantic search with traditional database filters
4. Example Queries and Testing: Explore real-world applications and test the system

## Hybrid Search Implementation
Our implementation brings together several key components to create an intuitive search experience:

1. **Embedding Generation**: We use Amazon Bedrock's Titan model to convert text queries into high-dimensional vectors that capture semantic meaning. These embeddings allow us to find products based on contextual similarity.

2. **Vector Similarity Search**: Using pgvector's specialized operators, we can efficiently find the closest matching products in our database. The `<=>` operator computes cosine similarity between vectors, helping us rank results by relevance.

3. **Keyword Based Search**: Using PostgreSQL full-text search method to find the exact match for the product description.

4. **Retrieval and reranking**: The combined result from semantic and keyword based search is returned based on the query. The results are then reranked using Cohere re-ranking model to prioritize the most relevant documents.

5. **Interactive Interface**: We've created a user-friendly interface with both basic and advanced search capabilities:
   - Basic Search: Simple query input with adjustable number of results
   - Advanced Search: Additional filters for price, ratings, and categories
   - Example Queries: Quick-access buttons to demonstrate various search scenarios

## Implementation Details
Our search interface combines several sophisticated features:

1. **Dual Search Modes**:
   - Basic mode for quick, straightforward searches
   - Advanced mode with filters for refined product discovery

2. **Real-time Feedback**:
   - Loading indicators during searches
   - Clear result displays with product details
   - Similarity scores to show match relevance

3. **Enhanced User Experience**:
   - Hover effects on product cards
   - Star ratings visualization
   - Price and category highlighting

## Results Display
The search results are presented in an easy-to-scan format, with each product card showing:
- Product image and description
- Price and rating information
- Number of reviews
- Category classification
- Semantic match score

The interface updates dynamically as users:
- Switch between basic and advanced search
- Adjust filter parameters
- Try different example queries
- Explore search results

In [None]:
# Install Required Libraries
%pip install setuptools==65.5.0
%pip install "psycopg[binary]" pgvector pandarallel boto3 tqdm numpy ipywidgets cohere

# Import Libraries and Set Up Connections
import boto3
import json
import psycopg
from pgvector.psycopg import register_vector
from IPython.display import display, HTML, clear_output
import ipywidgets as widgets
from tqdm.notebook import tqdm

# Initialize AWS and database connections
client = boto3.client('secretsmanager')
response = client.get_secret_value(SecretId='apgpg-pgvector-secret')
database_secrets = json.loads(response['SecretString'])

dbhost = database_secrets['host']
dbport = database_secrets['port']
dbuser = database_secrets['username']
dbpass = database_secrets['password']

# Initialize Bedrock client
bedrock_runtime = boto3.client('bedrock-runtime')

In [None]:
def generate_embedding(text):
    """Generate embedding for a single text using Amazon Titan"""
    try:
        payload = json.dumps({'inputText': text})
        response = bedrock_runtime.invoke_model(
            body=payload,
            modelId='amazon.titan-embed-text-v2:0',
            accept="application/json",
            contentType="application/json"
        )
        response_body = json.loads(response.get("body").read())
        return response_body.get("embedding")
    except Exception as e:
        print(f"Error generating embedding: {str(e)}")
        return None

In [None]:
def semantic_search(conn,query,num_results=50):
    """Basic semantic search for products"""
    query_embedding = generate_embedding(query)
    cursor = conn.cursor()
    sql_query = """
        SELECT 
            \"productId\",
            product_description,
            category_name,
            imgUrl,
            stars,
            reviews,
            price,
            1 - (embedding <=> %s::vector) as similarity
        FROM bedrock_integration.product_catalog
        ORDER BY embedding <=> %s::vector
        LIMIT %s;"""
    cursor.execute(sql_query, (query_embedding, query_embedding, num_results))
    semantic_search_results = cursor.fetchall()
    cursor.close()
    return semantic_search_results

In [None]:
def fulltext_search(conn,query,num_results=50):
    """Search documents with hybrid scoring"""
    cursor = conn.cursor()
    sql_query = """
            select \"productId\",
            product_description,
            category_name,
            imgUrl,
            stars,
            reviews,
            price
    from bedrock_integration.product_catalog
    WHERE to_tsvector('english', coalesce(product_description, '')) @@ plainto_tsquery('english', %s)
    ORDER BY ts_rank_cd(to_tsvector('english', coalesce(product_description, '')), plainto_tsquery('english', %s))
    DESC LIMIT %s;"""
    cursor.execute(sql_query, (query, query, num_results))
    keyword_search_results = cursor.fetchall()
    cursor.close()

    return keyword_search_results


In [None]:
import time

def hybrid_search(query,conn,num_results=50):
    try:
        bedrock_agent_runtime = boto3.client('bedrock-agent-runtime',region_name='us-west-2')
        modelId = "cohere.rerank-v3-5:0"
        model_package_arn = f"arn:aws:bedrock:us-west-2::foundation-model/{modelId}"
        client = boto3.client('secretsmanager')
        response = client.get_secret_value(SecretId='apgpg-pgvector-secret')
        database_secrets = json.loads(response['SecretString'])
        dbhost = database_secrets['host']
        dbport = database_secrets['port']
        dbuser = database_secrets['username']
        dbpass = database_secrets['password']

        start_time = time.time()
        sem_search =  semantic_search(conn, query)
        key_search = fulltext_search(conn, query)
        combined_search_result=[]
        combined_output=[]
        for i in sem_search:
                combined_search_result.append({
                    "type": "INLINE",
                    "inlineDocumentSource": {
                        "type": "JSON",
                        "jsonDocument": {
                            "productId": i[0],
                            "imgUrl": i[3],
                            "category_name": i[2],
                            "product_description": i[1],
                            "stars": str(i[4]),
                            "reviews": i[5],
                            "price": str(i[6])
                            }
                        }
                })
        #print("After adding sematic search combined_search_result")
        for i in key_search:
                combined_search_result.append({
                    "type": "INLINE",
                    "inlineDocumentSource": {
                        "type": "JSON",
                        "jsonDocument": {
                            "productId": i[0],
                            "imgUrl": i[3],
                            "category_name": i[2],
                            "product_description": i[1],
                            "stars": str(i[4]),
                            "reviews": i[5],
                            "price": str(i[6])
                            }
                        }
                })
        seen = set()
        combined_search_result_unique = []
        for d in combined_search_result:
            hashable_dict = frozenset(d['inlineDocumentSource']['jsonDocument'].items())
            if hashable_dict not in seen:
                seen.add(hashable_dict)
                combined_search_result_unique.append(d)
        rerank_output = rerank_results(query,combined_search_result_unique,num_results,model_package_arn)
        return rerank_output,combined_search_result_unique
        
    except Exception as err:
        print("Error in re-ranking :" + str(err))
        return [], []

In [None]:
## Cohere re-ranking model to rerank the combined result and generate a ranking of the most relevant documents for the query.
## modelId = "cohere.rerank-v3-5:0"

def rerank_results(query, combined_search_result, num_results, model_package_arn):
    bedrock_agent_runtime = boto3.client('bedrock-agent-runtime',region_name='us-west-2')
    response = bedrock_agent_runtime.rerank(
        queries=[
            {
                "type": "TEXT",
                "textQuery": {
                    "text": query
                }
            }
        ],
        sources=combined_search_result,
        rerankingConfiguration={
            "type": "BEDROCK_RERANKING_MODEL",
            "bedrockRerankingConfiguration": {
                "numberOfResults": num_results,
                "modelConfiguration": {
                    "modelArn": model_package_arn
                }
            }
        }
    )
    return response['results']

In [None]:
def advanced_search(query, category=None, max_price=None, min_stars=None, num_results=3):
    """Advanced search with multiple filters"""
    query_embedding = generate_embedding(query)

    conn = psycopg.connect(
        host=dbhost,
        port=dbport,
        user=dbuser,
        password=dbpass,
        autocommit=True
    )

    register_vector(conn)

    sql = """
        SELECT 
            \"productId\",
            product_description,
            imgUrl,
            stars,
            reviews,
            price,
            category_name,
            1 - (embedding <=> %s::vector) as similarity
        FROM bedrock_integration.product_catalog
        WHERE 1=1
    """
    params = [query_embedding]

    if category and category != 'All Categories':
        sql += " AND category_name = %s"
        params.append(category)
    if max_price:
        sql += " AND price <= %s"
        params.append(max_price)
    if min_stars:
        sql += " AND stars >= %s"
        params.append(min_stars)

    sql += """
        ORDER BY embedding <=> %s::vector
        LIMIT %s;
    """
    params.extend([query_embedding, num_results])

    results = conn.execute(sql, params).fetchall()
    conn.close()
    return results

In [None]:
def search_products(query, num_results=5):
    """Basic semantic search for products"""
    query_embedding = generate_embedding(query)

    conn = psycopg.connect(
        host=dbhost,
        port=dbport,
        user=dbuser,
        password=dbpass,
        autocommit=True
    )

    register_vector(conn)

    results = conn.execute("""
        SELECT 
            \"productId\",
            product_description,
            imgUrl,
            stars,
            reviews,
            price,
            category_name,
            1 - (embedding <=> %s::vector) as similarity
        FROM bedrock_integration.product_catalog
        ORDER BY embedding <=> %s::vector
        LIMIT %s;
    """, (query_embedding, query_embedding, num_results)).fetchall()

    conn.close()
    return results

In [None]:
def create_search_interface():
    """Create and display the interactive search interface"""
    # Create results area for displaying search results
    results_area = widgets.Output(
        layout=widgets.Layout(
            border='1px solid #ddd',
            padding='10px',
            margin='10px 0',
            min_height='100px'
        )
    )

    # Create search widgets for basic search
    basic_search_text = widgets.Text(
        value='',
        placeholder='Enter your search query...',
        description='Search:',
        layout=widgets.Layout(width='80%')
    )

    basic_results_slider = widgets.IntSlider(
        value=3,
        min=1,
        max=10,
        step=1,
        description='Results:',
        continuous_update=False
    )

    # Create widgets for advanced search
    advanced_search_text = widgets.Text(
        value='',
        placeholder='Enter your search query...',
        description='Search:',
        layout=widgets.Layout(width='80%')
    )

    category_dropdown = widgets.Dropdown(
        options=['All Categories', 
                'Smart Home: Security Cameras and Systems',
                'Smart Home: Voice Assistants and Hubs', 
                'Household Supplies',
                'Kitchen & Dining', 
                'Outdoor Recreation', 
                'Hair Care Products',
                'Gift Cards', 
                'Skin Care Products'],
        value='All Categories',
        description='Category:'
    )

    max_price_slider = widgets.FloatSlider(
        value=100,
        min=0,
        max=200,
        step=5,
        description='Max Price:$',
        continuous_update=False
    )

    min_stars_slider = widgets.FloatSlider(
        value=3.0,
        min=0,
        max=5,
        step=0.5,
        description='Min Stars:',
        continuous_update=False
    )

    advanced_results_slider = widgets.IntSlider(
        value=3,
        min=1,
        max=10,
        step=1,
        description='Results:',
        continuous_update=False
    )

    # Create search tabs
    basic_search_box = widgets.VBox([
        widgets.HTML(value="<h3>Basic Search</h3>"),
        basic_search_text,
        basic_results_slider
    ])

    hybrid_search_box = widgets.VBox([
        widgets.HTML(value="<h3>Hybrid Search</h3>"),
        basic_search_text,
        basic_results_slider
    ])

    advanced_search_box = widgets.VBox([
        widgets.HTML(value="<h3>Advanced Search</h3>"),
        advanced_search_text,
        category_dropdown,
        max_price_slider,
        min_stars_slider,
        advanced_results_slider
    ])

    search_type_tabs = widgets.Tab(children=[basic_search_box, hybrid_search_box,advanced_search_box])
    search_type_tabs.set_title(0, 'Semantic Search')
    search_type_tabs.set_title(1, 'Hybrid Search')
    search_type_tabs.set_title(2, 'Advanced Search')

    # Create search button and loading indicator
    search_button = widgets.Button(
        description='Search',
        button_style='primary',
        tooltip='Click to search',
        layout=widgets.Layout(width='150px')
    )

    loading_indicator = widgets.HTML(value="")

    def display_hybrid_results(hybrid_search_results):
        """Display hybrid search results from Cohere reranking"""
        results_area.clear_output()
        
        with results_area:
            html_output = """
            <style>
                .search-results {
                    margin-top: 20px;
                    padding: 10px;
                }
                .product-card { 
                    margin: 15px 0; 
                    padding: 20px; 
                    border: 1px solid #ddd; 
                    border-radius: 8px; 
                    box-shadow: 0 2px 4px rgba(0,0,0,0.1);
                    transition: transform 0.2s ease-in-out;
                    background-color: white;
                }
                .product-card:hover {
                    transform: translateY(-5px);
                    box-shadow: 0 4px 8px rgba(0,0,0,0.2);
                }
                .product-grid { 
                    display: grid; 
                    grid-template-columns: 200px 1fr; 
                    gap: 20px; 
                }
                .product-info {
                    display: flex;
                    flex-direction: column;
                    gap: 8px;
                }
                .product-price { 
                    color: #B12704; 
                    font-weight: bold; 
                    font-size: 1.2em; 
                }
                .product-stars { color: #FFA41C; }
                .product-reviews { color: #007185; }
                .product-category { 
                    color: #565959; 
                    font-size: 0.9em;
                }
                .rerank-score { 
                    color: #007600; 
                    font-weight: bold;
                    background: #f0f8f0;
                    padding: 5px 10px;
                    border-radius: 4px;
                    display: inline-block;
                }
                .results-header {
                    color: #444;
                    margin-bottom: 20px;
                    padding-bottom: 10px;
                    border-bottom: 2px solid #eee;
                }
            </style>
            <div class="search-results">
                <h3 class="results-header">Hybrid Search Results</h3>
            """
            
            if not hybrid_search_results:
                html_output += "<p>No results found.</p>"
            else:
                matching_indices = [result["index"] for result in hybrid_search_results[0]]
                relevance_scores = [result.get("relevanceScore", 0) for result in hybrid_search_results[0]]
                matching_documents = [hybrid_search_results[1][i] for i in matching_indices]
                for i, doc in enumerate(matching_documents):
                    if isinstance(doc, str) and doc.startswith("{") and doc.endswith("}"):
                        doc = json.loads(doc)
                    product = doc['inlineDocumentSource']['jsonDocument']
                    img_url = product['imgUrl'].split("|")[0]
                    relevance = relevance_scores[i]
                    stars = "⭐" * int(float(product['stars'])) if product['stars'] else ""
                    html_output += f"""
                    <div class="product-card">
                        <div class="product-grid">
                            <div>
                                <img src="{img_url}" style="max-width: 180px; height: auto;">
                            </div>
                            <div class="product-info">
                                <h3>{product['product_description'][:200]}{'...' if len(product['product_description']) > 200 else ''}</h3>
                                <div class="product-price">Price:${product['price']}</div>
                                <div class="product-stars">{stars}</div>
                                <div class="product-reviews">({product['reviews']} reviews)</div>
                                <div class="product-category">Category:{product['category_name']}</div>
                                <div class="rerank-score">Relevance Score: {relevance:.2%}</div>
                            </div>
                        </div>
                    </div>
                    """
            html_output += "</div>"
            display(HTML(html_output))
    
    def display_results(results):
        """Display search results with enhanced styling"""
        results_area.clear_output()

        with results_area:
            html_output = """
            <style>
                .search-results {
                    margin-top: 20px;
                    padding: 10px;
                }
                .product-card { 
                    margin: 15px 0; 
                    padding: 20px; 
                    border: 1px solid #ddd; 
                    border-radius: 8px; 
                    box-shadow: 0 2px 4px rgba(0,0,0,0.1);
                    transition: transform 0.2s ease-in-out;
                    background-color: white;
                }
                .product-card:hover {
                    transform: translateY(-5px);
                    box-shadow: 0 4px 8px rgba(0,0,0,0.2);
                }
                .product-grid { 
                    display: grid; 
                    grid-template-columns: 200px 1fr; 
                    gap: 20px; 
                }
                .product-info {
                    display: flex;
                    flex-direction: column;
                    gap: 8px;
                }
                .product-price { 
                    color: #B12704; 
                    font-weight: bold; 
                    font-size: 1.2em; 
                }
                .product-stars { color: #FFA41C; }
                .product-reviews { color: #007185; }
                .product-category { 
                    color: #565959; 
                    font-size: 0.9em;
                }
                .similarity-score { 
                    color: #007600; 
                    font-weight: bold;
                    background: #f0f8f0;
                    padding: 5px 10px;
                    border-radius: 4px;
                    display: inline-block;
                }
                .results-header {
                    color: #444;
                    margin-bottom: 20px;
                    padding-bottom: 10px;
                    border-bottom: 2px solid #eee;
                }
            </style>
            <div class="search-results">
                <h3 class="results-header">Search Results</h3>
            """

            if not results:
                html_output += "<p>No results found.</p>"
            else:
                for row in results:
                    similarity = round((row[-1] or 0) * 100, 2)
                    stars = "⭐" * int(row[3]) if row[3] else ""

                    html_output += f"""
                    <div class="product-card">
                        <div class="product-grid">
                            <div>
                                <img src="{row[2]}" style="max-width: 180px; height: auto;">
                            </div>
                            <div class="product-info">
                                <h3>{row[1][:200]}...</h3>
                                <div class="product-price">${row[5]:.2f}</div>
                                <div class="product-stars">{stars}</div>
                                <div class="product-reviews">({row[4]} reviews)</div>
                                <div class="product-category">Category: {row[6]}</div>
                                <div class="similarity-score">Match Score: {similarity}%</div>
                            </div>
                        </div>
                    </div>
                    """

            html_output += "</div>"
            display(HTML(html_output))

    def on_search_button_clicked(b):
        """Handle search button clicks"""
        loading_indicator.value = "<h4 style='color: #007bff'>🔍 Searching...</h4>"
        try:
            # Create database connection
            with psycopg.connect(
                host=dbhost,
                port=dbport,
                user=dbuser,
                password=dbpass,
                autocommit=True
            ) as conn:
                register_vector(conn)
                
                if search_type_tabs.selected_index == 1:
                    # Basic search - use hybrid_search with proper result handling
                    try:
                        search_result = hybrid_search(
                            basic_search_text.value,
                            conn,
                            basic_results_slider.value
                        )
                        if search_result and len(search_result) >= 2:
                            results = search_result
                        else:
                            results = []
                    except Exception as e:
                        print(f"Hybrid search failed: {e}, falling back to semantic search")
                        results = semantic_search(conn, basic_search_text.value, basic_results_slider.value)
                elif search_type_tabs.selected_index == 0:
                        results = search_products(
                        basic_search_text.value,    
                        basic_results_slider.value
                        )
                    
                else:
                    # Advanced search
                    results = advanced_search(
                        advanced_search_text.value,
                        category=category_dropdown.value,
                        max_price=max_price_slider.value,
                        min_stars=min_stars_slider.value,
                        num_results=advanced_results_slider.value
                    )
            # Use appropriate display function based on search type
            if search_type_tabs.selected_index == 1 and results:
                # Basic search uses hybrid search - display hybrid results
                display_hybrid_results(results)
            else:
                # Advanced search uses regular results
                display_results(results)
        except Exception as e:
            loading_indicator.value = f"<h4 style='color: #dc3545'>❌ Error: {str(e)}</h4>"
            return
        loading_indicator.value = ""

    search_button.on_click(on_search_button_clicked)

    # Create example queries
    example_queries = [
        "phone charger and case",
        "smart home automation",
        "outdoor camping gear",
        "pet supplies and toys",
        "home office essentials"
    ]

    def create_example_button(query):
        """Create a button for an example query"""
        button = widgets.Button(
            description=query,
            layout=widgets.Layout(width='auto'),
            style={'button_color': '#e9ecef'}
        )

        def on_click(b):
            basic_search_text.value = query
            advanced_search_text.value = query

        button.on_click(on_click)
        return button

    example_buttons = [create_example_button(query) for query in example_queries]

    examples_box = widgets.VBox([
        widgets.HTML(value="<h4>Try these examples:</h4>"),
        widgets.HBox(example_buttons)
    ])

    # Combine all elements
    main_interface = widgets.VBox([
        search_type_tabs,
        widgets.HBox([search_button], layout=widgets.Layout(justify_content='center')),
        loading_indicator,
        examples_box,
        results_area
    ], layout=widgets.Layout(padding='10px'))

    display(main_interface)

# Initialize the interface
create_search_interface()