# Hybrid Search

This notebook extends Vector_Search_Hybrid by leveraging hybrid search features:
1. Analyzer Matches on product description to enhance filtering
2. SAI on product price to enhance filtering

This notebook uses a table called `products_table_hybrid`.

## Imports

In [1]:
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from cassandra.query import dict_factory
from cassandra.query import SimpleStatement
import openai
import pandas as pd

## Keys & Environment Variables

In [2]:
import os
from dotenv import load_dotenv

load_dotenv()

# Astra DB
ASTRA_DB_KEYSPACE = os.environ['ASTRA_DB_KEYSPACE']
ASTRA_DB_SECURE_BUNDLE_PATH = os.environ['ASTRA_DB_SECURE_BUNDLE_PATH']
ASTRA_DB_APPLICATION_TOKEN = os.environ['ASTRA_DB_APPLICATION_TOKEN']

# OpenAI Token
openai_api_key = os.environ['OPENAI_API_KEY']
openai.api_key = openai_api_key

## Select a model to compute embeddings

Embeddings are numerical representations of concepts converted to number sequences, which make it easy for computers to understand the relationships between those concepts.

This new embedding model from openAI - `text-embedding-ada-002` - replaces five separate models for text search, text similarity, and code search, and outperforms our previous most capable model, Davinci, at most tasks, while being priced 99.8% lower.

In [3]:
model_id = "text-embedding-ada-002"

## Connect to Astra DB

In [4]:
cloud_config= {
  'secure_connect_bundle': ASTRA_DB_SECURE_BUNDLE_PATH
}
auth_provider = PlainTextAuthProvider('token', ASTRA_DB_APPLICATION_TOKEN)
cluster = Cluster(cloud=cloud_config, auth_provider=auth_provider)
session = cluster.connect()
session.set_keyspace(ASTRA_DB_KEYSPACE)
session

<cassandra.cluster.Session at 0x1292cb580>

## Database Schema

### Drop Schema

> **Note:** Only run this block when you want to DROP the schema.

In [5]:
# only use this to DROP the schema
session.execute(f"""DROP INDEX IF EXISTS idx_product_desc""")
session.execute(f"""DROP INDEX IF EXISTS idx_consumer_desc""")
session.execute(f"""DROP INDEX IF EXISTS idx_analyzer_description""")
session.execute(f"""DROP INDEX IF EXISTS idx_price""")

session.execute(f"""DROP TABLE IF EXISTS products_table_hybrid""")

<cassandra.cluster.ResultSet at 0x129389820>

### Create Schema

> **Note:** Only run this block when you want to CREATE the schema.


- Note the data type `vector` in the schema below.

- Note the index on the `embedding` columns. These will Vector Similraity Search on the embeddings.

- Note the index on the `price` column. This will allow filtering based on price.

- Note the Index Analyzer on the `description` column. This will allow filtering on any word (with stemming).

In [7]:
# CREATE the schema

session.execute(f"""CREATE TABLE IF NOT EXISTS products_table_hybrid
(product_id int,
 chunk_id int,

 product_name text,
 description text,
 consumer_description text,
 price int,
 
 product_description_embedding vector<float, 1536>,
 consumer_description_embedding vector<float, 1536>,

 PRIMARY KEY (product_id, chunk_id))""")

# Create Vector Indexes
session.execute("CREATE CUSTOM INDEX IF NOT EXISTS idx_product_desc ON products_table_hybrid (product_description_embedding) \
                USING 'org.apache.cassandra.index.sai.StorageAttachedIndex'")
session.execute("CREATE CUSTOM INDEX IF NOT EXISTS idx_consumer_desc ON products_table_hybrid (consumer_description_embedding) \
                USING 'org.apache.cassandra.index.sai.StorageAttachedIndex'")

# Create Index on Price
session.execute("CREATE CUSTOM INDEX IF NOT EXISTS idx_price ON products_table_hybrid(price) \
                USING 'org.apache.cassandra.index.sai.StorageAttachedIndex'")

# Create Analyser on Description
session.execute("""CREATE CUSTOM INDEX IF NOT EXISTS idx_analyzer_description ON products_table_hybrid(description) \
                USING 'org.apache.cassandra.index.sai.StorageAttachedIndex' WITH OPTIONS = {'index_analyzer': '{ \
                    "tokenizer" : {"name" : "standard"}, \
                    "filters" : [{"name" : "porterstem"}] \
                }'};""")


<cassandra.cluster.ResultSet at 0x12965fee0>

## Create embeddings and Store in DB 

### Read CSV file

In [8]:
products_list = pd.read_csv('ProductDatasetCombined.csv')
products_list

Unnamed: 0.1,Unnamed: 0,product_id,product_name,description,price,consumer_description
0,0,37162,Canon PIXMA Photo All-In-One Printer - MP620,Canon PIXMA Photo All-In-One Printer - MP620/ ...,$149.00,The Canon PIXMA Photo All-In-One Printer - MP6...
1,1,37174,TiVo HD XL Black Digital Video Recorder - TCD6...,TiVo HD XL Black Digital Video Recorder - TCD6...,$599.00,The TiVo HD XL Black Digital Video Recorder is...
2,2,37181,Apple 8GB Black 2nd Generation iPod Touch - MB...,Apple 8GB Black 2nd Generation iPod Touch - MB...,$229.00,The Apple 8GB Black 2nd Generation iPod Touch ...
3,3,37182,Apple 16GB Black 2nd Generation iPod Touch - M...,Apple 16GB Black 2nd Generation iPod Touch - M...,$299.00,The Apple 16GB Black 2nd Generation iPod Touch...
4,4,37183,Apple 32GB Black 2nd Generation iPod Touch - M...,Apple 32GB Black 2nd Generation iPod Touch - M...,$399.00,The Apple 32GB Black 2nd Generation iPod Touch...
...,...,...,...,...,...,...
167,167,39088,Logitech Cordless Desktop Wave Keyboard And Mo...,Logitech Cordless Desktop Wave Keyboard And Mo...,$79.00,The Logitech Cordless Desktop Wave Keyboard an...
168,168,39090,Mitsubishi DLP Black TV Stand - MBS73V,Mitsubishi DLP Black TV Stand - MBS73V/ Matchi...,$549.00,The Mitsubishi DLP Black TV Stand - MBS73V is ...
169,169,39175,Logitech Digital Precision PC Gaming Headset -...,Logitech Digital Precision PC Gaming Headset -...,$49.00,The Logitech Digital Precision PC Gaming Heads...
170,170,39176,Logitech 2.1 Multimedia Silver Speaker System ...,Logitech 2.1 Multimedia Silver Speaker System ...,,The Logitech 2.1 Multimedia Silver Speaker Sys...


### Generate embeddings from openAI and store data

In [16]:
import re

# Iterate over products
for id, row in products_list.iterrows():

    print (row.product_name)

    ### GENERATE EMBEDDINGS ###
    print ("    - generating embeddings")
    
    # Get price
    if isinstance(row.price, str):
        pricevalue = f"${row.price}" 
        match = re.search(r'\$(\d+)', pricevalue)
        if match:
            pricevalue = int(match.group(1))
    else:
        pricevalue = 0

    # append price to description
    original = f"{row.description} price: {pricevalue}"
    # append price to consumer description
    consumer = f"{row.consumer_description} price: {pricevalue}"
    
    # Create  embedding
    embedding_product = openai.Embedding.create(input=original, model=model_id)['data'][0]['embedding']
    # Create consumer embedding
    embedding_consumer = openai.Embedding.create(input=consumer, model=model_id)['data'][0]['embedding']


    ### WRITE TO DATABASE ###
    print ("    - writing to database")
    
    # Insert Data and Embedding into database
    query = SimpleStatement(
                f"""
                INSERT INTO products_table_hybrid
                (product_id, chunk_id, product_name, description, consumer_description, price, product_description_embedding, consumer_description_embedding)
                VALUES (%s, %s, %s, %s, %s, %s, %s, %s)
                """
            )
    session.execute(query, (row.product_id, 0, row.product_name, row.description, row.consumer_description, pricevalue, embedding_product, embedding_consumer))


Canon PIXMA Photo All-In-One Printer - MP620
    - generating embeddings
    - writing to database
TiVo HD XL Black Digital Video Recorder - TCD658000
    - generating embeddings
    - writing to database
Apple 8GB Black 2nd Generation iPod Touch - MB528LLA
    - generating embeddings
    - writing to database
Apple 16GB Black 2nd Generation iPod Touch - MB531LLA
    - generating embeddings
    - writing to database
Apple 32GB Black 2nd Generation iPod Touch - MB533LLA
    - generating embeddings
    - writing to database
Apple 8GB Silver 4th Generation iPod Nano - MB598LLA
    - generating embeddings
    - writing to database
Apple 8GB Blue 4th Generation iPod Nano - MB732LLA
    - generating embeddings
    - writing to database
Apple 8GB Pink 4th Generation iPod Nano - MB735LLA
    - generating embeddings
    - writing to database
Apple 8GB Purple 4th Generation iPod Nano - MB739LLA
    - generating embeddings
    - writing to database
Apple 16GB Green 4th Generation iPod Nano - MB91

## Searching

### Convert a query string into a text embedding to use as part of the query

In [17]:
MAX_PRICE = 500

customer_input = f"recommend a camera suitable for a beginner photographer that costs less than ${MAX_PRICE}"
embedding = openai.Embedding.create(input=customer_input, model=model_id)['data'][0]['embedding']
display(embedding)

[-0.002006965223699808,
 0.011957081034779549,
 -0.0191182978451252,
 -0.01761958934366703,
 -0.01617301069200039,
 0.014205142855644226,
 -0.014439723454415798,
 -0.05410986393690109,
 -0.011155598796904087,
 -0.013201660476624966,
 -0.007206829264760017,
 0.004414671566337347,
 -0.011820242740213871,
 -0.025556225329637527,
 -0.0027660932391881943,
 0.006203346885740757,
 0.01558656059205532,
 -0.008347149938344955,
 -0.012295919470489025,
 -0.009181213565170765,
 -0.016290301457047462,
 0.005379057489335537,
 -0.00600460497662425,
 -0.010321535170078278,
 -0.01979597471654415,
 -0.002147061750292778,
 0.011709468439221382,
 0.00426805904135108,
 -0.017841137945652008,
 0.003968317527323961,
 0.025061000138521194,
 -0.020134812220931053,
 -0.020630037412047386,
 -0.036646660417318344,
 -0.00011036678915843368,
 0.008464440703392029,
 0.007043926510959864,
 0.0038477692287415266,
 0.006340185180306435,
 -0.008431860245764256,
 0.024995839223265648,
 0.005245476961135864,
 0.0088814720

### Vector Search Only
Query using only Vector Similarity Search with ANN top 3

In [20]:
query = SimpleStatement(
    f"""
    SELECT product_id, product_name, description, consumer_description, price, similarity_dot_product(consumer_description_embedding, {embedding}) as sim
    FROM products_table_hybrid
    ORDER BY consumer_description_embedding ANN OF {embedding} LIMIT 3;
    """
    )
results = session.execute(query)
products = results._current_rows

for row in products:
  print(f"""Similarity: {row.sim}\nProduct Name: {row.product_name}\nPrice: {row.price}\n{row.description}\n\n""")

Similarity: 0.9164789915084839
Product Name: Canon Digital EOS Rebel XS Starter Kit - 9320A010
Price: 0
Canon Digital EOS Rebel XS Starter Kit - 9320A010/ Includes Digital Gadget Bag 200DG, Battery Pack NB-2LH, 58mm UV Haze Filter


Similarity: 0.9137437343597412
Product Name: Canon EOS Rebel XSi Silver Digital SLR Camera - XSIREB1855S
Price: 799
Canon EOS Rebel XSi Silver Digital SLR Camera - XSIREB1855S/ 12.2 Megapixel/ DIGIC III Image Processor/ Extensive Noise Reduction Technology/ Auto Optimization/ 3.0' LCD Monitor/ Compatible With Compact SD And SDHC Memory Cards/ EOS Integrated Cleaning System/ 18-55MM Lens Included/ 2756B003/ Silver Finish


Similarity: 0.9073257446289062
Product Name: Canon Black EOS 50D Digital SLR Camera With 28-135MM Lens - 50D28135
Price: 0
Canon Black EOS 50D Digital SLR Camera With 28-135MM Lens - 50D28135/ 15.1 Megapixel CMOS Sensor/ DIGIC 4 Image Processor/ 3.0' Clear View LCD/ 9 Cross-Type High-Precision Sensors/ Enhanced Live View/ EOS Integrated Cl

### Predicates Only Search

#### Filter on "price"

In [19]:
query = SimpleStatement(
    f"""
    SELECT product_id, product_name, description, consumer_description, price
    FROM products_table_hybrid
    WHERE price < {MAX_PRICE} LIMIT 3;
    """
    )
results = session.execute(query)
products = results._current_rows

for row in products:
  print(f"""Product Name: {row.product_name}\nPrice: {row.price}\n{row.consumer_description}\n\n""")

Product Name: Apple 120GB Black 7th Generation iPod Classic - MB565LLA
Price: 0
The Apple 120GB Black 7th Generation iPod Classic - MB565LLA is a sleek and portable gadget that allows you to carry your entire music library with you wherever you go. With its generous storage capacity of 120GB, you can store up to 30,000 songs, ensuring that you never run out of music to enjoy. The black and elegant design adds a touch of sophistication, while the user-friendly interface and click wheel make it easy to navigate through your playlists. With a long battery life, you can listen to your favorite tunes for hours on end without needing to recharge. Whether you're commuting, exercising, or relaxing at home, this iPod Classic is the perfect companion for music lovers on the go.


Product Name: Audiovox 7'  Acrylic Digital Photo Frame - DPF701
Price: 0
The Audiovox 7' Acrylic Digital Photo Frame, also known as the DPF701, is a sleek and stylish device designed to display your cherished memories w

#### Analyzer on "description"

In [21]:
query = SimpleStatement(
    f"""
    SELECT product_id, product_name, description, consumer_description, price
    FROM products_table_hybrid
    WHERE description : 'Black' LIMIT 3;
    """
    )
results = session.execute(query)
products = results._current_rows

for row in products:
  print(f"""Product Name: {row.product_name}\nPrice: {row.price}\n{row.consumer_description}\n\n""")

Product Name: Apple 120GB Black 7th Generation iPod Classic - MB565LLA
Price: 0
The Apple 120GB Black 7th Generation iPod Classic - MB565LLA is a sleek and portable gadget that allows you to carry your entire music library with you wherever you go. With its generous storage capacity of 120GB, you can store up to 30,000 songs, ensuring that you never run out of music to enjoy. The black and elegant design adds a touch of sophistication, while the user-friendly interface and click wheel make it easy to navigate through your playlists. With a long battery life, you can listen to your favorite tunes for hours on end without needing to recharge. Whether you're commuting, exercising, or relaxing at home, this iPod Classic is the perfect companion for music lovers on the go.


Product Name: Audiovox 7'  Acrylic Digital Photo Frame - DPF701
Price: 0
The Audiovox 7' Acrylic Digital Photo Frame, also known as the DPF701, is a sleek and stylish device designed to display your cherished memories w

### Hybrid Search

#### Vector Search and Basic Filter
Query using Predicates and Vector Similarity Search with ANN top 3

In [22]:
query = SimpleStatement(
    f"""
    SELECT product_id, product_name, description, consumer_description, price
    FROM products_table_hybrid
    WHERE price < {MAX_PRICE} 
    ORDER BY consumer_description_embedding ANN OF {embedding} LIMIT 3;
    """
    )
results = session.execute(query)
products = results._current_rows

for row in products:
  print(f"""Product Name: {row.product_name}\nPrice: {row.price}\n{row.consumer_description}\n\n""")

Product Name: Canon Digital EOS Rebel XS Starter Kit - 9320A010
Price: 0
The Canon Digital EOS Rebel XS Starter Kit (9320A010) is an all-inclusive package that is perfect for beginners looking to explore the world of photography. The kit includes the Canon EOS Rebel XS digital camera, which boasts an impressive 10.1-megapixel image sensor for stunning photos and an easy-to-use interface. It also comes with an 18-55mm f/3.5-5.6 IS lens, providing versatility for capturing a range of subjects. Additionally, the kit includes a handy camera bag to protect and carry your gear, as well as a 4GB SD memory card to store your captured moments. With this starter kit, you'll have everything you need to dive into the world of photography and capture beautiful, high-quality images.


Product Name: Canon Black EOS 50D Digital SLR Camera With 28-135MM Lens - 50D28135
Price: 0
The Canon Black EOS 50D Digital SLR Camera with 28-135mm Lens is a high-quality camera designed for photography enthusiasts. T

#### Vector Search and Filter and Analyzer

Query using Analyzer and Predicates and Vector Similarity Search with ANN top 3

In [23]:
query = SimpleStatement(
    f"""
    SELECT product_id, product_name, description, consumer_description, price
    FROM products_table_hybrid
    WHERE 
      price < {MAX_PRICE} AND
      description : 'Black'
    ORDER BY consumer_description_embedding ANN OF {embedding} LIMIT 3;
    """
    )
results = session.execute(query)
products = results._current_rows

for row in products:
  print(f"""Product Name: {row.product_name}\nPrice: {row.price}\n{row.consumer_description}\n\n""")

Product Name: Canon Black EOS 50D Digital SLR Camera With 28-135MM Lens - 50D28135
Price: 0
The Canon Black EOS 50D Digital SLR Camera with 28-135mm Lens is a high-quality camera designed for photography enthusiasts. This camera features a powerful 15.1 megapixel image sensor that captures stunningly sharp and detailed photos. With its wide range of ISO sensitivity, users can confidently shoot in various lighting conditions, from bright daylight to low-light environments. The 28-135mm lens provides flexibility in shooting different subjects, from wide-angle landscapes to zoomed-in portraits. The camera also offers a fast and accurate autofocus system, ensuring that every shot is focused and well-balanced. Additionally, it has a large LCD screen for easy reviewing and navigating through menus. Whether you're a beginner or an experienced photographer, the Canon 50D is a reliable and versatile camera that delivers exceptional image quality.


Product Name: Canon PowerShot Black 14.7 Megap