# Hybrid Search

This demo is based on the original Vector Search demo for a product catalogue.

The data model is enhanced to levergae hybrid search features:
1. Analyzer Matches on product description to enhance filtering


## Imports

In [1]:
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from cassandra.query import dict_factory
from cassandra.query import SimpleStatement
import openai
import pandas as pd

## Keys & Environment Variables

In [2]:
import os
from dotenv import load_dotenv

load_dotenv()

# Astra DB
ASTRA_DB_KEYSPACE = os.environ['ASTRA_DB_KEYSPACE']
ASTRA_DB_SECURE_BUNDLE_PATH = os.environ['ASTRA_DB_SECURE_BUNDLE_PATH']
ASTRA_DB_APPLICATION_TOKEN = os.environ['ASTRA_DB_APPLICATION_TOKEN']

# OpenAI Token
openai_api_key = os.environ['OPENAI_API_KEY']
openai.api_key = openai_api_key

## Select a model to compute embeddings

Embeddings are numerical representations of concepts converted to number sequences, which make it easy for computers to understand the relationships between those concepts.

This new embedding model from openAI - `text-embedding-ada-002` - replaces five separate models for text search, text similarity, and code search, and outperforms our previous most capable model, Davinci, at most tasks, while being priced 99.8% lower.

In [3]:
model_id = "text-embedding-ada-002"

## Connect to Astra DB

In [4]:
cloud_config= {
  'secure_connect_bundle': ASTRA_DB_SECURE_BUNDLE_PATH
}
auth_provider = PlainTextAuthProvider('token', ASTRA_DB_APPLICATION_TOKEN)
cluster = Cluster(cloud=cloud_config, auth_provider=auth_provider)
session = cluster.connect()
session.set_keyspace(ASTRA_DB_KEYSPACE)
session

<cassandra.cluster.Session at 0x107e1bc70>

## Database Schema

### Drop Schema

> **Note:** Only run this block when you want to DROP the schema.

In [None]:
# only use this to DROP the schema
session.execute(f"""DROP INDEX IF EXISTS idx_product_desc""")
session.execute(f"""DROP INDEX IF EXISTS idx_consumer_desc""")
session.execute(f"""DROP INDEX IF EXISTS idx_analyzer_description""")
session.execute(f"""DROP INDEX IF EXISTS idx_price""")

session.execute(f"""DROP TABLE IF EXISTS products_table""")

### Create Schema

> **Note:** Only run this block when you want to CREATE the schema.


- Note the data type `vector` in the schema below.

- Note the index on the `embedding` columns. These will Vector Similraity Search on the embeddings.

- Note the index on the `price` column. This will allow filtering based on price.

- Note the Index Analyzer on the `description` column. This will allow filtering on any word (with stemming).

In [None]:
# CREATE the schema

session.execute(f"""CREATE TABLE IF NOT EXISTS products_table
(product_id int,
 chunk_id int,

 product_name text,
 description text,
 consumer_description text,
 price int,
 
 product_description_embedding vector<float, 1536>,
 consumer_description_embedding vector<float, 1536>,

 PRIMARY KEY (product_id, chunk_id))""")

# Create Vector Indexes
session.execute("CREATE CUSTOM INDEX IF NOT EXISTS idx_product_desc ON products_table (product_description_embedding) \
                USING 'org.apache.cassandra.index.sai.StorageAttachedIndex'")
session.execute("CREATE CUSTOM INDEX IF NOT EXISTS idx_consumer_desc ON products_table (consumer_description_embedding) \
                USING 'org.apache.cassandra.index.sai.StorageAttachedIndex'")

# Create Index on Price
session.execute("CREATE CUSTOM INDEX IF NOT EXISTS idx_price ON products_table(price) \
                USING 'org.apache.cassandra.index.sai.StorageAttachedIndex'")

# Create Analyser on Description
session.execute("""CREATE CUSTOM INDEX IF NOT EXISTS idx_analyzer_description ON products_table(description) \
                USING 'org.apache.cassandra.index.sai.StorageAttachedIndex' WITH OPTIONS = {'index_analyzer': '{ \
                    "tokenizer" : {"name" : "standard"}, \
                    "filters" : [{"name" : "porterstem"}] \
                }'};""")


## Create embeddings and Store in DB 

### Read CSV file

In [None]:
products_list = pd.read_csv('ProductDatasetCombined.csv')
products_list

### Generate embeddings from openAI and store data

In [None]:
# Iterate over products
for id, row in products_list.iterrows():

    print (row.product_name)

    ### GENERATE EMBEDDINGS ###
    print ("    - generating embeddings")
    
    # Get price
    pricevalue = f"${row.price}" if isinstance(row.price, str) else ""

    # append price to description
    original = f"{row.description} price: {pricevalue}"
    # append price to consumer description
    consumer = f"{row.consumer_description} price: {pricevalue}"
    
    # Create  embedding
    embedding_product = openai.Embedding.create(input=original, model=model_id)['data'][0]['embedding']
    # Create consumer embedding
    embedding_consumer = openai.Embedding.create(input=consumer, model=model_id)['data'][0]['embedding']


    ### WRITE TO DATABASE ###
    print ("    - writing to database")
    
    # Insert Data and Embedding into database
    query = SimpleStatement(
                f"""
                INSERT INTO products_table
                (product_id, chunk_id, product_name, description, consumer_description, price, product_description_embedding, consumer_description_embedding)
                VALUES (%s, %s, %s, %s, %s, %s, %s, %s)
                """
            )
    session.execute(query, (row.product_id, 0, row.product_name, row.description, row.consumer_description, row.price, embedding_product, embedding_consumer))


## Searching

### Convert a query string into a text embedding to use as part of the query

In [5]:
MAX_PRICE = 500

customer_input = f"recommend a camera suitable for a beginner photographer that costs less than ${MAX_PRICE}"
embedding = openai.Embedding.create(input=customer_input, model=model_id)['data'][0]['embedding']
#display(embedding)

### Vector Search Only
Query using only Vector Similarity Search with ANN top 3

In [6]:
query = SimpleStatement(
    f"""
    SELECT product_id, product_name, description, consumer_description, price, similarity_dot_product(consumer_description_embedding, {embedding}) as sim
    FROM products_table
    ORDER BY consumer_description_embedding ANN OF {embedding} LIMIT 3;
    """
    )
results = session.execute(query)
products = results._current_rows

for row in products:
  print(f"""Similarity: {row.sim}\nProduct Name: {row.product_name}\nPrice: {row.price}\n{row.consumer_description}\n\n""")

Similarity: 0.9212608337402344
Product Name: Canon Digital EOS Rebel XS Starter Kit - 9320A010
Price: 499
The Canon Digital EOS Rebel XS Starter Kit (9320A010) is an all-inclusive package that is perfect for beginners looking to explore the world of photography. The kit includes the Canon EOS Rebel XS digital camera, which boasts an impressive 10.1-megapixel image sensor for stunning photos and an easy-to-use interface. It also comes with an 18-55mm f/3.5-5.6 IS lens, providing versatility for capturing a range of subjects. Additionally, the kit includes a handy camera bag to protect and carry your gear, as well as a 4GB SD memory card to store your captured moments. With this starter kit, you'll have everything you need to dive into the world of photography and capture beautiful, high-quality images.


Similarity: 0.913399338722229
Product Name: Canon EOS Rebel XSi Silver Digital SLR Camera - XSIREB1855S
Price: 799
The Canon EOS Rebel XSi Silver Digital SLR Camera is a high-quality ca

### Predicates Only Search

#### Filter on "price"

In [7]:
query = SimpleStatement(
    f"""
    SELECT product_id, product_name, description, consumer_description, price
    FROM products_table
    WHERE price < {MAX_PRICE} LIMIT 3;
    """
    )
results = session.execute(query)
products = results._current_rows

for row in products:
  print(f"""Product Name: {row.product_name}\nPrice: {row.price}\n{row.consumer_description}\n\n""")

Product Name: Apple 120GB Black 7th Generation iPod Classic - MB565LLA
Price: 349
The Apple 120GB Black 7th Generation iPod Classic - MB565LLA is a sleek and portable gadget that allows you to carry your entire music library with you wherever you go. With its generous storage capacity of 120GB, you can store up to 30,000 songs, ensuring that you never run out of music to enjoy. The black and elegant design adds a touch of sophistication, while the user-friendly interface and click wheel make it easy to navigate through your playlists. With a long battery life, you can listen to your favorite tunes for hours on end without needing to recharge. Whether you're commuting, exercising, or relaxing at home, this iPod Classic is the perfect companion for music lovers on the go.


Product Name: Audiovox 7'  Acrylic Digital Photo Frame - DPF701
Price: 0
The Audiovox 7' Acrylic Digital Photo Frame, also known as the DPF701, is a sleek and stylish device designed to display your cherished memories

#### Analyzer on "description"

In [8]:
query = SimpleStatement(
    f"""
    SELECT product_id, product_name, description, consumer_description, price
    FROM products_table
    WHERE description : 'Black' LIMIT 3;
    """
    )
results = session.execute(query)
products = results._current_rows

for row in products:
  print(f"""Product Name: {row.product_name}\nPrice: {row.price}\n{row.consumer_description}\n\n""")

Product Name: Apple 120GB Black 7th Generation iPod Classic - MB565LLA
Price: 349
The Apple 120GB Black 7th Generation iPod Classic - MB565LLA is a sleek and portable gadget that allows you to carry your entire music library with you wherever you go. With its generous storage capacity of 120GB, you can store up to 30,000 songs, ensuring that you never run out of music to enjoy. The black and elegant design adds a touch of sophistication, while the user-friendly interface and click wheel make it easy to navigate through your playlists. With a long battery life, you can listen to your favorite tunes for hours on end without needing to recharge. Whether you're commuting, exercising, or relaxing at home, this iPod Classic is the perfect companion for music lovers on the go.


Product Name: Audiovox 7'  Acrylic Digital Photo Frame - DPF701
Price: 0
The Audiovox 7' Acrylic Digital Photo Frame, also known as the DPF701, is a sleek and stylish device designed to display your cherished memories

### Hybrid Search

#### Basic Filter
Query using Predicates and Vector Similarity Search with ANN top 3

In [9]:
query = SimpleStatement(
    f"""
    SELECT product_id, product_name, description, consumer_description, price
    FROM products_table
    WHERE price < {MAX_PRICE} 
    ORDER BY consumer_description_embedding ANN OF {embedding} LIMIT 3;
    """
    )
results = session.execute(query)
products = results._current_rows

for row in products:
  print(f"""Product Name: {row.product_name}\nPrice: {row.price}\n{row.consumer_description}\n\n""")

Product Name: Canon Digital EOS Rebel XS Starter Kit - 9320A010
Price: 499
The Canon Digital EOS Rebel XS Starter Kit (9320A010) is an all-inclusive package that is perfect for beginners looking to explore the world of photography. The kit includes the Canon EOS Rebel XS digital camera, which boasts an impressive 10.1-megapixel image sensor for stunning photos and an easy-to-use interface. It also comes with an 18-55mm f/3.5-5.6 IS lens, providing versatility for capturing a range of subjects. Additionally, the kit includes a handy camera bag to protect and carry your gear, as well as a 4GB SD memory card to store your captured moments. With this starter kit, you'll have everything you need to dive into the world of photography and capture beautiful, high-quality images.


Product Name: Canon PowerShot Black 14.7 Megapixel Digital Camera - SD990ISB
Price: 380
The Canon PowerShot Black 14.7 Megapixel Digital Camera - SD990ISB is an advanced yet user-friendly camera designed for everyday

#### Filter and Analyzer

Query using Analyzer and Predicates and Vector Similarity Search with ANN top 3

In [10]:
query = SimpleStatement(
    f"""
    SELECT product_id, product_name, description, consumer_description, price
    FROM products_table
    WHERE 
      price < {MAX_PRICE} AND
      description : 'Black'
    ORDER BY consumer_description_embedding ANN OF {embedding} LIMIT 3;
    """
    )
results = session.execute(query)
products = results._current_rows

for row in products:
  print(f"""Product Name: {row.product_name}\nPrice: {row.price}\n{row.consumer_description}\n\n""")

Product Name: Canon PowerShot Black 14.7 Megapixel Digital Camera - SD990ISB
Price: 380
The Canon PowerShot Black 14.7 Megapixel Digital Camera - SD990ISB is an advanced yet user-friendly camera designed for everyday photography enthusiasts. With its impressive 14.7-megapixel resolution, this camera captures stunning high-quality images with clarity and detail. The sleek black design adds a touch of elegance, while its compact size makes it convenient to carry around. Equipped with advanced features like image stabilization, a large LCD screen, and various shooting modes, this camera makes it easy to quickly and effortlessly capture professional-looking photos. Whether you are capturing memorable family moments or exploring your creative side, the Canon PowerShot SD990ISB is an excellent choice for capturing beautiful, high-resolution images.


Product Name: Canon Black EOS 50D Digital SLR Camera Body - EOS50DBODY
Price: 148
The Canon Black EOS 50D Digital SLR Camera Body is a high-per