<div id="singlestore-header" style="display: flex; background-color: rgba(209, 153, 255, 0.25); padding: 5px;">
    <div id="icon-image" style="width: 90px; height: 90px;">
        <img width="100%" height="100%" src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/header-icons/vector-circle.png" />
    </div>
    <div id="text" style="padding: 5px; margin-left: 10px;">
        <div id="badge" style="display: inline-block; background-color: rgba(0, 0, 0, 0.15); border-radius: 4px; padding: 4px 8px; align-items: center; margin-top: 6px; margin-bottom: -2px; font-size: 80%">SingleStore Notebooks</div>
        <h1 style="font-weight: 500; margin: 8px 0 0 4px;">Hybrid Search</h1>
    </div>
</div>

*Source*: [OpenAI Cookbook](https://github.com/openai/openai-cookbook/blob/main/examples/data/AG_news_samples.csv)

Hybrid search integrates both keyword-based search and semantic search in order to combine the strengths of both and provide users with a more comprehensive and efficient search experience. This notebook is an example on how to perform hybrid search with SingleStore's database and notebooks.

## Setup
Let's first download the libraries necessary.

In [None]:
!pip install matplotlib --quiet
!pip install plotly.express --quiet
!pip install scikit-learn --quiet
!pip install tabulate --quiet
!pip install tiktoken --quiet
!pip install wget --quiet
!pip install openai --quiet

In [None]:
import pandas as pd
import os
import wget
import json

In [None]:
# Import the library for vectorizing the data (Up to 2 minutes)
!pip install sentence-transformers --quiet

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('flax-sentence-embeddings/all_datasets_v3_mpnet-base')

## Import data from CSV file
This csv file holds the title, summary, and category of approximately 2000 news articles.

In [None]:
# download reviews csv file
cvs_file_path = "https://raw.githubusercontent.com/openai/openai-cookbook/main/examples/data/AG_news_samples.csv"
file_path = "AG_news_samples.csv"

if not os.path.exists(file_path):
    wget.download(cvs_file_path, file_path)
    print("File downloaded successfully.")
else:
    print("File already exists in the local file system.")

In [None]:
df = pd.read_csv('AG_news_samples.csv')
df.pop('label_int')
df

In [None]:
data = df.values.tolist()

## Set up the database

Set up the SingleStoreDB database which will hold your data.

In [None]:
%%sql
DROP DATABASE IF EXISTS news;
CREATE DATABASE IF NOT EXISTS news;

<div class=\"alert alert-block alert-warning\" style="display: flex; background-color: rgba(255, 224, 177, 0.85); padding: 15px;">   
    <b class=\"fa fa-solid fa-exclamation-circle\"></b>    
    <div>        
        <p><b>Action Required</b></p>        
        <p>Make sure to select the <tt>news</tt> database from the drop-down menu at the top of this notebook. It updates the <tt>connection_url</tt> which is used by the <tt>%%sql</tt> magic command and SQLAlchemy to make connections to the selected database.
        </p>    
    </div>
</div>

In [None]:
%%sql
DROP TABLE IF EXISTS news_articles;
CREATE TABLE IF NOT EXISTS news_articles (
    title TEXT,
    description TEXT,
    genre TEXT,
    embedding BLOB,
    FULLTEXT (title, description)
);

Connect to your SingleStoreDB Cloud workspaces using SQLAlchemy.

In [None]:
from singlestoredb import create_engine

db_connection = create_engine().connect()

### Get embeddings for every row based on the description column

In [None]:
# Will take around 3.5 minutes to get embeddings for all 2000 columns

descriptions = [row[1] for row in data]
all_embeddings = model.encode(descriptions)
all_embeddings.shape

In [None]:
combined_data = [tuple(row) + (embedding,) for embedding, row in zip(all_embeddings, data)]

### Populate the database

In [None]:
%sql TRUNCATE TABLE news_articles;

statement = '''
        INSERT INTO news.news_articles (
            title,
            description,
            genre,
            embedding
        )
        VALUES (
            %s,
            %s,
            %s,
            %s
        )
    '''

for i, row in enumerate(combined_data):
    try:
        db_connection.execute(statement, row)
    except Exception as e:
        print("Error inserting row {}: {}".format(i, e))

## Semantic search

### Connect to OpenAI

In [None]:
import openai

# models
EMBEDDING_MODEL = "text-embedding-ada-002"
GPT_MODEL = "gpt-3.5-turbo"

In [None]:
openai.api_key = 'YOUR_API_KEY_HERE'

### Run semantic search and get scores

In [None]:
from openai.embeddings_utils import get_embedding

search_query = "Articles about Aussie captures"
search_embedding = model.encode(search_query)

# Create the SQL statement.
query_statement = """
    SELECT
        title,
        description,
        genre,
        DOT_PRODUCT(embedding, %(embedding)s) AS score
    FROM news.news_articles
    ORDER BY score DESC
    LIMIT 10
    """

# Execute the SQL statement.
results = pd.DataFrame(db_connection.execute(query_statement, dict(embedding=search_embedding)))
results

## Hybrid search

This search finds the average of the score gotten from the semantic search and the score gotten from the key-word search and sorts the news articles by this combined score to perform an effective hybrid search.

In [None]:
hyb_query = "Articles about Aussie captures"
hyb_embedding = model.encode(hyb_query)

# Create the SQL statement.
hyb_statement = """
    SELECT
        title,
        description,
        genre,
        DOT_PRODUCT(embedding, %(embedding)s) AS semantic_score,
        MATCH(title, description) AGAINST (%(query)s) AS keyword_score,
        (semantic_score + keyword_score) / 2 AS combined_score
    FROM news.news_articles
    ORDER BY combined_score DESC
    LIMIT 10
    """

# Execute the SQL statement.
hyb_results = pd.DataFrame(db_connection.execute(hyb_statement, dict(embedding=hyb_embedding, query=hyb_query)))
hyb_results

<div id="singlestore-footer" style="background-color: rgba(194, 193, 199, 0.25); height:2px; margin-bottom:10px"></div>
<div><img src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-logo-grey.png" style="padding: 0px; margin: 0px; height: 24px"/></div>