# Similarity Search with Redis
### Redis as a Vector Database

with Brian Sam-Bodden

## The "Unstructured Data" Problem

- The **balanced** of data has changed radically... 
- **~80%** of the data generated by organizations is **Unstructured**<sup>(IDC report, 2020)</sup>
- This percentage is estimated to keep growing <sup>(with CAGR of 36.5% between 2020 and 2025)</sup>




## But what is "Unstructured" Data?

- Data that does not conform to a **pre-defined** data model
- Data that can not be easily **"indexed"** by a search engine
- Data is typically **high-dimensional** and **semantically** rich
- Examples include **images**, **videos**, **free-form text**, and **audio**


![data pyramid](./images/data-balance.png)

## Dealing with Unstructured Data

- Unstructured data must be **transformed**
- To deal with the **high-dimensional** nature we extract **"features"**
- Traditional extraction techniques included **labelling**, **tagging**, and **1-hot encoding** 
- The extracted features are commonly encoded as **vectors** 


## Manual Image Feature Extraction

![manual image feature extraction](./images/image-manual-feature-extraction.png)

## Manual Text Feature Extraction

![manual text feature extraction](./images/text-manual-feature-extraction.png)

## Vectors

- They are a **Numeric representation** of something in **N-dimensional** space
- Can represent **anything**... entire documents, images, video, audio 
- Quantifies **features** or **characteristics** of the item
- More importantly... they are **comparable**

## Vectors

- A Vector is a tuple of one or more **values** called **scalars**
- Each **scalar** represents the measure of a **feature**
- Different frameworks use different data types to represent them:
  - In **Numpy** they are **Numpy Arrays** (`np.arrays`)
  - In **TensorFlow** they are **Tensors** (`tf.Tensor`)
  - In **PyTorch** they are also **Tensors** (`torch.tensor`)

## 3 "Bicycle Reviews" Features as a Vector

![represenation of a vector](./images/bicycle_vector.png)

## 🧨 Issues with Feature Engineering

- **Time-consuming**: Might require domain knowledge and expertise.
- **High dimensionality**: Can lead to a high-dimensional feature space.
- **Lack of scalability**: Not easily scalable, more data **==** more people.

## Enter "Vector Embeddings"

- **Machine Learning** / **Deep Learning** have leaped forward in last decade 
- ML models **outperform** humans in many tasks nowadays
  - 🔥 **CV** (Computer Vision) models excel at detection/classification
  - 🔥 **LLMs** (Large Language Models) have advanced exponentially
- Today, most vectors are **generated** using pre-trained **ML Models**

## Enter "Vector Embeddings"

- ML models can **extract contextual meaning** from unstructured data
- Reduce semantically-rich high-dimensional inputs and **"flatten"** them 
- Flatten representations retain the semantic information and make for ideal vectors
- Once in vector form the world of **linear algebra** allows to operate on vectors

## Vector Embeddings from a CV Model

![vector embedding extraction](./images/embedding-extraction.png)

## Enter "Vector Databases"

- Pure Vector Databases **efficiently store** Vectors (along with **metadata**)
- Enable **searching** for vectors using **"similarity"** and **"distance"** metrics
- Enable **hybrid searches** combining vectors and metadata

## Redis as a Vector Database

- Redis provides **Search Capabilities** for structured/semi-structured data
- Redis supports `TEXT`, `NUMERIC`, `TAG`, `GEO` and `GEOSHAPE` fields
- Redis introduces the **`VECTOR`** schema field type for vector support 
- **`VECTOR`** field allows **indexing**, and **querying** vectors in **Hashes** or **JSON**
- Redis **in-memory** approach provides **fast** and **efficient** vector searches





## Redis as a Vector Database

- Capabilities:
  - **3** distance metrics: **Euclidean**, **Internal Product** and **Cosine**
  - **2** indexing methods: **HNSW** and **Flat**
  - **Hybrid queries** combined with `GEO`, `TAG`, `TEXT` or `NUMERIC`

## 🛠️ Demo
### Adding Similarity Search to the **Redis Bike Company**

![bikeshop](./images/bike_shop.png)

## Connecting to Redis Stack

* **Redis Stack** instance running locally
* Import `redis-py` client library
* Create a **client connection**

In [3]:
import redis
client = redis.Redis(host = 'localhost', port=6379, decode_responses=True)

* Use the `PING` command to check that Redis is up and running:

In [4]:
client.ping()

True

## Inspect the Bikes

* Use the `JSON.GET` command to retrive the bike with key `redisbikeco:bike:rbc00067`:

In [5]:
bike067 = client.json().get('redisbikeco:bike:rbc00067')
bike067

{'stockcode': 'RBC00067',
 'model': 'Mars',
 'brand': 'Bicyk',
 'price': 17350,
 'type': 'Kids Mountain Bikes',
 'specs': {'material': 'aluminium', 'weight': 9.5},
 'description': "Kids want to ride with as little weight as possible. Especially on an incline! The Shimano gear system effectively does away with an external cassette, so is super low maintenance in terms of wear and tear. All bikes are great in their own way, but this bike will be one of the best you've ridden."}

## Generating Embeddings with ML

![ML Models for embeddings](./images/target-model-embeddings-redis.png)

## Where to find pre-trained models?

![Model Zoos](./images/model-zoos.png)

## Sentence Transformers

![SBERT](./images/sbert-net.png)

- **SentenceTransformers** to **generate embeddings** for the bikes **descriptions** 
- **Sentence-BERT** (**SBERT**) produces **contextually rich** sentence embeddings
- Embeddings provide **efficient sentence-level** semantic similarity
- Improves tasks like **semantic search** and **text grouping**

## Selecting a suitable pre-trained Model

- We must pick a **suitable model** for **generating embeddings**
- We want to query for bicycles using **short queries** against the **longer** bicycle **descriptions**
- This is referred to as **"Asymmetric Semantic Search"** 
- Used when **search query** and the **documents** being searched are of **different nature or structure**

## Selecting a suitable pre-trained Model

- For **asymmetric semantic search** suitable models include pre-trained **MS MARCO** Models
- Optimized for understanding **real-world queries** and producing **relevant responses**
- **Highest performing** MS MARCO model is **`msmarco-distilbert-base-v4`**
  - which is tuned for **cosine-similarity** 

In [6]:
from sentence_transformers import SentenceTransformer

embedder = SentenceTransformer('msmarco-distilbert-base-v4')

## Extract the Bike's Description

- Let's extract the `description` into the `sample_description` var:

In [7]:
sample_description = bike067['description']
sample_description

"Kids want to ride with as little weight as possible. Especially on an incline! The Shimano gear system effectively does away with an external cassette, so is super low maintenance in terms of wear and tear. All bikes are great in their own way, but this bike will be one of the best you've ridden."

## Generating an Embedding Vector

- To generate the vector embeddings, we use the `encode` function:

In [8]:
embedding = embedder.encode(sample_description)
VECTOR_DIMENSION = len(embedding)
VECTOR_DIMENSION

768

- Let's take a peek at the first **5** elements of the generated vector:

In [9]:
print(embedding.tolist()[:5])

[-0.5880345106124878, -0.11593937873840332, -0.06424460560083389, 0.5289014577865601, 0.011814583092927933]


## Generate Embeddings for the Bikes' Description

* To vectorize all the descriptions in the database, we will first collect all the Redis keys for the bikes:



In [10]:
keys = sorted(client.keys('redisbikeco:bike:*'))
len(keys)

111

In [11]:
print(keys[:3])

['redisbikeco:bike:rbc00001', 'redisbikeco:bike:rbc00002', 'redisbikeco:bike:rbc00003']


## Generate Embeddings for the Bikes' Description

* With the keys in `keys` we can use the Redis `JSON.MGET` command to retrieve just the `description` field
* We'll store all the descriptions in the `descriptions` variable
* The `encode` method can take a List of text passages to encode

In [12]:
import numpy as np

descriptions = client.json().mget(keys, '$.description')
descriptions = [item for sublist in descriptions for item in sublist]
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()

* Let's checked that we've generated the correct number of embedding vectors:

In [13]:
len(embeddings)

111

## Add the embeddings to the JSON documents

- Now we can add the vectorized descriptions to the JSON documents in Redis
- Use the `JSON.SET` command to insert a new field in each of the documents at `$.description_embeddings`
- Use Redis' **pipeline** mode to minimize the round-trip times:

In [14]:
pipeline = client.pipeline()

for key, embedding in zip(keys, embeddings):
    pipeline.json().set(key, '$.description_embeddings', embedding)

pipeline.execute()
print('Vector Embeddings Saved!')

Vector Embeddings Saved!


## Inspect the Bikes' Documents

- Let's inspect one of the vectorized bike documents using the `JSON.GET` command:

In [15]:
import json

print(json.dumps(client.json().get('redisbikeco:bike:rbc00001'), indent=2)) 

{
  "stockcode": "RBC00001",
  "model": "Deimos",
  "brand": "Ergonom",
  "price": 184950,
  "type": "Enduro Bikes",
  "specs": {
    "material": "alloy",
    "weight": 14.0
  },
  "description": "Redesigned for the 2020 model year, this bike impressed our testers and is the best all-around trail bike we've ever tested. It has a lightweight frame and all-carbon fork, with cables routed internally. It's for the rider who wants both efficiency and capability.",
  "description_embeddings": [
    0.28845763206481934,
    -0.4658890068531037,
    0.4761373996734619,
    0.13525675237178802,
    0.2378234565258026,
    -0.4099092781543732,
    -0.12295658141374588,
    -0.8986800909042358,
    0.2880902588367462,
    -0.763296365737915,
    0.4726661145687103,
    -0.1130569502711296,
    -0.17979376018047333,
    -0.14443010091781616,
    0.678691029548645,
    -0.7022264003753662,
    0.3896012008190155,
    -0.16891467571258545,
    0.28193211555480957,
    0.3375028967857361,
    0.79082

## Create Search Index for the Bikes Collection

- To define the index we'll import the `IndexDefinition` and the `IndexType`
- To define the schema fields we'll use the classes `TagField`, `TextField`, `NumericField`, and **`VectorField`**
- We'll create an index named **`idx:bikes_vss`**

In [16]:
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.field import TagField, TextField, NumericField, VectorField
from redis.commands.search.query import Query

INDEX_NAME = 'idx:bikes_vss'
DOC_PREFIX = 'redisbikeco:bike:'

## The Search Index Schema

In [17]:
try:
    client.ft(INDEX_NAME).info()
    print('Index already exists!')
except:
    schema = (
        TextField('$.model', no_stem=True, as_name='model'),  
        TextField('$.brand', no_stem=True, as_name='brand'),
        NumericField('$.price', as_name='price'),
        TagField('$.type', as_name='type'),
        TextField('$.description', as_name='description'),
        VectorField('$.description_embeddings', 'FLAT', {
          'TYPE': 'FLOAT32',
          'DIM': VECTOR_DIMENSION,
          'DISTANCE_METRIC': 'COSINE',
        },  as_name='vector'),
    )

    # index Definition
    definition = IndexDefinition(prefix=[DOC_PREFIX], index_type=IndexType.JSON)

    # create Index
    client.ft(INDEX_NAME).create_index(fields=schema, definition=definition)

## `VECTOR` Schema Field Definition

* **Indexing method**: `FLAT` **(brute-force indexing)** or `HNSW` **(Hierarchical Navigable Small World)**
* **Vector Type**: `FLOAT32` or `FLOAT64`.
* **Vector Dimension**: The length or dimension of our embeddings (`768`).
* **Distance Metric**: `L2` **(Euclidean distance)**, `IP` **(Inner Product)**, or `COSINE` **(Cosine Similarity)** 

## Check the state of the Index

- `FT.CREATE` creates the index
- The **indexing process** is automatically started in the **background**
- In the blink of an eye, our JSON documents are indexed and ready to be searched
- To corroborate that, we use the **`FT.INFO`**:

In [18]:
info = client.ft(INDEX_NAME).info()

num_docs = info['num_docs']
indexing_failures = info['hash_indexing_failures']
total_indexing_time = info['total_indexing_time']
percent_indexed = float(info['percent_indexed']) * 100


print(f"{num_docs} docs ({percent_indexed}%) indexed w/ {indexing_failures} failures in {float(total_indexing_time):.2f} msecs")

21 docs (18.103448275862068%) indexed w/ 0 failures in 1.97 msecs


## Structured Data Searches with Redis

- Let's test the non-vector part of the index first:

- Retrieve all bikes where the `brand` is `Peaknetic`

In [19]:
query = (
    Query('@brand:Peaknetic').return_fields('id', 'brand', 'model', 'price')
)
client.ft(INDEX_NAME).search(query).docs

[Document {'id': 'redisbikeco:bike:rbc00041', 'payload': None, 'brand': 'Peaknetic', 'model': 'Orcus', 'price': '229872'},
 Document {'id': 'redisbikeco:bike:rbc00108', 'payload': None, 'brand': 'Peaknetic', 'model': 'Soothe Electric bike', 'price': '160641'},
 Document {'id': 'redisbikeco:bike:rbc00083', 'payload': None, 'brand': 'Peaknetic', 'model': 'Vesta', 'price': '17204'},
 Document {'id': 'redisbikeco:bike:rbc00026', 'payload': None, 'brand': 'Peaknetic', 'model': 'Nereid', 'price': '74814'},
 Document {'id': 'redisbikeco:bike:rbc00069', 'payload': None, 'brand': 'Peaknetic', 'model': 'Hiiaka', 'price': '153895'}]

- Find all `Peaknetic` bikes price less than or equal to `10000`

In [20]:
query = (
    Query('@brand:Peaknetic @price:[0 10000]').return_fields('id', 'brand', 'model', 'price')
)
client.ft(INDEX_NAME).search(query).docs

[Document {'id': 'redisbikeco:bike:rbc00058', 'payload': None, 'brand': 'Peaknetic', 'model': 'Quaoar', 'price': '7764'}]

## Semantic Queries

- We want to query for bikes using short query prompts
- Let's put our queries in a list so we can vectorize them and execute them in bulk:

In [21]:
queries = [
    'Bike for small kids',
    'Best Mountain bikes for kids',
    'Cheap Mountain bike for kids',
    'Female specific mountain bike',
    'Road bike for beginners',
    'Commuter bike for people over 60',
    'Comfortable commuter bike',
    'Good bike for college students',
    'Mountain bike for beginners',
    'Vintage bike',
    'Comfortable city bike'
]

In [22]:
encoded_queries = embedder.encode(queries)
len(encoded_queries)

11

## Visualizing Embeddings

- The image below was generated using **t-distributed stochastic neighbor embedding** (**t-SNE**) and a small subset of the embeddings
- **t-SNE** is a dimensionality reduction techniques that maps the higher dimension embeddings to a 2 or 3-D space

![TSNE Visualization](./images/embeddings-tsne.png)

## Constructing a "Pure KNN" VSS Query

- We'll start with a **K-nearest neighbors** (KNN) query 
- KNN goal is to find the **most similar** items to a given query item
- KNN calculates the **distance** between the query vector and each vector in the database
- Returns 'K' items with the **smallest** distances
- These are considered to be the most similar items

## Constructing a "Pure KNN" VSS Query

In [23]:
query = (
    Query('(*)=>[KNN 3 @vector $query_vector AS vector_score]')
     .sort_by('vector_score')
     .return_fields('vector_score', 'id', 'brand', 'model', 'description')
     .dialect(2)
)

- The syntax for KNN queries is `(*)=>[vector_similarity_query>]` 
  - where the `(*)` (the `*` meaning all) is the filter query for the search engine.
  - `$query_vector` represents the query parameter we'll use to pass the vectorized query prompt.
  - results are filtered by `vector_score`
  - Query returns the `vector_score`, the `id` of the matched documents, the `$.brand`, `$.model`, and `$.description`

## 🩼 Pretty-printing Query Results

- We want to run the queries in bulk 
- Visualize the results in a nice table
- We've added a utility function `create_query_table`

In [37]:
import pandas as pd
from IPython.display import display, HTML

def create_query_table(query, queries, encoded_queries, extra_params = {}):
    results_list = []
    for i, encoded_query in enumerate(encoded_queries):
        result_docs = client.ft(INDEX_NAME).search(query, { 'query_vector': np.array(encoded_query, dtype=np.float32).tobytes() } | extra_params).docs
        for doc in result_docs:
            vector_score = round(1 - float(doc.vector_score), 2)
            results_list.append({
                'query': queries[i], 
                'score': vector_score, 
                'id': doc.id,
                'brand': doc.brand,
                'model': doc.model,
                'description': doc.description
            })

    # Pretty-print the table
    queries_table = pd.DataFrame(results_list)
    queries_table.sort_values(by=['query', 'score'], ascending=[True, False], inplace=True)
    queries_table['query'] = queries_table.groupby('query')['query'].transform(lambda x: [x.iloc[0]] + ['']*(len(x)-1))
    queries_table['description'] = queries_table['description'].apply(lambda x: (x[:497] + '...') if len(x) > 500 else x)
    html = queries_table.to_html(index=False, classes='striped_table')  
    display(HTML(html))

## 🏃🏾‍♀️Running the Query

- With the Query prepared in `query`
- and the query prompts in `queries` 
- and the encoded queries in `encoded_queries`
- we can use the `create_query_table` function to generate a table of results:

## 🏃🏾‍♀️Running the Query

In [38]:
create_query_table(query, queries, encoded_queries)

query,score,id,brand,model,description
Best Mountain bikes for kids,0.66,redisbikeco:bike:rbc00039,ScramBikes,Mercury,"This bike is an entry-level kids mountain bike that is a good choice if your MTB enthusiast is just taking to the trails and wants good suspension and easy gearing, without the cost of some more expensive models. A set of powerful Shimano hydraulic disc brakes provide ample stopping ability. If you're after a budget option, this is one of the best bikes you could get."
,0.63,redisbikeco:bike:rbc00079,7th Generation,Mars,"This bike is an entry-level kids mountain bike that is a good choice if your MTB enthusiast is just taking to the trails and wants good suspension and easy gearing, without the cost of some more expensive models. The Shimano Claris 8-speed groupset gives plenty of gear range to tackle hills and there's room for mudguards and a rack too. All bikes are great in their own way, but this bike will be one of the best you've ridden."
,0.63,redisbikeco:bike:rbc00040,Velorim,Polydeuces,"This bike gives kids aged six years and older a durable and uberlight mountain bike for their first experience on tracks and easy cruising through forests and fields. The Plush saddle softens over time with use. The included Seatpost, however, is easily adjustable and adds to this bike's fantastic rating, as do the hydraulic disc brakes from Tektro. If you're after a budget option, this is one of the best bikes you could get."
Bike for small kids,0.52,redisbikeco:bike:rbc00101,Velorim,Jigger,"Small and powerful, the Jigger is the best ride for the smallest of tikes! This is the tiniest kids’ pedal bike on the market available without a coaster brake, the Jigger is the vehicle of choice for the rare tenacious little rider raring to go. We say rare because this smokin’ little bike is not ideal for a nervous first-time rider, but it’s a true giddy up for a true speedster. The Jigger is a 12 inch lightweight kids bicycle and it will meet your little one’s need for speed. It’s a single..."
,0.48,redisbikeco:bike:rbc00033,nHill,Telesto,"Small and powerful, this bike is the best ride for the smallest of tikes. The carefully crafted 50-34 tooth chainset and 11-32 tooth cassette give an easy-on-the-legs bottom gear for climbing, and the high-quality Vittoria Zaffiro tires give balance and grip. It includes a low-step frame, our memory foam seat, bump-resistant shocks and conveniently placed thumb throttle. If you're after a budget option, this is one of the best bikes you could get."
,0.47,redisbikeco:bike:rbc00056,nHill,Vanth,"The innovative braking system on this bike has been a game changer in the kids' bike world. It has a lightweight frame and all-carbon fork, with cables routed internally. If you're after a budget option, this is one of the best bikes you could get."
Cheap Mountain bike for kids,0.51,redisbikeco:bike:rbc00035,nHill,Ncc1702,"This bike is an entry-level kids mountain bike that is a good choice if your MTB enthusiast is just taking to the trails and wants good suspension and easy gearing, without the cost of some more expensive models. It has a lightweight frame and all-carbon fork, with cables routed internally. It comes fully assembled (no convoluted instructions!) and includes a sturdy helmet at no cost."
,0.49,redisbikeco:bike:rbc00103,Nord,Chook air 5,"The Chook Air 5 gives kids aged six years and older a durable and uberlight mountain bike for their first experience on tracks and easy cruising through forests and fields. The lower top tube makes it easy to mount and dismount in any situation, giving your kids greater safety on the trails. The Chook Air 5 is the perfect intro to mountain biking."
,0.49,redisbikeco:bike:rbc00079,7th Generation,Mars,"This bike is an entry-level kids mountain bike that is a good choice if your MTB enthusiast is just taking to the trails and wants good suspension and easy gearing, without the cost of some more expensive models. The Shimano Claris 8-speed groupset gives plenty of gear range to tackle hills and there's room for mudguards and a rack too. All bikes are great in their own way, but this bike will be one of the best you've ridden."
Comfortable city bike,0.52,redisbikeco:bike:rbc00053,Velorim,Pallas,"Urban riding, gentle off-road ebike. The Plush saddle softens over time with use. The included Seatpost, however, is easily adjustable and adds to this bike's fantastic rating, as do the hydraulic disc brakes from Tektro. If you're after a budget option, this is one of the best bikes you could get."


## Hybrid Queries

- "Pure KNN" queries evaluate a query against the **whole space of vectors**
- The larger the collection, the more **computationally expensive**
- Unstructured data does not live in isolation
- Rich search experiences must allow searching all data (structured and unstructured) 

## Hybrid Queries

- For example, users might arrive at your search interface with a brand preference in mind
- Redis can use this information to pre-filter the search space
- In the hybrid query definition below:
  - we pre-filter using the `brand` to consider only `Peaknetic` brand bikes 
  - before our primary filter query was `(*)`, AKA everything
  - we narrow the search space using `(@brand:Peaknetic)` before the KNN query

In [26]:
hybrid_query = (
    Query('(@brand:Peaknetic)=>[KNN 3 @vector $query_vector AS vector_score]')
     .sort_by('vector_score')
     .return_fields('vector_score', 'id', 'brand', 'model', 'description')
     .dialect(2)
)

## 🏃🏾‍♀️Running the Query

In [27]:
create_query_table(hybrid_query, queries, encoded_queries)

query,score,id,brand,model,description
Best Mountain bikes for kids,0.5,redisbikeco:bike:rbc00022,Peaknetic,Quaoar,"This bike gives kids aged six years and older a durable and uberlight mountain bike for their first experience on tracks and easy cruising through forests and fields. At this price point, you get a Shimano 105 hydraulic groupset with a RS510 crank set. The wheels have had a slight upgrade for 2022, so you're now getting DT Swiss R470 rims with the Formula hubs. It comes fully assembled (no convoluted instructions!) and includes a sturdy helmet at no cost."
,0.48,redisbikeco:bike:rbc00026,Peaknetic,Nereid,"Redesigned for the 2020 model year, this bike impressed our testers and is the best all-around trail bike we've ever tested. The Shimano Claris 8-speed groupset gives plenty of gear range to tackle hills and there's room for mudguards and a rack too. That said, we feel this bike is a fantastic option for the rider seeking the versatility that this highly adjustable bike provides."
,0.47,redisbikeco:bike:rbc00082,Peaknetic,Jupiter,"Redesigned for the 2020 model year, this bike impressed our testers and is the best all-around trail bike we've ever tested. The carefully crafted 50-34 tooth chainset and 11-32 tooth cassette give an easy-on-the-legs bottom gear for climbing, and the high-quality Vittoria Zaffiro tires give balance and grip.It includes a low-step frame, our memory foam seat, bump-resistant shocks and conveniently placed thumb throttle. Put it all together and you get a bike that helps redefine what can be do..."
Bike for small kids,0.41,redisbikeco:bike:rbc00093,Peaknetic,Titan,"Easy, intuitive, and very lightweight, these bikes are carefully designed to make bike riding as natural as possible. The carefully crafted 50-34 tooth chainset and 11-32 tooth cassette give an easy-on-the-legs bottom gear for climbing, and the high-quality Vittoria Zaffiro tires give balance and grip. It includes a low-step frame, our memory foam seat, bump-resistant shocks and conveniently placed thumb throttle. It's for the rider who wants both efficiency and capability."
,0.39,redisbikeco:bike:rbc00058,Peaknetic,Quaoar,"For shy or agressive riders, paved or dirt trails, this bike boasts kid-friendly geometry and strong quality parts at a minimal price point. The carefully crafted 50-34 tooth chainset and 11-32 tooth cassette give an easy-on-the-legs bottom gear for climbing, and the high-quality Vittoria Zaffiro tires give balance and grip.It includes a low-step frame, our memory foam seat, bump-resistant shocks and conveniently placed thumb throttle. Put it all together and you get a bike that helps redefin..."
,0.37,redisbikeco:bike:rbc00108,Peaknetic,Soothe Electric bike,"The Soothe is an everyday electric bike, from the makers of Exercycle bikes, that conveys style while you get around the city. The Soothe lives up to its name by keeping your posture upright and relaxed for the ride ahead, keeping those aches and pains from riding at bay. It includes a low-step frame , our memory foam seat, bump-resistant shocks and conveniently placed thumb throttle."
Cheap Mountain bike for kids,0.39,redisbikeco:bike:rbc00022,Peaknetic,Quaoar,"This bike gives kids aged six years and older a durable and uberlight mountain bike for their first experience on tracks and easy cruising through forests and fields. At this price point, you get a Shimano 105 hydraulic groupset with a RS510 crank set. The wheels have had a slight upgrade for 2022, so you're now getting DT Swiss R470 rims with the Formula hubs. It comes fully assembled (no convoluted instructions!) and includes a sturdy helmet at no cost."
,0.32,redisbikeco:bike:rbc00058,Peaknetic,Quaoar,"For shy or agressive riders, paved or dirt trails, this bike boasts kid-friendly geometry and strong quality parts at a minimal price point. The carefully crafted 50-34 tooth chainset and 11-32 tooth cassette give an easy-on-the-legs bottom gear for climbing, and the high-quality Vittoria Zaffiro tires give balance and grip.It includes a low-step frame, our memory foam seat, bump-resistant shocks and conveniently placed thumb throttle. Put it all together and you get a bike that helps redefin..."
,0.3,redisbikeco:bike:rbc00083,Peaknetic,Vesta,"For shy or aggressive riders, paved or dirt trails, this bike boasts kid-friendly geometry and strong quality parts at a minimal price point. The Plush saddle softens over time with use. The included Seatpost, however, is easily adjustable and adds to this bike's fantastic rating, as do the hydraulic disc brakes from Tektro. Put it all together and you get a bike that helps redefine what can be done for this price."
Comfortable city bike,0.46,redisbikeco:bike:rbc00041,Peaknetic,Orcus,"A city eBike that could double as a short-haul commuter. The hydraulic disc brakes provide powerful and modulated braking even in wet conditions, whilst the 3x8 drivetrain offers a huge choice of gears. This is the bike for the rider who wants trail manners with low fuss ownership."


## Range Queries

- Range queries retrieve items within a specific **distance** from a query vector
- We consider **"distance"** to be the **measure of similarity** 
- The smaller the distance, the more similar the items
- For example, to return the top `4` bikes within a `0.55` radius of query: 

```
1️⃣ FT.SEARCH idx:bikes_vss 
2️⃣   @vector:[VECTOR_RANGE $range $query_vector]=>{$YIELD_DISTANCE_AS: vector_score} 
3️⃣   SORTBY vector_score ASC
4️⃣   LIMIT 0 4 
5️⃣   DIALECT 2 
6️⃣   PARAMS 4 range 0.55 query_vector "\x9d|\x99>bV#\xbfm\x86\x8a\xbd\xa7~$?*...."
```

## Range Queries

- In Python:

In [28]:
range_query = (
    Query('@vector:[VECTOR_RANGE $range $query_vector]=>{$YIELD_DISTANCE_AS: vector_score}') 
    .sort_by('vector_score')
    .return_fields('vector_score', 'id', 'brand', 'model', 'description')
    .paging(0, 4)
    .dialect(2)
)

## 🏃🏾‍♀️Running the Query

In [29]:
create_query_table(range_query, queries[:1], encoded_queries[:1], {'range': 0.55})

query,score,id,brand,model,description
Bike for small kids,0.52,redisbikeco:bike:rbc00101,Velorim,Jigger,"Small and powerful, the Jigger is the best ride for the smallest of tikes! This is the tiniest kids’ pedal bike on the market available without a coaster brake, the Jigger is the vehicle of choice for the rare tenacious little rider raring to go. We say rare because this smokin’ little bike is not ideal for a nervous first-time rider, but it’s a true giddy up for a true speedster. The Jigger is a 12 inch lightweight kids bicycle and it will meet your little one’s need for speed. It’s a single..."
,0.48,redisbikeco:bike:rbc00033,nHill,Telesto,"Small and powerful, this bike is the best ride for the smallest of tikes. The carefully crafted 50-34 tooth chainset and 11-32 tooth cassette give an easy-on-the-legs bottom gear for climbing, and the high-quality Vittoria Zaffiro tires give balance and grip. It includes a low-step frame, our memory foam seat, bump-resistant shocks and conveniently placed thumb throttle. If you're after a budget option, this is one of the best bikes you could get."
,0.47,redisbikeco:bike:rbc00056,nHill,Vanth,"The innovative braking system on this bike has been a game changer in the kids' bike world. It has a lightweight frame and all-carbon fork, with cables routed internally. If you're after a budget option, this is one of the best bikes you could get."
,0.46,redisbikeco:bike:rbc00057,nHill,Nereid,"Kids want to ride with as little weight as possible. Especially on an incline! The carefully crafted 50-34 tooth chainset and 11-32 tooth cassette give an easy-on-the-legs bottom gear for climbing, and the high-quality Vittoria Zaffiro tires give balance and grip.It includes a low-step frame , our memory foam seat, bump-resistant shocks and conveniently placed thumb throttle. All bikes are great in their own way, but this bike will be one of the best you've ridden."


## Visualizing High-dimensional vectors with dimensionality reduction

In [30]:
%%html
<iframe src="https://projector.tensorflow.org/" width="1920" height="540"></iframe>

## Recap

- The tools and techniques to unlock the value in **Unstructured Data** have evolved greatly...
- Redis **in-memory first** approach makes it a perfect fit for vector similarity searches
- Redis natively supports vector searches over **Hashes** and **JSON**
- Redis combines the power of searching over semi-structured and unstructured data
  - with the performance you've come to expect from Redis 



## Learn more at Redis University

## `https://university.redis.com`

![Redis U](./images/redis_university.png)

## Thank You!

![Simon and BSB](./images/simon_and_bsb.png)