### Install the required Python packages
Create a Python virtual environment and install the following dependencies using pip:

- redis: You can find further details about the redis-py client library in the clients section of this documentation site.
- pandas: Pandas is a data analysis library.
- sentence-transformers: You will use the SentenceTransformers framework to generate embeddings on full text.
- stabulate: pandas uses tabulate to render Markdown.
You will also need the following imports in your Python code:

In [1]:
import json
import time

import numpy as np
import pandas as pd
import requests
import redis
from redis.commands.search.field import (
    NumericField, 
    TagField,
    TextField,
    VectorField
)

from redis.commands.search.indexDefinition import IndexDefinition
from redis.commands.search.query import Query

from sentence_transformers import SentenceTransformer

  from tqdm.autonotebook import tqdm, trange


In [2]:
client = redis.Redis(host="localhost", port=6379, decode_responses=True)

1. Fetch the demo data
You need to first fetch the demo dataset as a JSON array:

In [3]:
URL = ("https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started/main/data/bikes.json")
response = requests.get(URL, timeout=10)
bikes = response.json()

json.dumps function converts any given data to a json object format

In [4]:
json.dumps(bikes[0], indent=2)

'{\n  "model": "Jigger",\n  "brand": "Velorim",\n  "price": 270,\n  "type": "Kids bikes",\n  "specs": {\n    "material": "aluminium",\n    "weight": "10"\n  },\n  "description": "Small and powerful, the Jigger is the best ride for the smallest of tikes! This is the tiniest kids\\u2019 pedal bike on the market available without a coaster brake, the Jigger is the vehicle of choice for the rare tenacious little rider raring to go. We say rare because this smokin\\u2019 little bike is not ideal for a nervous first-time rider, but it\\u2019s a true giddy up for a true speedster. The Jigger is a 12 inch lightweight kids bicycle and it will meet your little one\\u2019s need for speed. It\\u2019s a single speed bike that makes learning to pump pedals simple and intuitive. It even has  a handle in the bottom of the saddle so you can easily help your child during training!  The Jigger is among the most lightweight children\\u2019s bikes on the planet. It is designed so that 2-3 year-olds fit com

### In redis pipelines allow you to batch multiple commands and send them to the server in one go.
Improves performance by reducing the number of round trips

enumerate(bikes, start=1) provides both the index (i) and the bike object from the bikes list.
The start=1 means that indexing starts at 1 instead of 0.
The redis_key is created dynamically for each bike, using the index i.
The f"bikes:{i:03}" creates a Redis key like "bikes:001", "bikes:002", etc., 
formatting the index as a 3-digit number, padded with zeros if necessary.


This command is using the RedisJSON module to store a JSON object in Redis.
pipeline.json().set() is setting the JSON data for the Redis key redis_key.
"$" is the path in the JSON document where you want to set the value
(in this case, the root of the JSON document).
The bike object is stored as JSON under the corresponding key (bikes:001, bikes:002, etc.).

After adding all the commands to the pipeline (i.e., setting the JSON objects for multiple bikes),
pipeline.execute() sends all the batched commands to Redis for execution.
res will contain the results of the commands in the order they were added to the pipeline.

In [5]:
pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):
    redis_key = f"bikes:{i:03}"
    pipeline.json().set(redis_key, "$", bike)
res = pipeline.execute()


Once loaded, you can retrieve a specific attributes from one of the JSON documents in Redis using a JSONPath expression

In [6]:
res = client.json().get("bikes:001", "$.model")
res

['Jigger']

3. Select a text embedding model
HuggingFace has a large catalog of text embedding models that are locally servable through the SentenceTransformers framework. Here we use the MS MARCO model that is widely used in search engines, chatbots, and other AI applications.

In [7]:
from sentence_transformers import SentenceTransformer

embedder = SentenceTransformer('msmarco-distilbert-base-v4')

4. Generate text embeddings
Iterate over all the Redis keys with the prefix bikes:

In [8]:
keys = sorted(client.keys("bikes:*"))
keys

['bikes:001',
 'bikes:002',
 'bikes:003',
 'bikes:004',
 'bikes:005',
 'bikes:006',
 'bikes:007',
 'bikes:008',
 'bikes:009',
 'bikes:010',
 'bikes:011']

Use the keys as input to the JSON.MGET command, along with the $.description field, to collect the descriptions in a list. Then, pass the list of descriptions to the .encode() method:

In [9]:
descriptions = client.json().mget(keys, "$.description")
descriptions = [item for sublist in descriptions for item in sublist]
embedder = SentenceTransformer("msmarco-distilbert-base-v4")
embedder

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

In [10]:
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
embeddings
VECTOR_DIMENSION = len(embeddings[0])

In [11]:
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
    pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()


[True, True, True, True, True, True, True, True, True, True, True]

In [12]:
res = client.json().get("bikes:010")
res

{'model': 'Summit',
 'brand': 'nHill',
 'price': 1200,
 'type': 'Mountain Bike',
 'specs': {'material': 'alloy', 'weight': '11.3'},
 'description': 'This budget mountain bike from nHill performs well both on bike paths and on the trail. The fork with 100mm of travel absorbs rough terrain. Fat Kenda Booster tires give you grip in corners and on wet trails. The Shimano Tourney drivetrain offered enough gears for finding a comfortable pace to ride uphill, and the Tektro hydraulic disc brakes break smoothly. Whether you want an affordable bike that you can take to work, but also take trail riding on the weekends or you’re just after a stable, comfortable ride for the bike path, the Summit gives a good value for money.',
 'description_embeddings': [-0.5381148457527161,
  -0.4946589469909668,
  -0.025176772847771645,
  0.6540349721908569,
  -0.062414076179265976,
  -0.6898808479309082,
  -0.543022096157074,
  -0.5903494954109192,
  0.5061325430870056,
  0.2008494883775711,
  0.80156397819519

In [15]:
from redis.commands.search.indexDefinition import IndexType
schema = (
    TextField("$.model", no_stem=True, as_name="model"),
    TextField("$.brand", no_stem=True, as_name="brand"),
    NumericField("$.price", as_name="price"),
    TagField("$.type", as_name="type"),
    TextField("$.description", as_name="description"),
    VectorField(
        "$.description_embeddings",
        "FLAT",
        {
            "TYPE": "FLOAT32",
            "DIM": VECTOR_DIMENSION,
            "DISTANCE_METRIC": "COSINE",
        },
        as_name="vector",
    ),
)
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON)
res = client.ft("idx:bikes_vss").create_index(fields=schema, definition=definition)


In [18]:
info = client.ft("idx:bikes_vss").info()
num_docs = info["num_docs"]
indexing_failures = info["hash_indexing_failures"]

In [19]:
queries = [
    "Bike for small kids",
    "Best Mountain bikes for kids",
    "Cheap Mountain bike for kids",
    "Female specific mountain bike",
    "Road bike for beginners",
    "Commuter bike for people over 60",
    "Comfortable commuter bike",
    "Good bike for college students",
    "Mountain bike for beginners",
    "Vintage bike",
    "Comfortable city bike",
]

In [20]:
encoded_queries = embedder.encode(queries)
len(encoded_queries)

11

In [21]:
query = (
    Query('(*)=>[KNN 3 @vector $query_vector AS vector_score]')
     .sort_by('vector_score')
     .return_fields('vector_score', 'id', 'brand', 'model', 'description')
     .dialect(2)
)

In [22]:
encoded_query = encoded_queries[0]
client.ft('idx:bikes_vss').search(
    query,
    {
      'query_vector': np.array(encoded_query, dtype=np.float32).tobytes()
    }
).docs

[Document {'id': 'bikes:001', 'payload': None, 'vector_score': '0.476149082184', 'brand': 'Velorim', 'model': 'Jigger', 'description': 'Small and powerful, the Jigger is the best ride for the smallest of tikes! This is the tiniest kids’ pedal bike on the market available without a coaster brake, the Jigger is the vehicle of choice for the rare tenacious little rider raring to go. We say rare because this smokin’ little bike is not ideal for a nervous first-time rider, but it’s a true giddy up for a true speedster. The Jigger is a 12 inch lightweight kids bicycle and it will meet your little one’s need for speed. It’s a single speed bike that makes learning to pump pedals simple and intuitive. It even has  a handle in the bottom of the saddle so you can easily help your child during training!  The Jigger is among the most lightweight children’s bikes on the planet. It is designed so that 2-3 year-olds fit comfortably in a molded ride position that allows for efficient riding, balanced h

In [23]:
def create_query_table(query, queries, encoded_queries, extra_params=None):
    """
    Creates a query table.
    """
    results_list = []
    for i, encoded_query in enumerate(encoded_queries):
        result_docs = (
            client.ft("idx:bikes_vss")
            .search(
                query,
                {"query_vector": np.array(encoded_query, dtype=np.float32).tobytes()}
                | (extra_params if extra_params else {}),
            )
            .docs
        )
        for doc in result_docs:
            vector_score = round(1 - float(doc.vector_score), 2)
            results_list.append(
                {
                    "query": queries[i],
                    "score": vector_score,
                    "id": doc.id,
                    "brand": doc.brand,
                    "model": doc.model,
                    "description": doc.description,
                }
            )

    # Optional: convert the table to Markdown using Pandas
    queries_table = pd.DataFrame(results_list)
    queries_table.sort_values(
        by=["query", "score"], ascending=[True, False], inplace=True
    )
    queries_table["query"] = queries_table.groupby("query")["query"].transform(
        lambda x: [x.iloc[0]] + [""] * (len(x) - 1)
    )
    queries_table["description"] = queries_table["description"].apply(
        lambda x: (x[:497] + "...") if len(x) > 500 else x
    )
    return queries_table.to_markdown(index=False)

In [26]:
query = (
    Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]")
    .sort_by("vector_score")
    .return_fields("vector_score", "id", "brand", "model", "description")
    .dialect(2)
)

table = create_query_table(query, queries, encoded_queries)
print(table)

| query                            |   score | id        | brand      | model                | description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|:---------------------------------|--------:|:----------|:-----------|:---------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------