# Getting Started with RedisVL

RedisVL is a Python library with a dedicated CLI to help load and create vector search indices within Redis. While 

This notebook will walk through
1. Preparing a dataset with vectors.
2. Writing data schema for ``redis``
3. Loading the data and creating a vector search index
4. Performing queries

Before running this notebook, be sure to
1. Have installed ``redisvl`` and have that environment active for this notebook.
2. Have a running Redis instance with RediSearch > 2.4 running.

## Data Preparation

For this example, we will use the following overly simplified dataset


In [1]:
import os

# set redis address
username = "default"
host = "<enter your redis host here>"
port = "<enter your redis port here>"
password = "<enter your redis password here>"


REDIS_URL = f"redis://{username}:{password}@{host}:{port}"
os.environ["REDIS_URL"] = REDIS_URL

In [2]:
import os

import numpy as np
from jupyterutils import table_print

data = [
    {'user': 'john', 'age': 1, 'job': 'engineer', 'credit_score': 'high'},
    {'user': 'mary', 'age': 2, 'job': 'doctor', 'credit_score': 'low'},
    {'user': 'joe', 'age': 3, 'job': 'dentist', 'credit_score': 'medium'}
]

This will make up 3 entries in Redis (hashes) each with 4 sub-keys (users, age, job, credit_score).

Now, we want to add vectors to represent each user. These are just dummy vectors to illustrate the point, but more complex vectors can be created and used as well. For more information on creating embeddings, see this [article](https://mlops.community/vector-similarity-search-from-basics-to-production/).

As seen below, the sample vectors need to be turned into bytes before they can be loaded into Redis. Using ``NumPy``, this is fairly trivial.

In [3]:
# converted to bytes for redis
vectors = [
    np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes(),
    np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes(),
    np.array([0.9, 0.9, 0.1], dtype=np.float32).tobytes(),
]

for record, vector in zip(data, vectors):
    record["user_embedding"] = vector

table_print(data)

user,age,job,credit_score,user_embedding
john,1,engineer,high,b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'
mary,2,doctor,low,b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'
joe,3,dentist,medium,b'fff?fff?\xcd\xcc\xcc='



Our dataset is now ready to be used with ``redisvl``. The next step is to define the schema for the data.

## Define Index Schema

In order for ``redisvl`` to be flexible for many types of data, it uses a schema specified in either a python dictionary or a yaml file. There are a couple main components

1. index specification
2. field specification

The index specification determines how data will be stored in Redis. This includes
- ``name``: the name of the index
- (Optional) ``prefix``: key prefix for each loaded entry

The field specification determines what fields within the dataset will be available for queries. Each field corresponds to the name of a **column** within the dataset. The values within each specified column are arguments for the creation of that index that correspond directly to ``redis-py`` arguments.

So for example, given the above dataset, the following schema can be used.


```yaml

index:
  name: user_index
  prefix: v1

fields:
    # define tag fields
    tag:
        - name: user
        - name: job
        - name: credit_store
    # define numeric fields
    numeric:
        - name: age
    # define vector fields
    vector:
        - name: user_embedding
          algorithm: hnsw
          distance_metric: cosine

```

In Python, this looks like:

In [4]:
schema = {
    "index": {
        "name": "user_index",
        "prefix": "v1",
    },
    "fields": {
        "tag": [{"name": "credit_score"}],
        "text": [{"name": "job"}],
        "numeric": [{"name": "age"}],
        "vector": [{
                "name": "user_embedding",
                "dims": 3,
                "distance_metric": "cosine",
                "algorithm": "flat",
                "datatype": "float32"}
        ]
    },
}


## Create a ``SearchIndex``

With the data and the index schema defined, we can now use ``redisvl`` as a library to create a search index as follows.

Note that at this point, the index will have no entries. With Redis, this is fine as new entries from this index (or that follow the schema) will automatically be indexed in the background in Redis.

In [5]:
from redisvl.index import SearchIndex

# construct a search index from the schema
index = SearchIndex.from_dict(schema) # or SearchIndex.from_yaml("schema.yaml") for yaml files

# connect to local redis instance
index.connect(REDIS_URL)

# create the index (no data yet)
index.create(overwrite=True)

In [6]:
# use the CLI to see the created index
!rvl index listall

[32m01:26:16[0m [34m[RedisVL][0m [1;30mINFO[0m   Using Redis address from environment variable, REDIS_URL
[32m01:26:16[0m [34m[RedisVL][0m [1;30mINFO[0m   Indices:
[32m01:26:16[0m [34m[RedisVL][0m [1;30mINFO[0m   1. user_index


In [7]:
# use the CLI to print fields in the index
!rvl index info -i user_index

[32m01:26:17[0m [34m[RedisVL][0m [1;30mINFO[0m   Using Redis address from environment variable, REDIS_URL


Index Information:
╭──────────────┬────────────────┬────────────┬─────────────────┬────────────╮
│ Index Name   │ Storage Type   │ Prefixes   │ Index Options   │   Indexing │
├──────────────┼────────────────┼────────────┼─────────────────┼────────────┤
│ user_index   │ HASH           │ ['v1']     │ []              │          0 │
╰──────────────┴────────────────┴────────────┴─────────────────┴────────────╯
Index Fields:
╭────────────────┬────────────────┬─────────┬────────────────┬────────────────╮
│ Name           │ Attribute      │ Type    │ Field Option   │ Option Value   │
├────────────────┼────────────────┼─────────┼────────────────┼────────────────┤
│ credit_score   │ credit_score   │ TAG     │ SEPARATOR      │ ,              │
│ job            │ job            │ TEXT    │ WEIGHT         │ 1              │
│ age            │ age            │ NUMERIC │                │ 

## Load Data into the Index

Now that an index exists, data can be loaded into redis through the ``SearchIndex.load()`` function

In [8]:
# load expects an iterable of dictionaries
index.load(data)

## Executing Queries

Next we will run a vector query on our newly populated index. This example will use a simple vector to demonstrate how vector similarity works. Vectors in production will be much larger than 3 floats and often require Machine Learning models (i.e. Huggingface sentence transformers) or an embeddings API (Cohere, OpenAI) to create.

In [9]:
from redisvl.query import VectorQuery
from jupyterutils import result_print

# create a vector query returning a number of results
# with specific fields to return.
query = VectorQuery(
    vector=[0.1, 0.1, 0.5],
    vector_field_name="user_embedding",
    return_fields=["user", "age", "job", "credit_score", "vector_distance"],
    num_results=3
)

# use the SearchIndex instance (or Redis client) to execute the query
results = index.query(query)
result_print(results)

vector_distance,user,age,job,credit_score
0.0,john,1,engineer,high
0.0,mary,2,doctor,low
0.653301358223,joe,3,dentist,medium


## Connecting to an Existing Index

If you have an existing index, you can connect to it using the ``SearchIndex.from_existing()`` function. This will return a ``SearchIndex`` object that can be used to execute queries.


In [10]:
# create a new SearchIndex instance from an existing index
existing_index = SearchIndex.from_existing("user_index", REDIS_URL)

# run the same query
results = existing_index.query(query)
result_print(results)

vector_distance,user,age,job,credit_score
0.0,john,1,engineer,high
0.0,mary,2,doctor,low
0.653301358223,joe,3,dentist,medium


## Asynchronous Search

The AsyncSearchIndex class allows for queries, index creation, and data loading to be done asynchronously. This is useful for large datasets that may take a long time to load into Redis, for queries that may take a long time to execute, or for asynchronous applications that need to execute queries in the background like a FastAPI application.

In [11]:
## Asynchronous Search
from redisvl.index import AsyncSearchIndex

# construct a search index from the schema
index = AsyncSearchIndex.from_dict(schema)

# connect to local redis instance
index.connect("redis://localhost:6379")

# create the index (no data yet)
await index.create(overwrite=True)

# load the data again but asynchronously
await index.load(data)

# run the same vector query but asynchronously
results = await index.query(query)
result_print(results)

vector_distance,user,age,job,credit_score
0.0,john,1,engineer,high
0.0,mary,2,doctor,low
0.653301358223,joe,3,dentist,medium


In [12]:
# We can also use the CLI to check the stats for the index we just used
!rvl stats -i user_index

[32m01:26:18[0m [34m[RedisVL][0m [1;30mINFO[0m   Using Redis address from environment variable, REDIS_URL

Statistics:
╭─────────────────────────────┬─────────────╮
│ Stat Key                    │ Value       │
├─────────────────────────────┼─────────────┤
│ num_docs                    │ 3           │
│ num_terms                   │ 4           │
│ max_doc_id                  │ 3           │
│ num_records                 │ 13          │
│ percent_indexed             │ 1           │
│ hash_indexing_failures      │ 0           │
│ number_of_uses              │ 8           │
│ bytes_per_record_avg        │ 2.46154     │
│ doc_table_size_mb           │ 0.000294685 │
│ inverted_sz_mb              │ 3.05176e-05 │
│ key_table_size_mb           │ 0.000110626 │
│ offset_bits_per_record_avg  │ 8           │
│ offset_vectors_sz_mb        │ 3.8147e-06  │
│ offsets_per_term_avg        │ 0.307692    │
│ records_per_doc_avg         │ 4.33333     │
│ sortable_values_size_mb     │ 0           │
│

In [13]:
# clean up the index
await index.delete()