# Vector Similarity for RediSearch

This file contains the different capabilities RediSearch offers in the private preview build.
RediSearch vector similiarity capabilities are:
1. Realtime vector indexing
2. Realtime vector update
3. Realtime vector deletion
4. TOP-K query

## Indexing capabilities
In private preview build there are two types of indexing methods supported and three types of distance metrics:

### Index algorithms
1. Brute force (Flat Index)
2. HNSW (Hierarchical Navigable Small World)

### Distance metrics
1. L2 - Euclidean distance between two vectors
2. IP - Internal product of two vectors
3. COSINE - Cosine similarity of two vectors

## Creating an index
In order to create a vector index, the index creation command `FT.CREATE` should be invoked over the vector field name with the new reserved word `VECTOR`

Command format:
```
FT.CREATE <index_name> SCHEMA <vector field name> VECTOR <type> <dimension> <distance metric> <index algorithm> <algorithm parameters>
```
### General indexing mandatory parameters
Parameters should be given to the index build command in the following order
* type - vector data type -  - Currently only `FLOAT32` is supported
* dimension - vector dimension.
* distance metric - either `L2` for euclidean distance, `IP` for internal product or `COSINE` for cosine similarity should be provided. Note, when `COSINE` is selected the indexed vectors will be normalized upon indexing, and the query vector will be normalized upon query.
* Indexing algorithm - either `BF` for brute force or `HNSW` for HNSW indexing algorithm

### Brute force (Flat index)
This index compares the entire indexed vector data to the query vector and returns the top-k similar vectors, according to the given distance metric.

#### Index specific parameters
* `INITIAL_CAP` - initial index capacity (number of vectors). This will make the index pre-allocate space for the intended vector, so no additional allocations will happen while indexing (Optimization)

An example for creating a brute force index, with initial capacity of 1 million vectors of 128 float, using L2 distance metric

```
FT.CREATE my_flat_index SCHEMA my_vector_field VECTOR FLOAT32 128 L2 BF INITIAL_CAP 1000000
```

### HNSW
This index algorithm is a modified version of [nmslib/hnswlib](https://github.com/nmslib/hnswlib) which is the author's implementation of [Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs](https://arxiv.org/ftp/arxiv/papers/1603/1603.09320.pdf)

#### Index specific parameters
* `INITIAL_CAP` - initial index capacity (number of vectors). This will make the index pre-allocate space for the intended vector, so no additional allocations will happen while indexing (Optimization)
* `M` - maximum number of outbound connections in the graph.
* `EF` -  Maximum number of potential candidates to connect while building the graph.

An example for creating HNSW index, with initial capacity of 1 million vectors of 128 float, using L2 distance metric, `ef` is 200 and `M` is 40

```
FT.CREATE my_hnsw_index SCHEMA my_vector_field VECTOR FLOAT32 128 L2 HNSW INITIAL_CAP 1000000 M 40 EF 200
```

## Query
In order to execute top-k vector query the search command `FT.SEARCH` should be invoked with the vector blob as a parameter

Command format:
```
FT.SEARCH <index name> "@<vector field name>:[$<vector blob parameter name> TOPK <k>]" RETURN 1 <vector field name>_score SORTBY <vector field name>_score LIMIT 0 <k>  PARAMS 2 <vector blob parameter name> <vector blob>
```

### Query tuning parameters
#### HNSW
* `EFRUNTIME` - Maximum number of potential top-k candidates to collect while querying the graph. `EFRUNTIME` should be greater or equal to `K`

An example for top-10 query over HNSW indexed dataset with `EFRUNTIME` equals 150

```
FT.SEARCH my_hnsw_index "@my_vector_field:[$vec TOPK 10] => {$EFRUNTIME:150}" RETURN 1 my_vector_field_score SORTBY my_vector_field_score LIMIT 0 10 PARAMS 2 vec <vector blob>
```

## Python examples

### Packages

In [None]:
!pip install git+https://github.com/RediSearch/redisearch-py.git@params
!pip install numpy

In [4]:
import numpy as np
from redis import Redis
import redisearch

Create Redis Client

In [None]:
host = "localhost"
port = 6379

redis_conn = Redis(host = host, port = port)

In [8]:
n_vec = 1000000
dim = 128
M = 40
EF = 200
vector_field_name = "vector"
k = 10

In [None]:
def load_vectors(client : Redis, n, d,  field_name):
    for i in range(n):
        np_vector = np.random.rand(1, d).astype(np.float32)
        client.hset(i, field_name, np_vector.tobytes())
        
def delete_data(client: Redis):
    client.flushall()
        

### Brute Force

In [7]:
# build index
bf_index = redisearch.Client("my_flat_index", conn=rredis_conn)
bf_index.redis.execute_command("FT.CREATE", "my_flat_index", "SCEHMA", vector_field_name, "VECTOR", "FLOAT32", dim, "L2", "BF", "INITIAL_CAP", n_vec)
#load vectors
load_vectors(bf_index.redis, n_vec, d, vector_field_name)
#query
query_vector =  np.random.rand(1, d).astype(np.float32)
q = redisearch.Query(f'@{vector_field_name}:[$vec_param TOPK {k}]').sort_by(f'{vector_field_name}_score').paging(0,k).return_field(f'{vector_field_name}_score')
res = redisearch.search(q, query_params = {'vec_param': query_vector.tobytes()})
docs = [int(doc.id) for doc in res.docs]
rs_dists = [float(doc.vector_score) for doc in res.docs]
print(docs)
print(rs_dists)
#cleanup
delete_data(bf_index.redis)

### HNSW

In [9]:
# build index
hnsw_index = redisearch.Client("my_hnsw_index")
hnsw_index.redis.execute_command("FT.CREATE", "my_hnsw_index", "SCEHMA", vector_field_name, "VECTOR", "FLOAT32", dim, "L2", "HNSW", "INITIAL_CAP", n_vec, "M", M, "EF", EF)
#load vectors
load_vectors(hnsw_index.redis, n_vec, d, vector_field_name)
#query
query_vector =  np.random.rand(1, d).astype(np.float32)
q = redisearch.Query(f'@{vector_field_name}:[$vec_param TOPK {k}]  => {{$EFRUNTIME : {hnsw_EFRUNTIME}}}').sort_by(f'{vector_field_name}_score').paging(0,k).return_field(f'{vector_field_name}_score')
res = redisearch.search(q, query_params = {'vec_param': query_vector.tobytes()})
docs = [int(doc.id) for doc in res.docs]
rs_dists = [float(doc.vector_score) for doc in res.docs]
print(docs)
print(rs_dists)
print(results)
#cleanup
delete_data(hnsw_index.redis)

ConnectionError: Error 111 connecting to localhost:6379. Connection refused.