# ANN with enVector

This example demonstrates how ANN (Approximate Nearest Neighbor) can be performed with enVector, focusing on IVF-FLAT.

IVF-FLAT is one of the most popular indexing algorithms to enhance search efficiency.
The enVector supports the IVF (Inverted File) with FLAT algorithm for encrypted similarity search.
Note that the ciphertexts are given to the enVector server; no indexing algorithms are available to find the centroids for ANN on the server side in general.
Therefore, enVector SDK creates IVF clusters on the client side before encryption, enabling it to find the nearest clusters before insertion and search.
This step ensures there is no information exposure when finding the nearest neighbors.

## Import SDK

Before we start, we should install and import the `es2` package to use enVector Python APIs. 

In [None]:
import es2

## Prepare Dataset

First, we generate a large sample dataset for test purposes.

In [None]:
import numpy as np

num_vectors = 1_000_000
dim = 1536

vectors = np.random.rand(num_vectors, dim).astype(np.float32)
vectors = vectors / np.linalg.norm(vectors, axis=1, keepdims=True)

### Create Centroids

To perform IVF-FLAT, we train the centroids with the given dataset by running k-means clustering.
The `n_lists` parameter indicates the number of clusters in an IVF index, determining how many centroids the dataset is partitioned into.

In [None]:
from sklearn.cluster import KMeans 
# for GPU acceleration, we can use cuml.cluster.KMeans

n_lists = 500

kmeans = KMeans(n_clusters=n_lists, n_init=1)
kmeans.fit(vectors)
centroids = kmeans.cluster_centers_.copy()

## Vector Search

### 1. Initialize

To use the enVector service, initialization is required to connect to the server and register keys.

In [None]:
es2.init(
    address="localhost:50050",
    access_token="...",
    key_path="./keys",
    key_id="example_key",
    auto_key_setup=True,
)

### 2. Create Index

For ANN, we create an index with specified indexing parameters.
For IVF-FLAT, `n_list` and `n_probe` should be specified, and `centroids` registered on the client side.

In [None]:
# Configure index parameters
index_params = {
    "index_type": "IVF_FLAT",
    "nlist": n_lists,
    "default_nprobe": 1,
    "centroids": centroids.tolist()
}

# Create index
index = es2.create_index(
    "ivf_flat_index", 
    dim=dim,
    index_params=index_params
)

### 3. Insert Data

To populate the index, we need to insert data into the created index. 
Note that the vectors to be inserted are assigned to the nearest cluster by the client-side calculations during the insertion step.

In [None]:
metadata = [f"Metadata-{i+1}" for i in range(num_vectors)]
index.insert(vectors, metadata)

### 4. Encrypted Similarity Search

To perform a similarity search with ANN, we send a query vector to find the most similar vectors in the index.
To perform IVF-FLAT, we set search parameters (`search_params`), including `nprobe`, which indicates how many nearest clusters to search.
`nprobe` can differ from the `default_nprobe` specified when creating the index.
Note that the clusters are determined for the query vector by client-side calculations before sending the query to the server.

In [None]:
query = vectors[0]

results = index.search(
    query, 
    top_k=3, 
    output_fields=["metadata"],
    search_params={"nprobe": 16}
)
print(results)

### Clean Up

We can delete the created index and the registered key when they are no longer needed.

In [None]:
es2.drop_index("ivf_flat_index")

In [None]:
es2.delete_key("example_key")