# Simple Encrypted RAG with enVector

In this tutorial, we will walk through the steps to use the enVector SDK for Encrypted Retrieval-Augmented Generation (Encrypted RAG) using fully homomorphic encryption (FHE).

## Import SDK

First, you should install and import the `es2` package to use enVector Python APIs.
Before installing, make sure you have Python 3.12 and a virtual environment on your system.

In [None]:
import es2

## Initialize

To use the enVector service, initialization is required. 

The following initialization step includes establishing a connection to the enVector server and configuring cryptographic settings necessary for vector search.

In [None]:
es2.init(
    address="localhost:50050",
    # access_token="...", # if needed
    key_path="./keys",
    key_id="rag_key_id",
)

## Prepare Data

### Prepare Plaintext Vectors

To perform RAG, we need to prepare the plaintext text embedding vectors.

Note that these vectors should be normalized for the identification metric, cosine similarity. This is just one example of text embedding that uses sentence-transformers, you can also use your own embedding model to generate vectors from your text dataset.

In [None]:
from typing import List, Union
from sentence_transformers import SentenceTransformer
import numpy as np

# 1. Load a pretrained Sentence Transformer model
model = SentenceTransformer("all-MiniLM-L6-v2")

# 2. Calculate embeddings by calling model.encode()
def get_embedding(texts: Union[str, List[str]], dim=None) -> np.ndarray:
    BATCH_SIZE=128
    if dim is None:
        dim = model.get_sentence_embedding_dimension()
    if isinstance(texts, list):
        embeddings = np.empty((0, dim))
        for i in range(0, len(texts), BATCH_SIZE):
            batch_texts = texts[i : i + BATCH_SIZE]
            batch_embeddings = model.encode(batch_texts)
            embeddings = np.vstack([embeddings, batch_embeddings])
        return embeddings
    else:
        return model.encode(texts)

In [None]:
# Prepare vectors to be indexed
texts = [
    "The capital of France is Paris.",
    "The capital of Germany is Berlin.",
    "The capital of Italy is Rome.",
    "The capital of Canada is Ottawa.",
    "The capital of South Korea is Seoul.",
]

# Get embeddings
vectors = get_embedding(texts)
dim = vectors.shape[1]

print(f"Vector Dimension: {dim}")
print(f"Number of Vectors: {vectors.shape[0]}")

## Create Index and Insert Data

For encrypted similarity search, we first prepare a vector index, called `Index`, to store encrypted vectors and their metadata in the enVector system.
An index is defined by its name and the dimensionality of the vectors it will store.
The dimensionality must match the size of the vectors you plan to insert.
This step ensures the index is properly configured to handle your data.

Once the index is ready, you can insert data into it.
This first encrypts the vectors using the generated encryption keys and inserts them into the created index.
The data to be inserted can be in the form of vectors and associated metadata. 
The metadata can provide additional context or information about the vectors, such as their source or relevance.
Each vector should match the dimensionality specified during index creation.

Additionally, metadata can be attached to each vector to provide context or additional information. 
This step is essential for RAG.


In [None]:
index = es2.create_index("rag_index", dim=dim)

In [None]:
index.insert(vectors, metadata=texts)

## Encrypted Similarity Search

### Prepare query

First, prepare a query for encrypted search.

In [None]:
query_text = "What is the capital of France?"

query_vector = get_embedding(query_text)

### Encrypted search on the index

Let's perform an encrypted similarity search for encrypted RAG. 

Once the encrypted vector index and encrypted query vectors are ready, we can perform a similarity search on encrypted data without decrypting it.
The `index` object contains the decryption key, enabling the enVector server to return encrypted scores. 
These scores are decrypted by the client to retrieve the top-k relevant results along with their indices.
After identifying the indices by decryption and top-k selection, we retrieve the encrypted documents and decrypt them to obtain the plaintext.

This process ensures secure and efficient similarity search operations, even when working with encrypted data.

In [None]:
result = index.search(query_vector, top_k=1, output_fields=["metadata"])[0]
result

### Generate Answers with Retrieval-augmented Context

Once the decrypted documents are retrieved, we can use an LLM (e.g. OpenAI's GPT) to generate answers based on the retrieved documents.

In this example, we use the gpt-oss model running locally with ollama.

In [None]:
retrieved_docs = [res["metadata"] for res in result]
retrieved_docs

In [None]:
import requests

def generate_answer(docs, query, model="gpt-oss"):
    instruction = "You are an assistant that answers questions based on the provided documents."
    prompt = f"""{instruction}:\n\n[Documents]\n"""
    for doc in docs:
        prompt += f"- {doc}\n"
    prompt += f"\n[Question]\n{query}\n[Answer]\n"

    response = requests.post(
        "http://localhost:11434/api/chat",
        json={
            "model": model,
            "messages": [
                {"role": "system", "content": instruction},
                {"role": "user", "content": prompt}
            ],
            "stream": False
        }
    )
    response.raise_for_status()
    return response.json()["message"]["content"].strip()

In [None]:
# Example usage
answer = generate_answer(retrieved_docs, query_text)
print(f"Generated Answer: \n{answer}")

### Clean Up

We can delete the created index and the registered key when they are no longer needed.

In [None]:
es2.drop_index("rag_index")

In [None]:
es2.delete_key("rag_key_id")