# Encrypted RAG with ES2

In this tutorial, we will walk through the steps to use the ES2 SDK for Encrypted Retrieval-Augmented Generation (Encrypted RAG) using fully homomorphic encryption (FHE).

## Import ES2

To use the ES2 SDK, you need to install it first. Before installing, make sure you have conda installed on your system. For more details, see `SDK installation` section in `Get Started`. After installation, you can import the ES2 SDK in your Python code.

In [None]:
# !pip install es2

In [None]:
import es2

## Initialize ES2

To use the ES2 service, initialization is required. 

Initialization step includes 1) establishing a connection to the ES2 server, 2) configuring Crypto settings necessary for vector search, and 3) registering evaluation keys for enabling ES2 server to perform secure operations.

You can set the path and ID of the key for data encryption, presets for operations, query encryption, database encryption, and index type.

In [None]:
es2.init(
    host="localhost",
    port=50050,
    key_path="./keys",
    key_id="rag_key_id",
)

## Prepare Data

### Prepare Plaintext Vectors

To perform RAG, we need to prepare the plaintext text embedding vectors. Note that these vectors should be normalized for identification metric, cosine similarity. This is just one example of text embedding; you can also use your own embedding model to generate vectors from your text dataset.

In [None]:
# !pip install openai

In [None]:
# import os
# os.environ["OPENAI_API_KEY"] = "sk-***"

In [None]:
import os
import numpy as np
from openai import OpenAI

OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")

client = OpenAI(api_key=OPENAI_API_KEY)

# embedding function
def get_embedding(text, model="text-embedding-3-small"):
    text = text.replace("\n", " ")
    embedding = client.embeddings.create(input=[text], model=model).data[0].embedding
    vec = np.array(embedding, dtype=np.float32)
    norm = np.linalg.norm(vec)
    if norm != 0:
        vec /= norm
    return vec

# Prepare vectors to be indexed
db_text = [
    "The capital of France is Paris.",
    "The capital of Germany is Berlin.",
    "The capital of Italy is Rome.",
    "The capital of Canada is Ottawa.",
    "The capital of South Korea is Seoul.",
]

# get embeddings
db_vectors = np.stack([get_embedding(txt) for txt in db_text])
dim = db_vectors.shape[1]

print(f"Vector Dimension: {dim}")
print(f"Number of Vectors: {db_vectors.shape[0]}")

## Create Index and Insert Data

For encrypted similarity search, we first prepare a vector index, called `Index`, to store encrypted vectors and their metadata in the ES2 system.
An index is defined by its name and the dimensionality of the vectors it will store.
The dimensionality must match the size of the vectors you plan to insert.
This step ensures the database is properly configured to handle your data.

If the index is ready, you can insert data into it.
This first **encrypts the vectors** using the generated encryption keys and **inserts** them into the index in the created ES2.
The data to be inserted can be in the form of vectors and associated metadata. 
The metadata can provide additional context or information about the vectors, such as their source or relevance.
Each vector should match the dimensionality specified during index creation.

Additionally, metadata can be attached to each vector to provide context or additional information. 
This step is essential for RAG.


In [None]:
index = es2.create_index("rag_index", dim=dim)

In [None]:
index.insert(db_vectors, metadata=db_text)

## Encrypted Similarity Search

### Prepare query

First, prepare query for encrypted search.

In [None]:
query_text = "What is the capital of France?"

query_vector = get_embedding(query_text)

### Encrypted search on the index

Let's perform encrypted similarity search for encrypted RAG. 

Once all the encrypted vector index and encrypted query vectors are ready, we can now perform a similarity search on encrypted data without decrypting the data.
The `index` object contains the decryption key, enabling the ES2 server to return encrypted scores. 
These scores are decrypted by the client to retrieve the top-`k` relevant results along with their indices.
After identifying the indices by decryption and top-k selection, we retrieve the encrypted documents and decrypt them to obtain the plaintext.

This process ensures secure and efficient similarity search operations, even when working with encrypted data.

In [None]:
result = index.search(query_vector, top_k=1, output_fields=["metadata"])[0]
result

### Generate Answers with Retrieval-augmented Context

Once the decrypted documents are retrieved, we can use LLM (e.g. OpenAI's GPT) to generate answers based on the retrieved documents.

In [None]:
retrieved_docs = [res["metadata"] for res in result]

In [None]:
def generate_answer(docs, query, model="gpt-4"):
    instruction = "You are an assistant that answers questions based on the provided documents."
    prompt = f"""{instruction}:\n\n[Documents]\n"""
    for doc in docs:
        prompt += f"- {doc}\n"
    prompt += f"\n[Question]\n{query}\n[Answer]\n"

    response = client.chat.completions.create(
        model=model,  # Chat model
        messages=[
            {"role": "system", "content": instruction},
            {"role": "user", "content": prompt}
        ],
        max_tokens=128,
        temperature=0
    )
    return response.choices[0].message.content.strip()

answer = generate_answer(retrieved_docs, query_text)
print(f"Generated Answer: \n{answer}")

### Clean Up

In [None]:
es2.drop_index("rag_index")

In [None]:
es2.release_key("rag_key_id")