# Simple RAG example using Facebook AI Similarity Search (FAISS)

In this example, we'll demonstrate how to use [**FAISS**](https://faiss.ai/) for similarity-based document retrieval. We will simulate a small **mock dataset** of fictional documents and use **Sentence Transformers** to encode them into vectors. We will then build an **FAISS index** to enable fast and efficient similarity search. Finally, we will simulate a **RAG** system where we retrieve the most relevant documents and use them to generate an answer.


## Steps Overview:
1. **Create a mock dataset**: We create a small set of fictional documents.
2. **Generate embeddings**: We use the **SentenceTransformer** model to convert these documents into vector embeddings.
3. **Build FAISS index**: We build a **FAISS index** that will store these vector embeddings for fast similarity search.
4. **Search and retrieve**: We perform a similarity search based on a query and retrieve the most relevant document(s).
5. **Answer generation**: Using the retrieved document(s), we simulate a **RAG pipeline** to generate a response using OpenAI compatible API Aitta provides.

This example demonstrates how **FAISS** can be used for efficient document retrieval, and how **RAG** can help generate contextually relevant answers from these documents.


In [None]:
import numpy as np
import openai
import faiss
from sentence_transformers import SentenceTransformer
from aitta_client import Model, Client, StaticAccessTokenSource

In [None]:
api_key = "<API-KEY"

In [None]:
# configure Client instance with API URL and access token
token_source = StaticAccessTokenSource(api_key)
aitta_client = Client("https://api-staging-aitta.2.rahtiapp.fi", token_source)

# load the LumiOpen/Poro-34B-chat model
poro_model = Model.load("LumiOpen/Poro-34B-chat", aitta_client)
print(poro_model.description)

# configure OpenAI client to use the Aitta OpenAI compatibility endpoints
client = openai.OpenAI(api_key=token_source.get_access_token(), base_url=poro_model.openai_api_url)


In [None]:
# Create a mock dataset as a list of "documents"
documents = ["Cacapapadadas are grey, 10cm long worms.",
"The moon is actually made of a soft cheese."]


In [None]:
from sentence_transformers import SentenceTransformer

#  Initialize the SentenceTransformer model as encoder and generate vector embeddings
encoder = SentenceTransformer("all-MiniLM-L6-v2")
vectors = encoder.encode(documents)
#type(vectors)

In [None]:
vectors.shape

In [None]:
# Build a FAISS index from vectors

# Determine the dimensionality of the vector embeddings
vector_dimension = vectors.shape[1]

# Initialize FAISS index using the Inner Product (IP) method for cosine similarity search
index = faiss.IndexFlatIP(vector_dimension)  # Using IP for cosine similarity search
# Alternatively, you could use IndexFlatL2 for Euclidean distance-based similarity


# Normalize the vectors for better performance in cosine similarity
faiss.normalize_L2(vectors)


# Add the vectors to the FAISS index
index.add(vectors)

# Check the type of the index to ensure it's properly created
type(index)


In [None]:
# Create a search vector

# Define the query text for searching in the FAISS index
search_text = 'What is the moon made of?'

# Convert the query text into an embedding (vector)
search_vector = encoder.encode(search_text)

# Convert the query embedding into a NumPy array and normalize it
search_vector = np.array([search_vector])
faiss.normalize_L2(search_vector)

# Perform a search in the FAISS index to find the most similar document
# We search for 'k' nearest neighbors (k=2 for the top 2 results)
k = index.ntotal  # We set k to the total number of documents to see how similar all are to the query
distances, indices = index.search(search_vector, k=k)  # Perform the search


# Print the distances and corresponding indices of the retrieved documents
print(distances)
print(indices)

In [None]:
# Print each of the retrieved documents along with their similarity distance
for i, idx in enumerate(indices[0]):
    print(f"Rank {i+1}:")
    print("Text:", documents[idx])  # Retrieve the document text by its index
    print("Distance:", distances[0][i])  # The distance represents similarity (lower means more similar)
    print("-" * 50)


In [None]:
input_query = "What are Cacapapadadas?"


# Embed the query
query_embedding = encoder.encode(input_query)

query_embedding = np.array([query_embedding]) # without this comed IndexError: tuple index out of range
faiss.normalize_L2(query_embedding)

# Perform similarity search on the FAISS index
k = 1  # Number of nearest neighbors to retrieve
distances, indices = index.search(query_embedding, k)

# Retrieve the document(s) corresponding to the top index
retrieved_documents = [documents[i] for i in indices[0]]
print(retrieved_documents)

# Retrieve the most similar document(s)
print("Most similar document index:", indices)
print("Distance:", distances)


# Prepare the prompt
prompt = f"Given the following document, answer the question:\n\nDocument: {retrieved_documents}\n\nQuestion: {input_query}\nAnswer:"

In [None]:
input_query

In [None]:
# Call the OpenAI API
response = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": prompt
        }
    ],
    model=poro_model.id,
    stream=False  # response streaming is currently not supported by Aitta, now you get the full response in one go
)

# Display the answer
answer = response.choices[0].message.content
print("Answer:", answer)

## LLM usage without RAG

Now, let's test how the model responds to the query without relying on an external data source.

In [None]:
input_query = "What are Cacapapadadas?"


# Call the OpenAI API
response = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": input_query
        }
    ],
    model=poro_model.id,
    stream=False  # response streaming is currently not supported by Aitta, now you get the full response in one go
)

# Display the answer
answer = response.choices[0].message.content
print("Answer:", answer)

## Did the model hallucinate? 

You may notice that the model generates a response based on patterns in the training data, which could be inaccurate. To reduce the chances of hallucination without utilizing RAG, we can try to provide a more specific prompt.

In [None]:
input_query = "What are Cacapapadadas?"

prompt = f"Answer the query only if you know the answer for sure. Query: {input_query}"

# Call the OpenAI API
response = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": prompt
        }
    ],
    model=poro_model.id,
    stream=False  # response streaming is currently not supported by Aitta, now you get the full response in one go
)

# Display the answer
answer = response.choices[0].message.content
print("Answer:", answer)

## Would you like to see a more advanced example?

Check out the [repository](https://github.com/shanshanwangcsc/simple_chatbot/tree/aitta_integration) on AITTA integration with the RAG pipeline for a simple chatbot implementation. This example uses ChromaDB for vector storage and retrieval, with the LLM being accessed through LangChain.