# Building a Basic RAG Chatbot with FAISS and Microsoft Phi3

This notebook demonstrates how to build a basic Retrieval-Augmented Generation (RAG) chatbot using FAISS for document retrieval and Microsoft Phi3 for generating responses. This guide will help you set up the environment, index documents, and create a web-based chatbot.

## Step 1: Set up Conda Environment and Install Dependencies

Create a new Conda environment and install the required libraries by running the following commands in your terminal. FAISS setup instructions are available at https://github.com/facebookresearch/faiss/blob/main/INSTALL.md Please pay attention to GPU setup. If you have CPU only then execute CPU only command. Perfroamnce of CPU only setup will not be as good as GPU setup.

In [5]:
# Define the environment name
$envName = "BasicRAGChatBot"

# Check if the environment already exists
$envExists = conda info --envs | Select-String $envName

if ($envExists) {
    Write-Output "Environment '$envName' already exists."
} else {
    Write-Output "Creating environment '$envName'..."
    conda create -n $envName python=3.9 -y
}

# Activate the environment
conda activate $envName


# CPU-only version
conda install -c pytorch faiss-cpu=1.9.0

# GPU(+CPU) version
#$ conda install -c pytorch -c nvidia faiss-gpu=1.9.0

# GPU(+CPU) version with NVIDIA RAFT

#Install dependencies required for Tunning Phi 3.5
pip install onnxruntime fastapi uvicorn
pip install -q faiss-cpu flask transformers torch

SyntaxError: invalid syntax (21785987.py, line 2)

I downloaded microsoft/Phi-3-mini-128k-instruct-onnx as I am having CPU only laptop. if you have followed instructions, Let us quickly test 

In [None]:
from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:5272/v1/", 
    api_key="x" # required for the API but not used
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "what is the golden ratio?",
        }
    ],
    model="Phi-3-mini-4k-cuda-int4-onnx",
)

print(chat_completion.choices[0].message.content)

You will also need to setup Phi3. Easiest way I found to use Phi3 is to use AI Toolkit for Visual Studio Code Detailed instructions at the time of writing this notebook are available at https://github.com/microsoft/vscode-ai-toolkit Again, you need to pay attention whether you are going for CPU only version or GPU version.

## Step 2: Import Libraries

Import necessary libraries for the notebook. .

In [None]:
import faiss
from flask import Flask, request, jsonify
import numpy as np
import sys

## Step 3: Create FAISS Index

Use FAISS to index a document file. We'll use `documentation.txt` (assuming it's in the same directory) as a sample document. This text is created on the basis of public SAP rule engine documentation available at https://help.sap.com/docs/SAP_COMMERCE/9d346683b0084da2938be8a285c0c27a/ba076fa614e549309578fba7159fe628.html

First, we will create a function which will read documentation.txt and create index on the basis of text available in documentation.txt

In [None]:
import faiss
import os
import numpy as np
from sentence_transformers import SentenceTransformer


def create_faiss_index(file_path):
    # Step 1: Read the content from the text file    
    with open(file_path, 'r', encoding='utf-8') as file:
        texts = [line.strip() for line in file if line.strip()]

    # Step 2: Initialize the embedding model
    class EmbeddingModel:
        _model_instance = None

        @classmethod
        def get_model(cls):
            if cls._model_instance is None:
                cls._model_instance = SentenceTransformer('all-MiniLM-L6-v2')  # Load the model only once
                print("Model loaded successfully.")
            return cls._model_instance

    # Step 3: Generate embeddings for each text
    model = EmbeddingModel.get_model()
    embeddings = model.encode(texts)

    # Step 4: Convert embeddings to numpy array
    embeddings_np = np.array(embeddings).astype('float32')  # FAISS requires float32 format

    # Step 5: Set up FAISS index
    embedding_dim = embeddings_np.shape[1]  # Dimensionality of embeddings
    index = faiss.IndexFlatL2(embedding_dim)  # L2 distance for similarity search

    # Step 6: Add embeddings to the FAISS index
    index.add(embeddings_np)
    print(f"Indexed {index.ntotal} documents into FAISS")

    # Step 7: Save the index to disk (optional, for persistence)
    faiss.write_index(index, 'faiss_index.bin')

This section will call the above function and create vector DB index using FAISS.

In [None]:
# Creating the FAISS Index
faiss_index = faiss_impl.create_faiss_index("documentation.txt")
print(f"FAISS index created successfully: {faiss_index}")

## Step 4: Test Querying the FAISS Index

To test the retrieval function, use `search_faiss_index` to query the index with a sample question.

In [None]:
# Sample query to test the FAISS index
query = "Explain the rewards feature in detail"
search_results = faiss_impl.search_faiss_index(query)
print("Top 3 search results:", search_results)

## Step 5: Prompt Engineering

Define the prompt for the chatbot to generate responses as a 'Rewards Feature Expert'.

In [None]:
def process_question(response):
    result = faiss_impl.search_faiss_index(response)
    prompt = (
        f"""
        [Context: Rewards Feature]
        [Role: Reward feature expert]
        Assume you are a helpful assistant for Reward feature.
        Please do not hallucinate.
        Say 'I don't know' if you don't know the answer.
        Considering all given above, analyze reward feature and summarize following : {result}
        """
    )
    return result + "\n\n" + generate_response('tunedModels/rewardchatbot', prompt=prompt)

## Step 6: Generate Responses with Language Model

Define the `generate_response` function to connect with Microsoft Phi3 model (or any offline LLM).

In [None]:
class GenerativeModel:
    def __init__(self, model_name):
        self.model_name = model_name
    
    def generate_content(self, prompt):
        return type('GeneratedContent', (object,), {'text': f"Generated response for: {prompt}"})

def generate_response(model_name, prompt):
    model = GenerativeModel(model_name)
    model_output = model.generate_content(prompt)
    return model_output.text

In [None]:
# Test generating a response
test_response = process_question("What is the purpose of the rewards feature?")
print("Generated Response:", test_response)

## Step 7: Create the Flask API

Set up a Flask API to serve as an endpoint for the chatbot, allowing external applications to query it.

In [None]:
app = Flask(__name__)

@app.route('/chat', methods=['POST'])
def chat():
    if not request.is_json:
        return jsonify({"error": "Request must be JSON formatted"}), 400
    data = request.get_json()
    response = data.get('query')
    answer = process_question(response)
    return jsonify({"response": answer})

To start the Flask app, save this notebook and run the following command in a terminal:
```bash
flask run --host=0.0.0.0 --port=5000
```


## Conclusion

You now have a fully functional offline RAG chatbot using FAISS and Microsoft Phi3! This setup demonstrates how to retrieve relevant information from documents and respond with a locally hosted language model, ideal for customer support or FAQ applications.