# Orchestrate a RAG system

In this notebook, you'll ingest and preprocess data, create embeddings, and build a FAISS index, ultimately enabling you to implement a RAG system effectively.

## Before you start

Install the necessary libraries:

In [None]:
! pip install langchain openai faiss-cpu transformers nltk pandas

Now you need to define the values that will be used when submitting a chat completion request through the API endpoint. 

In [2]:
# Define the base URL for your Azure OpenAI Service endpoint
# Replace 'Your Azure OpenAI Service Endpoint' with your actual endpoint URL obtained previously
api_base = 'Your Azure OpenAI Service Endpoint'

# Define the API key for your Azure OpenAI Service
# Replace 'Your Azure OpenAI Service API Key' with your actual API key obtained previously
api_key = 'Your Azure OpenAI Service API Key'

# Define the names of the models deployed in your Azure OpenAI Service
model_name = 'gpt-4'

# Define the API version to use for the Azure OpenAI Service
api_version = '2024-08-01-preview'


Next, you need to load your dataset into the notebook and preprocess it. Then, create embeddings for each document and build a FAISS index for efficient similarity search:

In [3]:
import pandas as pd
import re
from nltk.corpus import stopwords
from langchain_openai import AzureOpenAIEmbeddings
from langchain.vectorstores import FAISS
import nltk

# Load your data
data = pd.read_csv('app_hotel_reviews.csv')
documents = data['User Reviews'].tolist()

# Preprocess the text
nltk.download('stopwords')
def preprocess_text(text):
    text = re.sub(r'\W', ' ', text)
    text = re.sub(r'\d', ' ', text)
    text = text.lower()
    stop_words = set(stopwords.words('english'))
    text = ' '.join([word for word in text.split() if word not in stop_words])
    return text

cleaned_documents = [preprocess_text(doc) for doc in documents]

# Create embeddings
embeddings = AzureOpenAIEmbeddings(
    deployment="text-embedding-ada-002",
    model="text-embedding-ada-002",
    azure_endpoint=api_base,
    openai_api_key=api_key,
    chunk_size=1
)
document_embeddings = [embeddings.embed(doc) for doc in cleaned_documents]

# Build the FAISS index
faiss_index = FAISS(document_embeddings)

Now you'll create an instance of the AzureOpenAI client to interact with your Azure OpenAI Service and obtain the chat completion response.

In [8]:
# Import the AzureOpenAI class from the openai library
from openai import AzureOpenAI

# Create two instances of the AzureOpenAI client to interact with Azure's OpenAI Service
client = AzureOpenAI(
    # Use the API key for authentication
    api_key=api_key,  
    
    # Specify the API version to use
    api_version=api_version,
    
    # Construct the base URL for the deployment using the provided API base and deployment name
    base_url=f"{api_base}openai/deployments/{model_name}",
)

In [None]:
# Define the messages to send to the models
messages=[
    { 
        "role": "user", 
        "content": [  
            { 
                # Specify the type of content as text
                "type": "text", 
                    
                # Provide the text content for the model to process
                "text": "Where can I stay in London?" 
            }
        ] 
    } 
]

Set up the pipeline to retrieve relevant information from FAISS and generate responses with Azure OpenAI’s GPT model:

In [4]:
from langchain.chains import RetrievalAugmentedGenerationChain

# Define the RAG pipeline
rag_chain = RetrievalAugmentedGenerationChain(
    retriever=faiss_index,
    generator=client.chat.completions.create,
    prompt_template="Retrieve and generate response for: {query}"
    )

# Example query
query = "Where can I stay in London?"
response = rag_chain.run(query)
print(response)


In [9]:
# Create the chat completion requests using the AzureOpenAI clients
response1 = client1.chat.completions.create(
    # Specify the model to use for generating the response
    model=model_name1,
    
    # Define the messages to send to the model
    messages=messages1,
    
    # Set the maximum number of tokens to generate in the response
    max_tokens=2000 
)

response2 = client2.chat.completions.create(
    model=model_name2,
    messages=messages2,
    max_tokens=2000 
)

In [10]:
# The response contains multiple choices, and we are accessing the first one as our result
result1 = response1.choices[0].message.content
result2 = response2.choices[0].message.content

The variables `result1` and `result2` now contain the content of the first choice from their respective responses. This content is the generated text or code from the model based on the input messages. You can print each result, copy the code block generated within them, run each of the codes in a new code cell and compare their outputs. Are the scripts and outputs in any way different? 

In [None]:
print(result1)

In [None]:
print(result2)

You can submit more requests and have the code modified. It will also further demonstrate the difference between the models and make the metrics observed later on more significant. However, to make sure that the models keep track of the prompt history, we need to append their responses and the new prompts to the `messages` variables that we've been using so far.

In [None]:
# Add the responses to the messages as an Assistant Role
messages1.append({"role": "assistant", "content": result1})
messages2.append({"role": "assistant", "content": result2})

# Define the new prompt that will develop the chat completion further
new_prompt = "Add a legend to the plot replacing the labels"

# Add the user's question to the messages as a User Role
messages1.append({"role": "user", "content": new_prompt})
messages2.append({"role": "user", "content": new_prompt})

In [None]:
# Submit the new chat completion requests
response1 = client1.chat.completions.create(
    model=model_name1,
    messages=messages1,
    max_tokens=2000 
)
response2 = client2.chat.completions.create(
    model=model_name2,
    messages=messages2,
    max_tokens=2000 
)
result1 = response1.choices[0].message.content
result2 = response2.choices[0].message.content

## Conclusion

After reviewing the plot and remembering the benchmark values in the Accuracy vs. Cost chart observed before, can you conclude which model is best for your use case? Does the difference in the outputs' accuracy outweight the difference in tokens generated and therefore cost? 

## Clean up

If you've finished the exercise, you should delete the resources you have created to avoid incurring unnecessary Azure costs.

1. Return to the browser tab containing the Azure portal (or re-open the [Azure portal](https://portal.azure.com?azure-portal=true) in a new browser tab) and view the contents of the resource group where you deployed the resources used in this exercise.
1. On the toolbar, select **Delete resource group**.
1. Enter the resource group name and confirm that you want to delete it.