# Review Genie - A conversation Chatbot for an Echo Tourism Application

*   List item
*   List item



> Add blockquote



## Basics of RAG

Before we start coding, lets go over a few questions and get it clarified.

### 1. What is RAG?
Imagine you're writing an article about climate change, but instead of relying only on what you remember, you search online for recent studies and data to support your writing. RAG works the same way—it combines a powerful LLMs with a search system that retrieves relevant information from an external knowledge source, like a database or documents. This helps generate more accurate and informative responses.

### 2. Why is RAG important?
RAG is crucial because LLM models, like ChatGPT, can sometimes "hallucinate" or provide outdated or incorrect information. By retrieving facts from trusted sources before generating responses, RAG ensures the answers are more reliable, up-to-date, and contextually relevant.

### 3. What’s the difference between RAG and a standard LLM chatbot?
A standard chatbot relies only on pre-trained knowledge, which may be limited or outdated. RAG-enhanced chatbots, however, actively retrieve fresh, relevant information from external sources, ensuring better accuracy and up-to-date insights.

### 4. What are the key components of RAG?
RAG consists of two main parts:

- Retriever: Finds the most relevant documents or data from a knowledge base (e.g., a search engine or database).
- Generator: Uses the retrieved information to produce a coherent and accurate response.

### 5. Usecases in real-life
- Customer Support: LLM chatbots retrieve knowledge base articles to provide better responses to customer queries.
- Healthcare: Doctors can get AI-assisted summaries of patient records and the latest medical research.
- Legal Services: Lawyers can search through legal documents and case studies to build stronger cases.



Key Objectives - our goals are as follows:

- Understanding the overall business problem
- Identifying the key milestones that we have to close
- Understand the apply the key tools that we need here
- Building a simple POC app, that will set the groundwork for the subsequent weeks.



**Structure of this Notebook**

- Building ReviewGenie POC
  - Data Preparation
  - Vector Store Setup
  - Building the chatbot
  - Building a simple Gradio UI


### Understanding the Limitation of the LLM

### Interpretation:
Why is the LLM not able to answer the query?



#### Do it yourself:
Try changing the model to `gpt-4o-mini` and observe how the output changes!

### Making the LLM context-aware

Next Step: Let's check Uber's [financial report ](https://s23.q4cdn.com/407969754/files/doc_events/2024/May/06/2023-annual-report.pdf)

On Page 54, of the above document it states:

"Revenue was 37.3 billion, up 17% year-over-year. Mobility revenue increased 5.8 billion primarily attributable to an increase in
Mobility Gross Bookings of......"

### Basic RAG app architecture

In the previous example, we manually retrieved the context from the given file which for all purposes is impractical (duh!!)

Therefore, we have to devise a strategy that enables us to:

- Take the query from the user
- Identify the documents from the external source that might be relevant for the query.
- Pass those documents' information as context to the LLM
- LLM then generates the final response

To do the above, we can follow a simple standard architecture as shown below (Image source - https://huyenchip.com/2024/07/25/genai-platform.html)

<center><img src="https://huyenchip.com/assets/pics/genai-platform/3-rag.png" width=500 height=400/></center>

As you can see in the above image, the retriever would be the key component of this entire architecture.

To build the retriever, we have to follow these steps:

- Connect to the document source
- Break the documents down to manageable chunks. This is due to the fact that taking in the entire document source for building the context will exceed the token limits of the LLM. This process is also called **Chunking**.
- Perform a search for the most relevant chunks based on the given query.
- Pass those relevant chunks to the LLM.

For performing the search or retrieval process, we will be following an **embedding-based approach.**

<center><img src="https://cdn.prod.website-files.com/640248e1fd70b63c09bd3d09/653fd23f1565c0c1da063efc_Semantic%20Search%20Text%20Embeddings%20(1).png" width =500/></center>

### Understanding Embedding based approach

*   List item
*   List item



In the embedding based approach:

- We convert the document chunks in the database to vector embeddings and store it in a vector store.

- Convert the given user query to an embedding.

- Find the document chunks whose vector embeddings are closest to the given query embedding using a vector search algorithm like FAISS (Facebook AI Similarity Search)

<center><img src="https://miro.medium.com/v2/resize:fit:1400/1*h_btyitJX79d-gFE8RaMQg.png" width=500/></center>

### Tools for building the RAG App

Now that we are familiar with the overall architecture, we can now go ahead and structure the tools that we'll use for the upcoming demonstration:

- OpenAI LLM (model - GPT 4o-mini): This will be our primary model for generating the responses
- LangChain: Langchain is a powerful framework for orchestrating different layers in the RAG app. We shall use this to build the retriever end-to-end and also connect with other tools for tasks such as
    - Chunking - RecursiveCharacterTextSplitter
    - Embedding Model - OpenAIEmbeddings
    - Vector Search Model - FAISS
- Gradio: This will help in building a simple UI at the end.


*A basic chatbot that can answer customer queries*



<center><img src="https://www.pranathiss.com/static/assets/images/ai-powered-chatBot.webp" width=500/></center>





### Steps:

1. **Data Preparation**:
   - Load and process the dataset using pandas

2. **Vector Store Setup**:
   - Convert product descriptions into embeddings.
   - Store embeddings in a vector database.

3. **Building the Chatbot**:
   - Use LangChain to create an LLM pipeline.
   - Develop a simple chatbot to answer product-related queries.

4. **Creating a UI**:
   - Implement a Gradio-based UI for user interaction.

In [1]:

!pip install --upgrade langchain langchain-openai faiss-cpu openai langchain-community

!pip install -U langchain-core langchain-text-splitters gradio faiss-cpu tiktoken

!pip install --upgrade gradio




In [2]:
# Importing the KaggleHub library to interact with datasets and models available on Kaggle.
import kagglehub

# Importing the CSV module for reading and writing CSV files.
import csv

# Importing pandas for data manipulation and analysis.
import pandas as pd

# Importing numpy for numerical operations and handling arrays efficiently.
import numpy as np

# Importing os to interact with the operating system, such as environment variables and file paths.
import os

# Importing getpass to securely handle user input (e.g., API keys or passwords).
import getpass


### STEP 1: Data Preparation

In [3]:
# Mount Google Drive
from google.colab import drive
drive.mount("/content/gdrive")

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [4]:
# Loading the data
df = pd.read_csv("/content/gdrive/MyDrive/tourism_resource_dataset.csv",index_col=0)

In [5]:
# Viewing the data
df.head(10)

Unnamed: 0_level_0,location_id,visitor_count,resource_usage_rate,temperature,air_quality_index,noise_level,season,peak_hour_flag,visitor_satisfaction,sensor_noise_flag,resource_prediction,resource_allocation,t_sne_dim1,t_sne_dim2
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2024-12-01 00:00:00,LOC_003,808,0.907638,19.368864,127,51.506727,summer,0,5.502615,1,0.857819,high,-4.576337,0.582736
2024-12-01 01:00:00,LOC_001,948,0.974266,17.404945,37,55.901717,autumn,0,4.736401,0,0.961133,high,-28.314085,20.02282
2024-12-01 02:00:00,LOC_003,292,0.321912,16.366819,113,68.533024,winter,1,2.522827,0,0.306956,low,1.329948,5.881103
2024-12-01 03:00:00,LOC_003,592,0.811889,20.266316,52,85.301039,autumn,1,2.687745,1,0.701945,medium,-11.921675,20.376535
2024-12-01 04:00:00,LOC_001,89,0.936667,15.922471,145,52.258779,summer,1,1.094965,1,0.512834,medium,-6.068825,-4.793058
2024-12-01 05:00:00,LOC_001,278,0.900152,26.343416,141,70.581182,spring,1,1.822836,0,0.589076,medium,-30.603321,2.003588
2024-12-01 06:00:00,LOC_003,597,0.975164,20.676569,134,54.19375,summer,1,6.011548,0,0.786082,medium,-1.928188,-13.306153
2024-12-01 07:00:00,LOC_002,969,0.832066,31.490696,47,87.103858,spring,0,9.026858,0,0.900533,high,-22.911472,6.416325
2024-12-01 08:00:00,LOC_003,272,0.609609,16.484817,30,62.670967,spring,0,3.584483,0,0.440805,low,15.556006,3.516501
2024-12-01 09:00:00,LOC_003,55,0.518366,19.52703,66,35.370598,autumn,0,3.949725,1,0.286683,low,-4.017957,-26.208872


**Constructing the text data**

It's useful to use both `Title` and `Description`. To help downstream models understand which content is title and which content is description, we will add a prefix explaining which section is title and which is description. So each row should look like

```
Title
{Title}
Description
{Description}
```

### STEP 2: Vector Store Setup

Let's try to get a few of the basic questions answered about vector stores before we start using it.

### What is a vector store?
A vector store is a specialized database that stores data in the form of numerical vectors, allowing efficient searching and retrieval based on similarity rather than exact matches.

### Why do we need a vector store?
Traditional databases rely on exact keyword matches, which can miss relevant information. A vector store helps find similar content by understanding relationships and meaning in data.

### How does a vector store work?
It converts text, images, or other data into numerical vectors using ML
 models, then stores these vectors and retrieves similar ones using techniques like cosine similarity.

### How does a vector store improve search results?
It enables searches based on meaning rather than just keywords, providing more relevant results even if the exact terms don't match.

### What are some popular vector store tools?
- FAISS (Facebook AI Similarity Search)
- Pinecone
- Weaviate
- Chroma

### What is an embedding, and how does it relate to a vector store?
An embedding is a numerical representation of data (e.g., text, image) that captures its meaning. These embeddings are stored in a vector store for efficient retrieval.


Our next step is
-  to convert the `product_description` to chunks
-  convert each chunk to embedding
-  store it in vector store for searching

As discussed earlier we shall use `LangChain` to perform these steps.

LangChain is a framework that helps developers build applications powered by large language models (LLMs) like GPT by providing tools for various tasks to be carried out like retrieving relevant information from databases, etc.

In [6]:
# Importing RecursiveCharacterTextSplitter from LangChain for chunking large text into smaller, manageable pieces.
# This helps in optimizing text for processing and retrieval.
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Importing OpenAIEmbeddings from LangChain to generate numerical vector representations (embeddings) of text.
# These embeddings capture the semantic meaning of the text for efficient similarity searches.
from langchain_openai import OpenAIEmbeddings

# Importing FAISS (Facebook AI Similarity Search) from LangChain's community package.
# FAISS is used for storing and retrieving embeddings efficiently by finding similar vectors.
from langchain_community.vectorstores import FAISS

# Importing the OpenAI library to interact with OpenAI's API services.
from openai import OpenAI

import os  # Importing the os module to interact with environment variables
import getpass  # Importing getpass to securely input sensitive information

# Prompting the user to securely enter their OpenAI API key without displaying it on the screen
OPENAI_API_KEY = getpass.getpass("Enter your OpenAI API key: ")

# Setting the OpenAI API key as an environment variable.
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

Enter your OpenAI API key: ··········


In [7]:
# Setting the OpenAI API key as an environment variable.
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
# Convert each row into a readable string for embeddings
texts = df.apply(lambda row: (
    f"Location {row['location_id']} had {row['visitor_count']} visitors. "
    f"Resource usage rate: {row['resource_usage_rate']}%. "
    f"Temperature: {row['temperature']}°C, AQI: {row['air_quality_index']}, Noise level: {row['noise_level']} dB. "
    f"Season: {row['season']}, Peak hour: {row['peak_hour_flag']}. "
    f"Visitor satisfaction: {row['visitor_satisfaction']}, Resource prediction: {row['resource_prediction']} units."
), axis=1).tolist()

# Split text into documents
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=250,
    chunk_overlap=20,
    length_function=len,
    is_separator_regex=False,
)

documents = text_splitter.create_documents(texts)

# Show a sample document
print(documents[0])


page_content='Location LOC_003 had 808 visitors. Resource usage rate: 0.9076380257726048%. Temperature: 19.368863657666143°C, AQI: 127, Noise level: 51.50672732934044 dB. Season: summer, Peak hour: 0. Visitor satisfaction: 5.502614668103616, Resource prediction:'


### Code Explanation:
The above code initializes a `RecursiveCharacterTextSplitter` to break down product_description into smaller text chunks of 250 characters each, with a 20-character overlap to preserve context between chunks. The `create_documents` function processes the text list and generates structured document chunks for efficient retrieval and analysis.

### Why do we need overlap?
Overlap is needed to ensure continuity and preserve context between chunks, preventing important information from being cut off at chunk boundaries. This helps LLMs better understand the text when processing each chunk independently, improving retrieval accuracy and response quality.

In [8]:
# Create an embedding model using LangChain.
# One option is using https://python.langchain.com/docs/integrations/text_embedding/openai/
# See https://python.langchain.com/docs/integrations/text_embedding/ for a list of available embedding models on LangChain
embeddings = OpenAIEmbeddings()


In [9]:
# Create a vector store using the created chunks and the embeddings model
vector = FAISS.from_documents(documents, embeddings)

### What have we done so far?
1. Data Preparation: Extracted the product description data
2. Data Chunking: Converted the entire data into multiple manageable chunks
3. Chunks to Embeddings: Converted the broken down chunks into embeddings
4. Storage in a Vector DB: Stored the resulting embeddings of chunks in a vector store for effective retrieval.


### What is remaining?
- Building the chatbot
- Building the Gradio UI

### STEP 3: Building the chatbot

Now that we have converted the documents to embeddings, our next step is to
- build a retriever that uses the vector store to retrieve the documents
- create a prompt template that contains the augmented context using the retrieved documents

In [10]:
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableMap


#### Code Explanation:
- `ChatOpenAI` – Used to access OpenAI models for chatbot functionality.
- `ChatPromptTemplate` – Helps structure queries to ensure better responses.
- `OpenAIEmbeddings` – Converts text into vector form for similarity-based retrieval.
- `create_stuff_documents_chain` – Combines retrieved documents meaningfully before passing to the LLM.
- `create_retrieval_chain` – Automates the process of retrieving and utilizing relevant content for AI responses.
- `StrOutputParser` - For processing the output of language models, ensuring that the output is returned as a plain string

In [11]:
# Initializing the ChatOpenAI model to interact with OpenAI's GPT model.
llm = ChatOpenAI(api_key=os.environ["OPENAI_API_KEY"], model = 'gpt-4o-mini')

In [12]:
# Importing the output parser to process and format the model's response into a readable string format.
output_parser = StrOutputParser()
chat_history = []
# Creating a prompt template that instructs the LLM to act as a customer service agent.
# The prompt takes two parameters:
#   1. {context} - Relevant information retrieved from the document store.
#   2. {input} - The user's question.
# The model is instructed to base its answer solely on the provided context.
prompt = ChatPromptTemplate.from_template(
    """You are a helpful assistant. Use the following context and chat history to answer the question.

<context>
{context}
</context>

<chat_history>
{chat_history}
</chat_history>

Question: {input}"""
)

# Create a retriever from the vector store for fetching relevant documents
# See https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/vectorstore/
retriever = vector.as_retriever()

# Creating a document processing chain using the LLM and the defined prompt
# template.
# This chain takes a list of retrieved documents and passes them as context to
# the model for generating responses.


output_parser = StrOutputParser()

rag_chain = (
    RunnableMap({
        "input": lambda x: x["input"],
        "chat_history": lambda x: x["chat_history"],
        "context": lambda x: "\n".join([doc.page_content for doc in retriever.invoke(x["input"])])
    })
    | prompt
    | llm
    | output_parser
)

def format_chat_history(history):
    return "\n".join([f"User: {q}\nBot: {a}" for q, a in history])


In [13]:
# Invoking the retrieval chain to process the user's query.
# The query "what are some of the best shoes available?" is passed as input.
# The retrieval chain first fetches relevant product descriptions from the vector store,
# then processes them using the document chain to generate a meaningful LLM response.
user_query = "what location has the top visitors?"
formatted_history = format_chat_history(chat_history)

response = rag_chain.invoke({
    "input": user_query,
    "chat_history": formatted_history
})

chat_history.append((user_query, response))
print(response)



Location LOC_001 has the top number of visitors, with 970 visitors.


In [14]:
# Fetching the final answer from the retrieval chain by invoking it with a user query.
# The ['answer'] key extracts the final LLM-generated answer from the response dictionary.
user_query = "what season has the top visitors?"
formatted_history = format_chat_history(chat_history)

response = rag_chain.invoke({
    "input": user_query,
    "chat_history": formatted_history
})

chat_history.append((user_query, response))
print(response)

The season with the top visitors is summer, as it had 970 visitors at Location LOC_001.


Now, we got the answer! But, the formatting is not very good, right? Lets create a simple UI for our bot.

### STEP 4: Building a simple Gradio UI

Gradio is an open-source Python library that makes it easy to build interactive user interfaces for machine learning models, APIs, and data science workflows. It allows developers to create shareable web-based UIs with just a few lines of code.

To build the gradio app we'll utilize the following steps:

- Modularize the entire RAG pipeline using a single function
- Create the building blocks for the UI.
- Connect the UI with the function

In [15]:


# Define a function to run the full RAG process
def final_response(user_query):
    formatted_history = format_chat_history(chat_history)

    # Step 1: Retrieve context
    context_docs = retriever.invoke(user_query)
    context_text = "\n".join([doc.page_content for doc in context_docs])

    # Step 2: Format the prompt with all variables
    prompt_input = {
        "input": user_query,
        "context": context_text,
        "chat_history": formatted_history
    }
    formatted_prompt = prompt.invoke(prompt_input)

    # Step 3: Run LLM on formatted prompt
    response = llm.invoke(formatted_prompt)
    parsed_response = output_parser.invoke(response)

    # Step 4: Save to history
    chat_history.append((user_query, parsed_response))

    return parsed_response

def chatbot_fn(message, history):
    # Format history for prompt
    formatted_history = format_chat_history(chat_history)

    # Run RAG pipeline
    response = rag_chain.invoke({
        "input": message,
        "chat_history": formatted_history
    })

    # Save to history
    chat_history.append((message, response))

    return response

In [17]:
import gradio as gr
chatbot_ui = gr.ChatInterface(
    fn=chatbot_fn,
    title="Echo Tourism RAG Bot",
    description="Ask me anything about eco-tourism and sustainability!",
    theme="soft",
)

chatbot_ui.launch()

  self.chatbot = Chatbot(


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://6b4a6f94603c892130.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




### Next Steps for Experimentation

In the above demonstration, we have created a very basic GUI-based RAG app. Before our next class, you are recommende to experiment the following

- Try asking 10-15 questions and check the accuracy of the response.
- What would be a strategy that you might follow to evaluate the app's overall responses?
- Ask a few out-of-the-box questions (like "What is the weather today in Seattle?")
- Try asking the same question multiple times with a different wording
- In the same chat history, ask questions which are related to the response that was just provided.
- Use a different dataset (perhaps in a different format like pdf) and see how the architecture might change.

<hr> <hr>