<a href="https://colab.research.google.com/github/aheshmat/MantisAI/blob/main/Workshop_6_Building_RAG_Chatbot_for_FAQs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Workshop 6: Building a RAG Chatbot for FAQ Responses
### InterSystems AI for Software Developers

In this workshop, we will build a Retrieval-Augmented Generation (RAG) system for responding to e-commerce FAQs.
The goal is to create a simple chatbot that can retrieve relevant FAQ responses based on user queries.
By the end of this session, we hope you will have a chatbot deployed on Hugging Face Spaces.

We'll load a [dataset of e-commerce FAQs](https://huggingface.co/datasets/NebulaByte/E-Commerce_FAQs) using the Hugging Face `datasets` library.

Here’s a quick breakdown of relevant fields:
* `question`: Contains the FAQ question text, ideal as input for retrieval.
* `answer`: Contains the answer to each FAQ, which we can use as the target output in our response generation.
* `category`: Could be useful for context if we want to segment responses by topic or apply specific embeddings for different categories.
* `que_ans`: Combination of question and answer, which may serve as a good retrieval field, especially if we want to capture both question structure and response context.

### Suggested Pipeline Adaptation
1. **Document Store**: Store each `que_ans` field as a document in the vector database. This allows the RAG pipeline to retrieve the most contextually relevant question-answer pairs.
2. **Retrieval**: Use `question` as input for retrieving similar FAQs, which will help refine the search to show the best match.
3. **Generation**: Generate responses or refine the retrieved answers based on context from the query.

In [None]:
# Install necessary libraries



## Part 1: Data Preparation

### Steps:
- Load the FAQ dataset.
- Preprocess it for use in a RAG system.

### Expected Output:
- `documents`: A list of `Document` objects containing FAQs.


In [None]:
from datasets import load_dataset

# Load the FAQ dataset

ds = load_dataset("NebulaByte/E-Commerce_FAQs")

# Preprocess if necessary


### Preparing Data for RAG
Here we’ll structure the data for use in RAG, creating a list of `Document` objects. Each document combines `question` and `answer` for better retrieval context.


In [None]:
# Convert each FAQ to a Document object for RAG
documents = []


print(f"Total documents prepared: {len(documents)}")

## Part 2: Document Store and Embedding Setup
We will store our FAQ documents in a vector database and set up embeddings for efficient retrieval.

### Steps:
- Initialize a Chroma document store.
- Populate the store with embedded FAQ documents.

In [None]:
from haystack_integrations.document_stores.chroma import ChromaDocumentStore
from haystack.components.embedders import SentenceTransformersDocumentEmbedder

# Initialize document store


# Add documents to the document store

print(f"Total documents in the store: {len()}")

### Embedding the Documents
We’ll now create an embedding pipeline to add vector embeddings for each document in the store.


In [None]:
# Initialize the embedding model
embedder = SentenceTransformersDocumentEmbedder(model_name_or_path='all-MiniLM-L6-v2')

# Embed documents in the document store
document_store.update_embeddings(embedder)

## Part 3: Build a Basic RAG Pipeline
We will create a RAG pipeline that combines retrieval and generation to answer user queries.

### Steps:
- Set up retrieval using the Chroma document store.
- Create a generator component to rephrase retrieved answers.
- Assemble a RAG pipeline that uses retrieval for context.


In [None]:
from haystack import Pipeline
from haystack.components.generators import HuggingFaceLocalGenerator
from haystack_integrations.components.retrievers.chroma import ChromaEmbeddingRetriever

# Initialize retriever and generator


# Build RAG pipeline


### Testing the RAG Pipeline
Now, let’s test the RAG pipeline by asking it a question from the FAQ dataset.


In [None]:
# Sample query
query = "What is the return policy?"
prediction = ""
print("Response:", prediction['answers'][0].answer)

## Part 4: Deploying as a Chatbot on Hugging Face Spaces
Finally, we’ll deploy this RAG model as an interactive chatbot on Hugging Face Spaces. This allows you to test it with real questions.

### Steps:
- Define a Gradio interface for the chatbot (search for `gradio chatbot` to find the documentation).
- Deploy it to Hugging Face Spaces.

In [None]:
import gradio as gr

# Define chatbot function


# Create Gradio interface


# Launch interface


## Potential questions to test

**Shipping and Delivery**:
* "Hey! How long will my order take to get here?"
* "Do you guys do next-day delivery?"
* "Can I get updates on where my package is right now?"

⠀**Returns and Refunds**:
* "Hi, what's the deal with returns if I don’t like something?"
* "How do I go about getting a refund?"
* "Is there a fee if I want to send something back?"

⠀**Account and Orders**:
* "Hey, I forgot my password. Can you help me reset it?"
* "I just placed an order—any chance I can cancel it?"
* "How do I check what I’ve ordered in the past?"

⠀**Product Information**:
* "Are your products eco-friendly by any chance?"
* "Do I get a discount if I buy a bunch at once?"
* "Got any tips on choosing the right size?"

⠀**Payment and Security**:
* "What types of payment do you guys take?"
* "Is my info safe when I pay here?"
* "Do you offer payment plans, like monthly installments?"

⠀**General Inquiries**:
* "How can I get in touch with someone from your team?"
* "Do you have gift wrapping options?"
* "What should I do if something arrives damaged?"


## Conclusion and Next Steps
You’ve successfully created and deployed a chatbot that answers e-commerce FAQs using RAG.

Consider these additional steps:
- Improve retrieval accuracy with additional data.
- Experiment with different embedding models.
- Add advanced features, like re-ranking retrieved answers.