# Workshop 6: Building a RAG Chatbot for FAQ Responses
### InterSystems AI for Software Developers

In this workshop, we will build a Retrieval-Augmented Generation (RAG) system for responding to e-commerce FAQs.
The goal is to create a simple chatbot that can retrieve relevant FAQ responses based on user queries.
By the end of this session, we hope you will have a chatbot deployed on Hugging Face Spaces.

We'll load a [dataset of e-commerce FAQs](https://huggingface.co/datasets/NebulaByte/E-Commerce_FAQs) using the Hugging Face `datasets` library.

Here’s a quick breakdown of relevant fields:
* `question`: Contains the FAQ question text, ideal as input for retrieval.
* `answer`: Contains the answer to each FAQ, which we can use as the target output in our response generation.
* `category`: Could be useful for context if we want to segment responses by topic or apply specific embeddings for different categories.
* `que_ans`: Combination of question and answer, which may serve as a good retrieval field, especially if we want to capture both question structure and response context.

### Suggested Pipeline Adaptation
1. **Document Store**: Store each `que_ans` field as a document in the vector database. This allows the RAG pipeline to retrieve the most contextually relevant question-answer pairs.
2. **Retrieval**: Use `question` as input for retrieving similar FAQs, which will help refine the search to show the best match.
3. **Generation**: Generate responses or refine the retrieved answers based on context from the query.

In [None]:
# Install necessary libraries
!pip install haystack-ai sentence-transformers datasets chroma-haystack

## Part 1: Data Preparation

### Steps:
- Load the FAQ dataset.
- Preprocess it for use in a RAG system.

### Expected Output:
- `documents`: A list of `Document` objects containing FAQs.


### Preparing Data for RAG
Here we’ll structure the data for use in RAG, creating a list of `Document` objects. Each document combines `question` and `answer` for better retrieval context.


In [None]:
import random
from typing import List, Dict, Any
from datasets import load_dataset
from haystack import Document

# Load the FAQ dataset
dataset = load_dataset("NebulaByte/E-Commerce_FAQs", split='train')
dataset = dataset.select(range(50))
documents = [Document(content=doc['que_ans']) for doc in dataset]
questions = [doc['question'] for doc in dataset]
answers = [doc['answer'] for doc in dataset]

print(f"Total documents prepared: {len(documents)}")



Total documents prepared: 50


or

In [3]:
import random
from typing import List, Dict, Any
from datasets import load_dataset
from haystack import Document

# Load and preprocess review dataset
def load_and_preprocess(dataset_name) -> (List[Document], List[str], List[str]):
    """ Load and prepare review dataset for RAG system """
    dataset = load_dataset(dataset_name, split="train")
    dataset = dataset.select(range(50))
    documents = [Document(content=doc['que_ans']) for doc in dataset]
    questions = [doc['question'] for doc in dataset]
    answers = [doc['answer'] for doc in dataset]

    # Sample 25 examples
    # questions, answers, documents = zip(*random.sample(list(zip(questions, answers, documents)), 25))

    return documents, questions, answers

documents, questions, answers = load_and_preprocess("NebulaByte/E-Commerce_FAQs")
print(f"Total documents prepared: {len(documents)}")

Total documents prepared: 50


## Part 2: Document Store and Embedding Setup
We will store our FAQ documents in a vector database and set up embeddings for efficient retrieval.

### Steps:
- Initialize a Chroma document store.
- Populate the store with embedded FAQ documents.

In [4]:
from haystack_integrations.document_stores.chroma import ChromaDocumentStore
from haystack.components.embedders import SentenceTransformersDocumentEmbedder

# Added
from haystack.document_stores.types import DuplicatePolicy

# Instantiate the ChromaDocumentStore
document_store_ = ChromaDocumentStore(
    collection_name="my_collection",
    # embedding_function="sentence_transformers",
    persist_path="./chroma_store",
)

# Instantiate the Embedder
embedder_ = SentenceTransformersDocumentEmbedder(
    model="sentence-transformers/all-mpnet-base-v2",
    # WRONG - Different embedding dimension compared to SentenceTransformersDocumentEmbedder use "all-mpnet-base-v2"
    # model="sentence-transformers/all-MiniLM-L6-v2",
    batch_size=16,
    normalize_embeddings=True,
    progress_bar=True
)

# Load embedding model
embedder_.warm_up()

def embed_documents(documents):
    for doc in documents:
        if doc.embedding is None:
            embedded_docs = embedder_.run([doc])
            doc.embedding = embedded_docs['documents'][0].embedding
    return documents

documents_ = embed_documents(documents)

num_docs_written = document_store_.write_documents(documents_)

print(f"\nSuccessfully wrote {num_docs_written} documents to the Chroma document store.")

result = document_store_._collection.get(include=["documents", "embeddings"])
# print(f"IDs{result['ids']}")
print(f"\nEmbedding vector size:\n{len(result['embeddings'][0])}\n")
print(f"Embedding vectors:\n{result['embeddings']}")


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Add of existing embedding ID: 4429840492d1f422bd962143e22a3f2012ba06412e706d10916e33290416163e
Add of existing embedding ID: 153da4e5ec2a074e4fcf47da8d7636ae93f3f5e4758ac0ae3ed239df343c811b
Add of existing embedding ID: ec5305bb64a080ef53de78daeb1d22426c7e666069c1c0bc712eef9df02b7a33
Add of existing embedding ID: 92f6259a15806601882eb0d1921ade93ab323f998fb0ccf5592d81ea996f4dbb
Add of existing embedding ID: a8f92f52253b61581ce092ff0a48772ca34e3253c937d56b4605b19990eb51a0
Add of existing embedding ID: e1b544740b1ce2cf31f6bb205aa9a0b979705baeaa11bb00ec76c9c9cf0ecace
Add of existing embedding ID: 2a529cbcf42598a62a2df58e968d016c5d63ede8e0cfa2ce105cfff4ab97879b
Add of existing embedding ID: 33fc3bd460ce8a7fcbcef3ccd51db086608f3b82d36b89e2d59263fb1a5ac6d6
Add of existing embedding ID: 6dee948c0d9ede0d48f07e45298816178ae459947e405d6304caac8de0d8f978
Add of existing embedding ID: 532c7211fe06a12473facb0717dfc48b95429fcaa1883d81b59627a9466fe7d7
Add of existing embedding ID: e403064f92424cfb227d


Successfully wrote 50 documents to the Chroma document store.

Embedding vector size:
768

Embedding vectors:
[[ 2.63867360e-02 -9.44992620e-03 -2.69568488e-02 ...  2.41002049e-02
  -2.41391244e-03  9.41916090e-03]
 [ 3.82876843e-02  1.24860005e-02 -2.19820794e-02 ... -3.46753858e-02
  -8.12453145e-05  8.41767341e-03]
 [ 8.50318000e-03  4.70687263e-02 -2.63809357e-02 ...  1.25584826e-02
   5.48392497e-02 -8.00929498e-03]
 ...
 [ 7.38279298e-02 -2.30819527e-02 -1.38115948e-02 ... -3.40640661e-03
  -2.37715580e-02 -2.56267581e-02]
 [ 5.23505658e-02 -2.37718653e-02 -1.19062653e-02 ...  2.86244359e-02
   1.00457929e-02 -2.81636957e-02]
 [ 1.09630890e-01 -1.72770862e-02 -1.24559943e-02 ...  1.30443135e-02
  -5.17518893e-02 -4.05901968e-02]]


or


In [5]:
from haystack_integrations.document_stores.chroma import ChromaDocumentStore
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder

# Added
from haystack.components.writers import DocumentWriter
from haystack.document_stores.types import DuplicatePolicy
from haystack import Pipeline

def setup_document_store(documents: List[Document]):
    """ Initialize and populate ChromaDB document store """
    #   Initialize document store
    document_store = ChromaDocumentStore()

    # Create indexing pipeline
    indexing = Pipeline()
    # Add embedder to generate document embeddings
    indexing.add_component("embedder", SentenceTransformersDocumentEmbedder())
    # Add document writer to store documents and embeddings in the document store
    indexing.add_component("writer", DocumentWriter(document_store, policy=DuplicatePolicy.OVERWRITE))
    # Connect embedder output to writer input
    indexing.connect("embedder.documents", "writer.documents")
    # Run indexing pipeline to embed and store documents
    indexing.run({"embedder": {"documents": list(documents)}})
    return document_store

# Initialize document store
document_store = setup_document_store(documents)

# Check the embedding vectors
result = document_store._collection.get(include=["documents", "embeddings"])
# print(f"IDs{result['ids']}")
print(f"\nEmbedding vector size:\n{len(result['embeddings'][0])}\n")
print(f"Embedding vectors:\n{result['embeddings']}")

Batches:   0%|          | 0/2 [00:00<?, ?it/s]


Embedding vector size:
768

Embedding vectors:
[[ 2.63867248e-02 -9.44987591e-03 -2.69568302e-02 ...  2.41001826e-02
  -2.41393945e-03  9.41917486e-03]
 [ 3.82877253e-02  1.24859819e-02 -2.19820403e-02 ... -3.46753784e-02
  -8.12341532e-05  8.41767900e-03]
 [ 8.50318745e-03  4.70686704e-02 -2.63809282e-02 ...  1.25584723e-02
   5.48393466e-02 -8.00927915e-03]
 ...
 [ 7.38279894e-02 -2.30819583e-02 -1.38115929e-02 ... -3.40642035e-03
  -2.37715840e-02 -2.56267302e-02]
 [ 5.23505583e-02 -2.37718429e-02 -1.19062616e-02 ...  2.86244228e-02
   1.00458078e-02 -2.81636752e-02]
 [ 1.09630935e-01 -1.72771998e-02 -1.24559877e-02 ...  1.30443461e-02
  -5.17520607e-02 -4.05901819e-02]]


### Embedding the Documents
We’ll now create an embedding pipeline to add vector embeddings for each document in the store.


In [6]:
# # Initialize the embedding model
# embedder = SentenceTransformersDocumentEmbedder(model='all-MiniLM-L6-v2')
# # embedder = SentenceTransformersDocumentEmbedder()

# # Embed documents in the document store
# document_store.update_embeddings(embedder)

## Part 3: Build a Basic RAG Pipeline
We will create a RAG pipeline that combines retrieval and generation to answer user queries.

### Steps:
- Set up retrieval using the Chroma document store.
- Create a generator component to rephrase retrieved answers.
- Assemble a RAG pipeline that uses retrieval for context.


In [None]:
from haystack import Pipeline
from haystack.components.generators import HuggingFaceLocalGenerator
from haystack_integrations.components.retrievers.chroma import ChromaEmbeddingRetriever
# Added
from haystack.components.builders import AnswerBuilder # For structuring the answer
from haystack.components.builders.prompt_builder import PromptBuilder  # For constructing prompts
# OpenAI
from haystack.components.generators import OpenAIGenerator  # For using OpenAI's language model
from haystack.utils import Secret  # For securely storing the OpenAI API key

# Initialize retriever and generator
retriever = ChromaEmbeddingRetriever(document_store=document_store)

# # Initialize a local Hugging Face generator
# generator = HuggingFaceLocalGenerator(
#     model="google/flan-t5-large", # Specify the pre-trained model to use
#     task="text2text-generation", # Define the task type for the model
#     generation_kwargs={ # Set parameters for text generation
#         "max_new_tokens": 1024, # Maximum number of tokens to generate
#         # "temperature": 0.9, # Controls the randomness of the generated text
#     },
# )

# OpenAI
openai_key = "YOUR_OPENAI_API_KEY"
generator = OpenAIGenerator(api_key=Secret.from_token(openai_key))

# Define the prompt template using Jinja2 syntax
prompt_template = """
Find the question in these documents and answer it following the information gvien in that document,
Documents:
{% for doc in documents %}
    {{ doc.content }}
{% endfor %}
Question: {{question}}
Answer:
"""

prompt_builder = PromptBuilder(template=prompt_template)
# answer_builder = AnswerBuilder()
query_embedder = SentenceTransformersTextEmbedder()

### Building Pipeline

In [13]:
# Build RAG pipeline
RAG = Pipeline()
# Add components
RAG.add_component("query_embedder", query_embedder)
RAG.add_component('retriever', retriever)
RAG.add_component("prompt_builder", prompt_builder)
RAG.add_component('llm', generator)
# RAG.add_component('answer_builder', answer_builder)

# Connect components
RAG.connect("query_embedder", "retriever.query_embedding")
# Connect the retriever's output to the prompt builder's 'documents' input
RAG.connect("retriever", "prompt_builder.documents")
# Connect the prompt builder's output to the new 'llm' component
RAG.connect("prompt_builder", "llm")

# If you want to add an answer builder
# # Connect the llm output response to the answer builder reply input
# RAG.connect("llm.replies", "answer_builder.replies")
# # Connect the output of the local LLM to the answer builder's 'replies' input
# RAG.connect("llm", "answer_builder.replies")
# # Connect the retriever's output to the answer builder's 'documents' input
# RAG.connect("retriever", "answer_builder.documents")

<haystack.core.pipeline.pipeline.Pipeline object at 0x31e61e560>
🚅 Components
  - query_embedder: SentenceTransformersTextEmbedder
  - retriever: ChromaEmbeddingRetriever
  - prompt_builder: PromptBuilder
  - llm: OpenAIGenerator
🛤️ Connections
  - query_embedder.embedding -> retriever.query_embedding (List[float])
  - retriever.documents -> prompt_builder.documents (List[Document])
  - prompt_builder.prompt -> llm.prompt (str)

### Testing the RAG Pipeline
Now, let’s test the RAG pipeline by asking it a question from the FAQ dataset.


In [14]:
# Sample query
query = "Do you guys do next-day delivery?"
# query = "I missed the delivery of my order today. What should I do?"
# query = "What is the return policy?"
# query = "Hey! How long will my order take to get here?"
# prediction = "The courier service delivering your order usually tries to deliver on the next business day in case you miss a delivery.\nYou can check your SMS for more details on when the courier service will try to deliver again.\n"

# Run the pipeline with embedder and retriever in one execution
results = RAG.run(
    {
        "query_embedder": {"text": query}, #send query to embedder
        "prompt_builder": {"question": query}
        # "answer_builder": {"query": query}
    }
)

print(results['llm']['replies'][0])

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

No, next-day delivery is not specifically mentioned. However, couriers attempt to re-deliver the order on the next working day if it was missed. The estimated delivery time is based on the product page and sellers typically ship orders 1-2 business days before the promised delivery date.


## Part 4: Deploying as a Chatbot on Hugging Face Spaces
Finally, we’ll deploy this RAG model as an interactive chatbot on Hugging Face Spaces. This allows you to test it with real questions.

### Steps:
- Define a Gradio interface for the chatbot (search for `gradio chatbot` to find the documentation).
- Deploy it to Hugging Face Spaces.

In [None]:
import gradio as gr

# Define chatbot function


# Create Gradio interface


# Launch interface


## Potential questions to test

**Shipping and Delivery**:
* "Hey! How long will my order take to get here?"
* "Do you guys do next-day delivery?"
* "Can I get updates on where my package is right now?"

⠀**Returns and Refunds**:
* "Hi, what's the deal with returns if I don’t like something?"
* "How do I go about getting a refund?"
* "Is there a fee if I want to send something back?"

⠀**Account and Orders**:
* "Hey, I forgot my password. Can you help me reset it?"
* "I just placed an order—any chance I can cancel it?"
* "How do I check what I’ve ordered in the past?"

⠀**Product Information**:
* "Are your products eco-friendly by any chance?"
* "Do I get a discount if I buy a bunch at once?"
* "Got any tips on choosing the right size?"

⠀**Payment and Security**:
* "What types of payment do you guys take?"
* "Is my info safe when I pay here?"
* "Do you offer payment plans, like monthly installments?"

⠀**General Inquiries**:
* "How can I get in touch with someone from your team?"
* "Do you have gift wrapping options?"
* "What should I do if something arrives damaged?"


## Conclusion and Next Steps
You’ve successfully created and deployed a chatbot that answers e-commerce FAQs using RAG.

Consider these additional steps:
- Improve retrieval accuracy with additional data.
- Experiment with different embedding models.
- Add advanced features, like re-ranking retrieved answers.