# Customer‚ÄëSupport Chatbot for an E-Commerce Store

## Roadmap  
We will build a RAG-based chatbot in **six** steps:

1. **Environment setup**
2. **Data preparation**  
   a. Load source documents  
   b. Chunk the text  
3. **Build a retriever**  
   a. Generate embeddings  
   b. Build a FAISS vector index  
4. **Build a generation engine**. Load the *Gemma3-1B* model through Ollama and run a sanity check.  
5. **Build a RAG**. Connect the system prompt, retriever, and LLM together. 
6. **Streamlit UI**. Wrap everything in a simple web app so users can chat with the bot.


## 1‚ÄØ-‚ÄØEnvironment setup

We use conda to manage our project dependencies and ensure everyone has a consistent setup. Conda is an open-source package and environment manager that makes it easy to install libraries and switch between isolated environments.

Let's import required libraries and print a message if we're not **missing packages**.

In [None]:
# Import standard libraries for file handling and text processing
import os, pathlib, textwrap, glob

import os, pathlib, textwrap, glob
from langchain_community.document_loaders import UnstructuredURLLoader, TextLoader, PyPDFLoader

# Load documents from various sources (URLs, text files, PDFs)
from langchain_community.document_loaders import UnstructuredURLLoader, TextLoader, PyPDFLoader

# Split long texts into smaller, manageable chunks for embedding
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Vector store to store and retrieve embeddings efficiently using FAISS
from langchain.vectorstores import FAISS

# Generate text embeddings using OpenAI or Hugging Face models
from langchain.embeddings import OpenAIEmbeddings, HuggingFaceEmbeddings, SentenceTransformerEmbeddings

# Use local LLMs (e.g., via Ollama) for response generation
from langchain.llms import Ollama

# Build a retrieval chain that combines a retriever, a prompt, and an LLM
from langchain.chains import ConversationalRetrievalChain

# Create prompts for the RAG system
from langchain.prompts import PromptTemplate

print("‚úÖ Libraries imported! You're good to go!")

## 2‚ÄØ-‚ÄØData preparation
The goal of this step is to turn all reference documents into small chunks of text that a retriever can index and search. These documents typically come from:
* PDF files: local documents such as policies, user manuals, or guides.
* Web pages (HTML): online documentation, blog posts, or help articles.

In this step, we perform two actions:
* **Ingesting**: load every PDF and collect the raw text in a list named `raw_docs`.
* **Chunking**: split each document into small, overlapping chunks so later steps can match a user query to the most relevant passage.

In [None]:
pdf_paths = glob.glob("data/Everstorm_*.pdf")
raw_docs = []

for pdf_path in pdf_paths:
    raw_docs.extend(PyPDFLoader(pdf_path).load())

print(f"Loaded {len(raw_docs)} PDF pages from {len(pdf_paths)} files.")

### 2.1 - Load web pages
You can also pull content straight from the web. Various libraries support reading and parsing web pages directly into text, which is useful for building custom knowledge bases. One example is **UnstructuredURLLoader** from LangChain, which can extract readable content from raw HTML pages and return them in a structured format. To learn more, see: https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.url.UnstructuredURLLoader.html

To practice, load each HTML page below and store the results in a list called `raw_docs`. We‚Äôve included a few sample URLs, but you can replace them with any links you prefer.

In [None]:
URLS = [
    # --- BigCommerce ‚Äì shipping & refunds ---
    "https://developer.bigcommerce.com/docs/store-operations/shipping",
    "https://developer.bigcommerce.com/docs/store-operations/orders/refunds",
    # --- Stripe ‚Äì disputes & chargebacks ---
    # "https://docs.stripe.com/disputes",  
    # --- WooCommerce ‚Äì REST API reference ---
    # "https://woocommerce.github.io/woocommerce-rest-api-docs/v3.html",
]

try:
    loader = UnstructuredURLLoader(urls=URLS)
    raw_docs.extend(loader.load())
    print(f"Fetched {len(raw_docs)} documents from the web.")
except Exception as e:
    print("‚ö†Ô∏è  Web fetch failed, using offline copies:", e)
    raw_docs = []
    for pdf_path in pdf_paths:
        raw_docs.extend(PyPDFLoader(pdf_path).load())
    print(f"Loaded {len(raw_docs)} offline documents.")

### 2.2‚ÄØ-‚ÄØChunk the text

Long documents won‚Äôt work well directly with most LLMs. They can easily exceed the model‚Äôs context window, making it impossible for the model to read or reason over the full text at once. Even if they fit, processing long inputs can be inefficient and lead to weaker retrieval results.

To handle this, we split large documents into smaller, overlapping chunks. Several libraries can help with text splitting, each designed to preserve structure or balance chunk size. A popular choice is `RecursiveCharacterTextSplitter` from LangChain, which splits text intelligently while keeping paragraph or sentence boundaries intact. To familiarize youself with the library, visit: https://python.langchain.com/api_reference/text_splitters/character/langchain_text_splitters.character.RecursiveCharacterTextSplitter.html

In [None]:
chunks = []
text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=30)
chunks = text_splitter.split_documents(raw_docs)
print(f"‚úÖ {len(chunks)} chunks ready for embedding")

## 3¬†-Build a retriever

A *retriever* lets the RAG pipeline efficiently look up small, relevant pieces of context at query‚Äëtime. This step has two parts:
1. **Load a model to generate embeddings**: convert each text chunk from the reference documents into a fixed‚Äëlength vector that captures its semantic meaning.  
2. **Build vector database**: store these embeddings in a vector database.


### 3.1‚ÄØ- Load a model to generate embeddings

In [None]:
embedding_vector = []

# Embed the sentence "Hello world! and store it in an embedding_vector.
embeddings = SentenceTransformerEmbeddings(model_name="thenlper/gte-small")
embedding_vector = embeddings.embed_query("Hello world!")
print(len(embedding_vector))

### 3.2‚ÄØ-‚ÄØBuild a vector database



In [None]:
# Expected steps:
    # 1. Build the FAISS index from the list of document chunks and their embeddings.
    # 2. Create a retriever object with a suitable k value (e.g., 8).
    # 3. Save the vector store locally (e.g., under "faiss_index").
    # 4. Print a short confirmation showing how many embeddings were stored.

vectordb = FAISS.from_documents(documents=chunks, embedding=embeddings)
retriever = vectordb.as_retriever(search_kwargs={"k": 8})
vectordb.save_local("faiss_index")

print("‚úÖ Vector store with", vectordb.index.ntotal, "embeddings")

## 4¬†-¬†Build the generation engine
At the core of any RAG system lies an **LLM**. The retriever finds relevant information, and the LLM uses that information to generate coherent, context-aware responses.

In this project, we‚Äôll use **Gemma 3* (1B), a small but capable open-weight model, and run it entirely on your local machine using Ollama. This means you won‚Äôt need API keys or internet access to generate responses once the model is downloaded.


### 4.1 - Install `ollama` and serve `gemma3`

Follow these steps to set up Ollama and start the model server:

**1 - Install**
```bash
# macOS (Homebrew)
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
```

If you‚Äôre on Windows, install using the official installer from https://ollama.com/download.

**2 - Start the Ollama server (keep this terminal open)**
```bash
ollama serve
```
This command launches a local server at http://localhost:11434, which will stay running in the background.


**3 - Pull the Gemma mode (or the model of your choice) in a new terminal**
```bash
ollama pull gemma3:1b
```

This downloads the 1B version of Gemma 3, a compact model suitable for running on most modern laptops. Once downloaded, Ollama will automatically handle model loading and caching.


After this setup, your system is ready to generate responses locally using the Gemma model through the Ollama API.


### 4.2 - Test an LLM with a random prompt (Sanity check)


In [None]:
# Expected steps:
    # 1. Initialize the model (for example, gemma3:1b) with a low temperature such as 0.1 for more factual outputs.
    # 2. Use llm.invoke() with a short test prompt and print the response to verify that the model runs successfully.

llm = Ollama(model="gemma3:1b", temperature=0.1)
print(llm.invoke("What is the capital of France?"))

## Build a RAG

### 5.1‚ÄØ-‚ÄØDefine a system prompt

At this stage, we need to tell the model how to behave when generating answers. The **system prompt** acts as the model‚Äôs rulebook. It should clearly instruct the model to answer only using the retrieved context and to admit when it doesn‚Äôt know the answer. This helps prevent hallucination and keeps the responses grounded in the provided documents.

In [None]:
SYSTEM_TEMPLATE = """
You are a **Customer Support Chatbot**. Use only the information in CONTEXT to answer.
If the answer is not in CONTEXT, respond with ‚ÄúI'm not sure from the docs.‚Äù

Rules:
1) Use ONLY the provided <context> to answer.
2) If the answer is not in the context, say: "I don't know based on the retrieved documents."
3) Be concise and accurate. Prefer quoting key phrases from the context.
4) When possible, cite sources as [source: source] using the metadata.

CONTEXT:
{context}

USER:
{question}
"""

### 5.2 Create a RAG chain
Now that we have a retriever, a prompt, and a language model, we can connect them into a single RAG pipeline. The retriever finds the most relevant chunks from our vector index, the prompt injects those chunks into the system message, and the LLM uses that context to produce the final answer. (retriever ‚Üí prompt ‚Üí model)

This connection is handled through LangChain‚Äôs `ConversationalRetrievalChain`, which combines retrieval and generation. To familiarize yourself with the library, visit: https://python.langchain.com/api_reference/langchain/chains/langchain.chains.conversational_retrieval.base.ConversationalRetrievalChain.html

In [None]:
# Expected steps:
    # 1. Create a PromptTemplate that uses the SYSTEM_TEMPLATE you defined earlier, with input variables for "context" and "question".
    # 2. Initialize your LLM using Ollama with the gemma3:1b model and a low temperature (e.g., 0.1) for reliable, grounded responses.
    # 3. Build a ConversationalRetrievalChain by combining the LLM, the retriever, and your custom prompt and name it "chain".

prompt = PromptTemplate(template=SYSTEM_TEMPLATE, input_variables=["context", "question"])
llm = Ollama(model="gemma3:1b", temperature=0.1)
chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, combine_docs_chain_kwargs={"prompt": prompt})

When you ask a question, the retriever pulls the top few relevant text chunks, the model reads them through the system prompt, and then it generates an answer based on that context.


### 5.3‚ÄØ-‚ÄØValidate the RAG chain

We run a few questions to make sure everything behaves as expecte. Experiment by adding you own questions.

In [None]:
test_questions = [
    "If I'm not happy with my purchase, what is your refund policy and how do I start a return?",
    "How long will delivery take for a standard order, and where can I track my package once it ships?",
    "What's the quickest way to contact your support team, and what are your operating hours?",
]

# Expected steps:
    # 1. Initialize an empty chat_history list.
    # 2. Loop through test_questions, pass each question and the current chat history to the chain, and append the new answer.
    # 3. Print each question and the LLM's response to verify it's working correctly.

chat_history = []
for question in test_questions:
    print(f"\n‚ùì Question: {question}")
    result = chain({"question": question, "chat_history": chat_history})
    answer = result["answer"]
    print(f"üí¨ Answer: {answer}")
    chat_history.append((question, answer))

### 6‚ÄØ-‚ÄØBuild the Streamlit UI (optional)

The goal here is to create a tiny demo so you can interact with your RAG system. The focus is not on UI design. We will build a very small interface only to demonstrate the end-to-end flow.

In [None]:
import streamlit as st
from langchain.vectorstores import FAISS
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.llms import Ollama
from langchain.chains import ConversationalRetrievalChain
from langchain.prompts import PromptTemplate

# Page config
st.set_page_config(page_title="Everstorm Support Chat", page_icon="üõçÔ∏è")
st.title("üõçÔ∏è Everstorm Outfitters Support")
st.caption("Ask me anything about our policies, shipping, or returns!")

# System prompt
SYSTEM_TEMPLATE = """
You are a **Customer Support Chatbot**. Use only the information in CONTEXT to answer.
If the answer is not in CONTEXT, respond with "I'm not sure from the docs."

Rules:
1) Use ONLY the provided <context> to answer.
2) If the answer is not in the context, say: "I don't know based on the retrieved documents."
3) Be concise and accurate. Prefer quoting key phrases from the context.
4) When possible, cite sources as [source: source] using the metadata.

CONTEXT:
{context}

USER:
{question}
"""

# Load RAG components (cached to avoid reloading)
@st.cache_resource
def load_rag_chain():
    # Load embeddings
    embeddings = SentenceTransformerEmbeddings(model_name="thenlper/gte-small")
    
    # Load FAISS index
    vectordb = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
    retriever = vectordb.as_retriever(search_kwargs={"k": 8})
    
    # Initialize LLM
    llm = Ollama(model="gemma3:1b", temperature=0.1)
    
    # Create prompt and chain
    prompt = PromptTemplate(template=SYSTEM_TEMPLATE, input_variables=["context", "question"])
    chain = ConversationalRetrievalChain.from_llm(
        llm=llm, 
        retriever=retriever, 
        combine_docs_chain_kwargs={"prompt": prompt}
    )
    
    return chain

# Initialize chain
chain = load_rag_chain()

# Initialize chat history in session state
if "messages" not in st.session_state:
    st.session_state.messages = []
    st.session_state.chat_history = []

# Display chat messages
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# Chat input
if prompt := st.chat_input("Ask about our refund policy, shipping times, or support hours..."):
    # Add user message
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)
    
    # Generate response
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
            result = chain({"question": prompt, "chat_history": st.session_state.chat_history})
            answer = result["answer"]
            st.markdown(answer)
    
    # Add assistant message
    st.session_state.messages.append({"role": "assistant", "content": answer})
    st.session_state.chat_history.append((prompt, answer))

# Sidebar with info
with st.sidebar:
    st.header("About")
    st.info("This chatbot answers questions about Everstorm Outfitters using RAG (Retrieval-Augmented Generation).")
    
    if st.button("Clear Chat History"):
        st.session_state.messages = []
        st.session_state.chat_history = []
        st.rerun()
    
    st.markdown("---")
    st.caption("Powered by Gemma 3 (1B) via Ollama")

# Save this to app.py
with open("app.py", "w") as f:
    f.write("""import streamlit as st
from langchain.vectorstores import FAISS
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.llms import Ollama
from langchain.chains import ConversationalRetrievalChain
from langchain.prompts import PromptTemplate

# Page config
st.set_page_config(page_title="Everstorm Support Chat", page_icon="üõçÔ∏è")
st.title("üõçÔ∏è Everstorm Outfitters Support")
st.caption("Ask me anything about our policies, shipping, or returns!")

# System prompt
SYSTEM_TEMPLATE = \"\"\"
You are a **Customer Support Chatbot**. Use only the information in CONTEXT to answer.
If the answer is not in CONTEXT, respond with "I'm not sure from the docs."

Rules:
1) Use ONLY the provided <context> to answer.
2) If the answer is not in the context, say: "I don't know based on the retrieved documents."
3) Be concise and accurate. Prefer quoting key phrases from the context.
4) When possible, cite sources as [source: source] using the metadata.

CONTEXT:
{context}

USER:
{question}
\"\"\"

# Load RAG components (cached to avoid reloading)
@st.cache_resource
def load_rag_chain():
    # Load embeddings
    embeddings = SentenceTransformerEmbeddings(model_name="thenlper/gte-small")
    
    # Load FAISS index
    vectordb = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
    retriever = vectordb.as_retriever(search_kwargs={"k": 8})
    
    # Initialize LLM
    llm = Ollama(model="gemma3:1b", temperature=0.1)
    
    # Create prompt and chain
    prompt = PromptTemplate(template=SYSTEM_TEMPLATE, input_variables=["context", "question"])
    chain = ConversationalRetrievalChain.from_llm(
        llm=llm, 
        retriever=retriever, 
        combine_docs_chain_kwargs={"prompt": prompt}
    )
    
    return chain

# Initialize chain
chain = load_rag_chain()

# Initialize chat history in session state
if "messages" not in st.session_state:
    st.session_state.messages = []
    st.session_state.chat_history = []

# Display chat messages
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# Chat input
if prompt := st.chat_input("Ask about our refund policy, shipping times, or support hours..."):
    # Add user message
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)
    
    # Generate response
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
            result = chain({"question": prompt, "chat_history": st.session_state.chat_history})
            answer = result["answer"]
            st.markdown(answer)
    
    # Add assistant message
    st.session_state.messages.append({"role": "assistant", "content": answer})
    st.session_state.chat_history.append((prompt, answer))

# Sidebar with info
with st.sidebar:
    st.header("About")
    st.info("This chatbot answers questions about Everstorm Outfitters using RAG (Retrieval-Augmented Generation).")
    
    if st.button("Clear Chat History"):
        st.session_state.messages = []
        st.session_state.chat_history = []
        st.rerun()
    
    st.markdown("---")
    st.caption("Powered by Gemma 3 (1B) via Ollama")
""")

print("‚úÖ app.py created! Run: streamlit run app.py")

Run `streamlit run app.py` from your terminal.