# Project 2: Customer‚ÄëSupport Chatbot for an E-Commerce Store

## üéØ What is RAG (Retrieval-Augmented Generation)?

**RAG** is a technique that combines:
1. **Retrieval** - Finding relevant documents from a knowledge base
2. **Augmented** - Adding retrieved context to the prompt
3. **Generation** - LLM generates answer based on context

```
User Question ‚Üí Retriever ‚Üí Relevant Docs ‚Üí LLM + Context ‚Üí Answer
```

**Why RAG?**
- LLMs have knowledge cutoff dates (can't know recent info)
- LLMs may hallucinate (make up facts)
- RAG grounds responses in YOUR actual documents

## 1 - Environment Setup & Imports

In [2]:
# ============================================================================
# IMPORT LIBRARIES - Each library serves a specific purpose in our RAG pipeline
# ============================================================================

# -----------------------------------------------------------------------------
# STANDARD PYTHON LIBRARIES
# -----------------------------------------------------------------------------
# os: Interact with operating system (file paths, environment variables)
# Example: os.getenv('API_KEY') gets environment variable
import os

# pathlib: Modern way to handle file paths (cross-platform compatible)
# Example: Path('/home/user') / 'documents' / 'file.txt' = '/home/user/documents/file.txt'
import pathlib

# textwrap: Format text for display (wrap long lines, dedent)
# Example: textwrap.wrap('long text...', width=50) breaks into lines of 50 chars
import textwrap

# glob: Find files matching a pattern (like shell wildcards)
# Example: glob.glob('*.pdf') finds all PDF files in current directory
import glob

# -----------------------------------------------------------------------------
# DOCUMENT LOADERS - Load different types of documents
# -----------------------------------------------------------------------------
# These loaders convert raw files into LangChain Document objects
# Each Document has: page_content (text) + metadata (source, page number, etc.)

# UnstructuredURLLoader: Fetches and parses web pages into text
# Example: loader = UnstructuredURLLoader(['https://example.com'])
from langchain_community.document_loaders import UnstructuredURLLoader

# TextLoader: Loads plain text files (.txt)
# Example: loader = TextLoader('readme.txt')
from langchain_community.document_loaders import TextLoader

# PyPDFLoader: Extracts text from PDF files (page by page)
# Example: loader = PyPDFLoader('manual.pdf')
from langchain_community.document_loaders import PyPDFLoader

# -----------------------------------------------------------------------------
# TEXT SPLITTER - Break documents into smaller chunks
# -----------------------------------------------------------------------------
# Why split? LLMs have context limits, and smaller chunks = better retrieval
# RecursiveCharacterTextSplitter tries to split at natural boundaries:
# First by paragraphs (\n\n), then sentences (\n), then words
from langchain.text_splitter import RecursiveCharacterTextSplitter

# -----------------------------------------------------------------------------
# VECTOR STORE - Store and search embeddings efficiently
# -----------------------------------------------------------------------------
# FAISS (Facebook AI Similarity Search) is a library for similarity search
# It stores vectors and finds nearest neighbors very fast
# Example: Given query vector, find 5 most similar document vectors
from langchain.vectorstores import FAISS

# -----------------------------------------------------------------------------
# EMBEDDINGS - Convert text to numerical vectors
# -----------------------------------------------------------------------------
# Embeddings capture semantic meaning: similar meanings = similar vectors
# "king" - "man" + "woman" ‚âà "queen" (famous example of semantic arithmetic)

# OpenAIEmbeddings: Uses OpenAI's API (requires API key, costs money)
from langchain.embeddings import OpenAIEmbeddings

# HuggingFaceEmbeddings: Uses local HuggingFace models (free, runs locally)
from langchain.embeddings import HuggingFaceEmbeddings

# SentenceTransformerEmbeddings: Another wrapper for sentence-transformers
from langchain.embeddings import SentenceTransformerEmbeddings

# -----------------------------------------------------------------------------
# LLM (Large Language Model) - The "brain" that generates responses
# -----------------------------------------------------------------------------
# Ollama: Runs open-source LLMs locally on your machine
# No API key needed, privacy-friendly, works offline
from langchain.llms import Ollama

# -----------------------------------------------------------------------------
# CHAINS - Connect components together into a pipeline
# -----------------------------------------------------------------------------
# ConversationalRetrievalChain: Combines retriever + LLM + memory
# It handles: query ‚Üí retrieve docs ‚Üí format prompt ‚Üí generate answer
from langchain.chains import ConversationalRetrievalChain

# -----------------------------------------------------------------------------
# PROMPTS - Templates that structure how we talk to the LLM
# -----------------------------------------------------------------------------
# PromptTemplate: Create reusable prompts with variables
# Example: PromptTemplate("Hello {name}!") ‚Üí "Hello Alice!" when name="Alice"
from langchain.prompts import PromptTemplate

print("‚úÖ Libraries imported! You're good to go!")

  from .autonotebook import tqdm as notebook_tqdm


‚úÖ Libraries imported! You're good to go!


## 2 - Data Preparation

### Theory: The Document Pipeline
```
Raw Files (PDF/HTML/TXT)
         ‚Üì
    [LOAD] ‚Üí Document objects with text + metadata
         ‚Üì
    [CHUNK] ‚Üí Smaller pieces (300-500 tokens each)
         ‚Üì
Ready for embedding!
```

### 2.1 - Ingest source documents (Load PDFs)

In [3]:
# ============================================================================
# STEP 2.1: LOAD PDF DOCUMENTS
# ============================================================================
# Goal: Read all PDF files and extract their text content
#
# How glob works:
# - glob.glob("pattern") returns list of matching file paths
# - "data/Everstorm_*.pdf" matches any PDF starting with "Everstorm_"
# - Example matches: data/Everstorm_Shipping.pdf, data/Everstorm_Returns.pdf
# ============================================================================

# Find all PDF files matching our pattern
# The * is a wildcard that matches any characters
pdf_paths = glob.glob("data/Everstorm_*.pdf")

# Initialize empty list to store all document pages
# Each page becomes a separate Document object
raw_docs = []

# -----------------------------------------------------------------------------
# SOLUTION: Loop through each PDF and load its pages
# -----------------------------------------------------------------------------
# PyPDFLoader.load() returns a list of Document objects (one per page)
# Each Document has:
#   - page_content: The actual text from that page
#   - metadata: Dict with 'source' (file path) and 'page' (page number)
#
# Example Document:
# Document(
#     page_content="Welcome to our store...",
#     metadata={'source': 'data/Everstorm_Shipping.pdf', 'page': 0}
# )
# -----------------------------------------------------------------------------

for pdf_path in pdf_paths:
    # Create a loader for this specific PDF file
    # PyPDFLoader uses the PyPDF library under the hood
    loader = PyPDFLoader(pdf_path)
    print(loader.load())
    
    # Load all pages from the PDF and add to our collection
    # .load() reads the file, extracts text from each page
    # .extend() adds all items from the list (vs .append() which adds the list itself)
    raw_docs.extend(loader.load())

# Verify loading worked
print(f"Loaded {len(raw_docs)} PDF pages from {len(pdf_paths)} files.")

# Let's peek at the first document to understand the structure
for raw_doc in raw_docs:
    print(f"\nüìÑ Sample document metadata: {raw_doc.metadata}")
    print(f"üìù First 200 chars: {raw_doc.page_content[:200]}...")

Ignoring wrong pointing object 80 0 (offset 0)
Ignoring wrong pointing object 80 0 (offset 0)


[Document(metadata={'producer': 'Skia/PDF m138 Google Docs Renderer', 'creator': 'PyPDF', 'creationdate': '', 'title': 'Shipping_and_Delivery_Policy', 'source': 'data/Everstorm_Shipping_and_Delivery_Policy.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}, page_content='Everstorm  Outfitters    SHIPPING  &  DELIVERY  POLICY    Revision  4.0  ‚Äî  Effective  18  May  2025   1\u2002Who  We  Ship  To    ‚Ä¢  United  States,  Canada,  EU/EEA,  UK,  Australia,  New  Zealand,  Japan,  Singapore    ‚Ä¢  No  PO  boxes,  freight  forwarders,  or  sanctioned  destinations  (OFAC  list).   2\u2002Fulfilment  Centers    ‚Ä¢  Reno,  NV  (US  West)\u2003‚Ä¢  Harrisburg,  PA  (US  East)\u2003‚Ä¢  Rotterdam,  NL  (EU)    Orders  route  automatically  to  the  closest  node  with  stock.   3\u2002Processing  Times    Mon‚ÄìFri  orders  placed  before  14:00  local  warehouse  time  ship  the  same  day.    Weekend  orders  ship  Monday,  except  public  holidays.   4\u2002Service  Levels  &  Transi

Ignoring wrong pointing object 76 0 (offset 0)
Ignoring wrong pointing object 76 0 (offset 0)


[Document(metadata={'producer': 'Skia/PDF m138 Google Docs Renderer', 'creator': 'PyPDF', 'creationdate': '', 'title': 'Product_sizing_and_care_guide', 'source': 'data/Everstorm_Product_sizing_and_care_guide.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}, page_content='Everstorm  Outfitters    PRODUCT  SIZING  &  CARE  GUIDE    Rev  2.1  ‚Äî  18  May  2025   A\u2002Apparel  Size  Charts  (inches)   |  Unisex  Tee  |  Chest  |  Body  Length  |  |------------|-------|-------------|  |  XS          |  34     |  27           |  |  S           |  36     |  28           |  |  M           |  40     |  29           |  |  L           |  44     |  30           |  |  XL          |  48     |  31           |  |  XXL         |  52     |  32           |   Fit  note:  Tees  are  athletic-cut;  size  up  for  a  relaxed  fit.   B\u2002Outerwear  Measurements   |  Jacket  Size  |  Chest  (in)  |  Sleeve  (from  center  back)  |  |-------------|-----------|---------------------------|  |  S       

Ignoring wrong pointing object 81 0 (offset 0)


[Document(metadata={'producer': 'Skia/PDF m138 Google Docs Renderer', 'creator': 'PyPDF', 'creationdate': '', 'title': 'Payment_refund_and_security', 'source': 'data/Everstorm_Payment_refund_and_security.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}, page_content='FAQs  \n Which  payment  methods  do  you  accept?   Visa,  MasterCard,  AmEx,  Discover,  Apple  Pay,  Google  Pay,  PayPal,  Shop  Pay.  Installments  (US  \nonly),\n \nKlarna\n \nPay-in-4\n \n(selected\n \nEU\n \ncountries).\n  What  is  3-D  Secure  and  why  did  I  see  a  pop-up?    3-D  Secure  (also  ‚ÄúVerified  by  Visa‚Äù  /  ‚ÄúMastercard  Identity  Check‚Äù)  adds  an  extra  one-time  code  \nfor\n \nEU\n \nPSD2\n \ncompliance.\n \nYour\n \nbank\n \ncontrols\n \nthat\n \npop-up.\n  Is  my  data  safe?   All  checkout  traffic  uses  TLS  1.3.  We  never  store  full  card  numbers.  Our  store  is  Level  1  \nPCI-DSS\n \ncompliant.\n  How  long  do  refunds  take?    We  issue  refunds  the  same  day 

Ignoring wrong pointing object 81 0 (offset 0)


[Document(metadata={'producer': 'Skia/PDF m138 Google Docs Renderer', 'creator': 'PyPDF', 'creationdate': '', 'title': 'Return_and_exchange_policy', 'source': 'data/Everstorm_Return_and_exchange_policy.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}, page_content='Everstorm  Outfitters    RETURN  &  EXCHANGE  POLICY    Document  ROX-2025-05   Easy-Fit  Promise    If  your  gear  doesn‚Äôt  fit  or  just  isn‚Äôt  your  vibe,  send  it  back  within  **30  days**  of  delivery  for  a  refund  or  free  size  exchange.   Eligibility  Checklist    ‚óè  Unworn,  unwashed,  no  odors,  tags  attached    ‚óè  Original  shoe  box  (footwear)  placed  inside  outer  carton    ‚óè  Electronics  (power-banks,  headlamps)  unopened  unless  faulty   How  to  Start    ‚óè  Visit  everstorm.example/returns  ‚Üí  enter  order  #  and  email.    ‚óè  Select  ‚ÄúRefund‚Äù  or  ‚ÄúExchange.‚Äù    ‚óè  Print  prepaid  label;  pack  securely.  Multiple  items  can  share  one  box.   Instant  Exch

### (Optional) 2.1b - Load web pages

In [8]:
# ============================================================================
# OPTIONAL: LOAD WEB PAGES
# ============================================================================
# UnstructuredURLLoader fetches HTML pages and extracts readable text
# It removes HTML tags, scripts, styles, leaving just the content
# 
# Note: Web scraping can fail due to:
#   - Network issues
#   - Blocked requests (rate limiting, bot detection)
#   - Page structure changes
# Always have fallback logic!
# ============================================================================

URLS = [
    # BigCommerce documentation about shipping
    "https://developer.bigcommerce.com/docs/store-operations/shipping",
    # BigCommerce documentation about refunds
    "https://developer.bigcommerce.com/docs/store-operations/orders/refunds",
]

try:
    # -----------------------------------------------------------------------------
    # SOLUTION: Load web pages using UnstructuredURLLoader
    # -----------------------------------------------------------------------------
    # Create loader with list of URLs to fetch
    url_loader = UnstructuredURLLoader(urls=URLS)
    
    # Fetch and parse all URLs (may take a few seconds)
    web_docs = url_loader.load()
    print(web_docs)
    
    # Add web documents to our raw_docs collection
    raw_docs.extend(web_docs)
    
    print(f"Fetched {len(web_docs)} documents from the web.")
    
except Exception as e:
    # If web fetch fails, fall back to local PDFs only
    print("‚ö†Ô∏è  Web fetch failed, using offline copies:", e)
    
    # -----------------------------------------------------------------------------
    # FALLBACK: Just use the PDFs we already loaded
    # -----------------------------------------------------------------------------
    # In production, you might load cached HTML files here
    print(f"Continuing with {len(raw_docs)} offline documents.")

[Document(metadata={'source': 'https://developer.bigcommerce.com/docs/store-operations/shipping'}, page_content='Home\n\nStore operations\n\nOverview\n\nStore Configuration\n\nProducts overview\n\nOverview\n\nProduct basic information\n\nProduct SEO information\n\nProduct variant options\n\nProduct modifier options\n\nProduct URL\n\nProduct attributes\n\nProduct custom fields\n\nProduct images\n\nContextual filters\n\nInventory adjustments\n\nInventory locations\n\nOverview\n\nComparison of operations\n\nOverview\n\nHow currencies work\n\nPrice lists\n\nPricing calculations\n\nOverview\n\nCurrency-specific promotions\n\nMulti-currency percentage promotions\n\nAPI and UI feature differences\n\nBrand promotions\n\nCategory promotions\n\nCustomer promotions\n\nOrder promotions\n\nProduct promotions\n\nShipping promotions\n\nStorewide promotions\n\nUsing logical operators\n\nUsing multiple rules\n\nStore configuration\n\nLocales configuration\n\nStore logs\n\nData layer\n\nOverview\n\nProd

### 2.2 - Chunk the text

### Theory: Why Chunking Matters

**Problem**: Documents can be thousands of tokens, but:
1. LLMs have context limits (e.g., 4K, 8K, 128K tokens)
2. Retrieval works better with focused chunks
3. Embedding quality degrades for very long texts

**Solution**: Split into 300-500 token chunks with overlap

```
Original: [========================================]
                          ‚Üì split with overlap
Chunks:   [=====]     ‚Üê Chunk 1
             [=====]  ‚Üê Chunk 2 (overlaps with 1)
                [=====] ‚Üê Chunk 3 (overlaps with 2)
```

**Why overlap?** Prevents cutting sentences in the middle!

In [7]:
# ============================================================================
# STEP 2.2: SPLIT DOCUMENTS INTO CHUNKS
# ============================================================================
# RecursiveCharacterTextSplitter tries to split at natural boundaries:
#   1. First tries to split at "\n\n" (paragraph breaks)
#   2. If still too long, splits at "\n" (line breaks)
#   3. If still too long, splits at " " (spaces between words)
#   4. Last resort: splits at character level
#
# Parameters:
#   - chunk_size: Maximum characters per chunk (300 = ~75 tokens)
#   - chunk_overlap: Characters shared between chunks (30 = ~7 tokens)
#
# Rule of thumb: 1 token ‚âà 4 characters in English
# ============================================================================

# Initialize empty list for our text chunks
chunks = []

# -----------------------------------------------------------------------------
# SOLUTION: Create splitter and split documents
# -----------------------------------------------------------------------------

# Create the text splitter with our chosen parameters
# chunk_size=300: Each chunk will be at most 300 characters
# chunk_overlap=30: Adjacent chunks share 30 characters (prevents lost context)
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,      # Max characters per chunk
    chunk_overlap=30,    # Overlap between chunks
    length_function=len, # How to measure length (character count)
)

# Split all documents into chunks
# .split_documents() preserves metadata from original documents
# Each chunk knows which source file it came from!
chunks = text_splitter.split_documents(raw_docs)

print(f"‚úÖ {len(chunks)} chunks ready for embedding")

# Let's see how chunking changed our data
print(f"\nüìä Before: {len(raw_docs)} documents")
print(f"üìä After: {len(chunks)} chunks")

# Peek at a sample chunk
if chunks:
    print(f"\nüìÑ Sample chunk (first 300 chars):")
    print(f"   {chunks[0].page_content[:300]}...")
    print(f"\nüìÑ Sample chunk (second 300 chars):")
    print(f"   {chunks[1].page_content[:300]}...")
    print(f"   Source: {chunks[0].metadata.get('source', 'Unknown')}")

‚úÖ 42 chunks ready for embedding

üìä Before: 8 documents
üìä After: 42 chunks

üìÑ Sample chunk (first 300 chars):
   Everstorm  Outfitters    SHIPPING  &  DELIVERY  POLICY    Revision  4.0  ‚Äî  Effective  18  May  2025   1‚ÄÇWho  We  Ship  To    ‚Ä¢  United  States,  Canada,  EU/EEA,  UK,  Australia,  New  Zealand,  Japan,  Singapore    ‚Ä¢  No  PO  boxes,  freight  forwarders,  or  sanctioned  destinations  (OFAC...

üìÑ Sample chunk (second 300 chars):
   destinations  (OFAC  list).   2‚ÄÇFulfilment  Centers    ‚Ä¢  Reno,  NV  (US  West)‚ÄÉ‚Ä¢  Harrisburg,  PA  (US  East)‚ÄÉ‚Ä¢  Rotterdam,  NL  (EU)    Orders  route  automatically  to  the  closest  node  with  stock.   3‚ÄÇProcessing  Times    Mon‚ÄìFri  orders  placed  before  14:00  local  warehouse  time  ship...
   Source: data/Everstorm_Shipping_and_Delivery_Policy.pdf


## 3 - Build a Retriever

### Theory: How Semantic Search Works

```
Text: "What is your return policy?"
         ‚Üì Embedding Model
Vector: [0.23, -0.45, 0.12, ..., 0.78]  (384 dimensions)
```

**Key Insight**: Similar meanings ‚Üí Similar vectors!
- "return policy" and "refund policy" have similar vectors
- "return policy" and "weather forecast" have different vectors

**Cosine Similarity**: Measures angle between vectors
- 1.0 = identical direction (same meaning)
- 0.0 = perpendicular (unrelated)
- -1.0 = opposite (opposite meaning)

### 3.1 - Load embedding model and test it

In [37]:
# ============================================================================
# STEP 3.1: CREATE EMBEDDING MODEL
# ============================================================================
# We use 'thenlper/gte-small' - a small but effective embedding model
#   - 33 million parameters (very lightweight)
#   - 384-dimensional embeddings
#   - Free to use, runs locally
#   - Good quality for its size
#
# Alternative models:
#   - 'all-MiniLM-L6-v2': Popular, 384 dims, very fast
#   - 'all-mpnet-base-v2': Better quality, 768 dims
#   - OpenAI 'text-embedding-3-small': Best quality, requires API key
# ============================================================================

# Initialize the embedding model
# First time running this downloads the model (~50MB)
# SentenceTransformerEmbeddings wraps the sentence-transformers library
embedder = SentenceTransformerEmbeddings(
    # model_name="thenlper/gte-small"  # Small, fast, effective model
    model_name="sentence-transformers/all-mpnet-base-v2"  # Small, fast, effective model
)

# -----------------------------------------------------------------------------
# SOLUTION: Test the embedding model
# -----------------------------------------------------------------------------
# embed_query() converts a single text string into a vector
# The vector is a list of floats representing semantic meaning

# Create embedding for a test sentence
test_text = "Hello world!"
embedding_vector = embedder.embed_query(test_text)

# Check the embedding dimensions
print(f"‚úÖ Embedding model loaded!")
print(f"üìä Text: '{test_text}'")
print(f"üìä Embedding dimension: {len(embedding_vector)}")
print(f"üìä First 5 values: {embedding_vector[:5]}")

# Bonus: Show how similar texts get similar embeddings
text_a = "What is your return policy?"
text_b = "How can I return an item?"
text_c = "I do not wanted to know any policy."

vec_a = embedder.embed_query(text_a)
vec_b = embedder.embed_query(text_b)
vec_c = embedder.embed_query(text_c)

# Simple cosine similarity calculation
import numpy as np
def cosine_similarity(v1, v2):
    return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))

print(f"\nüîç Similarity Demo:")
print(f"   '{text_a}' vs '{text_b}': {cosine_similarity(vec_a, vec_b):.3f}")
print(f"   '{text_a}' vs '{text_c}': {cosine_similarity(vec_a, vec_c):.3f}")


‚úÖ Embedding model loaded!
üìä Text: 'Hello world!'
üìä Embedding dimension: 768
üìä First 5 values: [0.019173715263605118, 0.028736531734466553, -0.012354082427918911, 0.015822136774659157, 0.0790899470448494]

üîç Similarity Demo:
   'What is your return policy?' vs 'How can I return an item?': 0.682
   'What is your return policy?' vs 'I do not wanted to know any policy.': 0.249


### 3.2 - Build FAISS vector database

In [23]:
# ============================================================================
# STEP 3.2: BUILD FAISS VECTOR INDEX
# ============================================================================
# FAISS (Facebook AI Similarity Search) efficiently stores and searches vectors
#
# How it works:
#   1. We give FAISS all our chunk embeddings
#   2. FAISS builds an optimized index structure
#   3. At query time, FAISS quickly finds the k nearest neighbors
#
# Why FAISS?
#   - Handles millions of vectors
#   - Very fast similarity search (milliseconds)
#   - Supports GPU acceleration
#   - Easy to save/load from disk
# ============================================================================

# -----------------------------------------------------------------------------
# SOLUTION: Create vector store from chunks
# -----------------------------------------------------------------------------

# Step 1: Build the FAISS index from documents
# FAISS.from_documents() does three things:
#   a) Embeds all chunk texts using our embedding model
#   b) Stores the vectors in a FAISS index
#   c) Links each vector to its original Document (text + metadata)
vectordb = FAISS.from_documents(
    documents=chunks,      # Our text chunks (list of Document objects)
    embedding=embedder     # The embedding model to use
)

# Step 2: Create a retriever from the vector store
# The retriever wraps the vector store with a simple interface
# search_kwargs={'k': 8} means return the 8 most similar chunks
retriever = vectordb.as_retriever(
    search_kwargs={'k': 8}  # Return top 8 most similar chunks
)

# Step 3: Save the index for later use (optional but recommended)
# This saves both the vectors and the document metadata
# Next time, you can load with: FAISS.load_local('faiss_index', embedder)
vectordb.save_local("faiss_index")

# Step 4: Verify the index was created correctly
print("‚úÖ Vector store with", vectordb.index.ntotal, "embeddings")
print(f"üìÅ Index saved to 'faiss_index/' directory")

# Test the retriever with a sample query
test_query = "What is the return policy?"
test_results = retriever.invoke(test_query)
print(f"\nüîç Test query: '{test_query}'")
print(f"üìä Retrieved {len(test_results)} chunks")
if test_results:
    print(f"üìÑ Top result preview: {test_results[0].page_content[:150]}...")

‚úÖ Vector store with 113 embeddings
üìÅ Index saved to 'faiss_index/' directory

üîç Test query: 'What is the return policy?'
üìä Retrieved 8 chunks
üìÑ Top result preview: and  custom-embroidered  items:  no  return  unless  defective.   Refund  Timeline    ‚óè  Warehouse  receipt  ‚Üí  inspection  ‚â§  3  business  days  ‚Üí  r...


## 4 - Build the Generation Engine

### Theory: Ollama Local LLM Server

Ollama runs LLMs locally on your machine:
- **No API key needed** - completely free
- **Privacy** - data never leaves your computer
- **Offline capable** - works without internet
- **Many models** - Gemma, Llama, Mistral, etc.

```bash
# Install (one-time)
curl -fsSL https://ollama.com/install.sh | sh

# Start server (keep running in background)
ollama serve

# Download model (one-time per model)
ollama pull gemma3:1b
```

### 4.1 - Test LLM with Ollama (Sanity Check)

In [24]:
# ============================================================================
# STEP 4: LOAD AND TEST THE LLM
# ============================================================================
# Make sure Ollama is running before executing this cell!
# In terminal: ollama serve
# In another terminal: ollama pull gemma3:1b
#
# Temperature controls randomness:
#   - 0.0 = deterministic (always same output)
#   - 0.1 = mostly consistent (good for factual Q&A)
#   - 0.7 = balanced (good for creative tasks)
#   - 1.0 = very random (highly creative/unpredictable)
# ============================================================================

# -----------------------------------------------------------------------------
# SOLUTION: Initialize and test the LLM
# -----------------------------------------------------------------------------

# Initialize Ollama LLM client
# model: which model to use (must be downloaded first)
# temperature: controls randomness (low = more consistent)
llm = Ollama(
    model="gemma3:latest",   # Gemma 3 Latest model
    temperature=0.1      # Low temp for factual responses
)

# Test with a simple prompt
# .invoke() sends the prompt to Ollama and returns the response
test_prompt = "What is 2 + 2? Answer in one word."
response = llm.invoke(test_prompt)

print("‚úÖ LLM is working!")
print(f"‚ùì Prompt: {test_prompt}")
print(f"ü§ñ Response: {response}")

  llm = Ollama(


‚úÖ LLM is working!
‚ùì Prompt: What is 2 + 2? Answer in one word.
ü§ñ Response: Four


## 5 - Build the RAG Chain

### Theory: The Complete RAG Pipeline

```
User Question: "What is your return policy?"
         ‚Üì
    [EMBED] ‚Üí Query vector
         ‚Üì
    [RETRIEVE] ‚Üí Top k similar chunks from FAISS
         ‚Üì
    [FORMAT PROMPT] ‚Üí System prompt + Context + Question
         ‚Üì
    [GENERATE] ‚Üí LLM creates answer from context
         ‚Üì
Answer: "You can return items within 30 days..."
```

### 5.1 - Define a system prompt

In [31]:
# ============================================================================
# STEP 5.1: DEFINE THE SYSTEM PROMPT
# ============================================================================
# The system prompt is CRITICAL for RAG quality!
# It tells the LLM:
#   1. What role to play (Customer Support Chatbot)
#   2. What information to use (only the provided context)
#   3. What to do when unsure (admit it doesn't know)
#   4. How to format the response (concise, cite sources)
#
# A good RAG prompt prevents hallucination by:
#   - Explicitly limiting the LLM to the provided context
#   - Giving a clear fallback for unknown questions
# ============================================================================

# Define the system prompt template
# {context} will be replaced with retrieved document chunks
# {question} will be replaced with the user's question
SYSTEM_TEMPLATE = """
You are a **Customer Support Chatbot** for Everstorm Outfitters.

IMPORTANT RULES:
1. Use ONLY the information in <context> to answer.
2. If the answer is NOT in the context, say: "I don't know based on the retrieved documents."
3. Be concise and accurate. Quote key phrases from the context when helpful.
4. When possible, cite the source document.
5. Do NOT make up information or use outside knowledge.
6. Answer must be in German.

<context>
{context}
</context>

USER QUESTION:
{question}

ASSISTANT:
"""

print("‚úÖ System prompt defined!")
print(f"üìù Template has {len(SYSTEM_TEMPLATE)} characters")
print("\nüìã Template preview:")
print(SYSTEM_TEMPLATE[:300] + "...")

‚úÖ System prompt defined!
üìù Template has 507 characters

üìã Template preview:

You are a **Customer Support Chatbot** for Everstorm Outfitters.

IMPORTANT RULES:
1. Use ONLY the information in <context> to answer.
2. If the answer is NOT in the context, say: "I don't know based on the retrieved documents."
3. Be concise and accurate. Quote key phrases from the context when he...


### 5.2 - Create the RAG chain

In [32]:
# ============================================================================
# STEP 5.2: BUILD THE RAG CHAIN
# ============================================================================
# ConversationalRetrievalChain connects all components:
#   - Retriever: finds relevant documents
#   - LLM: generates the answer
#   - Prompt: structures how we ask the LLM
#   - Memory: tracks conversation history (optional)
#
# Flow: question ‚Üí retrieve docs ‚Üí format prompt ‚Üí LLM ‚Üí answer
# ============================================================================

# -----------------------------------------------------------------------------
# SOLUTION: Create the complete RAG chain
# -----------------------------------------------------------------------------

# Step 1: Create a PromptTemplate from our system template
# input_variables tells LangChain which parts to fill in
rag_prompt = PromptTemplate(
    template=SYSTEM_TEMPLATE,
    input_variables=["context", "question"]  # Variables to fill in
)

# Step 2: Ensure LLM is initialized (re-initialize for clarity)
llm = Ollama(
    model="gemma3:latest",
    temperature=0.1  # Low temperature for consistent, factual answers
)

# Step 3: Build the ConversationalRetrievalChain
# This chains together: retriever ‚Üí prompt ‚Üí LLM
chain = ConversationalRetrievalChain.from_llm(
    llm=llm,                              # The language model
    retriever=retriever,                  # Our FAISS retriever
    return_source_documents=True,         # Return the docs used (for debugging)
    combine_docs_chain_kwargs={
        "prompt": rag_prompt              # Our custom prompt template
    }
)

print("‚úÖ RAG chain created!")
print("üîó Components connected: Retriever ‚Üí Prompt ‚Üí LLM")

‚úÖ RAG chain created!
üîó Components connected: Retriever ‚Üí Prompt ‚Üí LLM


### 5.3 - Validate the RAG chain

In [33]:
# ============================================================================
# STEP 5.3: TEST THE RAG CHAIN WITH SAMPLE QUESTIONS
# ============================================================================
# chat_history tracks previous Q&A pairs for context
# Format: list of (question, answer) tuples
# This allows follow-up questions like "Can you tell me more about that?"
# ============================================================================

# Sample questions to test our chatbot
test_questions = [
    "If I'm not happy with my purchase, what is your refund policy and how do I start a return?",
    "How long will delivery take for a standard order, and where can I track my package once it ships?",
    "What's the quickest way to contact your support team, and what are your operating hours?",
]

# -----------------------------------------------------------------------------
# SOLUTION: Test the RAG chain
# -----------------------------------------------------------------------------

# Initialize empty chat history
# As we ask questions, we'll add (question, answer) tuples
chat_history = []

print("="*60)
print("ü§ñ RAG CHATBOT TEST")
print("="*60)

# Loop through each test question
for i, question in enumerate(test_questions, 1):
    print(f"\n‚ùì Question {i}: {question}")
    print("-"*40)
    
    # Invoke the chain with the question and chat history
    # The chain returns a dict with 'answer' and 'source_documents'
    result = chain.invoke({
        "question": question,
        "chat_history": chat_history
    })
    
    # Extract the answer
    answer = result["answer"]
    
    # Print the answer
    print(f"ü§ñ Answer: {answer}")
    
    # Show which sources were used (helpful for debugging)
    if "source_documents" in result:
        sources = set([doc.metadata.get('source', 'Unknown') for doc in result["source_documents"]])
        print(f"üìö Sources: {', '.join([s.split('/')[-1] for s in sources])}")
    
    # Update chat history for context in follow-up questions
    chat_history.append((question, answer))

print("\n" + "="*60)
print("‚úÖ RAG chatbot test complete!")
print("="*60)

ü§ñ RAG CHATBOT TEST

‚ùì Question 1: If I'm not happy with my purchase, what is your refund policy and how do I start a return?
----------------------------------------
ü§ñ Answer: Wenn Sie mit Ihrem Kauf nicht zufrieden sind, erhalten Sie f√ºr Geschenke Gutschein-Credits. ‚Äú**Refund** Timeline‚Äù ‚óè ‚ÄúWarehouse receipt ‚Üí inspection ‚â§ 3 business days ‚Üí refund initiates.‚Äù (Dokument 1) Um einen R√ºckgabeantrag zu stellen, w√§hlen Sie ‚ÄúRefund‚Äù oder ‚ÄúExchange‚Äù und drucken Sie die Prepaid-Etikett. (Dokument 2)
üìö Sources: Everstorm_Return_and_exchange_policy.pdf, Everstorm_Payment_refund_and_security.pdf, refunds

‚ùì Question 2: How long will delivery take for a standard order, and where can I track my package once it ships?
----------------------------------------
ü§ñ Answer: Ich wei√ü das nicht anhand der abgerufenen Dokumente.
üìö Sources: Everstorm_Product_sizing_and_care_guide.pdf, Everstorm_Return_and_exchange_policy.pdf, shipping, refunds

‚ùì Question 3: W

## 6 - Build Streamlit UI (Optional)

This creates a web interface for your chatbot. Run with: `streamlit run app.py`

In [17]:
# ============================================================================
# STEP 6: CREATE STREAMLIT WEB APP
# ============================================================================
# This cell writes a complete Streamlit app to app.py
# Run it with: streamlit run app.py
# ============================================================================

streamlit_code = '''
# ===========================================================================
# STREAMLIT RAG CHATBOT APP
# ===========================================================================
# A simple web interface for our customer support chatbot
# Run with: streamlit run app.py
# ===========================================================================

import streamlit as st
from langchain.vectorstores import FAISS
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.llms import Ollama
from langchain.chains import ConversationalRetrievalChain
from langchain.prompts import PromptTemplate

# ---------------------------------------------------------------------------
# PAGE CONFIGURATION
# ---------------------------------------------------------------------------
st.set_page_config(
    page_title="Customer Support Chatbot",
    page_icon="üõçÔ∏è",
    layout="centered"
)

st.title("üõçÔ∏è Everstorm Outfitters Support")
st.caption("Ask me about shipping, returns, payments, and more!")

# ---------------------------------------------------------------------------
# LOAD RAG COMPONENTS (cached for performance)
# ---------------------------------------------------------------------------
@st.cache_resource
def load_chain():
    """Load and cache the RAG chain components."""
    # Load embeddings model
    embedder = SentenceTransformerEmbeddings(model_name="thenlper/gte-small")
    
    # Load saved FAISS index
    vectordb = FAISS.load_local(
        "faiss_index", 
        embedder,
        allow_dangerous_deserialization=True  # Required for loading pickle files
    )
    retriever = vectordb.as_retriever(search_kwargs={"k": 8})
    
    # Initialize LLM
    llm = Ollama(model="gemma3:1b", temperature=0.1)
    
    # System prompt
    SYSTEM_TEMPLATE = """
    You are a helpful Customer Support Chatbot for Everstorm Outfitters.
    
    Rules:
    1. Use ONLY the provided context to answer.
    2. If unsure, say "I don\'t know based on the documents."
    3. Be concise and helpful.
    
    Context: {context}
    
    Question: {question}
    """
    
    prompt = PromptTemplate(
        template=SYSTEM_TEMPLATE,
        input_variables=["context", "question"]
    )
    
    # Build chain
    chain = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=retriever,
        return_source_documents=True,
        combine_docs_chain_kwargs={"prompt": prompt}
    )
    return chain

# Load the chain
chain = load_chain()

# ---------------------------------------------------------------------------
# CHAT INTERFACE
# ---------------------------------------------------------------------------

# Initialize session state for chat history
if "messages" not in st.session_state:
    st.session_state.messages = []
if "chat_history" not in st.session_state:
    st.session_state.chat_history = []

# Display chat messages
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# Chat input
if prompt := st.chat_input("Ask a question about our policies..."):
    # Add user message to chat
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)
    
    # Generate response
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
            result = chain.invoke({
                "question": prompt,
                "chat_history": st.session_state.chat_history
            })
            response = result["answer"]
            st.markdown(response)
    
    # Update history
    st.session_state.messages.append({"role": "assistant", "content": response})
    st.session_state.chat_history.append((prompt, response))

# Sidebar with info
with st.sidebar:
    st.header("About")
    st.write("This chatbot answers questions using RAG.")
    st.write("**Powered by:**")
    st.write("- ü¶ú LangChain")
    st.write("- üìä FAISS")
    st.write("- ü§ñ Gemma 3 (via Ollama)")
    
    if st.button("Clear Chat"):
        st.session_state.messages = []
        st.session_state.chat_history = []
        st.rerun()
'''

# Write the Streamlit app to a file
with open("app.py", "w") as f:
    f.write(streamlit_code)

print("‚úÖ Streamlit app saved to app.py")
print("\nüöÄ To run the chatbot UI:")
print("   streamlit run app.py")

‚úÖ Streamlit app saved to app.py

üöÄ To run the chatbot UI:
   streamlit run app.py


## üéâ Congratulations!

You've built a complete **RAG-based customer support chatbot**!

### What You Learned:

| Concept | What It Does | Tool Used |
|---------|--------------|----------|
| **Document Loading** | Extracts text from PDFs/URLs | PyPDFLoader, UnstructuredURLLoader |
| **Chunking** | Splits text into searchable pieces | RecursiveCharacterTextSplitter |
| **Embeddings** | Converts text to semantic vectors | SentenceTransformerEmbeddings |
| **Vector Store** | Stores & searches embeddings | FAISS |
| **LLM** | Generates natural language answers | Ollama (Gemma 3) |
| **RAG Chain** | Connects retrieval to generation | ConversationalRetrievalChain |

### Next Steps:
1. Try different embedding models
2. Experiment with chunk sizes
3. Test different LLMs (Llama, Mistral)
4. Add more documents to your knowledge base