# Wood Pecker Retail Chatbot
This notebook walks through building a minimal-yet-practical chatbot interface for a small retail store. The focus is on clarity, lightweight dependencies, and leaving obvious hooks for future ML-powered upgrades.

## Learning Goals and Scope
- stay manual for now: fixed responses for 15-20 common retail questions (orders, sales, returns, hours, locations, shipping, loyalty, etc.)
- surface explainable logic so students can see how answers are picked and where ML would slot in later
- keep the UI lightweight by using Gradio only (pure Python, no Node toolchain) while still giving clear input/output coloration
- call out optional improvements (RAG, intent classifiers, analytics) to encourage experimentation once the basics feel comfortable

## Design Outline
1. Capture a small FAQ-style knowledge base with 18 everyday customer questions covering orders, shipping, hours, promos, gift cards, memberships, etc.
2. Route each user prompt to a simple keyword-overlap scorer so the logic stays transparent; default with a friendly fallback when nothing matches.
3. Build a Gradio Blocks UI where the input textbox and the chatbot transcript use contrasting colors for the user vs Wood Pecker messages.
4. Provide helper hooks (`set_ml_handler`) so a future ML model or API can override the rule-based answerer without reworking the UI.
5. Document improvement ideas (semantic search, analytics, context windows) for when students are ready to go beyond the manual baseline.

In [None]:
# Minimal dependency check: Gradio gives us a quick UI without extra frontend tooling.
import importlib  # Used to check if a library is installed
import subprocess  # Used to run shell commands (like pip install)
import sys  # Used to access system-specific parameters (like the python executable path)


def ensure(package: str) -> None:
    """Install `package` via pip if it is missing."""
    if importlib.util.find_spec(package) is None:
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])


ensure("gradio")
ensure("transformers")
ensure("nltk")
ensure("spacy")

import gradio as gr  # Gradio: A library for creating machine learning demos and web UIs quickly
from transformers import pipeline  # Transformers: Provides state-of-the-art pre-trained models for NLP tasks

GREETING_MESSAGE = "Hello, how can I help you today?"

  from .autonotebook import tqdm as notebook_tqdm


In [6]:
import nltk  # NLTK (Natural Language Toolkit): A suite of libraries for symbolic and statistical NLP
import spacy  # Spacy: An industrial-strength NLP library for advanced text processing

# Download NLTK data
try:
    nltk.data.find('tokenizers/punkt_tab')
except LookupError:
    print("Downloading NLTK 'punkt_tab'...")
    nltk.download('punkt_tab')

try:
    nltk.data.find('taggers/averaged_perceptron_tagger_eng')
except LookupError:
    print("Downloading NLTK 'averaged_perceptron_tagger_eng'...")
    nltk.download('averaged_perceptron_tagger_eng')

# Download Spacy model if needed
try:
    nlp = spacy.load("en_core_web_sm")
except OSError:
    print("Downloading Spacy model 'en_core_web_sm'...")
    subprocess.check_call([sys.executable, "-m", "spacy", "download", "en_core_web_sm"])
    nlp = spacy.load("en_core_web_sm")

print("NLP components loaded successfully!")

NLP components loaded successfully!


In [None]:
from datetime import datetime  # Used for handling date and time objects (e.g., timestamps)

faq_items = [
    {
        "question": "What are your store hours?",
        "answer": "We are open 10 AM - 8 PM Monday through Saturday, and 11 AM - 6 PM on Sundays.",
        "keywords": ["hours", "open", "close", "time"],
    },
    {
        "question": "Where is the store located?",
        "answer": "Wood Pecker is inside the Riverwalk Mall, first floor near the fountain.",
        "keywords": ["where", "location", "address", "store"],
    },
    {
        "question": "Do you offer curbside pickup?",
        "answer": "Yes, select curbside pickup at checkout and call us when you arrive at bay C.",
        "keywords": ["curbside", "pickup", "collect"],
    },
    {
        "question": "How can I check my order status?",
        "answer": "Go to woodpecker.com/orders and enter your email plus order number to see live status.",
        "keywords": ["order", "status", "track"],
    },
    {
        "question": "What are the shipping options?",
        "answer": "Standard (5-7 days), Expedited (2-3 days), and Overnight are available at checkout.",
        "keywords": ["shipping", "delivery", "ship"],
    },
    {
        "question": "How long does shipping take?",
        "answer": "Standard shipping arrives within 5-7 business days; peak holidays may add a day.",
        "keywords": ["how long", "shipping", "arrive", "time"],
    },
    {
        "question": "Can I change my shipping address after ordering?",
        "answer": "We can update the address within 30 minutes of purchase—contact support via chat or phone.",
        "keywords": ["change", "address", "order"],
    },
    {
        "question": "Do you have any sales right now?",
        "answer": "Weekly specials post every Friday on our homepage plus 15% off clearance in-store.",
        "keywords": ["sale", "discount", "deal", "promo"],
    },
    {
        "question": "How do returns work?",
        "answer": "Returns are free within 30 days with receipt; bring the item or ship it back with the prepaid label.",
        "keywords": ["return", "policy", "refund"],
    },
    {
        "question": "Can I exchange a gift?",
        "answer": "Absolutely—bring the gift receipt or sender name within 60 days for an exchange or store credit.",
        "keywords": ["exchange", "gift"],
    },
    {
        "question": "Do you sell gift cards?",
        "answer": "Gift cards are available from $10 to $500 both online and at the register.",
        "keywords": ["gift", "card", "voucher"],
    },
    {
        "question": "How do I apply a promo code?",
        "answer": "Enter the code in the Promo Code box at checkout; it will show the discount instantly.",
        "keywords": ["promo", "code", "coupon"],
    },
    {
        "question": "What sizes do you carry?",
        "answer": "Most apparel ranges from XS-3XL, and footwear spans sizes 5-13 for adults.",
        "keywords": ["size", "sizing", "fit"],
    },
    {
        "question": "Is there a loyalty program?",
        "answer": "Yes—Woodland Rewards gives 1 point per dollar and $5 back for every 100 points.",
        "keywords": ["loyalty", "membership", "rewards"],
    },
    {
        "question": "Can I talk to a human agent?",
        "answer": "Sure thing. Call 800-555-7425 or use the Help > Live Chat option for a teammate.",
        "keywords": ["agent", "human", "representative", "contact"],
    },
    {
        "question": "Do you offer repairs or alterations?",
        "answer": "Basic hemming and zipper repairs are available for $15—drop items at the service desk.",
        "keywords": ["repair", "alteration", "tailor"],
    },
    {
        "question": "How do I cancel an order?",
        "answer": "You can cancel within 30 minutes via your order page or by calling customer care.",
        "keywords": ["cancel", "order"],
    },
    {
        "question": "Are holiday items in stock?",
        "answer": "Seasonal decor is refreshed every Thursday; check the Holiday tab for live inventory badges.",
        "keywords": ["holiday", "seasonal", "stock"],
    },
    {
        "question": "Do you price match?",
        "answer": "Yes, we match major retailers within 14 days of purchase—bring the ad or link.",
        "keywords": ["price match", "match", "price"],
    },
    {
        "question": "What payment methods are accepted?",
        "answer": "We accept major cards, PayPal, Apple Pay, Google Pay, and contactless in-store.",
        "keywords": ["payment", "pay", "methods"],
    },
    {
        "question": "How do I get notified about new products?",
        "answer": "Subscribe to our newsletter or enable push alerts in the Wood Pecker mobile app.",
        "keywords": ["notify", "new", "products", "newsletter"],
    },
    {
        "question": "Can I buy online and return in store?",
        "answer": "Yes, bring the packing slip or email receipt to any Wood Pecker location.",
        "keywords": ["buy online", "return", "store"],
    },
    {
        "question": "Is same-day delivery available?",
        "answer": "Same-day courier delivery is available within 15 miles for $12 on orders placed before 2 PM.",
        "keywords": ["same day", "delivery", "courier"],
    },
    {
        "question": "Do you have eco-friendly packaging?",
        "answer": "We default to recyclable packaging and offer a $1 rebate when you opt for minimal packing.",
        "keywords": ["eco", "sustainable", "packaging"],
    },
    {
        "question": "Can I reserve items to try later?",
        "answer": "Use the Reserve & Try feature online to hold items for 24 hours at your chosen store.",
        "keywords": ["reserve", "hold", "try"],
    },
    {
        "question": "How do I contact support after hours?",
        "answer": "Leave us a voicemail at 800-555-7425 or email support@woodpecker.com—we reply next morning.",
        "keywords": ["support", "after hours", "contact"],
    },
    {
        "question": "Where can I see my loyalty balance?",
        "answer": "Log in to your Wood Pecker account and look for the Rewards tab for real-time points.",
        "keywords": ["loyalty", "points", "balance"],
    },
    {
        "question": "What is Wood Pecker's return address?",
        "answer": "Ship returns to Wood Pecker Returns, 422 Orchard Lane, Columbus, OH 43215.",
        "keywords": ["return", "address", "ship back"],
    },
    {
        "question": "Do you assemble furniture?",
        "answer": "We partner with HandyCo for in-home assembly starting at $49; schedule at checkout.",
        "keywords": ["assemble", "assembly", "furniture"],
    },
]

## Component Explanations
- `faq_items`: manual knowledge base with 26 everyday retail prompts so students can inspect every rule.
- `SimpleRetailBrain`: keyword overlap scoring keeps reasoning auditable; `set_ml_handler` is the placeholder for future models.
- `sentiment_analyzer`: Hugging Face pipeline that powers the optional transformers mode so students can compare manual vs ML.
- `format_answer`: wraps each response with timing + provenance so testers know whether the FAQ, fallback, or ML hook answered.
- `respond`: glue code between Gradio and both logic paths—easy to extend with analytics or context windows.
- Gradio Blocks UI: fastest way to get a colored chat interface without custom JS; CSS snippets apply different palettes to the shopper vs Wood Pecker.

### NER Reference Snapshot
- Latest spaCy `en_core_web_sm` (v3.x) is trained to recognize the standard OntoNotes labels: PERSON, NORP, FAC, ORG, GPE, LOC, PRODUCT, EVENT, WORK_OF_ART, LAW, LANGUAGE, DATE, TIME, PERCENT, MONEY, QUANTITY, ORDINAL, and CARDINAL.
- `PRODUCT` and `EVENT` are the key buckets for hardware/software (e.g., "Galaxy S24", "software update") and happenings (launches, outages, conferences).
- Small models can miss lowercase nouns like "phone" or "update", so we pair NER with lightweight noun-chunk heuristics to keep device/event mentions visible for learners.

In [17]:
from typing import Callable, Dict, List, Optional, Tuple  # Typing: Provides support for type hints to improve code readability and debugging


class SimpleRetailBrain:
    """Transparent keyword-based router with an optional ML override hook."""

    def __init__(self, faqs: List[Dict[str, List[str]]]):
        self.faqs = faqs
        self.ml_handler: Optional[Callable[[str], Optional[str]]] = None

    def set_ml_handler(self, handler: Callable[[str], Optional[str]]) -> None:
        """Allow future ML models (LLMs, vector search, etc.) to override answers."""
        self.ml_handler = handler

    def _score(self, normalized_prompt: str, keywords: List[str]) -> int:
        return sum(1 for keyword in keywords if keyword in normalized_prompt)

    def answer(self, prompt: str) -> Tuple[str, str]:
        normalized = prompt.lower().strip()
        if self.ml_handler:  # plug in semantic search or an LLM later
            ml_answer = self.ml_handler(prompt)
            if ml_answer:
                return ml_answer, "ml_handler"

        best_answer = None
        best_question = None
        best_score = 0
        for item in self.faqs:
            score = self._score(normalized, item["keywords"])
            if score > best_score:
                best_score = score
                best_answer = item["answer"]
                best_question = item["question"]

        if best_answer is None:
            fallback = (
                "I do not have that in my notes yet. Could you try rephrasing or "
                "contact our team at support@woodpecker.com?"
            )
            return fallback, "fallback"

        return best_answer, best_question or "faq"


brain = SimpleRetailBrain(faq_items)

In [18]:
import nltk
import spacy
from collections import defaultdict
from transformers import pipeline

# --- 1. Setup & Initialization ---
# Load models once at the start. Rely on spaCy's latest `en_core_web_sm` to surface all OntoNotes labels.

sentiment_analyzer = pipeline(
    "sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english",
)

try:
    nlp = spacy.load("en_core_web_sm")
except OSError:
    from spacy.cli import download
    download("en_core_web_sm")
    nlp = spacy.load("en_core_web_sm")

nltk.download("punkt_tab", quiet=True)
nltk.download("averaged_perceptron_tagger_eng", quiet=True)

# --- 2. Helper Functions ---

def format_answer(answer: str, source: str) -> str:
    """Adds a timestamp and source label to the bot's reply."""
    timestamp = datetime.now().strftime("%H:%M:%S")
    return f"[{timestamp}] ({source}) {answer}"


def summarize_entities(doc: spacy.tokens.Doc) -> Dict[str, List[str]]:
    """Group detected entities by their spaCy labels (PRODUCT, EVENT, etc.)."""
    summary: Dict[str, List[str]] = defaultdict(list)
    for ent in doc.ents:
        summary[ent.label_].append(ent.text)
    # Deduplicate while preserving order for readability
    return {label: sorted(set(values), key=values.index) for label, values in summary.items()}


def analyze_text_simple(text: str) -> str:
    """
    Runs a simple NLP pipeline for educational purposes.
    Steps: Tokenize -> POS Tag -> NER -> Sentiment
    """
    if not text:
        return "Please enter text to analyze."

    # 1. Tokenization (NLTK)
    tokens = nltk.word_tokenize(text)

    # 2. POS Tagging (NLTK)
    pos_map = {"NN": "Noun", "VB": "Verb", "JJ": "Adj", "DT": "Det"}
    tagged = nltk.pos_tag(tokens)
    readable_pos = [f"{word} ({pos_map.get(tag[:2], tag)})" for word, tag in tagged]

    # 3. NER & Key Phrases (spaCy)
    doc = nlp(text)
    entities_by_label = summarize_entities(doc)
    noun_chunks = [chunk.text for chunk in doc.noun_chunks]

    # 4. Sentiment (Transformers)
    sentiment = sentiment_analyzer(text)[0]

    return (
        f"### Analysis of: '{text}'\n\n"
        f"**1. Tokens:** {tokens}\n\n"
        f"**2. POS Tags:** {', '.join(readable_pos)}\n\n"
        f"**3. Entities (NER grouped by label):** {entities_by_label or 'None'}\n"
        f"   *Key Phrases:* {noun_chunks}\n\n"
        f"**4. Sentiment:** {sentiment['label']} ({sentiment['score']:.2f})"
    )

# --- 3. Chatbot Logic ---

def respond(message: str, history: List[Dict[str, str]], mode: str):
    """Main function called by Gradio when user sends a message."""
    history = history or [{"role": "assistant", "content": GREETING_MESSAGE}]

    if mode == "Sentiment pipeline":
        result = sentiment_analyzer(message)[0]
        reply = f"Sentiment: {result['label']} (Score: {result['score']:.2f})"
        source = "sentiment-model"
    else:
        reply, source = brain.answer(message)

    history.append({"role": "user", "content": message})
    history.append({"role": "assistant", "content": format_answer(reply, source)})
    return history, ""


def reset_chat():
    return [{"role": "assistant", "content": GREETING_MESSAGE}], ""

# --- 4. UI Configuration ---
CUSTOM_CSS = """
#chatbot .message.user { background-color: #fff4cf; color: #663c00; }
#chatbot .message.bot { background-color: #dceeff; color: #003355; }
"""
STYLE_TAG = f"<style>{CUSTOM_CSS}</style>"


Device set to use cpu



--- Analyzing: 'My Samsung phone battery drains too fast after the last software update' ---
1. Tokenization:
   ['My', 'Samsung', 'phone', 'battery', 'drains', 'too', 'fast', 'after', 'the', 'last', 'software', 'update']
2. POS Tagging:
   [('My', 'PRP$'), ('Samsung', 'NNP'), ('phone', 'NN'), ('battery', 'NN'), ('drains', 'VBZ'), ('too', 'RB'), ('fast', 'RB'), ('after', 'IN'), ('the', 'DT'), ('last', 'JJ'), ('software', 'NN'), ('update', 'NN')]
3a. NER:
   [('Samsung', 'ORG')]
3b. Key Phrases (Noun Chunks):
   ['My Samsung phone battery', 'the last software update']
4. Sentiment Analysis:
   Label: NEGATIVE, Score: 0.9993


In [None]:
with gr.Blocks() as demo:
    gr.HTML(STYLE_TAG)
    gr.Markdown("""
**How to use this sandbox**
1. **Chat**: Ask Wood Pecker about orders, returns, promos, shipping, hours, etc.
2. **Modes**: Choose **Manual data (FAQs)** for rule-based answers or **Sentiment pipeline** for the transformers option.
3. **NLP Practice**: Use the **User feedback** box on the right to test the integrated NLP pipeline (Tokenization -> POS -> NER -> Sentiment). Try a sentence like: *"Serena Williams won Wimbledon in 2016."*
""")
    with gr.Row():
        with gr.Column(scale=2):
            gr.Markdown("### Chat with Wood Pecker")
            chatbot = gr.Chatbot(
                label="Wood Pecker",
                height=420,
                elem_id="chatbot",
                value=[{"role": "assistant", "content": GREETING_MESSAGE}],
            )
            with gr.Row():
                msg = gr.Textbox(
                    label="Ask a question",
                    placeholder="e.g., How fast is shipping?",
                    elem_classes=["input-box"],
                    scale=3,
                )
                mode = gr.Radio(
                    choices=[MODE_MANUAL, MODE_SENTIMENT],
                    value=MODE_MANUAL,
                    label="Answer mode",
                    info="Option 1 = manual data, Option 2 = transformers sentiment pipeline.",
                    scale=1,
                )
            msg.submit(respond, inputs=[msg, chatbot, mode], outputs=[chatbot, msg])
            clear = gr.Button("Clear conversation")
            clear.click(reset_chat, None, [chatbot, msg], queue=False)
        with gr.Column(scale=1):
            gr.Markdown("### User feedback / NLP Lab")
            feedback_input = gr.Textbox(
                label="Share your experience (or test text)",
                placeholder="Type any text here to analyze...",
                lines=4,
                elem_classes=["input-box"],
            )
            feedback_button = gr.Button("Analyze NLP Pipeline")
            feedback_result = gr.Markdown(value="Awaiting input...")
            feedback_button.click(analyze_feedback, inputs=feedback_input, outputs=feedback_result)
    gr.Markdown("""
_Tip_: to test a future ML model, pass a callable to `brain.set_ml_handler` that takes the prompt and returns a custom string. Leave it `None` to stay purely rule-based.
""")

# Launch on any available port to avoid conflicts when rerunning in notebooks.
demo.launch(server_name="0.0.0.0")

ERROR:    [Errno 10048] error while attempting to bind on address ('0.0.0.0', 7860): [winerror 10048] only one usage of each socket address (protocol/network address/port) is normally permitted
ERROR:    [Errno 10048] error while attempting to bind on address ('0.0.0.0', 7861): [winerror 10048] only one usage of each socket address (protocol/network address/port) is normally permitted
ERROR:    [Errno 10048] error while attempting to bind on address ('0.0.0.0', 7861): [winerror 10048] only one usage of each socket address (protocol/network address/port) is normally permitted


* Running on local URL:  http://0.0.0.0:7862
* To create a public link, set `share=True` in `launch()`.
* To create a public link, set `share=True` in `launch()`.





--- Analyzing: 'My phone battery drains too fast after the last software update' ---
1. Tokenization:
   ['My', 'phone', 'battery', 'drains', 'too', 'fast', 'after', 'the', 'last', 'software', 'update']
2. POS Tagging:
   [('My', 'PRP$'), ('phone', 'NN'), ('battery', 'NN'), ('drains', 'VBZ'), ('too', 'RB'), ('fast', 'RB'), ('after', 'IN'), ('the', 'DT'), ('last', 'JJ'), ('software', 'NN'), ('update', 'NN')]
3a. NER:
   None detected
3b. Key Phrases (Noun Chunks):
   ['My phone battery', 'the last software update']
4. Sentiment Analysis:
   Label: NEGATIVE, Score: 0.9995

--- Analyzing: 'My Samsung phone battery drains too fast after the last software update' ---
1. Tokenization:
   ['My', 'Samsung', 'phone', 'battery', 'drains', 'too', 'fast', 'after', 'the', 'last', 'software', 'update']
2. POS Tagging:
   [('My', 'PRP$'), ('Samsung', 'NNP'), ('phone', 'NN'), ('battery', 'NN'), ('drains', 'VBZ'), ('too', 'RB'), ('fast', 'RB'), ('after', 'IN'), ('the', 'DT'), ('last', 'JJ'), ('softwar

## Improvement Ideas
- **Semantic matching**: plug sentence-transformer embeddings into `brain.set_ml_handler` to rank answers by vector similarity instead of keywords.
- **Analytics hooks**: log unanswered questions to a CSV so the retail team knows what info to add next.
- **Order API integration**: swap the placeholder shipping/order status responses with live API calls once credentials are available.
- **Persona tuning**: expand the `format_answer` helper with tone controls, emojis, or markdown templates for more personality.
- **Context memory**: keep track of previous answers in a state object if you want Wood Pecker to handle follow-up questions like “What about curbside?”

In [24]:
# Safety net: close any previously running Gradio demo before relaunching.
try:
    demo.close()
except NameError:
    pass