# Debug Drill 09: The Hallucinating Bot

**Symptom:** Your colleague deployed a customer support chatbot. A customer asked about the return policy, and the bot confidently stated "You have 90 days to return any item for a full refund." The actual policy is 30 days. This is a hallucination.

**Your task:** Identify hallucination patterns, design detection strategies, and write a postmortem.

**Time:** 15 minutes

In [None]:
import pandas as pd
import numpy as np

In [None]:
# Simulated chatbot responses and ground truth
# In real life, you'd have actual LLM outputs

test_cases = [
    {
        "question": "What is your return policy?",
        "bot_response": "You have 90 days to return any item for a full refund, no questions asked.",
        "ground_truth": "30-day return window for unopened items. Opened items: 14 days.",
        "is_hallucination": True
    },
    {
        "question": "How do I cancel my subscription?",
        "bot_response": "Go to Account Settings > Subscription > Cancel. Your subscription ends at the billing cycle.",
        "ground_truth": "Account Settings > Subscription > Cancel. Ends at billing cycle.",
        "is_hallucination": False
    },
    {
        "question": "What payment methods do you accept?",
        "bot_response": "We accept Visa, Mastercard, Amex, PayPal, Apple Pay, Google Pay, Bitcoin, and wire transfers.",
        "ground_truth": "Visa, Mastercard, Amex, PayPal, Apple Pay.",
        "is_hallucination": True  # Added Bitcoin and wire transfers
    },
    {
        "question": "Do you ship internationally?",
        "bot_response": "Yes! We ship to over 150 countries with free shipping on orders over $100.",
        "ground_truth": "We ship to US and Canada only.",
        "is_hallucination": True
    },
    {
        "question": "What are your support hours?",
        "bot_response": "Our support team is available Monday-Friday, 9am-6pm EST.",
        "ground_truth": "Monday-Friday, 9am-6pm EST.",
        "is_hallucination": False
    }
]

df = pd.DataFrame(test_cases)

In [None]:
# ===== COLLEAGUE'S CODE (NO HALLUCINATION DETECTION) =====

def get_bot_response(question):
    """Simulated bot - just returns the response without checking"""
    for case in test_cases:
        if case['question'] == question:
            return case['bot_response']
    return "I don't know."

# No verification, no confidence scoring, no grounding
print("Bot response (no safeguards):")
print(get_bot_response("What is your return policy?"))

## Your Investigation

**Q1:** Look at the hallucinations in the test cases. What patterns do you see?

In [None]:
# TODO: Identify hallucination patterns
print("=== HALLUCINATIONS ===")
for _, row in df[df['is_hallucination']].iterrows():
    print(f"\nQ: {row['question']}")
    print(f"Bot said: {row['bot_response']}")
    print(f"Truth: {row['ground_truth']}")

# TODO: What patterns do you notice?
# Pattern 1: 
# Pattern 2: 
# Pattern 3: 

**Q2:** Why do LLMs hallucinate? List at least 2 reasons.

In [None]:
# TODO: Your explanation
# Reason 1: 
# Reason 2: 

## Fix the Bug

**Q3:** Design a hallucination detection/prevention system.

In [None]:
# Knowledge base (ground truth)
KNOWLEDGE_BASE = {
    "return_policy": "30-day return window for unopened items. Opened items: 14 days.",
    "cancel_subscription": "Account Settings > Subscription > Cancel. Ends at billing cycle.",
    "payment_methods": "Visa, Mastercard, Amex, PayPal, Apple Pay.",
    "shipping": "We ship to US and Canada only.",
    "support_hours": "Monday-Friday, 9am-6pm EST."
}

def detect_hallucination_simple(bot_response, topic):
    """
    Simple fact-checking against knowledge base.
    Returns (is_safe, confidence, issues)
    """
    issues = []
    
    if topic not in KNOWLEDGE_BASE:
        return False, 0.0, ["Topic not in knowledge base"]
    
    ground_truth = KNOWLEDGE_BASE[topic].lower()
    response = bot_response.lower()
    
    # Check for specific number mismatches
    import re
    response_numbers = set(re.findall(r'\d+', response))
    truth_numbers = set(re.findall(r'\d+', ground_truth))
    
    wrong_numbers = response_numbers - truth_numbers
    if wrong_numbers:
        issues.append(f"Suspicious numbers: {wrong_numbers}")
    
    # Check for superlatives/absolutes (often hallucinated)
    red_flags = ['all', 'every', 'always', 'never', 'unlimited', 'free', 'guaranteed']
    for flag in red_flags:
        if flag in response and flag not in ground_truth:
            issues.append(f"Added superlative: '{flag}'")
    
    confidence = 1.0 - (len(issues) * 0.3)
    confidence = max(0.0, min(1.0, confidence))
    
    is_safe = len(issues) == 0
    return is_safe, confidence, issues

In [None]:
# TODO: Build a safer response function

def get_safe_response(question, topic):
    """Get response with hallucination checking"""
    
    # Step 1: Get bot response
    bot_response = get_bot_response(question)
    
    # Step 2: Check for hallucination
    is_safe, confidence, issues = detect_hallucination_simple(bot_response, topic)
    
    # Step 3: Decide what to return
    if is_safe and confidence > 0.7:
        return {
            "response": bot_response,
            "confidence": confidence,
            "source": "bot"
        }
    else:
        # Fall back to knowledge base
        return {
            "response": KNOWLEDGE_BASE.get(topic, "I'm not sure. Let me connect you with a human agent."),
            "confidence": 1.0,
            "source": "knowledge_base",
            "issues_detected": issues
        }

# Test the safe response system
print("=== SAFE RESPONSE SYSTEM ===")

test_queries = [
    ("What is your return policy?", "return_policy"),
    ("How do I cancel my subscription?", "cancel_subscription"),
    ("Do you ship internationally?", "shipping")
]

for question, topic in test_queries:
    result = get_safe_response(question, topic)
    print(f"\nQ: {question}")
    print(f"Response: {result['response']}")
    print(f"Source: {result['source']}, Confidence: {result['confidence']:.1%}")
    if 'issues_detected' in result:
        print(f"Issues: {result['issues_detected']}")

## Self-Check

In [None]:
# Verify the hallucinations are caught
return_result = get_safe_response("What is your return policy?", "return_policy")
assert return_result['source'] == 'knowledge_base', "Should fall back to KB for hallucinated response"
assert '30' in return_result['response'], "Should return correct policy"

cancel_result = get_safe_response("How do I cancel my subscription?", "cancel_subscription")
# This one was correct, so either source is fine

print("PASS: Hallucination detection working!")

## Postmortem

Write 3 bullets:
1. **Root cause:** 
2. **How we detected it:** 
3. **Prevention for next time:** 