# ANSWER KEY: Debug Drill 09 - Hallucination Detection

**Bug:** LLM chatbot confidently states incorrect information ("90-day return policy" when actual is 30 days).

**Key Lesson:** LLMs hallucinate. Ground responses in a knowledge base and add verification.

In [None]:
import re
from typing import Dict, Tuple, List

## The Bug (Colleague's Code)

In [None]:
# ===== BUGGY CODE =====
# Colleague's chatbot with NO fact-checking

def buggy_chatbot(question: str) -> str:
    """Simulates an LLM response without verification."""
    # These are simulated LLM responses (which may hallucinate)
    responses = {
        "return policy": "Our return policy allows returns within 90 days of purchase.",  # WRONG! Actual is 30 days
        "shipping time": "Orders typically arrive within 1-2 business days.",  # WRONG! Actual is 3-5 days
        "price match": "We offer a 200% price match guarantee!",  # WRONG! We don't offer this
    }
    
    for topic, response in responses.items():
        if topic in question.lower():
            return response
    
    return "I'm not sure about that. Let me connect you with support."

# Test the buggy chatbot
print("Buggy chatbot responses:")
print(f"Q: What is your return policy?")
print(f"A: {buggy_chatbot('return policy')}")
print(f"\nThis is WRONG! The actual return policy is 30 days, not 90!")

## Why This Is Wrong

**LLMs hallucinate because:**
1. They generate plausible-sounding text, not verified facts
2. They don't "know" your specific business policies
3. They may have seen similar policies during training and mix them up

**Business impact:**
- Customer expects 90-day return → actually gets 30 days → bad experience
- Company might be legally liable for incorrect claims
- Trust in the chatbot (and brand) erodes

## The Fix: Grounding + Verification

In [None]:
# ===== FIXED CODE =====

# Step 1: Define the AUTHORITATIVE knowledge base
KNOWLEDGE_BASE = {
    "return_policy": {
        "text": "We offer a 30-day return window for all products in original condition.",
        "numbers": [30],
        "keywords": ["return", "refund", "days"]
    },
    "shipping": {
        "text": "Standard shipping takes 3-5 business days. Express shipping is 1-2 days.",
        "numbers": [3, 5, 1, 2],
        "keywords": ["shipping", "delivery", "days"]
    },
    "price_match": {
        "text": "We do not offer price matching. Our prices are final.",
        "numbers": [],
        "keywords": ["price", "match", "guarantee"]
    },
    "warranty": {
        "text": "All electronics come with a 1-year manufacturer warranty.",
        "numbers": [1],
        "keywords": ["warranty", "year", "guarantee"]
    }
}

print("Knowledge Base loaded with", len(KNOWLEDGE_BASE), "topics")

In [None]:
# Step 2: Create hallucination detector

def detect_hallucination(response: str, topic: str) -> Tuple[bool, float, List[str]]:
    """
    Check if a response contains potential hallucinations.
    Returns: (is_safe, confidence, issues)
    """
    issues = []
    confidence = 1.0
    
    if topic not in KNOWLEDGE_BASE:
        return False, 0.0, ["Topic not in knowledge base"]
    
    kb_entry = KNOWLEDGE_BASE[topic]
    kb_text = kb_entry["text"].lower()
    kb_numbers = kb_entry["numbers"]
    response_lower = response.lower()
    
    # Check 1: Number verification
    response_numbers = [int(n) for n in re.findall(r'\b(\d+)\b', response)]
    for num in response_numbers:
        if num not in kb_numbers and num > 0:
            # Allow for small variations, but flag significant differences
            if not any(abs(num - kb_num) <= 1 for kb_num in kb_numbers):
                issues.append(f"Number {num} not in knowledge base (expected: {kb_numbers})")
                confidence -= 0.3
    
    # Check 2: Superlative claims (often hallucinated)
    superlatives = ["best", "guaranteed", "always", "never", "100%", "unlimited"]
    for sup in superlatives:
        if sup in response_lower and sup not in kb_text:
            issues.append(f"Superlative '{sup}' not supported by knowledge base")
            confidence -= 0.2
    
    # Check 3: Contradiction detection
    negation_pairs = [
        ("do not offer", "offer"),
        ("no ", "yes"),
    ]
    for neg, pos in negation_pairs:
        if neg in kb_text and pos in response_lower and neg not in response_lower:
            issues.append(f"Response may contradict knowledge base")
            confidence -= 0.4
    
    is_safe = confidence >= 0.7 and len(issues) == 0
    return is_safe, max(0, confidence), issues

In [None]:
# Step 3: Create safe response function

def get_safe_response(question: str, llm_response: str) -> Dict:
    """
    Verify LLM response against knowledge base.
    Fall back to KB if hallucination detected.
    """
    # Determine topic from question
    topic = None
    question_lower = question.lower()
    
    for kb_topic, entry in KNOWLEDGE_BASE.items():
        if any(kw in question_lower for kw in entry["keywords"]):
            topic = kb_topic
            break
    
    if topic is None:
        return {
            "response": "I'm not sure about that. Let me connect you with support.",
            "source": "fallback",
            "verified": False
        }
    
    # Check for hallucinations
    is_safe, confidence, issues = detect_hallucination(llm_response, topic)
    
    if is_safe:
        return {
            "response": llm_response,
            "source": "llm_verified",
            "verified": True,
            "confidence": confidence
        }
    else:
        # Fall back to knowledge base
        return {
            "response": KNOWLEDGE_BASE[topic]["text"],
            "source": "knowledge_base",
            "verified": True,
            "issues_detected": issues,
            "original_response": llm_response
        }

In [None]:
# Test the fixed system
print("="*60)
print("FIXED CHATBOT WITH HALLUCINATION DETECTION")
print("="*60)

test_cases = [
    ("What is your return policy?", "Our return policy allows returns within 90 days."),  # Hallucination!
    ("How long does shipping take?", "Shipping takes 3-5 business days."),  # Correct
    ("Do you price match?", "Yes, we offer a 200% price match guarantee!"),  # Hallucination!
]

for question, llm_response in test_cases:
    print(f"\nQ: {question}")
    print(f"LLM said: {llm_response}")
    
    result = get_safe_response(question, llm_response)
    print(f"Safe response: {result['response']}")
    print(f"Source: {result['source']}")
    if 'issues_detected' in result:
        print(f"Issues caught: {result['issues_detected']}")

In [None]:
# Self-check
# Verify hallucination was caught
result = get_safe_response("return policy", "90 day returns")
assert result['source'] == 'knowledge_base', "Should fall back to KB for hallucination"
assert '30' in result['response'], "Should return correct 30-day policy"

print("\nPASS: Hallucination detection working correctly!")

## Hallucination Prevention Strategies

| Strategy | How It Works | When to Use |
|----------|--------------|-------------|
| RAG | Retrieve docs before generating | Factual Q&A |
| Knowledge Base Fallback | Verify against KB, fallback if wrong | Policy questions |
| Number Verification | Check numbers against source | Financial, dates |
| Confidence Scoring | Only surface high-confidence responses | All production use |
| Human Escalation | Route uncertain responses to humans | High-stakes queries |

## Completed Postmortem

### What happened:
- Chatbot confidently stated "90-day return policy" when actual policy is 30 days
- Customers received incorrect information, damaging trust

### Root cause:
- LLM generates plausible text without access to actual business facts
- No verification layer between LLM output and customer

### How to prevent:
- Always ground LLM responses in authoritative knowledge base
- Implement verification checks (numbers, contradictions, superlatives)
- Fall back to KB when hallucination detected
- Log and review flagged responses for continuous improvement