# Debug Drill: The Retrieval Failure

**Scenario:**
Your support team's search system is failing users.

"When I search 'get my money back', I get articles about payment methods!" a user complains.

The search is keyword-based, and it's not finding the right documents.

**Your Task:**
1. Diagnose why keyword search is failing
2. Implement semantic search to fix it
3. Evaluate the improvement with metrics
4. Write a 3-bullet postmortem

---

In [None]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD
from sklearn.metrics.pairwise import cosine_similarity

np.random.seed(42)

In [None]:
# Knowledge base with common support articles
documents = [
    {"id": 1, "title": "Refund Policy", 
     "content": "We offer full refunds within 30 days of purchase. To request a refund, go to Order History and select the item."},
    {"id": 2, "title": "How to Cancel Subscription", 
     "content": "Cancel your subscription from Account Settings. Click Manage Subscription then Cancel."},
    {"id": 3, "title": "Password Reset Guide", 
     "content": "Reset your password by clicking Forgot Password on the login page. We'll send a reset link."},
    {"id": 4, "title": "Return an Item", 
     "content": "Start a return from Order History. Select the item, choose Return, and print the prepaid label."},
    {"id": 5, "title": "Payment Methods", 
     "content": "We accept credit cards, debit cards, and PayPal. Add or update payment methods in Billing Settings."},
    {"id": 6, "title": "Shipping Information", 
     "content": "Standard shipping takes 5-7 business days. Express shipping delivers in 2-3 days."},
]

# Test queries with expected relevant documents
test_cases = [
    {"query": "get my money back", "relevant": [1, 4], "note": "User means refund, doesn't use that word"},
    {"query": "can't remember my login", "relevant": [3], "note": "Password reset without 'password'"},
    {"query": "end my membership", "relevant": [2], "note": "Cancel subscription without 'cancel'"},
    {"query": "how long until my order arrives", "relevant": [6], "note": "Shipping without 'ship'"},
]

df = pd.DataFrame(documents)
df['text'] = df['title'] + ' ' + df['content']

print(f"Knowledge base: {len(df)} documents")
print(f"Test queries: {len(test_cases)}")

In [None]:
# ===== COLLEAGUE'S CODE (BUG: Keyword-only search) =====

# Build TF-IDF index for keyword search
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(df['text'])

def keyword_search(query, k=3):
    """Search using TF-IDF keyword matching."""
    query_vec = tfidf.transform([query])
    scores = cosine_similarity(query_vec, tfidf_matrix).flatten()
    top_k_idx = scores.argsort()[::-1][:k]
    return [(df.iloc[i]['id'], scores[i]) for i in top_k_idx]

# Test the broken search
print("=== Colleague's Keyword Search (FAILING) ===")
for case in test_cases:
    results = keyword_search(case['query'], k=3)
    top_ids = [r[0] for r in results]
    relevant = case['relevant']
    
    # Check if any relevant doc is in top 3
    found = any(r in top_ids for r in relevant)
    status = "✓" if found else "❌"
    
    print(f"\n{status} Query: '{case['query']}'")
    print(f"  Expected: docs {relevant}")
    print(f"  Got: docs {top_ids}")
    if not found:
        print(f"  Problem: {case['note']}")

---

## Your Investigation

### Step 1: Understand why keyword search fails

In [None]:
print("=== Why Keyword Search Fails ===")
print()
print("Query: 'get my money back'")
print("Problem: User wants a REFUND but doesn't use that word")
print()
print("TF-IDF looks for exact word matches:")
print("  'money' appears in: Payment Methods (about adding payment, not getting money back)")
print("  'refund' appears in: Refund Policy (correct doc)")
print()
print("❌ No overlap between 'money back' and 'refund'")
print("✓ Semantic search understands they mean the same thing")

### Step 2: TODO - Implement semantic search

In [None]:
# TODO: Build semantic search with dense embeddings

# Uncomment and complete:

# # Create semantic embeddings using SVD (in production, use sentence-transformers)
# svd = TruncatedSVD(n_components=50, random_state=42)
# semantic_embeddings = svd.fit_transform(tfidf_matrix)
# 
# def semantic_search(query, k=3):
#     """Search using semantic embeddings."""
#     query_vec = tfidf.transform([query])
#     query_emb = svd.transform(query_vec)
#     scores = cosine_similarity(query_emb, semantic_embeddings).flatten()
#     top_k_idx = scores.argsort()[::-1][:k]
#     return [(df.iloc[i]['id'], scores[i]) for i in top_k_idx]
# 
# print("✓ Semantic search implemented")

In [None]:
# TODO: Test semantic search on failing queries

# Uncomment:

# print("=== Semantic Search Results ===")
# for case in test_cases:
#     results = semantic_search(case['query'], k=3)
#     top_ids = [r[0] for r in results]
#     relevant = case['relevant']
#     
#     found = any(r in top_ids for r in relevant)
#     status = "✓" if found else "❌"
#     
#     print(f"\n{status} Query: '{case['query']}'")
#     print(f"  Expected: docs {relevant}")
#     print(f"  Got: docs {top_ids}")

In [None]:
# TODO: Calculate metrics comparing both methods

# Uncomment:

# def recall_at_k(retrieved, relevant, k):
#     """What fraction of relevant docs are in top K?"""
#     top_k = [r[0] for r in retrieved[:k]]
#     hits = len(set(top_k) & set(relevant))
#     return hits / len(relevant) if relevant else 0
# 
# keyword_recalls = []
# semantic_recalls = []
# 
# for case in test_cases:
#     keyword_results = keyword_search(case['query'], k=3)
#     semantic_results = semantic_search(case['query'], k=3)
#     
#     keyword_recalls.append(recall_at_k(keyword_results, case['relevant'], 3))
#     semantic_recalls.append(recall_at_k(semantic_results, case['relevant'], 3))
# 
# print("=== Recall@3 Comparison ===")
# print(f"Keyword Search: {np.mean(keyword_recalls):.1%}")
# print(f"Semantic Search: {np.mean(semantic_recalls):.1%}")
# print(f"\nImprovement: {np.mean(semantic_recalls) - np.mean(keyword_recalls):+.1%}")

In [None]:
# ============================================
# SELF-CHECK
# ============================================

# Uncomment:

# assert callable(semantic_search), "Should have implemented semantic_search"
# assert np.mean(semantic_recalls) > np.mean(keyword_recalls), "Semantic should beat keyword"
# 
# print("✓ Retrieval fixed!")
# print(f"✓ Keyword Recall@3: {np.mean(keyword_recalls):.1%}")
# print(f"✓ Semantic Recall@3: {np.mean(semantic_recalls):.1%}")

### Step 3: Write your postmortem

In [None]:
postmortem = """
## Postmortem: The Retrieval Failure

### What happened:
- (Your answer: What user complaints indicated the search was broken?)

### Root cause:
- (Your answer: Why does keyword search fail on synonym queries?)

### How to prevent:
- (Your answer: What approach handles vocabulary mismatch?)

"""

print(postmortem)

---

## ✅ Drill Complete!

**Key lessons:**

1. **Keyword search fails on synonyms.** Users don't use your vocabulary.

2. **Semantic search understands meaning.** "Money back" ≈ "refund" in embedding space.

3. **Use retrieval metrics.** Recall@K measures if relevant docs are found.

4. **Hybrid search combines both.** Keywords for exact matches + semantics for meaning.

---

## Search Strategy Guide

| Query Type | Best Approach | Example |
|------------|---------------|----------|
| Exact term | Keyword | "ORD-12345" |
| Synonym | Semantic | "money back" → refund |
| Mixed | Hybrid | "refund ORD-12345" |