# Debug Drill 06: The Wrong Threshold

**Symptom:** The retention team can only call 200 customers per week. Your colleague deployed a churn model using the default 0.5 threshold. The team is frustrated: "We're calling people who aren't churning!"

**Your task:** Find the right threshold, explain why 0.5 is wrong, and write a postmortem.

**Time:** 15 minutes

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score, recall_score, confusion_matrix
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Load and prepare data
try:
    df = pd.read_csv('https://raw.githubusercontent.com/189investmentai/ml-foundations-interactive/main/streamcart_customers.csv')
except:
    df = pd.read_csv('../../data/streamcart_customers.csv')

features = ['tenure_months', 'logins_last_30d', 'orders_last_30d', 
            'support_tickets_last_30d', 'nps_score']
X = df[features].fillna(0)
y = df['churn_30d']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
probs = model.predict_proba(X_test)[:, 1]

In [None]:
# ===== COLLEAGUE'S CODE (CONTAINS BUG) =====

# Using default threshold of 0.5
threshold = 0.5
predictions = (probs >= threshold).astype(int)

print(f"Using threshold: {threshold}")
print(f"Customers flagged for calling: {predictions.sum()}")
print(f"Precision: {precision_score(y_test, predictions):.2%}")
print(f"Recall: {recall_score(y_test, predictions):.2%}")

print("\nRetention team capacity: 200 calls/week")
print(f"Customers flagged: {predictions.sum()}")

## Your Investigation

**Q1:** What's the base churn rate? Why does this make 0.5 a bad threshold?

In [None]:
# TODO: Calculate base churn rate
base_rate = y_test.mean()
print(f"Base churn rate: {base_rate:.2%}")

# TODO: Explain why 0.5 threshold is problematic
# Your explanation: 

**Q2:** What threshold would give us approximately 200 customers?

In [None]:
# TODO: Find threshold that selects ~200 customers
# Hint: Sort probabilities and find the cutoff

target_calls = 200

# Sort probabilities in descending order
sorted_probs = np.sort(probs)[::-1]

# Find threshold that gives us target_calls
if len(sorted_probs) >= target_calls:
    optimal_threshold = sorted_probs[target_calls - 1]
else:
    optimal_threshold = sorted_probs[-1]

print(f"Threshold for {target_calls} calls: {optimal_threshold:.4f}")

## Fix the Bug

**Q3:** Apply the correct threshold and compare metrics.

In [None]:
# TODO: Use the optimal threshold (or just take top 200)

# Method 1: Threshold-based
predictions_fixed = (probs >= optimal_threshold).astype(int)

# Method 2: Top-K based (more reliable for capacity constraints)
top_k_indices = np.argsort(probs)[::-1][:target_calls]
predictions_topk = np.zeros(len(probs))
predictions_topk[top_k_indices] = 1

print("=== FIXED (Top 200) ===")
print(f"Customers flagged: {int(predictions_topk.sum())}")
print(f"Precision@200: {precision_score(y_test, predictions_topk):.2%}")
print(f"Actual churners in top 200: {int(y_test.iloc[top_k_indices].sum())}")

print("\n=== COMPARISON ===")
print(f"Old (0.5 threshold): {predictions.sum()} flagged, {precision_score(y_test, predictions):.2%} precision")
print(f"New (top 200): {int(predictions_topk.sum())} flagged, {precision_score(y_test, predictions_topk):.2%} precision")

## Self-Check

In [None]:
# Verify fix
assert int(predictions_topk.sum()) == 200, "Should flag exactly 200"
assert precision_score(y_test, predictions_topk) > precision_score(y_test, predictions), "Precision should improve"
print("PASS: Threshold optimized for capacity!")

## Postmortem

Write 3 bullets:
1. **Root cause:** 
2. **How we detected it:** 
3. **Prevention for next time:** 