# Debug Drill: The Hidden Segments

**Scenario:**
A colleague built an LTV prediction model for StreamCart. They're thrilled with the results.

"R¬≤ = 0.85 and MAE = $40!" they report. "We're ready to deploy!"

But when the marketing team uses it, they complain: "Your predictions for our premium customers are terrible!"

**Your Task:**
1. Run the model and verify the overall metrics look good
2. Investigate segment-level performance
3. Diagnose why certain segments fail
4. Write a 3-bullet postmortem

---

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

np.random.seed(42)

In [None]:
# Generate synthetic customer data with hidden segment structure
n_samples = 1000

# Create three customer segments with different relationships
segments = np.random.choice(['Standard', 'Premium', 'Enterprise'], n_samples, p=[0.7, 0.2, 0.1])

# Features
tenure_months = np.random.uniform(1, 36, n_samples)
monthly_spend = np.random.uniform(20, 200, n_samples)
orders = np.random.poisson(5, n_samples)

# Generate LTV with DIFFERENT patterns per segment (the hidden bug)
ltv = np.zeros(n_samples)

for i, seg in enumerate(segments):
    if seg == 'Standard':
        # Linear relationship
        ltv[i] = 50 + 10 * tenure_months[i] + 2 * monthly_spend[i] + np.random.normal(0, 30)
    elif seg == 'Premium':
        # QUADRATIC relationship (model can't capture this!)
        ltv[i] = 200 + 5 * tenure_months[i]**1.5 + 3 * monthly_spend[i] + np.random.normal(0, 50)
    else:  # Enterprise
        # Very high, different scale
        ltv[i] = 1000 + 50 * tenure_months[i] + 10 * monthly_spend[i] + np.random.normal(0, 200)

df = pd.DataFrame({
    'tenure_months': tenure_months,
    'monthly_spend': monthly_spend,
    'orders': orders,
    'segment': segments,
    'ltv': ltv
})

print(f"Dataset: {len(df)} customers")
print(f"\nSegment distribution:")
print(df['segment'].value_counts())

In [None]:
# ===== COLLEAGUE'S CODE =====

# Train a simple linear regression on ALL data
X = df[['tenure_months', 'monthly_spend', 'orders']]
y = df['ltv']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

# Overall metrics look great!
print("=== Colleague's Report ===")
print(f"\nOverall Test Metrics:")
print(f"  R¬≤:  {r2_score(y_test, y_pred):.3f}")
print(f"  MAE: ${mean_absolute_error(y_test, y_pred):.2f}")
print(f"  RMSE: ${np.sqrt(mean_squared_error(y_test, y_pred)):.2f}")
print("\n‚úÖ Looks good! Ready to deploy...")

---

## Your Investigation

The marketing team says the model fails for premium customers. Let's dig deeper.

### Step 1: Analyze by segment

In [None]:
# Get segment labels for test set
test_idx = X_test.index
test_segments = df.loc[test_idx, 'segment']

# Calculate metrics per segment
print("=== Segment-Level Performance ===")
print(f"{'Segment':<12} {'Count':<8} {'MAE':<12} {'RMSE':<12} {'R¬≤':<8}")
print("-" * 55)

for seg in ['Standard', 'Premium', 'Enterprise']:
    mask = test_segments == seg
    if mask.sum() > 0:
        y_true_seg = y_test[mask]
        y_pred_seg = y_pred[mask]
        
        mae = mean_absolute_error(y_true_seg, y_pred_seg)
        rmse = np.sqrt(mean_squared_error(y_true_seg, y_pred_seg))
        r2 = r2_score(y_true_seg, y_pred_seg)
        
        flag = "‚ùå" if r2 < 0.5 else "‚úì"
        print(f"{seg:<12} {mask.sum():<8} ${mae:<10.2f} ${rmse:<10.2f} {r2:<8.3f} {flag}")

print("\nüîç Key Finding: Performance varies DRAMATICALLY by segment!")

In [None]:
# Visualize the problem: residuals by segment
residuals = y_test - y_pred

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Box plot of residuals by segment
ax1 = axes[0]
segment_residuals = [residuals[test_segments == seg].values for seg in ['Standard', 'Premium', 'Enterprise']]
ax1.boxplot(segment_residuals, labels=['Standard', 'Premium', 'Enterprise'])
ax1.axhline(y=0, color='red', linestyle='--', alpha=0.7)
ax1.set_ylabel('Residual (Actual - Predicted)')
ax1.set_title('Residual Distribution by Segment')
ax1.grid(True, alpha=0.3)

# Scatter: predicted vs actual by segment
ax2 = axes[1]
colors = {'Standard': '#3b82f6', 'Premium': '#f97316', 'Enterprise': '#22c55e'}
for seg in ['Standard', 'Premium', 'Enterprise']:
    mask = test_segments == seg
    ax2.scatter(y_test[mask], y_pred[mask], alpha=0.5, label=seg, c=colors[seg])

# Perfect prediction line
ax2.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--', label='Perfect')
ax2.set_xlabel('Actual LTV')
ax2.set_ylabel('Predicted LTV')
ax2.set_title('Predicted vs Actual by Segment')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("üîç Notice: Enterprise predictions are systematically LOW (below the diagonal)")

### Step 2: Diagnose the root cause

In [None]:
# Check the actual LTV distribution by segment
print("=== LTV Statistics by Segment ===")
print(df.groupby('segment')['ltv'].describe().round(1))

print("\nüîç Root Cause Analysis:")
print("  1. Enterprise customers have MUCH higher LTV (different scale)")
print("  2. Premium customers have non-linear LTV pattern")
print("  3. A single linear model can't capture both patterns")

### Step 3: TODO - Propose a fix

In [None]:
# TODO: Train separate models per segment and compare

# Uncomment and complete:

# print("=== Segment-Specific Models ===")
# 
# y_pred_fixed = pd.Series(index=X_test.index, dtype=float)
# 
# for seg in ['Standard', 'Premium', 'Enterprise']:
#     # Train on segment
#     train_mask = df.loc[X_train.index, 'segment'] == seg
#     test_mask = df.loc[X_test.index, 'segment'] == seg
#     
#     if train_mask.sum() > 10:  # Need enough samples
#         seg_model = LinearRegression()
#         seg_model.fit(X_train[train_mask], y_train[train_mask])
#         
#         seg_pred = seg_model.predict(X_test[test_mask])
#         y_pred_fixed[test_mask] = seg_pred
#         
#         mae = mean_absolute_error(y_test[test_mask], seg_pred)
#         r2 = r2_score(y_test[test_mask], seg_pred)
#         print(f"  {seg}: MAE=${mae:.2f}, R¬≤={r2:.3f}")
# 
# # Compare overall
# print(f"\nOverall (segment models): MAE=${mean_absolute_error(y_test, y_pred_fixed):.2f}")
# print(f"Overall (single model):   MAE=${mean_absolute_error(y_test, y_pred):.2f}")

In [None]:
# ============================================
# SELF-CHECK: Did you improve segment performance?
# ============================================

# Uncomment after completing:

# assert 'y_pred_fixed' in dir(), "Should have created segment-specific predictions"
# mae_original = mean_absolute_error(y_test, y_pred)
# mae_fixed = mean_absolute_error(y_test, y_pred_fixed)
# assert mae_fixed < mae_original, f"Segment models ({mae_fixed:.1f}) should beat single model ({mae_original:.1f})"
# 
# print("‚úì Segment-specific models improved overall MAE!")
# print(f"  Original: ${mae_original:.2f}")
# print(f"  Fixed: ${mae_fixed:.2f}")
# print(f"  Improvement: {(mae_original - mae_fixed) / mae_original * 100:.1f}%")

### Step 4: Write your postmortem

In [None]:
postmortem = """
## Postmortem: The Hidden Segments

### What happened:
- (Your answer: What symptom did the marketing team observe?)

### Root cause:
- (Your answer: Why did good overall metrics hide poor segment performance?)

### How to prevent:
- (Your answer: What should we check before deploying a regression model?)

"""

print(postmortem)

---

## ‚úÖ Drill Complete!

**Key lessons:**

1. **Overall metrics can hide segment-level failures.** A model can have great R¬≤ overall but terrible performance on important subgroups.

2. **Always check performance by segment.** Business users care about specific segments, not just averages.

3. **Consider segment-specific models** when different groups have different patterns.

4. **Beware of scale differences.** Enterprise customers with 10x higher LTV can dominate MAE/RMSE.

---

## Segment Analysis Checklist

| Check | Why It Matters |
|-------|----------------|
| MAE by segment | Different value groups have different tolerances |
| R¬≤ by segment | Model may not capture patterns for all groups |
| Residual distribution | Systematic bias indicates model limitations |
| MAPE by segment | Percentage error normalizes across scales |