# Debug Drill: The Unreproducible Model

**Scenario:**
Your team deployed a model that performed great in testing but poorly in production.

"I can't reproduce the training results!" a colleague says. "Every run gives different metrics!"

The training pipeline has reproducibility bugs that make debugging impossible.

**Your Task:**
1. Identify why training isn't reproducible
2. Fix the pipeline to ensure identical results
3. Verify reproducibility across multiple runs
4. Write a 3-bullet postmortem

---

In [None]:
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

In [None]:
# Generate sample data
np.random.seed(42)  # Seed for data generation only
n_samples = 1000

data = pd.DataFrame({
    'feature_1': np.random.normal(0, 1, n_samples),
    'feature_2': np.random.normal(0, 1, n_samples),
    'feature_3': np.random.normal(0, 1, n_samples),
})
data['target'] = ((data['feature_1'] + data['feature_2'] * 0.5) > 0).astype(int)

print(f"Data shape: {data.shape}")

In [None]:
# ===== COLLEAGUE'S CODE (BUGS: Missing random seeds!) =====

def broken_train_pipeline(df):
    """Training pipeline with reproducibility bugs."""
    
    features = ['feature_1', 'feature_2', 'feature_3']
    X = df[features]
    y = df['target']
    
    # BUG 1: No random_state in train_test_split!
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2
    )  # <-- Missing random_state!
    
    # BUG 2: No random_state in model!
    model = GradientBoostingClassifier(
        n_estimators=100,
        max_depth=3
    )  # <-- Missing random_state!
    
    model.fit(X_train, y_train)
    
    # Evaluate
    y_prob = model.predict_proba(X_test)[:, 1]
    auc = roc_auc_score(y_test, y_prob)
    
    return auc

# Test reproducibility
print("=== Testing Broken Pipeline ===")
results = [broken_train_pipeline(data) for _ in range(5)]

print(f"\nAUC across 5 runs: {results}")
print(f"Variance: {np.var(results):.6f}")
print(f"\n❌ Results are different every run! Not reproducible.")

---

## Your Investigation

### Step 1: Identify the reproducibility bugs

In [None]:
print("=== Reproducibility Bug Analysis ===")
print()
print("BUG 1: train_test_split without random_state")
print("  - Each run creates different train/test splits")
print("  - Model sees different data each time")
print()
print("BUG 2: Model without random_state")
print("  - GradientBoosting has randomness in tree building")
print("  - Same data can produce different models")
print()
print("Combined effect:")
print("  - Impossible to debug or compare experiments")
print("  - Can't verify if a code change helped or hurt")

### Step 2: TODO - Fix the pipeline

In [None]:
# TODO: Create reproducible training pipeline

# Uncomment and complete:

# def fixed_train_pipeline(df, random_state=42):
#     """Reproducible training pipeline."""
#     
#     features = ['feature_1', 'feature_2', 'feature_3']
#     X = df[features]
#     y = df['target']
#     
#     # FIX 1: Add random_state to train_test_split
#     X_train, X_test, y_train, y_test = train_test_split(
#         X, y, test_size=0.2, random_state=random_state
#     )
#     
#     # FIX 2: Add random_state to model
#     model = GradientBoostingClassifier(
#         n_estimators=100,
#         max_depth=3,
#         random_state=random_state
#     )
#     
#     model.fit(X_train, y_train)
#     
#     # Evaluate
#     y_prob = model.predict_proba(X_test)[:, 1]
#     auc = roc_auc_score(y_test, y_prob)
#     
#     return auc
# 
# print("✓ Fixed pipeline defined")

In [None]:
# TODO: Verify reproducibility

# Uncomment:

# print("=== Testing Fixed Pipeline ===")
# fixed_results = [fixed_train_pipeline(data) for _ in range(5)]
# 
# print(f"\nAUC across 5 runs: {fixed_results}")
# print(f"Variance: {np.var(fixed_results):.6f}")
# 
# is_reproducible = len(set(fixed_results)) == 1
# print(f"\n{'✓ Reproducible!' if is_reproducible else '❌ Still not reproducible'}")

In [None]:
# ============================================
# SELF-CHECK
# ============================================

# Uncomment:

# # Run 3 times and verify identical results
# results = [fixed_train_pipeline(data) for _ in range(3)]
# assert len(set(results)) == 1, f"Results should be identical: {results}"
# 
# print("✓ Reproducibility fixed!")
# print(f"✓ Consistent AUC: {results[0]:.4f}")

### Step 3: Write your postmortem

In [None]:
postmortem = """
## Postmortem: The Unreproducible Model

### What happened:
- (Your answer: What symptoms indicated non-reproducibility?)

### Root cause:
- (Your answer: What was missing from the pipeline?)

### How to prevent:
- (Your answer: What should always be set for reproducibility?)

"""

print(postmortem)

---

## ✅ Drill Complete!

**Key lessons:**

1. **Set random_state everywhere.** train_test_split, models, any random operation.

2. **Test reproducibility explicitly.** Run multiple times, check for variance.

3. **Use configuration files.** Store all hyperparameters including random seeds.

4. **Version control seeds.** Makes experiments trackable and reproducible.

---

## Reproducibility Checklist

| Component | Set random_state? |
|-----------|-------------------|
| train_test_split | ✓ Required |
| KFold/cross_val | ✓ Required |
| RandomForest | ✓ Required |
| GradientBoosting | ✓ Required |
| Neural Networks | ✓ + torch.manual_seed |
| Data shuffling | ✓ Required |