# Monitoring CLV Model Drift

This notebook demonstrates how to monitor CLV model performance over time and detect when models need retraining.

**What you'll learn:**
- Time-based train/test splits for CLV models
- Monitoring prediction accuracy over time
- Detecting statistical drift in customer behavior
- Setting thresholds for model retraining
- Rolling window validation

**Why monitoring matters:**
- Customer behavior changes over time (seasonality, market shifts, product changes)
- Models trained on old data become stale
- Early detection prevents bad business decisions

In [None]:
from datetime import date, timedelta
from pathlib import Path
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from customer_base_audit.synthetic.generator import generate_customers, generate_transactions
from customer_base_audit.synthetic.scenarios import BASELINE_SCENARIO, HIGH_CHURN_SCENARIO
from customer_base_audit.foundation.data_mart import build_data_mart
from customer_base_audit.foundation.rfm import calculate_rfm
from customer_base_audit.models.bgnbd import BGNBD
from customer_base_audit.models.gamma_gamma import GammaGamma

## Step 1: Generate Data with Regime Change

We'll simulate a business that starts stable, then experiences a regime change (increased churn).
This mimics real scenarios like:
- New competitor enters market
- Product quality issues
- Price increase
- Economic downturn

In [None]:
# Generate 18 months of stable data (training period)
train_start = date(2023, 1, 1)
train_end = date(2024, 6, 30)

customers_train = generate_customers(500, train_start, train_end, seed=100)
txns_train = generate_transactions(
    customers_train, train_start, train_end, 
    scenario=BASELINE_SCENARIO  # Stable baseline behavior
)

# Generate 6 months with regime change (high churn)
regime_start = date(2024, 7, 1)
regime_end = date(2024, 12, 31)

# Continue with same customers but different behavior
txns_regime = generate_transactions(
    customers_train, regime_start, regime_end,
    scenario=HIGH_CHURN_SCENARIO  # Changed behavior
)

# Combine all transactions
all_txns = txns_train + txns_regime

print(f"Training period: {train_start} to {train_end} ({len(txns_train)} txns)")
print(f"Regime change: {regime_start} to {regime_end} ({len(txns_regime)} txns)")
print(f"Total transactions: {len(all_txns)}")

## Step 2: Train Model on Historical Data

Train BG/NBD + Gamma-Gamma models using only the stable period.
This represents the scenario where you trained a model and deployed it.

In [None]:
# Build data mart for training period
mart_train = build_data_mart(
    customers_train, 
    txns_train, 
    snapshot_date=train_end
)

# Calculate RFM
rfm_train = calculate_rfm(
    mart_train,
    customer_id_col='customer_id',
    frequency_col='purchase_count',
    recency_col='days_since_last_purchase',
    monetary_col='avg_order_value',
    t_col='customer_age_days'
)

# Train BG/NBD model
print("Training BG/NBD model on stable period...")
bgnbd = BGNBD()
bgnbd.fit(
    rfm_train['frequency'],
    rfm_train['recency'],
    rfm_train['T']
)

# Train Gamma-Gamma model (only on repeat customers)
rfm_repeat = rfm_train[rfm_train['frequency'] > 0]
print(f"Training Gamma-Gamma on {len(rfm_repeat)} repeat customers...")
gg = GammaGamma()
gg.fit(
    rfm_repeat['frequency'],
    rfm_repeat['monetary_value']
)

print("\nTrained model parameters:")
print(f"BG/NBD: r={bgnbd.params_['r']:.4f}, alpha={bgnbd.params_['alpha']:.4f}, a={bgnbd.params_['a']:.4f}, b={bgnbd.params_['b']:.4f}")
print(f"Gamma-Gamma: p={gg.params_['p']:.4f}, q={gg.params_['q']:.4f}, v={gg.params_['v']:.4f}")

## Step 3: Rolling Window Monitoring

Monitor model performance on rolling monthly windows.
Compare predicted vs actual purchase counts to detect drift.

In [None]:
def calculate_monthly_actuals(transactions, customers, observation_end, month_start, month_end):
    """Calculate actual purchases in a future month for customers observed up to observation_end."""
    # Filter transactions in the target month
    month_txns = [
        t for t in transactions 
        if month_start <= t.transaction_date <= month_end
    ]
    
    # Count transactions per customer
    actual_counts = {}
    for t in month_txns:
        actual_counts[t.customer_id] = actual_counts.get(t.customer_id, 0) + 1
    
    # Ensure all customers have an entry (0 if no purchases)
    for c in customers:
        if c.customer_id not in actual_counts:
            actual_counts[c.customer_id] = 0
    
    return actual_counts

# Monitor performance on each month after training
monitoring_results = []

for month_offset in range(6):  # 6 months after training
    month_start = date(2024, 7 + month_offset, 1)
    # Calculate last day of month
    if month_offset < 5:
        month_end = date(2024, 7 + month_offset + 1, 1) - timedelta(days=1)
    else:
        month_end = date(2024, 12, 31)
    
    # Predict purchases for next 30 days using model trained on stable period
    predictions = bgnbd.predict(
        rfm_train['frequency'],
        rfm_train['recency'],
        rfm_train['T'],
        t=30  # 30-day prediction
    )
    
    # Get actual purchases in this month
    actuals = calculate_monthly_actuals(
        all_txns, customers_train, train_end, month_start, month_end
    )
    
    # Align predictions with actuals
    pred_values = [predictions[i] for i, cid in enumerate(rfm_train['customer_id'])]
    actual_values = [actuals[cid] for cid in rfm_train['customer_id']]
    
    # Calculate metrics
    mae = np.mean(np.abs(np.array(pred_values) - np.array(actual_values)))
    rmse = np.sqrt(np.mean((np.array(pred_values) - np.array(actual_values))**2))
    
    # Calculate mean prediction and actual
    mean_pred = np.mean(pred_values)
    mean_actual = np.mean(actual_values)
    bias = mean_pred - mean_actual
    
    monitoring_results.append({
        'month': month_start.strftime('%Y-%m'),
        'mae': mae,
        'rmse': rmse,
        'mean_predicted': mean_pred,
        'mean_actual': mean_actual,
        'bias': bias
    })

monitor_df = pd.DataFrame(monitoring_results)
print("\nMonthly Monitoring Results:")
print(monitor_df.to_string(index=False))

## Step 4: Visualize Model Drift

Plot prediction error over time to detect drift.

In [None]:
fig, axes = plt.subplots(2, 1, figsize=(12, 8))

# Plot 1: MAE and RMSE over time
axes[0].plot(monitor_df['month'], monitor_df['mae'], 'o-', label='MAE', linewidth=2)
axes[0].plot(monitor_df['month'], monitor_df['rmse'], 's-', label='RMSE', linewidth=2)
axes[0].axhline(y=monitor_df['mae'].iloc[0] * 1.5, color='r', linestyle='--', label='Alert Threshold (50% increase)')
axes[0].set_xlabel('Month')
axes[0].set_ylabel('Error')
axes[0].set_title('Prediction Error Over Time')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
axes[0].tick_params(axis='x', rotation=45)

# Plot 2: Predicted vs Actual mean purchases
axes[1].plot(monitor_df['month'], monitor_df['mean_predicted'], 'o-', label='Predicted', linewidth=2)
axes[1].plot(monitor_df['month'], monitor_df['mean_actual'], 's-', label='Actual', linewidth=2)
axes[1].set_xlabel('Month')
axes[1].set_ylabel('Mean Purchases per Customer')
axes[1].set_title('Predicted vs Actual Purchase Rate')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
axes[1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

# Detect drift
baseline_mae = monitor_df['mae'].iloc[0]
current_mae = monitor_df['mae'].iloc[-1]
drift_pct = ((current_mae - baseline_mae) / baseline_mae) * 100

print(f"\n🔍 Drift Detection:")
print(f"Baseline MAE (first month): {baseline_mae:.4f}")
print(f"Current MAE (last month): {current_mae:.4f}")
print(f"Drift: {drift_pct:+.1f}%")

if drift_pct > 50:
    print("\n⚠️  ALERT: Model drift detected! Consider retraining.")
else:
    print("\n✅ Model performance stable.")

## Step 5: Monitor RFM Distribution Drift

Track changes in the RFM distribution over time.
Significant shifts indicate changing customer behavior.

In [None]:
# Calculate RFM at multiple time points
rfm_snapshots = []

for month_offset in [0, 3, 6]:  # Training end, +3 months, +6 months
    if month_offset == 0:
        snapshot_date = train_end
        txns_subset = txns_train
        label = "Training (2024-06)"
    else:
        snapshot_date = date(2024, 6 + month_offset, 30)
        txns_subset = [t for t in all_txns if t.transaction_date <= snapshot_date]
        label = f"+{month_offset}mo (2024-{6+month_offset:02d})"
    
    mart_snapshot = build_data_mart(customers_train, txns_subset, snapshot_date=snapshot_date)
    rfm_snapshot = calculate_rfm(
        mart_snapshot,
        customer_id_col='customer_id',
        frequency_col='purchase_count',
        recency_col='days_since_last_purchase',
        monetary_col='avg_order_value',
        t_col='customer_age_days'
    )
    
    rfm_snapshots.append({
        'label': label,
        'date': snapshot_date,
        'rfm': rfm_snapshot
    })

# Compare distributions
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

metrics = ['frequency', 'recency', 'monetary_value']
titles = ['Frequency Distribution', 'Recency Distribution', 'Monetary Value Distribution']

for idx, (metric, title) in enumerate(zip(metrics, titles)):
    for snapshot in rfm_snapshots:
        axes[idx].hist(
            snapshot['rfm'][metric], 
            bins=20, 
            alpha=0.5, 
            label=snapshot['label'],
            edgecolor='black'
        )
    axes[idx].set_xlabel(metric.replace('_', ' ').title())
    axes[idx].set_ylabel('Count')
    axes[idx].set_title(title)
    axes[idx].legend()
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate distribution statistics
print("\nRFM Distribution Statistics:")
print("=" * 80)
for snapshot in rfm_snapshots:
    print(f"\n{snapshot['label']}:")
    for metric in metrics:
        mean_val = snapshot['rfm'][metric].mean()
        std_val = snapshot['rfm'][metric].std()
        print(f"  {metric:20s}: mean={mean_val:8.2f}, std={std_val:8.2f}")

## Step 6: Statistical Drift Detection (Kolmogorov-Smirnov Test)

Use statistical tests to detect significant distribution changes.

In [None]:
from scipy.stats import ks_2samp

# Compare training distribution vs latest distribution
baseline_rfm = rfm_snapshots[0]['rfm']
current_rfm = rfm_snapshots[-1]['rfm']

print("Kolmogorov-Smirnov Test Results:")
print("=" * 60)
print("H0: Distributions are the same")
print("Ha: Distributions are different (drift detected)")
print()

for metric in metrics:
    statistic, pvalue = ks_2samp(
        baseline_rfm[metric],
        current_rfm[metric]
    )
    
    significance = "⚠️  DRIFT DETECTED" if pvalue < 0.05 else "✅ No drift"
    
    print(f"{metric.upper():20s}: KS={statistic:.4f}, p={pvalue:.4f} {significance}")

## Step 7: Retraining Decision Framework

Establish clear criteria for when to retrain models.

In [None]:
def should_retrain(monitor_df, baseline_rfm, current_rfm, thresholds=None):
    """
    Determine if model should be retrained based on multiple criteria.
    
    Args:
        monitor_df: DataFrame with monitoring metrics over time
        baseline_rfm: RFM dataframe from training period
        current_rfm: RFM dataframe from current period
        thresholds: Dict with threshold values
    
    Returns:
        tuple: (should_retrain: bool, reasons: list)
    """
    if thresholds is None:
        thresholds = {
            'mae_increase_pct': 50,  # Retrain if MAE increases >50%
            'ks_pvalue': 0.05,        # Retrain if any RFM distribution differs (p<0.05)
            'bias_threshold': 0.5,     # Retrain if bias > 0.5 purchases/customer
        }
    
    reasons = []
    
    # Check 1: MAE increase
    baseline_mae = monitor_df['mae'].iloc[0]
    current_mae = monitor_df['mae'].iloc[-1]
    mae_increase_pct = ((current_mae - baseline_mae) / baseline_mae) * 100
    
    if mae_increase_pct > thresholds['mae_increase_pct']:
        reasons.append(f"MAE increased by {mae_increase_pct:.1f}% (threshold: {thresholds['mae_increase_pct']}%)")
    
    # Check 2: Distribution drift (KS test)
    for metric in ['frequency', 'recency', 'monetary_value']:
        _, pvalue = ks_2samp(baseline_rfm[metric], current_rfm[metric])
        if pvalue < thresholds['ks_pvalue']:
            reasons.append(f"{metric} distribution changed significantly (p={pvalue:.4f})")
    
    # Check 3: Prediction bias
    current_bias = abs(monitor_df['bias'].iloc[-1])
    if current_bias > thresholds['bias_threshold']:
        reasons.append(f"Prediction bias too high ({current_bias:.2f} purchases/customer)")
    
    should_retrain = len(reasons) > 0
    
    return should_retrain, reasons

# Run retraining decision
retrain, reasons = should_retrain(monitor_df, baseline_rfm, current_rfm)

print("\n" + "=" * 70)
print("RETRAINING DECISION")
print("=" * 70)

if retrain:
    print("\n⚠️  RECOMMENDATION: RETRAIN MODEL\n")
    print("Reasons:")
    for i, reason in enumerate(reasons, 1):
        print(f"  {i}. {reason}")
    print("\nNext steps:")
    print("  1. Retrain BG/NBD and Gamma-Gamma models on recent data")
    print("  2. Validate new model on holdout set")
    print("  3. A/B test old vs new model predictions")
    print("  4. Deploy new model if performance improves")
else:
    print("\n✅ RECOMMENDATION: KEEP CURRENT MODEL\n")
    print("Model performance is stable. Continue monitoring.")

## Step 8: Retrain on Recent Data

If drift is detected, retrain models on more recent data.

In [None]:
if retrain:
    print("Retraining models on recent data (last 12 months)...\n")
    
    # Use last 12 months for retraining
    retrain_start = date(2024, 1, 1)
    retrain_end = date(2024, 12, 31)
    
    txns_retrain = [t for t in all_txns if retrain_start <= t.transaction_date <= retrain_end]
    
    # Build new data mart
    mart_retrain = build_data_mart(customers_train, txns_retrain, snapshot_date=retrain_end)
    rfm_retrain = calculate_rfm(
        mart_retrain,
        customer_id_col='customer_id',
        frequency_col='purchase_count',
        recency_col='days_since_last_purchase',
        monetary_col='avg_order_value',
        t_col='customer_age_days'
    )
    
    # Train new models
    bgnbd_new = BGNBD()
    bgnbd_new.fit(
        rfm_retrain['frequency'],
        rfm_retrain['recency'],
        rfm_retrain['T']
    )
    
    rfm_repeat_new = rfm_retrain[rfm_retrain['frequency'] > 0]
    gg_new = GammaGamma()
    gg_new.fit(
        rfm_repeat_new['frequency'],
        rfm_repeat_new['monetary_value']
    )
    
    print("\nParameter Comparison:")
    print("=" * 70)
    print("\nBG/NBD:")
    print(f"  Old model: r={bgnbd.params_['r']:.4f}, alpha={bgnbd.params_['alpha']:.4f}, a={bgnbd.params_['a']:.4f}, b={bgnbd.params_['b']:.4f}")
    print(f"  New model: r={bgnbd_new.params_['r']:.4f}, alpha={bgnbd_new.params_['alpha']:.4f}, a={bgnbd_new.params_['a']:.4f}, b={bgnbd_new.params_['b']:.4f}")
    
    print("\nGamma-Gamma:")
    print(f"  Old model: p={gg.params_['p']:.4f}, q={gg.params_['q']:.4f}, v={gg.params_['v']:.4f}")
    print(f"  New model: p={gg_new.params_['p']:.4f}, q={gg_new.params_['q']:.4f}, v={gg_new.params_['v']:.4f}")
    
    print("\n✅ Models retrained successfully!")
else:
    print("No retraining needed at this time.")

## Key Takeaways

### When to Monitor:
- **Monthly**: For businesses with stable customer bases
- **Weekly**: For fast-moving consumer goods or seasonal businesses
- **After major events**: Product launches, price changes, competitor actions

### What to Monitor:
1. **Prediction accuracy** (MAE, RMSE)
2. **Prediction bias** (over/under prediction)
3. **RFM distribution changes** (KS tests)
4. **Business metrics** (actual purchase rates, revenue)

### Retraining Triggers:
- Prediction error increases >50%
- Statistically significant RFM drift (p < 0.05)
- Persistent prediction bias
- Major business changes (new product, pricing, market conditions)

### Best Practices:
- Set up automated monitoring dashboards
- Document baseline metrics when deploying models
- Use rolling windows for training (e.g., last 12 months)
- A/B test old vs new models before full deployment
- Keep historical model versions for comparison

### Common Pitfalls:
- ❌ Waiting too long to retrain (stale models hurt business)
- ❌ Retraining too frequently (adds complexity, may overfit noise)
- ❌ Ignoring seasonality when detecting drift
- ❌ Not validating retrained models before deployment