## 9. OFFLINE LEARNING & DEPLOYMENT ISSUES

### Definition
**Offline Learning Problem**: Model trained on historical data doesn't adapt to real-world changes.

### The Problem: Concept Drift

```
Training (Month 1):
  - Users prefer Product A
  - Model learns: "Recommend A"

Month 3 (Real World):
  - Preferences changed!
  - Users now prefer Product B
  - But model still recommends A (based on month 1 data)
  - → Poor performance!
```

### Real-World Examples:


In [None]:
# Example 1: Stock Price Prediction
# Trained: 2019 data (normal market)
# Deployed: 2020 during COVID crash
# Result: Model fails (market behavior completely different)

# Example 2: Spam Detection
# Trained: 2020 spam patterns
# Deployed: 2024, new spam tactics
# Result: Misses new spam types

# Example 3: Product Demand
# Trained: Pre-pandemic shopping
# Deployed: During pandemic (online boom)
# Result: Predictions way off

# Example 4: Credit Scoring
# Trained: Traditional credit patterns
# Deployed: Now gig economy, crypto, new payment methods
# Result: Unfair to new demographic groups


### Detecting Concept Drift:


In [None]:
import numpy as np
from sklearn.metrics import accuracy_score

def detect_concept_drift(y_true, y_pred_recent, y_pred_old, window_size=100):
    """
    Detect if model performance degraded (concept drift)
    """
    
    # Compare recent vs old performance
    recent_accuracy = accuracy_score(y_true[-window_size:], y_pred_recent[-window_size:])
    old_accuracy = accuracy_score(y_true[-2*window_size:-window_size], y_pred_old[-2*window_size:-window_size])
    
    drift_magnitude = old_accuracy - recent_accuracy
    
    print(f"Old accuracy: {old_accuracy:.3f}")
    print(f"Recent accuracy: {recent_accuracy:.3f}")
    print(f"Performance drop: {drift_magnitude:.3f}")
    
    if drift_magnitude > 0.05:  # 5% drop
        print("⚠️  CONCEPT DRIFT DETECTED!")
        return True
    else:
        print("✅ No significant drift")
        return False

# Statistical test: Kolmogorov-Smirnov test
from scipy.stats import ks_2samp

def detect_data_drift(X_train, X_recent):
    """
    Detect if input distribution changed
    """
    
    n_features = X_train.shape[1]
    p_values = []
    
    for i in range(n_features):
        _, p_value = ks_2samp(X_train[:, i], X_recent[:, i])
        p_values.append(p_value)
    
    print("Feature drift p-values:")
    for i, p in enumerate(p_values):
        status = "⚠️  DRIFTED" if p < 0.05 else "✅"
        print(f"  Feature {i}: {p:.4f} {status}")


### Solutions to Concept Drift:

#### 1. **Online Learning**


In [None]:
from sklearn.linear_model import SGDClassifier

# Online learning: Updates continuously
online_model = SGDClassifier(loss='log', warm_start=False)

# Continuous retraining loop
for new_data, new_labels in streaming_data:
    # Update model with new samples
    if first_batch:
        online_model.partial_fit(new_data, new_labels, classes=[0, 1])
    else:
        online_model.partial_fit(new_data, new_labels)
    
    # Monitor performance
    recent_performance = online_model.score(recent_test_data, recent_test_labels)
    
    if recent_performance < threshold:
        print("Performance degraded, retraining...")
        online_model = retrain_from_scratch()


#### 2. **Scheduled Retraining**


In [None]:
import schedule
import time

def retrain_model():
    """Retrain model daily"""
    print("Retraining model...")
    
    # Get recent data
    recent_data = load_recent_data(days=30)
    
    # Retrain
    new_model = train_model(recent_data)
    
    # Validate
    val_score = validate(new_model)
    
    if val_score > current_score:
        # Deploy new model
        save_model(new_model)
        print("✅ New model deployed")
    else:
        print("❌ New model underperforms, keeping old one")

# Schedule daily retraining
schedule.every().day.at("02:00").do(retrain_model)

while True:
    schedule.run_pending()
    time.sleep(60)


#### 3. **Incremental Learning**


In [None]:
# Combine batch + online: Benefits of both
from sklearn.linear_model import SGDClassifier

class IncrementalModel:
    def __init__(self, retrain_frequency='weekly'):
        self.batch_model = None
        self.incremental_model = SGDClassifier()
        self.retrain_frequency = retrain_frequency
    
    def retrain_batch(self, X_historical, y_historical):
        """Batch training on full historical data"""
        print("Batch retraining on full history...")
        self.batch_model = train_batch_model(X_historical, y_historical)
        self.incremental_model = self.batch_model
    
    def update_online(self, X_new, y_new):
        """Online update with new data"""
        self.incremental_model.partial_fit(X_new, y_new)
    
    def predict(self, X):
        """Use online model (up-to-date)"""
        return self.incremental_model.predict(X)

model = IncrementalModel()

# Batch training monthly
if calendar.day_of_month() == 1:
    model.retrain_batch(all_historical_data, all_labels)

# Online updates daily
model.update_online(today_data, today_labels)

# Predictions always use latest model
predictions = model.predict(X_test)


#### 4. **Model Ensemble**


In [None]:
# Use multiple models, combine predictions
from sklearn.ensemble import VotingClassifier

# Train models on different time periods
model_recent = train_on_data(recent_6_months)
model_medium = train_on_data(past_1_year)
model_long = train_on_data(past_5_years)

# Ensemble: Weight recent more
ensemble = VotingClassifier(
    estimators=[
        ('recent', model_recent),
        ('medium', model_medium),
        ('long', model_long)
    ],
    weights=[0.5, 0.3, 0.2]  # Recent model has more weight
)

ensemble_pred = ensemble.predict(X)
# More robust to concept drift!


### Offline Learning Deployment Workflow:

```
Monday 8 AM:
  ├─ Extract data from production database (past month)
  ├─ Clean and validate data
  ├─ Train new model
  ├─ Test on hold-out set
  ├─ Compare to current model
  └─ If better: Deploy new model
      └─ Update serving system
      └─ Log deployment

Monday-Sunday:
  └─ Serve predictions with deployed model (no updates)

Sunday 11 PM:
  └─ Start next week's training pipeline

Problem: Predictions use month-old patterns!
Solution: Use online updates between batch trainings
```

---
