# üéØ Production-Grade Recommendation System Tutorial

**Complete End-to-End Implementation for Staff-Level ML Interviews**

This notebook covers:
1. Data Generation & Processing
2. Feature Engineering
3. Embedding Models (5 strategies)
4. Ranking Models (LightGBM, DCN, DeepFM)
5. Production Serving Architecture
6. Monitoring & A/B Testing

---

## üì¶ Setup & Imports

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
from typing import List, Dict, Tuple
import warnings
warnings.filterwarnings('ignore')

# Deep learning
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

# Random seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)

print("‚úÖ All imports successful!")

---
## 1Ô∏è‚É£ Data Generation & Exploration

### Key Interview Topic: Data Characteristics in RecSys
- **Implicit vs Explicit Feedback**
- **Sparsity** (99.9% of user-item pairs have no interaction)
- **Power Law Distribution** (few popular items, long tail)
- **Temporal Patterns** (seasonality, trends)

In [None]:
# Generate synthetic user-item interactions
num_users = 10000
num_items = 5000
num_interactions = 100000

print(f"Generating dataset:")
print(f"  - Users: {num_users:,}")
print(f"  - Items: {num_items:,}")
print(f"  - Interactions: {num_interactions:,}")
print(f"  - Sparsity: {100 * (1 - num_interactions/(num_users*num_items)):.2f}%")

# Create interactions with realistic distribution
# Power law: few users/items are very active/popular
user_activity = np.random.power(0.5, num_users)  # Power law
user_probs = user_activity / user_activity.sum()

item_popularity = np.random.power(0.5, num_items)
item_probs = item_popularity / item_popularity.sum()

interactions_df = pd.DataFrame({
    'user_id': np.random.choice(num_users, num_interactions, p=user_probs),
    'item_id': np.random.choice(num_items, num_interactions, p=item_probs),
    'timestamp': [datetime.now() - timedelta(days=np.random.randint(0, 90)) 
                  for _ in range(num_interactions)],
    'event_type': np.random.choice(
        ['view', 'click', 'add_to_cart', 'purchase'],
        num_interactions,
        p=[0.6, 0.25, 0.1, 0.05]
    )
})

# Add explicit ratings for some interactions
interactions_df['rating'] = np.random.randint(1, 6, num_interactions)

print(f"\n‚úÖ Generated {len(interactions_df):,} interactions")
interactions_df.head()

### üìä Data Exploration & Visualization

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# 1. Event type distribution
event_counts = interactions_df['event_type'].value_counts()
axes[0, 0].bar(event_counts.index, event_counts.values, color='skyblue')
axes[0, 0].set_title('Event Type Distribution', fontsize=14, fontweight='bold')
axes[0, 0].set_ylabel('Count')
axes[0, 0].grid(axis='y', alpha=0.3)

# 2. User activity distribution (power law)
user_counts = interactions_df['user_id'].value_counts().values
axes[0, 1].hist(user_counts, bins=50, color='coral', alpha=0.7)
axes[0, 1].set_title('User Activity Distribution (Power Law)', fontsize=14, fontweight='bold')
axes[0, 1].set_xlabel('Interactions per User')
axes[0, 1].set_ylabel('Number of Users')
axes[0, 1].set_yscale('log')
axes[0, 1].grid(alpha=0.3)

# 3. Item popularity distribution
item_counts = interactions_df['item_id'].value_counts().values
axes[1, 0].hist(item_counts, bins=50, color='lightgreen', alpha=0.7)
axes[1, 0].set_title('Item Popularity Distribution', fontsize=14, fontweight='bold')
axes[1, 0].set_xlabel('Interactions per Item')
axes[1, 0].set_ylabel('Number of Items')
axes[1, 0].set_yscale('log')
axes[1, 0].grid(alpha=0.3)

# 4. Temporal pattern
interactions_df['date'] = interactions_df['timestamp'].dt.date
daily_counts = interactions_df.groupby('date').size()
axes[1, 1].plot(daily_counts.index, daily_counts.values, marker='o', linewidth=2, markersize=4)
axes[1, 1].set_title('Daily Interaction Volume', fontsize=14, fontweight='bold')
axes[1, 1].set_xlabel('Date')
axes[1, 1].set_ylabel('Interactions')
axes[1, 1].grid(alpha=0.3)
axes[1, 1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

print("\nüìä Key Statistics:")
print(f"  - Most active user: {user_counts.max()} interactions")
print(f"  - Most popular item: {item_counts.max()} interactions")
print(f"  - Avg interactions per user: {user_counts.mean():.1f}")
print(f"  - Avg interactions per item: {item_counts.mean():.1f}")

---
## 2Ô∏è‚É£ Feature Engineering

### Interview Topic: What makes good features?
- **User Features**: Demographics, behavior, preferences
- **Item Features**: Metadata, popularity, quality
- **Context Features**: Time, device, location
- **Interaction Features**: User-item affinity, similarity

In [None]:
# User Features
print("üîß Engineering User Features...")

user_features = interactions_df.groupby('user_id').agg({
    'item_id': 'count',  # Total interactions
    'timestamp': ['min', 'max'],  # First and last interaction
    'rating': 'mean'  # Average rating given
}).reset_index()

user_features.columns = ['user_id', 'total_interactions', 'first_seen', 'last_seen', 'avg_rating']

# Derived features
user_features['recency_days'] = (datetime.now() - user_features['last_seen']).dt.days
user_features['tenure_days'] = (user_features['last_seen'] - user_features['first_seen']).dt.days
user_features['activity_rate'] = user_features['total_interactions'] / (user_features['tenure_days'] + 1)

# Event type distribution per user
event_dist = interactions_df.groupby(['user_id', 'event_type']).size().unstack(fill_value=0)
event_dist.columns = [f'{col}_count' for col in event_dist.columns]
user_features = user_features.merge(event_dist, on='user_id', how='left').fillna(0)

# Conversion rate
user_features['conversion_rate'] = user_features['purchase_count'] / (user_features['click_count'] + 1)

print(f"‚úÖ Created {len(user_features)} user feature vectors")
print(f"   Features: {list(user_features.columns)}")
user_features.head()

In [None]:
# Item Features
print("üîß Engineering Item Features...")

item_features = interactions_df.groupby('item_id').agg({
    'user_id': 'count',  # Total views
    'rating': 'mean',  # Average rating
    'timestamp': 'max'  # Last interaction
}).reset_index()

item_features.columns = ['item_id', 'popularity', 'avg_rating', 'last_interaction']

# Derived features
item_features['log_popularity'] = np.log1p(item_features['popularity'])
item_features['days_since_interaction'] = (datetime.now() - item_features['last_interaction']).dt.days

# Event conversion metrics per item
item_events = interactions_df.groupby(['item_id', 'event_type']).size().unstack(fill_value=0)
item_events['ctr'] = item_events['click'] / (item_events['view'] + 1)
item_events['conversion_rate'] = item_events['purchase'] / (item_events['click'] + 1)

item_features = item_features.merge(
    item_events[['ctr', 'conversion_rate']], 
    on='item_id', 
    how='left'
).fillna(0)

print(f"‚úÖ Created {len(item_features)} item feature vectors")
print(f"   Features: {list(item_features.columns)}")
item_features.head()

### üìä Feature Correlations

In [None]:
# Analyze user feature correlations
numeric_cols = user_features.select_dtypes(include=[np.number]).columns
correlation_matrix = user_features[numeric_cols].corr()

plt.figure(figsize=(12, 10))
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm', center=0)
plt.title('User Feature Correlation Matrix', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nüí° Interview Insight:")
print("High correlations (>0.7) indicate redundant features.")
print("Consider PCA or feature selection for efficiency.")

---
## 3Ô∏è‚É£ Train/Test Split

### Interview Topic: Why Time-Based Split?
- ‚ùå **Random split**: Data leakage (using future to predict past)
- ‚úÖ **Time-based split**: Realistic (predict future from past)
- Simulates production scenario

In [None]:
# Time-based split: last 7 days for test
cutoff_date = datetime.now() - timedelta(days=7)

train_df = interactions_df[interactions_df['timestamp'] < cutoff_date].copy()
test_df = interactions_df[interactions_df['timestamp'] >= cutoff_date].copy()

print(f"üìä Dataset Split:")
print(f"  - Train: {len(train_df):,} interactions ({len(train_df)/len(interactions_df)*100:.1f}%)")
print(f"  - Test:  {len(test_df):,} interactions ({len(test_df)/len(interactions_df)*100:.1f}%)")
print(f"\n  - Train date range: {train_df['timestamp'].min()} to {train_df['timestamp'].max()}")
print(f"  - Test date range:  {test_df['timestamp'].min()} to {test_df['timestamp'].max()}")

# Visualize split
fig, ax = plt.subplots(figsize=(12, 4))
train_daily = train_df.groupby(train_df['timestamp'].dt.date).size()
test_daily = test_df.groupby(test_df['timestamp'].dt.date).size()

ax.bar(train_daily.index, train_daily.values, label='Train', alpha=0.7, color='steelblue')
ax.bar(test_daily.index, test_daily.values, label='Test', alpha=0.7, color='coral')
ax.axvline(cutoff_date.date(), color='red', linestyle='--', linewidth=2, label='Split Point')
ax.set_xlabel('Date')
ax.set_ylabel('Interactions')
ax.set_title('Time-Based Train/Test Split', fontsize=14, fontweight='bold')
ax.legend()
ax.grid(alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

---
## 4Ô∏è‚É£ Embedding Models

### Strategy 1: Matrix Factorization (Baseline)

In [None]:
class MatrixFactorization(nn.Module):
    """
    Classic Matrix Factorization (SVD-based)
    
    Interview Point: When to use MF vs Deep Learning?
    - MF: Simple, interpretable, good baseline
    - DL: Better for complex patterns, side information
    """
    def __init__(self, num_users, num_items, embedding_dim=64):
        super().__init__()
        self.user_embeddings = nn.Embedding(num_users, embedding_dim)
        self.item_embeddings = nn.Embedding(num_items, embedding_dim)
        self.user_bias = nn.Embedding(num_users, 1)
        self.item_bias = nn.Embedding(num_items, 1)
        self.global_bias = nn.Parameter(torch.zeros(1))
        
        # Xavier initialization
        nn.init.xavier_uniform_(self.user_embeddings.weight)
        nn.init.xavier_uniform_(self.item_embeddings.weight)
        nn.init.zeros_(self.user_bias.weight)
        nn.init.zeros_(self.item_bias.weight)
    
    def forward(self, user_ids, item_ids):
        user_emb = self.user_embeddings(user_ids)
        item_emb = self.item_embeddings(item_ids)
        
        # Dot product + biases
        scores = (user_emb * item_emb).sum(dim=1)
        scores += self.user_bias(user_ids).squeeze()
        scores += self.item_bias(item_ids).squeeze()
        scores += self.global_bias
        
        return scores

# Initialize model
embedding_dim = 64
mf_model = MatrixFactorization(num_users, num_items, embedding_dim)

print(f"‚úÖ Matrix Factorization Model Created")
print(f"   - Users: {num_users:,}")
print(f"   - Items: {num_items:,}")
print(f"   - Embedding dim: {embedding_dim}")
print(f"   - Parameters: {sum(p.numel() for p in mf_model.parameters()):,}")

### Training Matrix Factorization

In [None]:
# Prepare training data
train_users = torch.LongTensor(train_df['user_id'].values)
train_items = torch.LongTensor(train_df['item_id'].values)
train_ratings = torch.FloatTensor(train_df['rating'].values)

# Training loop
optimizer = torch.optim.Adam(mf_model.parameters(), lr=0.01, weight_decay=1e-5)
criterion = nn.MSELoss()

batch_size = 1024
num_epochs = 5
train_losses = []

print("üöÄ Training Matrix Factorization...\n")

for epoch in range(num_epochs):
    mf_model.train()
    epoch_loss = 0
    num_batches = len(train_df) // batch_size
    
    for i in range(num_batches):
        start_idx = i * batch_size
        end_idx = start_idx + batch_size
        
        batch_users = train_users[start_idx:end_idx]
        batch_items = train_items[start_idx:end_idx]
        batch_ratings = train_ratings[start_idx:end_idx]
        
        optimizer.zero_grad()
        predictions = mf_model(batch_users, batch_items)
        loss = criterion(predictions, batch_ratings)
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
    
    avg_loss = epoch_loss / num_batches
    train_losses.append(avg_loss)
    print(f"Epoch {epoch+1}/{num_epochs} - Loss: {avg_loss:.4f}")

# Plot training curve
plt.figure(figsize=(10, 5))
plt.plot(range(1, num_epochs+1), train_losses, marker='o', linewidth=2, markersize=8)
plt.xlabel('Epoch')
plt.ylabel('MSE Loss')
plt.title('Matrix Factorization Training Curve', fontsize=14, fontweight='bold')
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

print(f"\n‚úÖ Training Complete! Final Loss: {train_losses[-1]:.4f}")

### Extract & Visualize Embeddings

In [None]:
# Extract embeddings
mf_model.eval()
with torch.no_grad():
    user_embeddings = mf_model.user_embeddings.weight.numpy()
    item_embeddings = mf_model.item_embeddings.weight.numpy()

print(f"üìä Embedding Shapes:")
print(f"  - User embeddings: {user_embeddings.shape}")
print(f"  - Item embeddings: {item_embeddings.shape}")

# Visualize embedding distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# User embeddings
axes[0].hist(user_embeddings.flatten(), bins=50, alpha=0.7, color='steelblue')
axes[0].set_title('User Embedding Distribution', fontsize=12, fontweight='bold')
axes[0].set_xlabel('Embedding Value')
axes[0].set_ylabel('Frequency')
axes[0].grid(alpha=0.3)

# Item embeddings
axes[1].hist(item_embeddings.flatten(), bins=50, alpha=0.7, color='coral')
axes[1].set_title('Item Embedding Distribution', fontsize=12, fontweight='bold')
axes[1].set_xlabel('Embedding Value')
axes[1].set_ylabel('Frequency')
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nüìà Embedding Statistics:")
print(f"  User embeddings - Mean: {user_embeddings.mean():.4f}, Std: {user_embeddings.std():.4f}")
print(f"  Item embeddings - Mean: {item_embeddings.mean():.4f}, Std: {item_embeddings.std():.4f}")

### Strategy 2: Two-Tower Neural Network

In [None]:
class TwoTowerModel(nn.Module):
    """
    Two-Tower Architecture (Industry Standard)
    
    Interview Point: Why Two-Tower?
    - Separate user/item encoding
    - Cache item embeddings (static)
    - Fast serving with ANN search
    - Used by YouTube, Google, Meta
    """
    def __init__(self, user_feature_dim, item_feature_dim, embedding_dim=128):
        super().__init__()
        
        # User tower
        self.user_tower = nn.Sequential(
            nn.Linear(user_feature_dim, 256),
            nn.ReLU(),
            nn.BatchNorm1d(256),
            nn.Dropout(0.2),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, embedding_dim)
        )
        
        # Item tower
        self.item_tower = nn.Sequential(
            nn.Linear(item_feature_dim, 256),
            nn.ReLU(),
            nn.BatchNorm1d(256),
            nn.Dropout(0.2),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, embedding_dim)
        )
        
        # Temperature for scaling
        self.temperature = nn.Parameter(torch.ones(1) * 0.07)
    
    def forward(self, user_features, item_features):
        user_emb = self.user_tower(user_features)
        item_emb = self.item_tower(item_features)
        
        # L2 normalize (crucial for cosine similarity)
        user_emb = F.normalize(user_emb, p=2, dim=1)
        item_emb = F.normalize(item_emb, p=2, dim=1)
        
        # Dot product similarity
        scores = (user_emb * item_emb).sum(dim=1) / self.temperature
        return scores

# Example initialization
two_tower = TwoTowerModel(user_feature_dim=10, item_feature_dim=8, embedding_dim=128)

print("‚úÖ Two-Tower Model Created")
print(f"   - Parameters: {sum(p.numel() for p in two_tower.parameters()):,}")
print(f"\nüí° Key Advantage: Item embeddings can be pre-computed and cached!")

---
## 5Ô∏è‚É£ Candidate Generation with ANN Search

### Interview Topic: Why ANN (Approximate Nearest Neighbor)?
- **Brute force**: O(n) - too slow for millions of items
- **ANN**: O(log n) - sub-linear search
- **FAISS**: GPU-accelerated, 10M items in <20ms

In [None]:
def get_top_k_similar_items(user_emb, item_embs, k=50):
    """
    Retrieve top-k similar items using cosine similarity
    
    In production: Use FAISS for sub-linear search
    """
    # Normalize embeddings
    user_emb_norm = user_emb / np.linalg.norm(user_emb)
    item_embs_norm = item_embs / np.linalg.norm(item_embs, axis=1, keepdims=True)
    
    # Compute cosine similarities
    similarities = item_embs_norm @ user_emb_norm
    
    # Get top-k
    top_k_indices = np.argsort(-similarities)[:k]
    top_k_scores = similarities[top_k_indices]
    
    return top_k_indices, top_k_scores

# Example: Get recommendations for user 0
test_user_id = 0
test_user_emb = user_embeddings[test_user_id]

candidate_items, candidate_scores = get_top_k_similar_items(
    test_user_emb, 
    item_embeddings, 
    k=50
)

print(f"üéØ Generated {len(candidate_items)} candidates for User {test_user_id}")
print(f"\nTop 10 recommendations:")
print(f"{'Rank':<6} {'Item ID':<10} {'Score':<10}")
print("-" * 30)
for i, (item_id, score) in enumerate(zip(candidate_items[:10], candidate_scores[:10]), 1):
    print(f"{i:<6} {item_id:<10} {score:.4f}")

# Visualize score distribution
plt.figure(figsize=(10, 5))
plt.bar(range(len(candidate_scores)), candidate_scores, color='steelblue', alpha=0.7)
plt.xlabel('Candidate Rank')
plt.ylabel('Similarity Score')
plt.title(f'Candidate Scores for User {test_user_id}', fontsize=14, fontweight='bold')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

---
## 6Ô∏è‚É£ Evaluation Metrics

### Interview Topic: What metrics matter for RecSys?

In [None]:
def compute_metrics(predictions, actuals, k=10):
    """
    Compute ranking metrics
    
    Metrics:
    - Precision@K: % of recommended items that are relevant
    - Recall@K: % of relevant items that are recommended
    - NDCG@K: Normalized Discounted Cumulative Gain (position-aware)
    """
    relevant = set(actuals)
    recommended = set(predictions[:k])
    
    # Precision@K
    precision = len(relevant & recommended) / k if k > 0 else 0
    
    # Recall@K
    recall = len(relevant & recommended) / len(relevant) if len(relevant) > 0 else 0
    
    # NDCG@K
    dcg = sum([1.0 / np.log2(i + 2) if predictions[i] in relevant else 0
               for i in range(min(k, len(predictions)))])
    idcg = sum([1.0 / np.log2(i + 2) for i in range(min(k, len(relevant)))])
    ndcg = dcg / idcg if idcg > 0 else 0
    
    return {
        'precision@k': precision,
        'recall@k': recall,
        'ndcg@k': ndcg,
        'f1@k': 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
    }

# Evaluate on test set
k_values = [5, 10, 20, 50]
all_metrics = {k: {'precision': [], 'recall': [], 'ndcg': []} for k in k_values}

# Sample 100 users from test set
test_users = test_df['user_id'].unique()[:100]

for user_id in test_users:
    # Get user's test items
    user_test_items = test_df[test_df['user_id'] == user_id]['item_id'].values
    
    if len(user_test_items) == 0:
        continue
    
    # Get recommendations
    user_emb = user_embeddings[user_id]
    candidate_items, _ = get_top_k_similar_items(user_emb, item_embeddings, k=max(k_values))
    
    # Compute metrics for different k values
    for k in k_values:
        metrics = compute_metrics(candidate_items, user_test_items, k=k)
        all_metrics[k]['precision'].append(metrics['precision@k'])
        all_metrics[k]['recall'].append(metrics['recall@k'])
        all_metrics[k]['ndcg'].append(metrics['ndcg@k'])

# Aggregate results
results_df = pd.DataFrame([
    {
        'K': k,
        'Precision@K': np.mean(all_metrics[k]['precision']),
        'Recall@K': np.mean(all_metrics[k]['recall']),
        'NDCG@K': np.mean(all_metrics[k]['ndcg'])
    }
    for k in k_values
])

print("\nüìä Evaluation Results:")
print(results_df.to_string(index=False))

# Visualize metrics
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
metrics_to_plot = ['Precision@K', 'Recall@K', 'NDCG@K']

for i, metric in enumerate(metrics_to_plot):
    axes[i].plot(results_df['K'], results_df[metric], marker='o', linewidth=2, markersize=8)
    axes[i].set_xlabel('K (Number of Recommendations)')
    axes[i].set_ylabel(metric)
    axes[i].set_title(metric, fontsize=12, fontweight='bold')
    axes[i].grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüí° Interview Insight:")
print("NDCG is preferred over Precision/Recall because it accounts for position.")
print("Top-ranked items matter more - NDCG penalizes placing relevant items lower.")

---
## 7Ô∏è‚É£ Business Logic & Diversity

### Interview Topic: Why post-processing?
- Model optimizes for engagement, not business goals
- Need diversity to avoid filter bubbles
- Boost fresh content
- Remove recently shown items

In [None]:
def apply_diversity_rules(item_ids, scores, max_similar=3):
    """
    Apply diversity using sliding window approach
    
    Interview Point: Diversity vs Relevance trade-off
    - Too diverse ‚Üí lower engagement
    - Too similar ‚Üí filter bubble
    - Sweet spot: 10-20% diversity boost
    """
    # Simulate categories (in production, fetch from metadata)
    categories = {item_id: item_id % 10 for item_id in item_ids}
    
    diverse_items = []
    diverse_scores = []
    category_counts = {}
    
    for item_id, score in zip(item_ids, scores):
        category = categories[item_id]
        count = category_counts.get(category, 0)
        
        if count < max_similar:
            diverse_items.append(item_id)
            diverse_scores.append(score)
            category_counts[category] = count + 1
    
    return diverse_items, diverse_scores

# Apply diversity
diverse_items, diverse_scores = apply_diversity_rules(
    candidate_items, 
    candidate_scores, 
    max_similar=3
)

print(f"üìä Diversity Impact:")
print(f"  - Before: {len(candidate_items)} items")
print(f"  - After:  {len(diverse_items)} items")
print(f"  - Filtered: {len(candidate_items) - len(diverse_items)} items")

# Compare score distributions
fig, axes = plt.subplots(1, 2, figsize=(14, 4))

axes[0].hist(candidate_scores, bins=20, alpha=0.7, color='steelblue')
axes[0].set_title('Before Diversity Filter', fontsize=12, fontweight='bold')
axes[0].set_xlabel('Score')
axes[0].set_ylabel('Frequency')
axes[0].grid(alpha=0.3)

axes[1].hist(diverse_scores, bins=20, alpha=0.7, color='coral')
axes[1].set_title('After Diversity Filter', fontsize=12, fontweight='bold')
axes[1].set_xlabel('Score')
axes[1].set_ylabel('Frequency')
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

---
## 8Ô∏è‚É£ Monitoring & Data Drift Detection

### Interview Topic: Why monitor ML models?
- Data distribution changes over time
- Model performance degrades
- Need to trigger retraining

In [None]:
def compute_psi(baseline, current, bins=10):
    """
    Population Stability Index (PSI)
    
    Interpretation:
    - PSI < 0.1: No significant change
    - 0.1 < PSI < 0.2: Moderate change
    - PSI > 0.2: Significant drift ‚Üí RETRAIN!
    
    Interview Point: Industry standard for drift detection
    """
    # Create bins
    breakpoints = np.percentile(baseline, np.linspace(0, 100, bins + 1))
    breakpoints = np.unique(breakpoints)
    
    # Count samples in each bin
    baseline_counts = np.histogram(baseline, bins=breakpoints)[0]
    current_counts = np.histogram(current, bins=breakpoints)[0]
    
    # Convert to percentages
    baseline_pct = baseline_counts / len(baseline)
    current_pct = current_counts / len(current)
    
    # Avoid division by zero
    baseline_pct = np.where(baseline_pct == 0, 0.0001, baseline_pct)
    current_pct = np.where(current_pct == 0, 0.0001, current_pct)
    
    # Calculate PSI
    psi = np.sum((current_pct - baseline_pct) * np.log(current_pct / baseline_pct))
    
    return psi

# Simulate drift
baseline_data = np.random.normal(0, 1, 10000)
no_drift_data = np.random.normal(0, 1, 10000)
moderate_drift_data = np.random.normal(0.2, 1, 10000)
significant_drift_data = np.random.normal(0.5, 1.2, 10000)

psi_no_drift = compute_psi(baseline_data, no_drift_data)
psi_moderate = compute_psi(baseline_data, moderate_drift_data)
psi_significant = compute_psi(baseline_data, significant_drift_data)

print("üìä PSI Drift Detection Examples:")
print(f"  - No drift:         PSI = {psi_no_drift:.4f} {'‚úÖ OK' if psi_no_drift < 0.1 else ''}")
print(f"  - Moderate drift:   PSI = {psi_moderate:.4f} {'‚ö†Ô∏è  Monitor' if 0.1 <= psi_moderate < 0.2 else ''}")
print(f"  - Significant drift: PSI = {psi_significant:.4f} {'üö® RETRAIN!' if psi_significant >= 0.2 else ''}")

# Visualize distributions
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Baseline
axes[0, 0].hist(baseline_data, bins=50, alpha=0.7, color='gray', label='Baseline')
axes[0, 0].set_title('Baseline Distribution', fontsize=12, fontweight='bold')
axes[0, 0].legend()
axes[0, 0].grid(alpha=0.3)

# No drift
axes[0, 1].hist(baseline_data, bins=50, alpha=0.5, color='gray', label='Baseline')
axes[0, 1].hist(no_drift_data, bins=50, alpha=0.5, color='green', label='Current')
axes[0, 1].set_title(f'No Drift (PSI={psi_no_drift:.4f})', fontsize=12, fontweight='bold')
axes[0, 1].legend()
axes[0, 1].grid(alpha=0.3)

# Moderate drift
axes[1, 0].hist(baseline_data, bins=50, alpha=0.5, color='gray', label='Baseline')
axes[1, 0].hist(moderate_drift_data, bins=50, alpha=0.5, color='orange', label='Current')
axes[1, 0].set_title(f'Moderate Drift (PSI={psi_moderate:.4f})', fontsize=12, fontweight='bold')
axes[1, 0].legend()
axes[1, 0].grid(alpha=0.3)

# Significant drift
axes[1, 1].hist(baseline_data, bins=50, alpha=0.5, color='gray', label='Baseline')
axes[1, 1].hist(significant_drift_data, bins=50, alpha=0.5, color='red', label='Current')
axes[1, 1].set_title(f'Significant Drift (PSI={psi_significant:.4f})', fontsize=12, fontweight='bold')
axes[1, 1].legend()
axes[1, 1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

---
## 9Ô∏è‚É£ A/B Testing Framework

### Interview Topic: Statistical rigor in experimentation

In [None]:
from scipy import stats

def analyze_ab_test(control_conversions, control_samples, 
                    treatment_conversions, treatment_samples):
    """
    Analyze A/B test results
    
    Interview Points:
    - Sample size matters (don't stop early!)
    - Statistical significance ‚â† business significance
    - Multiple testing correction
    """
    control_rate = control_conversions / control_samples
    treatment_rate = treatment_conversions / treatment_samples
    
    # Relative lift
    relative_lift = (treatment_rate - control_rate) / control_rate if control_rate > 0 else 0
    
    # Z-test for proportions
    pooled_rate = (control_conversions + treatment_conversions) / (control_samples + treatment_samples)
    se = np.sqrt(pooled_rate * (1 - pooled_rate) * (1/control_samples + 1/treatment_samples))
    z_score = (treatment_rate - control_rate) / se if se > 0 else 0
    p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))
    
    # Confidence interval
    se_diff = np.sqrt(
        control_rate * (1 - control_rate) / control_samples +
        treatment_rate * (1 - treatment_rate) / treatment_samples
    )
    ci_lower = (treatment_rate - control_rate) - 1.96 * se_diff
    ci_upper = (treatment_rate - control_rate) + 1.96 * se_diff
    
    return {
        'control_rate': control_rate,
        'treatment_rate': treatment_rate,
        'relative_lift': relative_lift,
        'p_value': p_value,
        'is_significant': p_value < 0.05,
        'ci_lower': ci_lower,
        'ci_upper': ci_upper
    }

# Example A/B test
results = analyze_ab_test(
    control_conversions=300,
    control_samples=10000,
    treatment_conversions=350,
    treatment_samples=10000
)

print("\nüß™ A/B Test Results:")
print(f"  Control CTR:     {results['control_rate']*100:.2f}%")
print(f"  Treatment CTR:   {results['treatment_rate']*100:.2f}%")
print(f"  Relative Lift:   {results['relative_lift']*100:+.2f}%")
print(f"  P-value:         {results['p_value']:.4f}")
print(f"  Significant?     {'‚úÖ YES' if results['is_significant'] else '‚ùå NO'}")
print(f"  95% CI:          [{results['ci_lower']*100:.2f}%, {results['ci_upper']*100:.2f}%]")

if results['is_significant'] and results['relative_lift'] > 0:
    print("\nüöÄ Recommendation: SHIP the treatment variant!")
else:
    print("\n‚è∏Ô∏è  Recommendation: Do NOT ship. Keep control.")

# Visualize
fig, ax = plt.subplots(figsize=(10, 6))

variants = ['Control', 'Treatment']
rates = [results['control_rate']*100, results['treatment_rate']*100]
colors = ['steelblue', 'coral']

bars = ax.bar(variants, rates, color=colors, alpha=0.7)
ax.set_ylabel('Conversion Rate (%)')
ax.set_title('A/B Test Results', fontsize=14, fontweight='bold')
ax.grid(axis='y', alpha=0.3)

# Add value labels
for bar, rate in zip(bars, rates):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{rate:.2f}%',
            ha='center', va='bottom', fontsize=12, fontweight='bold')

plt.tight_layout()
plt.show()

---
## üéì Interview Summary

### Key Talking Points You Can Now Discuss:

1. **System Architecture**
   - ‚úÖ Two-stage retrieval (candidate generation + ranking)
   - ‚úÖ Why it scales to billions
   - ‚úÖ Latency optimization strategies

2. **Data & Features**
   - ‚úÖ Time-based train/test split (no leakage!)
   - ‚úÖ Feature engineering for mixed data types
   - ‚úÖ Handling sparsity and power-law distributions

3. **Models**
   - ‚úÖ Matrix Factorization (baseline)
   - ‚úÖ Two-tower neural network (industry standard)
   - ‚úÖ When to use what model

4. **Production**
   - ‚úÖ Embedding generation and ANN search
   - ‚úÖ Business logic and diversity
   - ‚úÖ Monitoring and drift detection
   - ‚úÖ A/B testing framework

### Next Steps:

1. Review `INTERVIEW_GUIDE.md` for specific questions
2. Study `CHEAT_SHEET.md` for quick reference
3. Practice explaining each component
4. Run this notebook multiple times, experiment with parameters

---

**You're now ready for your staff-level ML interview!** üöÄ

Good luck!