# 08 - Deployment, Serving, and Monitoring

---

## What the Chapter Says

This notebook covers **Deployment and Serving** with:

### H1) Cloud vs On-Device Comparison
| Aspect | Cloud | On-Device |
|--------|-------|----------|
| Simplicity | Easier | Harder |
| Cost | Higher | Lower |
| Network Latency | Yes | No |
| Inference Latency | Variable | Consistent |
| Hardware | Powerful | Limited |
| Privacy | Less | More |
| Internet Dependency | Required | Not required |

### H2) Model Compression
- Distillation, Pruning, Quantization

### H3) Testing in Production
- Shadow deployment, A/B testing, Canary, Interleaving, Bandits

### H4) Prediction Pipeline
- Batch vs Online prediction
- Personalized news feed system diagram

---

## Meta Interview Signal

| Level | Expectations |
|-------|-------------|
| **E5** | Knows cloud vs on-device tradeoffs. Understands batch vs online. Can explain A/B testing. |
| **E6** | Designs serving architecture for scale. Proposes model compression strategies. Designs monitoring and feedback loops. Discusses rollout strategies. |

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import warnings
warnings.filterwarnings('ignore')

np.random.seed(42)

---

## H1) Cloud vs On-Device Comparison (Chapter Table)

In [None]:
# Chapter's exact comparison table
cloud_vs_device = pd.DataFrame({
    'Aspect': [
        'Simplicity',
        'Cost',
        'Network Latency',
        'Inference Latency',
        'Hardware',
        'Privacy',
        'Internet Dependency'
    ],
    'Cloud': [
        'Easier to deploy and update',
        'Higher (server costs)',
        'Yes (network round-trip)',
        'Variable (depends on load)',
        'Powerful (GPUs, TPUs)',
        'Less (data leaves device)',
        'Required'
    ],
    'On-Device': [
        'Harder (device fragmentation)',
        'Lower (no server costs)',
        'No',
        'Consistent',
        'Limited (mobile CPU/GPU)',
        'More (data stays on device)',
        'Not required'
    ]
})

print("="*80)
print("CLOUD vs ON-DEVICE COMPARISON (Chapter Table)")
print("="*80)
print(cloud_vs_device.to_string(index=False))

In [None]:
# Visual comparison
fig, ax = plt.subplots(figsize=(14, 6))
ax.axis('off')
ax.set_title('Cloud vs On-Device Deployment', fontsize=14, fontweight='bold')

# Cloud side
rect = mpatches.FancyBboxPatch((1, 1), 5, 4, boxstyle='round,pad=0.1',
                                facecolor='#BBDEFB', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(3.5, 4.5, 'CLOUD', ha='center', va='center', fontsize=14, fontweight='bold')
ax.text(3.5, 3.5, '+ Easy updates', ha='center', va='center', fontsize=10, color='green')
ax.text(3.5, 3, '+ Powerful hardware', ha='center', va='center', fontsize=10, color='green')
ax.text(3.5, 2.5, '+ Complex models OK', ha='center', va='center', fontsize=10, color='green')
ax.text(3.5, 1.8, '- Network latency', ha='center', va='center', fontsize=10, color='red')
ax.text(3.5, 1.3, '- Privacy concerns', ha='center', va='center', fontsize=10, color='red')

# On-Device side
rect = mpatches.FancyBboxPatch((8, 1), 5, 4, boxstyle='round,pad=0.1',
                                facecolor='#C8E6C9', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(10.5, 4.5, 'ON-DEVICE', ha='center', va='center', fontsize=14, fontweight='bold')
ax.text(10.5, 3.5, '+ No network latency', ha='center', va='center', fontsize=10, color='green')
ax.text(10.5, 3, '+ Privacy preserving', ha='center', va='center', fontsize=10, color='green')
ax.text(10.5, 2.5, '+ Works offline', ha='center', va='center', fontsize=10, color='green')
ax.text(10.5, 1.8, '- Limited hardware', ha='center', va='center', fontsize=10, color='red')
ax.text(10.5, 1.3, '- Hard to update', ha='center', va='center', fontsize=10, color='red')

# VS in middle
ax.text(7, 3, 'vs', ha='center', va='center', fontsize=20, fontweight='bold')

ax.set_xlim(0, 14)
ax.set_ylim(0, 6)
plt.tight_layout()
plt.show()

In [None]:
# Use case mapping
deployment_use_cases = pd.DataFrame({
    'Use Case': [
        'Feed ranking',
        'Keyboard autocomplete',
        'Search results',
        'Face unlock',
        'Ad targeting',
        'Voice assistant (initial)',
        'Spam filtering'
    ],
    'Best Deployment': [
        'Cloud',
        'On-Device',
        'Cloud',
        'On-Device',
        'Cloud',
        'Hybrid',
        'Cloud'
    ],
    'Reason': [
        'Needs latest content, complex model',
        'Privacy (keystrokes), low latency, offline',
        'Large index, complex ranking',
        'Privacy (biometrics), must work offline',
        'Needs cross-user data, complex targeting',
        'Wake word on-device, NLU in cloud',
        'Needs global spam patterns'
    ]
})

print("\n" + "="*70)
print("DEPLOYMENT CHOICE BY USE CASE")
print("="*70)
print(deployment_use_cases.to_string(index=False))

---

## H2) Model Compression (Chapter Content)

In [None]:
# Model compression techniques from chapter
compression_techniques = pd.DataFrame({
    'Technique': ['Distillation', 'Pruning', 'Quantization'],
    'Description': [
        'Train smaller "student" model to mimic larger "teacher" model',
        'Remove unimportant weights/neurons from model',
        'Reduce precision of weights (e.g., FP32 → INT8)'
    ],
    'Size Reduction': [
        '2-10x smaller',
        '2-10x smaller',
        '2-4x smaller'
    ],
    'Accuracy Impact': [
        'Small (student can learn well)',
        'Small to moderate',
        'Usually small'
    ]
})

print("="*80)
print("MODEL COMPRESSION TECHNIQUES (Chapter Content)")
print("="*80)
print(compression_techniques.to_string(index=False))

In [None]:
# Visualize compression techniques
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Distillation
ax = axes[0]
ax.axis('off')
ax.set_title('Knowledge Distillation', fontsize=12, fontweight='bold')

# Teacher
rect = mpatches.FancyBboxPatch((0.5, 2), 2, 2, boxstyle='round,pad=0.1',
                                facecolor='#FFCCBC', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(1.5, 3, 'Teacher\n(Large)', ha='center', va='center', fontsize=10)

# Arrow
ax.annotate('soft labels', xy=(4, 3), xytext=(2.5, 3),
           arrowprops=dict(arrowstyle='->', color='black', lw=2))

# Student
rect = mpatches.FancyBboxPatch((4, 2.5), 1.5, 1, boxstyle='round,pad=0.1',
                                facecolor='#C8E6C9', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(4.75, 3, 'Student\n(Small)', ha='center', va='center', fontsize=10)

ax.set_xlim(0, 6)
ax.set_ylim(0, 5)

# Pruning
ax = axes[1]
ax.axis('off')
ax.set_title('Pruning', fontsize=12, fontweight='bold')

# Original network (dense)
for i in range(4):
    for j in range(3):
        circle = plt.Circle((1 + j, 3.5 - i*0.8), 0.2, color='#BBDEFB', ec='black')
        ax.add_patch(circle)

ax.text(2, 0.8, 'Dense → Sparse', ha='center', fontsize=10)

# Pruned network (sparse)
for i in range(4):
    for j in range(3):
        if np.random.random() > 0.4:  # Some removed
            circle = plt.Circle((4 + j, 3.5 - i*0.8), 0.2, color='#C8E6C9', ec='black')
            ax.add_patch(circle)

ax.annotate('', xy=(3.5, 2), xytext=(3, 2),
           arrowprops=dict(arrowstyle='->', color='black', lw=2))

ax.set_xlim(0, 7)
ax.set_ylim(0, 5)

# Quantization
ax = axes[2]
ax.axis('off')
ax.set_title('Quantization', fontsize=12, fontweight='bold')

# FP32
rect = mpatches.FancyBboxPatch((0.5, 2.5), 2, 1.5, boxstyle='round,pad=0.1',
                                facecolor='#FFCCBC', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(1.5, 3.25, 'FP32\n32 bits/weight', ha='center', va='center', fontsize=10)

# Arrow
ax.annotate('', xy=(3.5, 3.25), xytext=(2.5, 3.25),
           arrowprops=dict(arrowstyle='->', color='black', lw=2))

# INT8
rect = mpatches.FancyBboxPatch((3.5, 2.5), 2, 1.5, boxstyle='round,pad=0.1',
                                facecolor='#C8E6C9', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(4.5, 3.25, 'INT8\n8 bits/weight', ha='center', va='center', fontsize=10)

ax.text(3, 1.5, '4x smaller!', ha='center', fontsize=11, color='green', fontweight='bold')

ax.set_xlim(0, 6)
ax.set_ylim(0, 5)

plt.tight_layout()
plt.show()

---

## H3) Testing in Production (Chapter Content)

In [None]:
# Testing strategies from chapter
testing_strategies = pd.DataFrame({
    'Strategy': [
        'Shadow Deployment',
        'A/B Testing',
        'Canary Release',
        'Interleaving',
        'Bandits'
    ],
    'Description': [
        'Route all requests to both models, only old model serves',
        'Random split of traffic between models',
        'Gradual rollout to small % first',
        'Mix results from both models in single response',
        'Dynamically allocate traffic based on performance'
    ],
    'Pros': [
        'No user impact, test real traffic',
        'Statistically rigorous, clear winner',
        'Quick rollback, lower risk',
        'More sensitive for ranking evaluation',
        'Faster convergence, less regret'
    ],
    'Cons': [
        'Double inference cost',
        'Need to run long enough for significance',
        'May miss issues at scale',
        'More complex to implement',
        'Harder to interpret statistically'
    ]
})

print("="*90)
print("TESTING IN PRODUCTION STRATEGIES (Chapter Content)")
print("="*90)
print(testing_strategies.to_string(index=False))

In [None]:
# Visual: Shadow Deployment (from chapter)
fig, ax = plt.subplots(figsize=(12, 5))
ax.axis('off')
ax.set_title('Shadow Deployment (Chapter)', fontsize=14, fontweight='bold')

# Request
rect = mpatches.FancyBboxPatch((0.5, 2), 2, 1.5, boxstyle='round,pad=0.1',
                                facecolor='#BBDEFB', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(1.5, 2.75, 'User\nRequest', ha='center', va='center', fontsize=11)

# Router
rect = mpatches.FancyBboxPatch((3.5, 2), 2, 1.5, boxstyle='round,pad=0.1',
                                facecolor='#FFF9C4', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(4.5, 2.75, 'Router', ha='center', va='center', fontsize=11)

ax.annotate('', xy=(3.5, 2.75), xytext=(2.5, 2.75),
           arrowprops=dict(arrowstyle='->', color='black', lw=2))

# Old Model (serves)
rect = mpatches.FancyBboxPatch((7, 3.5), 2.5, 1.2, boxstyle='round,pad=0.1',
                                facecolor='#C8E6C9', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(8.25, 4.1, 'Old Model\n(SERVES)', ha='center', va='center', fontsize=10, fontweight='bold')

ax.annotate('', xy=(7, 4.1), xytext=(5.5, 3),
           arrowprops=dict(arrowstyle='->', color='black', lw=2))

# New Model (shadow)
rect = mpatches.FancyBboxPatch((7, 1), 2.5, 1.2, boxstyle='round,pad=0.1',
                                facecolor='#FFCCBC', edgecolor='black', linewidth=2, linestyle='--')
ax.add_patch(rect)
ax.text(8.25, 1.6, 'New Model\n(SHADOW)', ha='center', va='center', fontsize=10)

ax.annotate('', xy=(7, 1.6), xytext=(5.5, 2.5),
           arrowprops=dict(arrowstyle='->', color='gray', lw=2, linestyle='--'))

# Response
ax.annotate('Response to user', xy=(11, 4.1), xytext=(9.5, 4.1),
           arrowprops=dict(arrowstyle='->', color='green', lw=2))

# Logging
ax.annotate('Log predictions\n(for comparison)', xy=(10.5, 1.6), xytext=(9.5, 1.6),
           arrowprops=dict(arrowstyle='->', color='gray', lw=1.5, linestyle='--'))

ax.text(6, 0.3, 'Shadow: All requests go to both, only old model response is served',
        fontsize=10, style='italic')
ax.text(6, -0.1, 'Cost: Double inference (both models run on every request)',
        fontsize=10, style='italic', color='red')

ax.set_xlim(0, 13)
ax.set_ylim(-0.5, 6)
plt.tight_layout()
plt.show()

In [None]:
# Visual: A/B Testing (from chapter)
fig, ax = plt.subplots(figsize=(12, 5))
ax.axis('off')
ax.set_title('A/B Testing (Chapter)', fontsize=14, fontweight='bold')

# Users
rect = mpatches.FancyBboxPatch((0.5, 2), 2, 2, boxstyle='round,pad=0.1',
                                facecolor='#BBDEFB', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(1.5, 3, 'All Users\n(100%)', ha='center', va='center', fontsize=11)

# Random split
ax.text(4, 3, 'Random\nSplit', ha='center', va='center', fontsize=10, fontweight='bold')

# Control
ax.annotate('50%', xy=(5.5, 4), xytext=(3, 3.5),
           arrowprops=dict(arrowstyle='->', color='blue', lw=2))
rect = mpatches.FancyBboxPatch((5.5, 3.5), 2.5, 1.2, boxstyle='round,pad=0.1',
                                facecolor='#C8E6C9', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(6.75, 4.1, 'Control\n(Old Model)', ha='center', va='center', fontsize=10)

# Treatment
ax.annotate('50%', xy=(5.5, 2), xytext=(3, 2.5),
           arrowprops=dict(arrowstyle='->', color='orange', lw=2))
rect = mpatches.FancyBboxPatch((5.5, 1.5), 2.5, 1.2, boxstyle='round,pad=0.1',
                                facecolor='#FFCCBC', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(6.75, 2.1, 'Treatment\n(New Model)', ha='center', va='center', fontsize=10)

# Metrics comparison
rect = mpatches.FancyBboxPatch((9, 2.5), 3, 1.5, boxstyle='round,pad=0.1',
                                facecolor='#E1BEE7', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(10.5, 3.25, 'Compare Metrics\n(Statistical Significance)', ha='center', va='center', fontsize=10)

ax.annotate('', xy=(9, 3.25), xytext=(8, 4.1),
           arrowprops=dict(arrowstyle='->', color='gray', lw=1.5))
ax.annotate('', xy=(9, 3.25), xytext=(8, 2.1),
           arrowprops=dict(arrowstyle='->', color='gray', lw=1.5))

ax.text(6, 0.5, 'A/B Test: Random traffic split, compare metrics, run long enough for significance',
        fontsize=10, style='italic')

ax.set_xlim(0, 13)
ax.set_ylim(0, 5.5)
plt.tight_layout()
plt.show()

---

## H4) Prediction Pipeline: Batch vs Online (Chapter Content)

In [None]:
# Batch vs Online from chapter
batch_vs_online = pd.DataFrame({
    'Aspect': [
        'When computed',
        'Freshness',
        'Latency',
        'Cost',
        'Requirement'
    ],
    'Batch': [
        'Periodic (hourly/daily)',
        'Less responsive (stale until next batch)',
        'None at request time (precomputed)',
        'Lower (can use spot instances)',
        'Must know what to precompute'
    ],
    'Online': [
        'At request time',
        'Real-time (latest features)',
        'Latency at request time',
        'Higher (always-on inference)',
        'Fast model + feature lookup'
    ]
})

print("="*80)
print("BATCH vs ONLINE PREDICTION (Chapter Content)")
print("="*80)
print(batch_vs_online.to_string(index=False))

In [None]:
# Use cases for batch vs online
print("\n" + "="*60)
print("WHEN TO USE BATCH vs ONLINE")
print("="*60)

print("""
BATCH PREDICTION:
  - Email recommendations (daily digest)
  - Precomputed "users who bought X also bought Y"
  - Candidate generation for recommendations
  - Weekly reports/dashboards

ONLINE PREDICTION:
  - Search ranking (user's query is unknown ahead of time)
  - Feed ranking (new content constantly arriving)
  - Fraud detection (must decide immediately)
  - Ad targeting (context matters)

HYBRID (Common in practice):
  - Batch: precompute candidate items for each user
  - Online: rank the candidates at request time
""")

---

## Personalized News Feed System Diagram (Chapter Figure)

In [None]:
# Chapter's personalized news feed system diagram
fig, ax = plt.subplots(figsize=(16, 10))
ax.axis('off')
ax.set_title('Personalized News Feed System (Chapter Figure)', fontsize=16, fontweight='bold')

# User request
rect = mpatches.FancyBboxPatch((0.5, 5), 2, 1.2, boxstyle='round,pad=0.1',
                                facecolor='#BBDEFB', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(1.5, 5.6, 'User\nRequest', ha='center', va='center', fontsize=10, fontweight='bold')

# RETRIEVAL
rect = mpatches.FancyBboxPatch((3.5, 5), 2.5, 1.2, boxstyle='round,pad=0.1',
                                facecolor='#FFF9C4', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(4.75, 5.6, 'Retrieval\n(Candidate Gen)', ha='center', va='center', fontsize=10)

ax.annotate('', xy=(3.5, 5.6), xytext=(2.5, 5.6),
           arrowprops=dict(arrowstyle='->', color='black', lw=2))

# RANKING (ML Model)
rect = mpatches.FancyBboxPatch((7, 5), 2.5, 1.2, boxstyle='round,pad=0.1',
                                facecolor='#FFCCBC', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(8.25, 5.6, 'Ranking\n(ML Model)', ha='center', va='center', fontsize=10, fontweight='bold')

ax.annotate('~1000 candidates', xy=(7, 5.6), xytext=(6, 5.6),
           arrowprops=dict(arrowstyle='->', color='black', lw=2))

# RE-RANKING
rect = mpatches.FancyBboxPatch((10.5, 5), 2.5, 1.2, boxstyle='round,pad=0.1',
                                facecolor='#C8E6C9', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(11.75, 5.6, 'Re-Ranking\n(Business Rules)', ha='center', va='center', fontsize=10)

ax.annotate('~100 ranked', xy=(10.5, 5.6), xytext=(9.5, 5.6),
           arrowprops=dict(arrowstyle='->', color='black', lw=2))

# Response
rect = mpatches.FancyBboxPatch((14, 5), 2, 1.2, boxstyle='round,pad=0.1',
                                facecolor='#E1BEE7', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(15, 5.6, 'Feed\nResponse', ha='center', va='center', fontsize=10)

ax.annotate('Top-N', xy=(14, 5.6), xytext=(13, 5.6),
           arrowprops=dict(arrowstyle='->', color='black', lw=2))

# FEATURE STORE (Online features)
rect = mpatches.FancyBboxPatch((6.5, 7.5), 3.5, 1.2, boxstyle='round,pad=0.1',
                                facecolor='#B2DFDB', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(8.25, 8.1, 'Feature Store\n(Online Features)', ha='center', va='center', fontsize=10)

ax.annotate('', xy=(8.25, 7.5), xytext=(8.25, 6.2),
           arrowprops=dict(arrowstyle='->', color='green', lw=2))

# Online Feature Computation
rect = mpatches.FancyBboxPatch((3, 8.5), 3, 1.2, boxstyle='round,pad=0.1',
                                facecolor='#B2DFDB', edgecolor='black', linewidth=1.5)
ax.add_patch(rect)
ax.text(4.5, 9.1, 'Online Feature\nComputation', ha='center', va='center', fontsize=9)

ax.annotate('', xy=(6.5, 8.1), xytext=(6, 9.1),
           arrowprops=dict(arrowstyle='->', color='green', lw=1.5))

# Data Store
rect = mpatches.FancyBboxPatch((6.5, 1.5), 3.5, 1.2, boxstyle='round,pad=0.1',
                                facecolor='#F8BBD9', edgecolor='black', linewidth=2)
ax.add_patch(rect)
ax.text(8.25, 2.1, 'Data Store\n(Logs, Events)', ha='center', va='center', fontsize=10)

# Batch Feature Computation
rect = mpatches.FancyBboxPatch((11, 1.5), 3, 1.2, boxstyle='round,pad=0.1',
                                facecolor='#F8BBD9', edgecolor='black', linewidth=1.5)
ax.add_patch(rect)
ax.text(12.5, 2.1, 'Batch Feature\nComputation', ha='center', va='center', fontsize=9)

ax.annotate('', xy=(11, 2.1), xytext=(10, 2.1),
           arrowprops=dict(arrowstyle='->', color='purple', lw=1.5))

ax.annotate('', xy=(8.25, 7.5), xytext=(12.5, 2.7),
           arrowprops=dict(arrowstyle='->', color='purple', lw=1.5))

# User Interactions stream
rect = mpatches.FancyBboxPatch((2.5, 1.5), 3, 1.2, boxstyle='round,pad=0.1',
                                facecolor='#BBDEFB', edgecolor='black', linewidth=1.5)
ax.add_patch(rect)
ax.text(4, 2.1, 'User Interactions\n(Stream)', ha='center', va='center', fontsize=9)

ax.annotate('', xy=(6.5, 2.1), xytext=(5.5, 2.1),
           arrowprops=dict(arrowstyle='->', color='blue', lw=1.5))

ax.annotate('', xy=(4, 8.5), xytext=(4, 2.7),
           arrowprops=dict(arrowstyle='->', color='blue', lw=1.5))

# Legend
ax.text(0.5, 0.5, 'Blue = User data flow | Green = Online features | Purple = Batch features',
        fontsize=10, style='italic')

ax.set_xlim(0, 17)
ax.set_ylim(0, 10.5)
plt.tight_layout()
plt.show()

In [None]:
# Explain the pipeline components
print("\n" + "="*70)
print("NEWS FEED PIPELINE COMPONENTS (Chapter Figure Explained)")
print("="*70)

print("""
1. RETRIEVAL (Candidate Generation)
   - Fast, simple models to get ~1000 candidates from millions
   - E.g., ANN search, simple heuristics, collaborative filtering

2. RANKING (ML Model)
   - Complex ML model to score ~1000 candidates
   - Uses rich features: user, item, context
   - Optimizes for engagement (CTR, watch time)

3. RE-RANKING (Business Rules)
   - Apply business logic on top of ML ranking
   - Diversity: don't show 10 posts from same author
   - Freshness: boost recent content
   - Policy: demote borderline content

4. FEATURE STORE
   - Central repository for features
   - Online features: computed at request time (user's last 5 actions)
   - Batch features: precomputed (user's 30-day engagement history)

5. DATA STORE
   - Logs all user interactions
   - Feeds into both online and batch feature computation
   - Used for model training and evaluation
""")

---

## Monitoring and Feedback Loops

In [None]:
# Monitoring considerations
print("="*70)
print("MONITORING IN PRODUCTION (Framework Step 7)")
print("="*70)

monitoring_aspects = pd.DataFrame({
    'What to Monitor': [
        'Model metrics',
        'Feature drift',
        'Data quality',
        'System performance',
        'Business metrics'
    ],
    'Examples': [
        'Precision, recall, AUC over time',
        'Feature distributions shifting from training',
        'Missing values, null rates, unexpected values',
        'Latency p99, throughput, error rates',
        'CTR, revenue, user satisfaction'
    ],
    'Alert Threshold': [
        '>10% drop from baseline',
        'Distribution significantly different (KS test)',
        'Null rate > 1% or new categories appearing',
        'p99 latency > SLA',
        'Statistically significant drop'
    ]
})

print(monitoring_aspects.to_string(index=False))

In [None]:
# Simulate monitoring dashboard
np.random.seed(42)
n_days = 30

# Simulate metrics over time
dates = pd.date_range('2024-01-01', periods=n_days, freq='D')

monitoring_data = pd.DataFrame({
    'date': dates,
    'precision': np.random.normal(0.72, 0.02, n_days),
    'recall': np.random.normal(0.65, 0.03, n_days),
    'p99_latency_ms': np.random.normal(45, 5, n_days),
    'ctr': np.random.normal(0.028, 0.002, n_days),
})

# Introduce a drift on day 20
monitoring_data.loc[20:, 'precision'] -= 0.08  # Precision drops
monitoring_data.loc[20:, 'p99_latency_ms'] += 15  # Latency increases

fig, axes = plt.subplots(2, 2, figsize=(14, 8))

# Precision
axes[0, 0].plot(monitoring_data['date'], monitoring_data['precision'], 'b-', linewidth=2)
axes[0, 0].axhline(y=0.72, color='green', linestyle='--', label='Baseline')
axes[0, 0].axhline(y=0.72*0.9, color='red', linestyle='--', label='Alert threshold (-10%)')
axes[0, 0].axvline(x=dates[20], color='orange', linestyle=':', label='Drift start')
axes[0, 0].set_title('Precision Over Time')
axes[0, 0].set_ylabel('Precision')
axes[0, 0].legend(loc='lower left')
axes[0, 0].grid(True, alpha=0.3)

# Recall
axes[0, 1].plot(monitoring_data['date'], monitoring_data['recall'], 'g-', linewidth=2)
axes[0, 1].axhline(y=0.65, color='green', linestyle='--', label='Baseline')
axes[0, 1].set_title('Recall Over Time')
axes[0, 1].set_ylabel('Recall')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Latency
axes[1, 0].plot(monitoring_data['date'], monitoring_data['p99_latency_ms'], 'r-', linewidth=2)
axes[1, 0].axhline(y=50, color='green', linestyle='--', label='SLA (50ms)')
axes[1, 0].axhline(y=60, color='red', linestyle='--', label='Alert threshold')
axes[1, 0].axvline(x=dates[20], color='orange', linestyle=':', label='Drift start')
axes[1, 0].set_title('P99 Latency Over Time')
axes[1, 0].set_ylabel('Latency (ms)')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# CTR
axes[1, 1].plot(monitoring_data['date'], monitoring_data['ctr']*100, 'purple', linewidth=2)
axes[1, 1].set_title('CTR Over Time')
axes[1, 1].set_ylabel('CTR (%)')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.suptitle('Production Monitoring Dashboard', fontsize=14, fontweight='bold', y=1.02)
plt.show()

print("\n[Alert]: Precision dropped below threshold on day 20!")
print("[Alert]: P99 latency exceeded SLA on day 20!")
print("[Action]: Investigate feature pipeline, check for data issues, consider rollback")

---

## Tradeoffs (Chapter-Aligned)

| Tradeoff | Discussion | Interview Signal |
|----------|------------|------------------|
| **Cloud vs On-Device** | Power vs privacy/latency | E5: Knows table. E6: Proposes hybrid |
| **Batch vs Online** | Freshness vs cost | E5: Knows when each applies. E6: Designs hybrid pipeline |
| **Model Size vs Latency** | Accuracy vs speed | E5: Knows compression. E6: Proposes specific technique |
| **Shadow vs A/B** | Safety vs speed | E5: Knows both. E6: Proposes rollout strategy |

---

## Meta Interview Signal (Detailed)

### E5 Answer Expectations

- Knows cloud vs on-device comparison table
- Can explain batch vs online prediction
- Understands A/B testing and shadow deployment
- Knows model compression techniques exist

### E6 Additions

- **Serving architecture**: "We use a two-stage pipeline: retrieval with ANN, then ranking with a DNN. Feature store for low-latency feature lookup."
- **Rollout strategy**: "Start with shadow deployment to validate, then canary to 1%, A/B test for statistical significance, then full rollout."
- **Monitoring**: "We monitor feature drift, model metrics, and business metrics. Automated alerts for >10% metric drops."
- **Feedback loops**: "User interactions feed back into training data. We need to be careful about feedback loops affecting model behavior."

---

## Interview Drills

### Drill 1: Cloud vs On-Device
Reproduce the 7-row comparison table from memory.

### Drill 2: Deployment Decision
For each use case, decide cloud vs on-device:
- Spam email filtering
- Keyboard next-word prediction
- Search ranking
- Face recognition unlock

### Drill 3: Testing Strategy
You're deploying a new ranking model. Design the rollout:
- What testing strategy first?
- What metrics to track?
- What's your rollback plan?

### Drill 4: Pipeline Design
Draw the news feed pipeline from memory:
- Retrieval → Ranking → Re-ranking
- Online features vs batch features
- Feature store role

### Drill 5: Monitoring
For a harmful content detection model:
- What 3 model metrics would you monitor?
- What 2 system metrics?
- What alert thresholds?

---

## Summary: The Complete Framework

This concludes the 8 notebooks covering the chapter's framework:

1. **Clarifying Requirements** - Business objective, data, constraints, scale, performance, privacy
2. **Framing as ML Task** - ML objective, inputs/outputs, ML category
3. **Data Preparation** - ETL, storage types, feature engineering
4. **Model Development** - Selection, training, class imbalance
5. **Evaluation** - Offline metrics, online metrics, fairness
6. **Deployment & Serving** - Cloud vs device, compression, batch vs online
7. **Monitoring** - Metrics tracking, alerts, feedback loops

**Remember for Meta interviews:**
- E5: Clear end-to-end structure, metrics reasoning, practical tradeoffs
- E6: Scale considerations, failure modes, iteration velocity, feedback loops