# Tutorial 00: Introduction to ML Systems

Welcome to the first tutorial in our comprehensive ML System Design series! This tutorial provides the foundation for understanding the difference between building ML models and building production-ready ML systems.

---

## Learning Objectives

By the end of this tutorial, you will be able to:

1. **Understand** the difference between ML algorithms and ML systems
2. **Identify** components of production-ready ML systems
3. **Learn** the 7-step ML system design framework
4. **Decide** when to use ML vs traditional approaches

---

## Setup

Let's start by importing the necessary libraries for this tutorial.

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# For visualizations
from IPython.display import display, HTML, Markdown

# Set plotting style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

print("‚úÖ Setup complete!")

---

## 1. ML Algorithms vs ML Systems

### What's the Difference?

When many people think about machine learning, they focus on the **algorithm** - the model that learns patterns from data. However, in production environments, the algorithm is just one piece of a much larger puzzle.

| Aspect | ML Algorithm | ML System |
|--------|-------------|----------|
| **Focus** | Mathematical model | End-to-end solution |
| **Input** | Clean, prepared data | Raw, messy real-world data |
| **Output** | Predictions | Actionable results |
| **Environment** | Jupyter notebook | Production infrastructure |
| **Concern** | Accuracy | Accuracy + Latency + Cost + Reliability |
| **Lifecycle** | Train once | Continuous improvement |

### The "Tiny Box" Analogy

In a famous paper from Google, researchers showed that ML code (the algorithm) represents only a small fraction of a real-world ML system:

In [None]:
# Visualization: ML Code as a fraction of ML System

fig, ax = plt.subplots(1, 1, figsize=(12, 8))

# Components of an ML System with their relative sizes
components = {
    'Data Collection': 15,
    'Data Verification': 10,
    'Feature Engineering': 15,
    'ML Code': 5,  # The algorithm itself
    'Configuration': 8,
    'Serving Infrastructure': 15,
    'Monitoring': 12,
    'Resource Management': 10,
    'Process Management': 10
}

# Create a horizontal bar chart
colors = ['#3498db' if k != 'ML Code' else '#e74c3c' for k in components.keys()]
bars = ax.barh(list(components.keys()), list(components.values()), color=colors)

# Highlight the ML Code bar
ax.axvline(x=5, color='#e74c3c', linestyle='--', alpha=0.5)

ax.set_xlabel('Relative Effort/Complexity (%)', fontsize=12)
ax.set_title('Components of a Production ML System\n(ML Code is just ~5% of the total effort!)', fontsize=14)

# Add value labels
for bar, value in zip(bars, components.values()):
    ax.text(value + 0.5, bar.get_y() + bar.get_height()/2, f'{value}%', 
            va='center', fontsize=10)

plt.tight_layout()
plt.show()

print("\nüí° Key Insight: The actual ML algorithm is typically only ~5% of a production ML system!")

### Real-World Example: Movie Recommendation

Let's compare building a recommendation algorithm vs. a recommendation system:

In [None]:
# Example: Algorithm-focused approach
class SimpleRecommendationAlgorithm:
    """A simple collaborative filtering algorithm - just the 'ML Code' part"""
    
    def __init__(self):
        self.user_item_matrix = None
        self.similarity_matrix = None
    
    def fit(self, ratings_matrix):
        """Train the model on user-item ratings"""
        self.user_item_matrix = ratings_matrix
        # Calculate item-item similarity using cosine similarity
        from sklearn.metrics.pairwise import cosine_similarity
        self.similarity_matrix = cosine_similarity(ratings_matrix.T)
        return self
    
    def predict(self, user_id, item_id):
        """Predict rating for a user-item pair"""
        # Get items the user has rated
        user_ratings = self.user_item_matrix[user_id]
        # Weight by similarity
        similarities = self.similarity_matrix[item_id]
        weighted_sum = np.dot(user_ratings, similarities)
        similarity_sum = np.sum(np.abs(similarities))
        return weighted_sum / similarity_sum if similarity_sum > 0 else 0

# Demo with sample data
np.random.seed(42)
sample_ratings = np.random.randint(0, 6, (10, 20))  # 10 users, 20 movies

algo = SimpleRecommendationAlgorithm()
algo.fit(sample_ratings)

# Predict rating for user 0, item 5
predicted_rating = algo.predict(0, 5)
print(f"üìä Algorithm predicts user 0 would rate movie 5: {predicted_rating:.2f}")
print("\n‚ö†Ô∏è  This is JUST the algorithm. A production system needs much more!")

In [None]:
# Example: System-focused approach (conceptual outline)
class RecommendationSystemOutline:
    """Conceptual outline of a production recommendation system"""
    
    def __init__(self):
        self.components = {
            'data_ingestion': {
                'description': 'Collect user interactions, ratings, clicks',
                'technologies': ['Kafka', 'Kinesis', 'Spark Streaming'],
                'considerations': ['Real-time vs batch', 'Data volume', 'Schema evolution']
            },
            'data_storage': {
                'description': 'Store raw and processed data',
                'technologies': ['PostgreSQL', 'Redis', 'S3', 'Feature Store'],
                'considerations': ['Query patterns', 'Storage costs', 'Data retention']
            },
            'feature_engineering': {
                'description': 'Transform raw data into ML features',
                'technologies': ['Spark', 'Pandas', 'Feature Store'],
                'considerations': ['Feature freshness', 'Consistency', 'Scalability']
            },
            'model_training': {
                'description': 'Train and update recommendation models',
                'technologies': ['PyTorch', 'TensorFlow', 'XGBoost'],
                'considerations': ['Training frequency', 'Hyperparameter tuning', 'Versioning']
            },
            'model_serving': {
                'description': 'Serve predictions with low latency',
                'technologies': ['TorchServe', 'TensorFlow Serving', 'FastAPI'],
                'considerations': ['Latency SLA', 'Throughput', 'Caching']
            },
            'monitoring': {
                'description': 'Track system health and model performance',
                'technologies': ['Prometheus', 'Grafana', 'DataDog'],
                'considerations': ['Alerting', 'Drift detection', 'A/B testing']
            }
        }
    
    def describe(self):
        """Print system components"""
        print("üèóÔ∏è  Production Recommendation System Components:\n")
        for name, details in self.components.items():
            print(f"üì¶ {name.upper()}")
            print(f"   Description: {details['description']}")
            print(f"   Technologies: {', '.join(details['technologies'])}")
            print(f"   Key Considerations: {', '.join(details['considerations'])}")
            print()

system = RecommendationSystemOutline()
system.describe()

---

## 2. Components of Production ML Systems

A production ML system consists of several interconnected components. Let's explore each one:

In [None]:
# Visualization: ML System Architecture

fig, ax = plt.subplots(figsize=(14, 10))
ax.set_xlim(0, 100)
ax.set_ylim(0, 100)
ax.axis('off')

# Define boxes for each component
boxes = [
    # Data Stack
    {'name': 'Data Sources', 'x': 5, 'y': 75, 'w': 18, 'h': 20, 'color': '#3498db'},
    {'name': 'Data\nProcessing', 'x': 28, 'y': 75, 'w': 18, 'h': 20, 'color': '#3498db'},
    {'name': 'Feature\nStore', 'x': 51, 'y': 75, 'w': 18, 'h': 20, 'color': '#3498db'},
    
    # Model Stack
    {'name': 'Model\nTraining', 'x': 28, 'y': 45, 'w': 18, 'h': 20, 'color': '#e74c3c'},
    {'name': 'Model\nRegistry', 'x': 51, 'y': 45, 'w': 18, 'h': 20, 'color': '#e74c3c'},
    
    # Serving Stack
    {'name': 'Model\nServing', 'x': 74, 'y': 60, 'w': 18, 'h': 25, 'color': '#2ecc71'},
    
    # Monitoring Stack
    {'name': 'Monitoring\n& Logging', 'x': 28, 'y': 10, 'w': 41, 'h': 20, 'color': '#9b59b6'},
]

for box in boxes:
    rect = plt.Rectangle((box['x'], box['y']), box['w'], box['h'], 
                         facecolor=box['color'], edgecolor='white', 
                         linewidth=2, alpha=0.8)
    ax.add_patch(rect)
    ax.text(box['x'] + box['w']/2, box['y'] + box['h']/2, box['name'],
            ha='center', va='center', fontsize=11, color='white', fontweight='bold')

# Add arrows
arrows = [
    (23, 85, 5, 0),   # Data Sources -> Data Processing
    (46, 85, 5, 0),   # Data Processing -> Feature Store
    (60, 75, 0, -10), # Feature Store -> Model Training
    (46, 55, 5, 0),   # Model Training -> Model Registry
    (69, 55, 5, 5),   # Model Registry -> Model Serving
    (69, 82, 5, 0),   # Feature Store -> Model Serving
    (83, 60, 0, -20), # Model Serving -> Monitoring
]

for arrow in arrows:
    ax.annotate('', xy=(arrow[0]+arrow[2], arrow[1]+arrow[3]), 
                xytext=(arrow[0], arrow[1]),
                arrowprops=dict(arrowstyle='->', color='gray', lw=2))

# Add labels for stacks
ax.text(5, 98, 'DATA STACK', fontsize=12, fontweight='bold', color='#3498db')
ax.text(28, 68, 'MODEL STACK', fontsize=12, fontweight='bold', color='#e74c3c')
ax.text(74, 88, 'SERVING\nSTACK', fontsize=12, fontweight='bold', color='#2ecc71')
ax.text(28, 33, 'MONITORING STACK', fontsize=12, fontweight='bold', color='#9b59b6')

ax.set_title('Production ML System Architecture', fontsize=16, fontweight='bold', pad=20)
plt.tight_layout()
plt.show()

### 2.1 Data Stack

The **Data Stack** handles everything related to data collection, processing, and storage.

In [None]:
# Example: Data Stack Components

data_stack = {
    "Data Sources": {
        "types": [
            "User interactions (clicks, views, purchases)",
            "Content metadata (titles, descriptions)",
            "User profiles (demographics, preferences)",
            "External data (3rd party, APIs)"
        ],
        "key_questions": [
            "Where does the data come from?",
            "What's the data freshness requirement?",
            "How much data do we have?"
        ]
    },
    "Data Processing": {
        "steps": [
            "Data validation and cleaning",
            "Data transformation (ETL)",
            "Data aggregation",
            "Feature computation"
        ],
        "tools": ["Apache Spark", "Apache Flink", "dbt", "Airflow"]
    },
    "Feature Store": {
        "purpose": "Centralized repository for ML features",
        "benefits": [
            "Feature reuse across models",
            "Consistency between training and serving",
            "Feature versioning",
            "Real-time feature serving"
        ],
        "tools": ["Feast", "Tecton", "Hopsworks"]
    }
}

for component, details in data_stack.items():
    print(f"\nüîπ {component}")
    print("=" * 40)
    for key, values in details.items():
        print(f"  {key.title()}:")
        if isinstance(values, list):
            for v in values:
                print(f"    ‚Ä¢ {v}")
        else:
            print(f"    {values}")

### 2.2 Serving Infrastructure

The **Serving Infrastructure** is responsible for deploying models and serving predictions.

In [None]:
# Example: Serving Infrastructure Considerations

serving_infrastructure = pd.DataFrame({
    "Aspect": [
        "Latency",
        "Throughput", 
        "Availability",
        "Scalability",
        "Cost"
    ],
    "Description": [
        "Time to return prediction (e.g., <100ms)",
        "Requests per second (e.g., 10K RPS)",
        "Uptime guarantee (e.g., 99.9%)",
        "Ability to handle load spikes",
        "Infrastructure and compute costs"
    ],
    "Typical Target": [
        "p50 < 50ms, p99 < 200ms",
        "1K - 100K RPS",
        "99.9% - 99.99%",
        "Auto-scaling 2-10x",
        "$0.001 - $0.01 per prediction"
    ]
})

print("üöÄ Serving Infrastructure Requirements:")
print("=" * 70)
display(serving_infrastructure)

### 2.3 Evaluation Pipeline

The **Evaluation Pipeline** ensures models perform well both offline and online.

In [None]:
# Example: Evaluation Pipeline Components

evaluation_types = {
    "Offline Evaluation": {
        "when": "Before deployment",
        "metrics": ["Accuracy", "Precision/Recall", "AUC-ROC", "NDCG"],
        "data": "Historical test set",
        "pros": "Fast, cheap, reproducible",
        "cons": "May not reflect real-world performance"
    },
    "Online Evaluation": {
        "when": "During/after deployment",
        "metrics": ["Click-through rate", "Conversion rate", "Revenue", "Engagement"],
        "data": "Live user traffic",
        "pros": "Measures real impact",
        "cons": "Slow, expensive, risky"
    }
}

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

for idx, (eval_type, details) in enumerate(evaluation_types.items()):
    ax = axes[idx]
    ax.axis('off')
    
    # Create a text box
    text = f"{eval_type}\n\n"
    text += f"When: {details['when']}\n\n"
    text += f"Metrics:\n"
    for m in details['metrics']:
        text += f"  ‚Ä¢ {m}\n"
    text += f"\nData: {details['data']}\n"
    text += f"\n‚úÖ Pros: {details['pros']}\n"
    text += f"‚ùå Cons: {details['cons']}"
    
    color = '#3498db' if idx == 0 else '#2ecc71'
    ax.text(0.5, 0.5, text, transform=ax.transAxes, fontsize=11,
            verticalalignment='center', horizontalalignment='center',
            bbox=dict(boxstyle='round,pad=1', facecolor=color, alpha=0.2))
    ax.set_title(eval_type, fontsize=14, fontweight='bold')

plt.suptitle('Evaluation Pipeline: Offline vs Online', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

### 2.4 Monitoring Systems

**Monitoring** is critical for maintaining ML system health in production.

In [None]:
# Example: What to Monitor in ML Systems

monitoring_categories = {
    "System Metrics": {
        "icon": "‚öôÔ∏è",
        "metrics": [
            "CPU/GPU utilization",
            "Memory usage",
            "Request latency",
            "Error rates",
            "Throughput"
        ]
    },
    "Data Metrics": {
        "icon": "üìä",
        "metrics": [
            "Input data distribution",
            "Missing values",
            "Feature statistics",
            "Data freshness",
            "Schema violations"
        ]
    },
    "Model Metrics": {
        "icon": "ü§ñ",
        "metrics": [
            "Prediction distribution",
            "Model confidence scores",
            "Online accuracy (if labels available)",
            "Model drift",
            "Feature importance changes"
        ]
    },
    "Business Metrics": {
        "icon": "üí∞",
        "metrics": [
            "Click-through rate",
            "Conversion rate",
            "Revenue impact",
            "User satisfaction",
            "Engagement metrics"
        ]
    }
}

print("üìà ML System Monitoring Categories:")
print("=" * 50)

for category, details in monitoring_categories.items():
    print(f"\n{details['icon']} {category}")
    for metric in details['metrics']:
        print(f"   ‚Ä¢ {metric}")

---

## 3. The 7-Step ML System Design Framework

This framework provides a structured approach to designing ML systems. Each step builds on the previous one.

In [None]:
# Visualization: 7-Step Framework

steps = [
    {"step": 1, "name": "Clarify Requirements", "color": "#e74c3c", 
     "description": "Understand business goals, constraints, and scale"},
    {"step": 2, "name": "Frame the Problem", "color": "#e67e22",
     "description": "Translate to ML objectives, define I/O"},
    {"step": 3, "name": "Data Preparation", "color": "#f1c40f",
     "description": "ETL, feature engineering, data pipeline"},
    {"step": 4, "name": "Model Development", "color": "#2ecc71",
     "description": "Select, train, and tune models"},
    {"step": 5, "name": "Evaluation", "color": "#3498db",
     "description": "Offline & online metrics, A/B testing"},
    {"step": 6, "name": "Deployment", "color": "#9b59b6",
     "description": "Serving infrastructure, scaling"},
    {"step": 7, "name": "Monitoring", "color": "#34495e",
     "description": "Track performance, detect drift, maintain"}
]

fig, ax = plt.subplots(figsize=(14, 8))
ax.set_xlim(0, 100)
ax.set_ylim(0, 100)
ax.axis('off')

# Draw steps as a circular flow
import math

center_x, center_y = 50, 45
radius = 35

for i, step in enumerate(steps):
    angle = (i / len(steps)) * 2 * math.pi - math.pi/2
    x = center_x + radius * math.cos(angle)
    y = center_y + radius * math.sin(angle)
    
    # Draw circle for step
    circle = plt.Circle((x, y), 8, color=step['color'], alpha=0.9)
    ax.add_patch(circle)
    
    # Add step number
    ax.text(x, y, str(step['step']), fontsize=20, fontweight='bold',
            ha='center', va='center', color='white')
    
    # Add step name (outside circle)
    text_x = center_x + (radius + 15) * math.cos(angle)
    text_y = center_y + (radius + 15) * math.sin(angle)
    
    # Adjust text alignment based on position
    ha = 'center'
    if x < center_x - 5:
        ha = 'right'
    elif x > center_x + 5:
        ha = 'left'
    
    ax.text(text_x, text_y, f"{step['name']}\n{step['description']}",
            fontsize=9, ha=ha, va='center',
            bbox=dict(boxstyle='round,pad=0.3', facecolor=step['color'], alpha=0.2))

# Draw arrows between steps
for i in range(len(steps)):
    angle1 = (i / len(steps)) * 2 * math.pi - math.pi/2
    angle2 = ((i + 1) / len(steps)) * 2 * math.pi - math.pi/2
    
    x1 = center_x + radius * math.cos(angle1 + 0.3)
    y1 = center_y + radius * math.sin(angle1 + 0.3)
    x2 = center_x + radius * math.cos(angle2 - 0.3)
    y2 = center_y + radius * math.sin(angle2 - 0.3)
    
    ax.annotate('', xy=(x2, y2), xytext=(x1, y1),
                arrowprops=dict(arrowstyle='->', color='gray', lw=1.5))

ax.set_title('The 7-Step ML System Design Framework', fontsize=16, fontweight='bold', pad=20)
plt.tight_layout()
plt.show()

In [None]:
# Framework Summary Table

framework_df = pd.DataFrame({
    "Step": [1, 2, 3, 4, 5, 6, 7],
    "Name": [
        "Clarify Requirements",
        "Frame the Problem",
        "Data Preparation",
        "Model Development",
        "Evaluation",
        "Deployment",
        "Monitoring"
    ],
    "Key Questions": [
        "What are we optimizing? What are the constraints?",
        "What type of ML problem is this? What are inputs/outputs?",
        "What data do we have? How do we engineer features?",
        "Which models to try? How to train them?",
        "How do we measure success? What experiments to run?",
        "How to serve predictions? How to scale?",
        "How to detect issues? When to retrain?"
    ],
    "Key Outputs": [
        "Requirements document, success metrics",
        "ML objective, I/O specification, task type",
        "ETL pipeline, feature store, dataset",
        "Trained model, hyperparameters",
        "Evaluation report, experiment results",
        "Serving infrastructure, APIs",
        "Dashboards, alerts, retraining triggers"
    ]
})

print("üìã 7-Step Framework Summary:")
display(framework_df)

---

## 4. When to Use ML vs Traditional Approaches

Not every problem needs ML! Here's a framework to help you decide.

In [None]:
# Decision Framework: When to Use ML

def should_use_ml(problem):
    """
    A decision framework to determine if ML is appropriate for a problem.
    
    Parameters:
    problem: dict with keys describing the problem characteristics
    
    Returns:
    recommendation: str, explanation: str
    """
    
    scores = {
        'ml_favorable': 0,
        'traditional_favorable': 0
    }
    
    reasons = []
    
    # Check: Is there a pattern to learn?
    if problem.get('has_patterns', False):
        scores['ml_favorable'] += 2
        reasons.append("‚úÖ Problem has patterns that can be learned")
    else:
        scores['traditional_favorable'] += 2
        reasons.append("‚ùå No clear patterns - rules might work better")
    
    # Check: Do you have enough data?
    data_size = problem.get('data_size', 'small')
    if data_size == 'large':
        scores['ml_favorable'] += 2
        reasons.append("‚úÖ Large dataset available for training")
    elif data_size == 'medium':
        scores['ml_favorable'] += 1
        reasons.append("‚ö†Ô∏è Medium-sized dataset - might need simple models")
    else:
        scores['traditional_favorable'] += 2
        reasons.append("‚ùå Small dataset - ML may overfit")
    
    # Check: Is the problem too complex for rules?
    if problem.get('rule_complexity', 'low') == 'high':
        scores['ml_favorable'] += 2
        reasons.append("‚úÖ Too complex for hand-crafted rules")
    else:
        scores['traditional_favorable'] += 1
        reasons.append("‚ö†Ô∏è Simple rules might suffice")
    
    # Check: Do you need to handle unseen cases?
    if problem.get('needs_generalization', False):
        scores['ml_favorable'] += 2
        reasons.append("‚úÖ Need to generalize to new cases")
    else:
        scores['traditional_favorable'] += 1
        reasons.append("‚ö†Ô∏è Known cases can be handled with rules")
    
    # Check: Is interpretability critical?
    if problem.get('needs_interpretability', False):
        scores['traditional_favorable'] += 1
        reasons.append("‚ö†Ô∏è Interpretability needed - consider simple models or rules")
    
    # Calculate recommendation
    if scores['ml_favorable'] > scores['traditional_favorable']:
        recommendation = "üëç ML Recommended"
    elif scores['traditional_favorable'] > scores['ml_favorable']:
        recommendation = "üìè Traditional Approach Recommended"
    else:
        recommendation = "ü§î Consider both approaches"
    
    return recommendation, reasons

# Example problems
problems = [
    {
        'name': 'Email Spam Detection',
        'has_patterns': True,
        'data_size': 'large',
        'rule_complexity': 'high',
        'needs_generalization': True,
        'needs_interpretability': False
    },
    {
        'name': 'Tax Calculation',
        'has_patterns': False,
        'data_size': 'small',
        'rule_complexity': 'low',
        'needs_generalization': False,
        'needs_interpretability': True
    },
    {
        'name': 'Image Classification',
        'has_patterns': True,
        'data_size': 'large',
        'rule_complexity': 'high',
        'needs_generalization': True,
        'needs_interpretability': False
    }
]

print("üéØ ML vs Traditional Approach Decision Framework")
print("=" * 60)

for problem in problems:
    print(f"\nüìå Problem: {problem['name']}")
    recommendation, reasons = should_use_ml(problem)
    print(f"   Recommendation: {recommendation}")
    print("   Analysis:")
    for reason in reasons:
        print(f"      {reason}")

In [None]:
# Visualization: ML vs Traditional Decision Matrix

decision_matrix = pd.DataFrame({
    'Criteria': [
        'Pattern Complexity',
        'Data Availability',
        'Need for Generalization',
        'Rule Maintainability',
        'Interpretability Requirement',
        'Accuracy Requirement'
    ],
    'Favors ML': [
        'Complex patterns, non-linear relationships',
        'Large amounts of labeled data',
        'Must handle unseen cases',
        'Rules would be hard to maintain',
        'Black-box acceptable',
        'High accuracy critical'
    ],
    'Favors Traditional': [
        'Simple, well-defined rules',
        'Limited or no data',
        'Known, finite set of cases',
        'Rules are simple and stable',
        'Must explain every decision',
        'Good enough is sufficient'
    ]
})

print("üìä ML vs Traditional Approach Decision Matrix:")
display(decision_matrix)

---

## 5. Hands-On Exercise: Analyze a Recommendation System

Let's apply what we've learned by analyzing the architecture of a movie recommendation system.

In [None]:
# Exercise: Map a Recommendation System to the 7-Step Framework

class RecommendationSystemAnalysis:
    """
    Analyze a movie recommendation system using the 7-step framework.
    """
    
    def __init__(self, name):
        self.name = name
        self.steps = {}
    
    def add_step(self, step_num, step_name, details):
        """Add analysis for a framework step"""
        self.steps[step_num] = {
            'name': step_name,
            'details': details
        }
    
    def display_analysis(self):
        """Display the complete analysis"""
        print(f"\nüé¨ System Analysis: {self.name}")
        print("=" * 60)
        
        for step_num in sorted(self.steps.keys()):
            step = self.steps[step_num]
            print(f"\nStep {step_num}: {step['name']}")
            print("-" * 40)
            for key, value in step['details'].items():
                print(f"  ‚Ä¢ {key}: {value}")

# Create analysis for Netflix-like recommendation system
netflix = RecommendationSystemAnalysis("Netflix-like Movie Recommendations")

netflix.add_step(1, "Clarify Requirements", {
    "Business Goal": "Increase user engagement and retention",
    "Success Metric": "Time spent watching / Subscriber retention",
    "Scale": "200M+ users, 15K+ titles",
    "Latency Requirement": "< 200ms for recommendations",
    "Personalization Level": "User-level personalization"
})

netflix.add_step(2, "Frame the Problem", {
    "ML Objective": "Predict user engagement score for each movie",
    "Input": "User features + Movie features + Context",
    "Output": "Ranked list of movies",
    "Task Type": "Learning-to-Rank / Regression"
})

netflix.add_step(3, "Data Preparation", {
    "Data Sources": "Watch history, ratings, search queries, browse behavior",
    "Features": "User embeddings, movie embeddings, temporal features",
    "Data Pipeline": "Streaming ingestion + Batch processing"
})

netflix.add_step(4, "Model Development", {
    "Model Architecture": "Two-tower neural network",
    "Training Frequency": "Daily batch + Real-time updates",
    "Candidate Generation": "Approximate nearest neighbor search"
})

netflix.add_step(5, "Evaluation", {
    "Offline Metrics": "Recall@K, NDCG, Hit Rate",
    "Online Metrics": "CTR, Watch Time, Retention",
    "Testing": "A/B testing with holdout groups"
})

netflix.add_step(6, "Deployment", {
    "Serving Strategy": "Two-stage: Candidate retrieval + Ranking",
    "Infrastructure": "Distributed model serving",
    "Caching": "Pre-computed candidates for cold start"
})

netflix.add_step(7, "Monitoring", {
    "System Metrics": "Latency, throughput, error rates",
    "Model Metrics": "Prediction distribution, feature drift",
    "Business Metrics": "Daily active users, engagement trends"
})

netflix.display_analysis()

### Try It Yourself! üöÄ

Now it's your turn! Complete the analysis for a different ML system.

In [None]:
# TODO: Create your own ML system analysis
# Choose one of the following systems:
# - Fraud Detection System (for a bank)
# - Search Ranking System (for an e-commerce site)
# - Content Moderation System (for a social media platform)

# Uncomment and complete the code below:

# your_system = RecommendationSystemAnalysis("Your System Name")

# your_system.add_step(1, "Clarify Requirements", {
#     "Business Goal": "...",
#     "Success Metric": "...",
#     "Scale": "...",
#     "Latency Requirement": "...",
# })

# # Add steps 2-7...

# your_system.display_analysis()

print("üí° Tip: Think about what makes each system unique in terms of:")
print("   - What data is available?")
print("   - What are the latency constraints?")
print("   - What happens if the model is wrong?")
print("   - How do you measure success?")

---

## Summary

### Key Takeaways

1. **ML Algorithm ‚â† ML System**: The algorithm is just ~5% of a production ML system. The rest includes data pipelines, serving infrastructure, monitoring, and more.

2. **Four Main Components** of production ML systems:
   - üìä **Data Stack**: Collection, processing, and storage
   - üöÄ **Serving Infrastructure**: Low-latency predictions at scale
   - üìè **Evaluation Pipeline**: Offline and online metrics
   - üìà **Monitoring Systems**: Track health and detect issues

3. **The 7-Step Framework** provides a systematic approach:
   1. Clarify Requirements
   2. Frame the Problem
   3. Data Preparation
   4. Model Development
   5. Evaluation
   6. Deployment
   7. Monitoring

4. **ML is not always the answer**: Consider traditional approaches when:
   - Data is scarce
   - Rules are simple and well-defined
   - Interpretability is critical
   - The problem is deterministic

### What's Next?

In the next tutorial, we'll dive deep into **Step 1: Clarifying Requirements** - learning how to ask the right questions to understand business objectives, constraints, and scale requirements.

---

## Additional Resources

- üìö [Hidden Technical Debt in Machine Learning Systems](https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf) - Google's seminal paper
- üìö [Machine Learning System Design Interview](https://www.amazon.com/Machine-Learning-System-Design-Interview/dp/1736049127) - Book reference
- üé• [Made With ML](https://madewithml.com/) - Practical MLOps course

In [None]:
# Final Review: Quick Self-Assessment Quiz

quiz_questions = [
    {
        "question": "What percentage of a production ML system is typically the ML algorithm itself?",
        "options": ["A) 50%", "B) 25%", "C) 5%", "D) 80%"],
        "answer": "C"
    },
    {
        "question": "Which component is responsible for serving predictions with low latency?",
        "options": ["A) Data Stack", "B) Serving Infrastructure", "C) Evaluation Pipeline", "D) Feature Store"],
        "answer": "B"
    },
    {
        "question": "When should you prefer traditional approaches over ML?",
        "options": ["A) When you have lots of data", "B) When patterns are complex", 
                   "C) When rules are simple and well-defined", "D) When you need generalization"],
        "answer": "C"
    }
]

print("üìù Quick Self-Assessment Quiz")
print("=" * 50)

for i, q in enumerate(quiz_questions, 1):
    print(f"\nQ{i}: {q['question']}")
    for opt in q['options']:
        print(f"   {opt}")

print("\n" + "=" * 50)
print("Answers: 1-C, 2-B, 3-C")