# Tutorial 02: Framing Problems as ML Tasks

Welcome to the third tutorial in our ML System Design series! This tutorial focuses on one of the most critical skills in ML system design: **framing business problems as ML tasks**.

---

## Learning Objectives

By the end of this tutorial, you will be able to:

1. **Translate** business objectives to ML objectives
2. **Define** appropriate input/output specifications
3. **Choose** between single and multi-model architectures
4. **Design** effective model pipeline architectures

---

In [None]:
# Setup
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from dataclasses import dataclass, field
from typing import List, Dict, Optional, Tuple, Any
from enum import Enum

plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")
print("Setup complete!")

## Why Problem Framing Matters

Problem framing is the bridge between business requirements and technical implementation. A good frame:

- **Simplifies** the problem to its essential components
- **Defines** clear success criteria
- **Enables** appropriate model selection
- **Facilitates** evaluation and iteration

A poor frame can lead to:
- Building the wrong model
- Optimizing for the wrong metric
- Impossible-to-solve problems
- Wasted engineering effort

In [None]:
# Visualization: Problem Framing in the ML Pipeline
fig, ax = plt.subplots(figsize=(14, 4))
ax.set_xlim(0, 100)
ax.set_ylim(0, 20)
ax.axis('off')

# Draw pipeline stages
stages = [
    ('Business\nProblem', 5, '#e74c3c'),
    ('Problem\nFraming', 25, '#f39c12'),
    ('ML\nObjective', 45, '#27ae60'),
    ('Model\nDesign', 65, '#3498db'),
    ('Implementation', 85, '#9b59b6')
]

for name, x, color in stages:
    rect = plt.Rectangle((x-8, 5), 16, 10, facecolor=color, edgecolor='white', linewidth=2)
    ax.add_patch(rect)
    ax.text(x, 10, name, ha='center', va='center', fontsize=10, color='white', fontweight='bold')

# Draw arrows
for i in range(len(stages)-1):
    ax.annotate('', xy=(stages[i+1][1]-9, 10), xytext=(stages[i][1]+9, 10),
               arrowprops=dict(arrowstyle='->', color='gray', lw=2))

# Highlight problem framing
ax.annotate('This Tutorial!', xy=(25, 16), ha='center', fontsize=12, 
            fontweight='bold', color='#f39c12')

ax.set_title('Problem Framing in the ML System Design Pipeline', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

---

## 1. Business to ML Objective Translation

The first step in problem framing is translating a business objective into an ML objective that can be optimized.

In [None]:
# Framework for translating business objectives

@dataclass
class ObjectiveTranslation:
    """Represents the translation from business to ML objective"""
    business_objective: str
    proxy_metric: str
    ml_objective: str
    optimization_target: str
    potential_issues: List[str]

# Common translations
translations = [
    ObjectiveTranslation(
        business_objective="Increase ticket sales for events",
        proxy_metric="Event registration rate",
        ml_objective="Predict which events a user will register for",
        optimization_target="P(registration | user, event)",
        potential_issues=["Cold start for new events", "Seasonality effects"]
    ),
    ObjectiveTranslation(
        business_objective="Maximize user engagement on platform",
        proxy_metric="Time spent / sessions per day",
        ml_objective="Predict content that maximizes watch time",
        optimization_target="E[watch_time | user, content]",
        potential_issues=["Clickbait optimization", "Filter bubbles"]
    ),
    ObjectiveTranslation(
        business_objective="Reduce customer support costs",
        proxy_metric="Ticket resolution rate, deflection rate",
        ml_objective="Classify and route tickets, suggest answers",
        optimization_target="P(correct_category | ticket_text)",
        potential_issues=["Edge cases", "User frustration with automation"]
    ),
    ObjectiveTranslation(
        business_objective="Prevent fraudulent transactions",
        proxy_metric="Fraud rate, false positive rate",
        ml_objective="Classify transactions as fraud/legitimate",
        optimization_target="P(fraud | transaction_features)",
        potential_issues=["Class imbalance", "Adversarial behavior"]
    )
]

# Display as table
df = pd.DataFrame([{
    'Business Objective': t.business_objective,
    'Proxy Metric': t.proxy_metric,
    'ML Objective': t.ml_objective,
    'Optimization Target': t.optimization_target
} for t in translations])

print("Business to ML Objective Translation Examples:")
display(df)

In [None]:
# Interactive Translation Framework

class ObjectiveTranslator:
    """Helper class to translate business objectives to ML objectives"""
    
    def __init__(self):
        self.templates = {
            "increase": "Predict {target} that maximizes {metric}",
            "reduce": "Predict {target} that minimizes {metric}",
            "optimize": "Predict optimal {target} for {metric}",
            "detect": "Classify {target} as {classes}",
            "recommend": "Rank {items} by predicted {metric} for {user}",
            "generate": "Generate {output} given {input}"
        }
    
    def translate(self, business_obj: str, context: Dict[str, str]) -> str:
        """Translate a business objective to ML objective"""
        # Find matching template
        for keyword, template in self.templates.items():
            if keyword in business_obj.lower():
                return template.format(**context)
        return f"Predict outcome for: {business_obj}"
    
    def suggest_metrics(self, ml_objective: str) -> List[str]:
        """Suggest appropriate metrics for an ML objective"""
        metrics = {
            "predict": ["MSE", "MAE", "R-squared"],
            "classify": ["Precision", "Recall", "F1", "AUC-ROC"],
            "rank": ["NDCG", "MRR", "Precision@K"],
            "generate": ["BLEU", "ROUGE", "Perplexity"]
        }
        
        for task_type, task_metrics in metrics.items():
            if task_type in ml_objective.lower():
                return task_metrics
        return ["Custom metric needed"]

# Example usage
translator = ObjectiveTranslator()

examples = [
    ("Increase user retention", {"target": "content", "metric": "retention"}),
    ("Detect spam emails", {"target": "emails", "classes": "spam/not-spam"}),
    ("Recommend products to users", {"items": "products", "metric": "purchase probability", "user": "user"})
]

print("Objective Translation Examples:")
print("=" * 60)
for business_obj, context in examples:
    ml_obj = translator.translate(business_obj, context)
    metrics = translator.suggest_metrics(ml_obj)
    print(f"\nBusiness: {business_obj}")
    print(f"ML Objective: {ml_obj}")
    print(f"Suggested Metrics: {', '.join(metrics)}")

### Common Pitfalls in Objective Translation

| Pitfall | Example | Solution |
|---------|---------|----------|
| Proxy Mismatch | Optimizing for clicks instead of conversions | Align proxy with true business value |
| Short-term Focus | Maximizing immediate engagement | Include long-term metrics (retention) |
| Gaming Risk | Clickbait titles for CTR | Multiple objectives, constraints |
| Missing Constraints | High accuracy but 1-hour latency | Specify all requirements upfront |

---

## 2. Defining System Input/Output

Once we have an ML objective, we need to precisely define what the system takes as input and produces as output.

In [None]:
# Input/Output Specification Framework

@dataclass
class IOSpecification:
    """Defines the input/output specification for an ML system"""
    name: str
    input_type: str
    input_schema: Dict[str, str]
    output_type: str
    output_schema: Dict[str, str]
    context: List[str]
    
    def display(self):
        print(f"\n{'='*60}")
        print(f"System: {self.name}")
        print(f"{'='*60}")
        print(f"\nINPUT ({self.input_type}):")
        for field, dtype in self.input_schema.items():
            print(f"  - {field}: {dtype}")
        print(f"\nCONTEXT:")
        for ctx in self.context:
            print(f"  - {ctx}")
        print(f"\nOUTPUT ({self.output_type}):")
        for field, dtype in self.output_schema.items():
            print(f"  - {field}: {dtype}")

# Example: Video Recommendation System
video_rec_io = IOSpecification(
    name="Video Recommendation System",
    input_type="User Request",
    input_schema={
        "user_id": "string",
        "device_type": "enum[mobile, web, tv]",
        "timestamp": "datetime",
        "page_context": "enum[home, search, watch]"
    },
    output_type="Ranked List",
    output_schema={
        "video_ids": "List[string]",
        "scores": "List[float]",
        "explanations": "List[string] (optional)"
    },
    context=[
        "User's watch history (last 100 videos)",
        "User's profile (age, location, preferences)",
        "Current trending videos",
        "Time of day, day of week"
    ]
)

video_rec_io.display()

In [None]:
# Example: Content Moderation System
content_mod_io = IOSpecification(
    name="Content Moderation System",
    input_type="Content Item",
    input_schema={
        "content_id": "string",
        "content_type": "enum[text, image, video]",
        "content_data": "bytes or string",
        "author_id": "string",
        "metadata": "Dict[string, any]"
    },
    output_type="Moderation Decision",
    output_schema={
        "decision": "enum[approve, review, reject]",
        "violation_types": "List[string]",
        "confidence_score": "float",
        "explanation": "string"
    },
    context=[
        "Author's history and trust score",
        "Community guidelines and policies",
        "Regional content restrictions"
    ]
)

content_mod_io.display()

In [None]:
# Input/Output Design Patterns

io_patterns = pd.DataFrame({
    'Pattern': [
        'Single Item -> Single Label',
        'Single Item -> Multiple Labels',
        'Single Item -> Ranked List',
        'Multiple Items -> Scores',
        'Query + Context -> Response',
        'Sequence -> Sequence'
    ],
    'Example Use Case': [
        'Spam detection (email -> spam/not spam)',
        'Content tagging (image -> list of tags)',
        'Recommendations (user -> ranked items)',
        'Batch scoring (candidates -> relevance scores)',
        'Search (query + user -> ranked results)',
        'Translation (source text -> target text)'
    ],
    'ML Task Type': [
        'Binary Classification',
        'Multi-label Classification',
        'Ranking / Retrieval',
        'Scoring / Regression',
        'Contextual Ranking',
        'Sequence-to-Sequence'
    ]
})

print("Common I/O Patterns in ML Systems:")
display(io_patterns)

---

## 3. Single vs Multi-Model Architectures

A key design decision is whether to use a single model or multiple models working together.

In [None]:
# Visualization: Single vs Multi-Model

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Single Model
ax = axes[0]
ax.set_xlim(0, 100)
ax.set_ylim(0, 100)
ax.axis('off')

# Input
rect = plt.Rectangle((10, 40), 20, 20, facecolor='#3498db', edgecolor='white', lw=2)
ax.add_patch(rect)
ax.text(20, 50, 'Input', ha='center', va='center', color='white', fontweight='bold')

# Model
rect = plt.Rectangle((40, 35), 25, 30, facecolor='#e74c3c', edgecolor='white', lw=2)
ax.add_patch(rect)
ax.text(52.5, 50, 'Single\nModel', ha='center', va='center', color='white', fontweight='bold')

# Output
rect = plt.Rectangle((75, 40), 20, 20, facecolor='#27ae60', edgecolor='white', lw=2)
ax.add_patch(rect)
ax.text(85, 50, 'Output', ha='center', va='center', color='white', fontweight='bold')

# Arrows
ax.annotate('', xy=(40, 50), xytext=(30, 50), arrowprops=dict(arrowstyle='->', lw=2))
ax.annotate('', xy=(75, 50), xytext=(65, 50), arrowprops=dict(arrowstyle='->', lw=2))

ax.set_title('Single Model Architecture', fontsize=12, fontweight='bold')

# Multi-Model
ax = axes[1]
ax.set_xlim(0, 100)
ax.set_ylim(0, 100)
ax.axis('off')

# Input
rect = plt.Rectangle((5, 40), 15, 20, facecolor='#3498db', edgecolor='white', lw=2)
ax.add_patch(rect)
ax.text(12.5, 50, 'Input', ha='center', va='center', color='white', fontweight='bold', fontsize=9)

# Model 1: Candidate Generation
rect = plt.Rectangle((25, 55), 20, 25, facecolor='#e74c3c', edgecolor='white', lw=2)
ax.add_patch(rect)
ax.text(35, 67.5, 'Candidate\nGen', ha='center', va='center', color='white', fontweight='bold', fontsize=8)

# Model 2: Feature Extraction
rect = plt.Rectangle((25, 20), 20, 25, facecolor='#9b59b6', edgecolor='white', lw=2)
ax.add_patch(rect)
ax.text(35, 32.5, 'Feature\nExtract', ha='center', va='center', color='white', fontweight='bold', fontsize=8)

# Model 3: Ranking
rect = plt.Rectangle((55, 35), 20, 30, facecolor='#f39c12', edgecolor='white', lw=2)
ax.add_patch(rect)
ax.text(65, 50, 'Ranking\nModel', ha='center', va='center', color='white', fontweight='bold', fontsize=8)

# Output
rect = plt.Rectangle((80, 40), 15, 20, facecolor='#27ae60', edgecolor='white', lw=2)
ax.add_patch(rect)
ax.text(87.5, 50, 'Output', ha='center', va='center', color='white', fontweight='bold', fontsize=9)

# Arrows
ax.annotate('', xy=(25, 55), xytext=(20, 50), arrowprops=dict(arrowstyle='->', lw=1.5))
ax.annotate('', xy=(25, 35), xytext=(20, 50), arrowprops=dict(arrowstyle='->', lw=1.5))
ax.annotate('', xy=(55, 55), xytext=(45, 60), arrowprops=dict(arrowstyle='->', lw=1.5))
ax.annotate('', xy=(55, 45), xytext=(45, 35), arrowprops=dict(arrowstyle='->', lw=1.5))
ax.annotate('', xy=(80, 50), xytext=(75, 50), arrowprops=dict(arrowstyle='->', lw=1.5))

ax.set_title('Multi-Model Architecture', fontsize=12, fontweight='bold')

plt.tight_layout()
plt.show()

In [None]:
# Comparison: Single vs Multi-Model

comparison = pd.DataFrame({
    'Aspect': [
        'Complexity',
        'Latency',
        'Maintainability',
        'Flexibility',
        'Debugging',
        'Scalability',
        'Best For'
    ],
    'Single Model': [
        'Lower',
        'Lower (single inference)',
        'Easier',
        'Less flexible',
        'Easier',
        'Scales as one unit',
        'Simple tasks, low latency requirements'
    ],
    'Multi-Model': [
        'Higher',
        'Higher (multiple inferences)',
        'More complex',
        'More flexible',
        'Harder (cascading errors)',
        'Each component scales independently',
        'Complex tasks, large scale systems'
    ]
})

print("Single vs Multi-Model Architecture Comparison:")
display(comparison)

In [None]:
# Multi-Model Architecture Patterns

@dataclass
class MultiModelArchitecture:
    """Represents a multi-model architecture pattern"""
    name: str
    components: List[str]
    flow: str
    use_case: str
    benefits: List[str]

architectures = [
    MultiModelArchitecture(
        name="Two-Stage Retrieval + Ranking",
        components=["Candidate Generator", "Ranker"],
        flow="Items -> Fast Retrieval (1000s) -> Precise Ranking (10s)",
        use_case="Recommendation systems, Search",
        benefits=["Fast candidate generation", "Precise final ranking", "Scalable"]
    ),
    MultiModelArchitecture(
        name="Cascade Classification",
        components=["Fast Filter", "Detailed Classifier"],
        flow="Items -> Quick Filter -> Deep Analysis on subset",
        use_case="Content moderation, Spam detection",
        benefits=["Efficient processing", "Cost reduction", "High precision"]
    ),
    MultiModelArchitecture(
        name="Ensemble",
        components=["Model A", "Model B", "Model C", "Aggregator"],
        flow="Input -> Multiple Models in parallel -> Combine outputs",
        use_case="High-stakes predictions, Competitions",
        benefits=["Higher accuracy", "Robustness", "Diversity"]
    ),
    MultiModelArchitecture(
        name="Feature Extraction + Classifier",
        components=["Embedding Model", "Task-Specific Classifier"],
        flow="Raw Input -> Embeddings -> Classification",
        use_case="Transfer learning, Multi-task systems",
        benefits=["Reusable features", "Faster training", "Modularity"]
    )
]

print("Common Multi-Model Architecture Patterns:")
print("=" * 60)

for arch in architectures:
    print(f"\n{arch.name}")
    print(f"  Components: {' -> '.join(arch.components)}")
    print(f"  Flow: {arch.flow}")
    print(f"  Use Case: {arch.use_case}")
    print(f"  Benefits: {', '.join(arch.benefits)}")

---

## 4. Practical Examples: Problem Framing

Let's work through several real-world examples of problem framing.

In [None]:
# Example 1: E-commerce Product Recommendation

class ProblemFrame:
    """Complete problem framing for an ML system"""
    
    def __init__(self, name: str):
        self.name = name
        self.business_objective = ""
        self.ml_objective = ""
        self.input_spec = {}
        self.output_spec = {}
        self.architecture = ""
        self.models = []
        self.metrics = []
    
    def display(self):
        print(f"\n{'='*70}")
        print(f"PROBLEM FRAME: {self.name}")
        print(f"{'='*70}")
        print(f"\nBusiness Objective: {self.business_objective}")
        print(f"ML Objective: {self.ml_objective}")
        print(f"\nInput Specification:")
        for k, v in self.input_spec.items():
            print(f"  - {k}: {v}")
        print(f"\nOutput Specification:")
        for k, v in self.output_spec.items():
            print(f"  - {k}: {v}")
        print(f"\nArchitecture: {self.architecture}")
        print(f"Models: {', '.join(self.models)}")
        print(f"Evaluation Metrics: {', '.join(self.metrics)}")

# E-commerce recommendation
ecommerce = ProblemFrame("E-commerce Product Recommendation")
ecommerce.business_objective = "Increase revenue by showing relevant products"
ecommerce.ml_objective = "Predict purchase probability for user-product pairs, rank by expected revenue"
ecommerce.input_spec = {
    "user_id": "string",
    "current_page": "enum[home, category, product, cart]",
    "browsing_history": "List[product_id]",
    "user_segment": "string"
}
ecommerce.output_spec = {
    "product_ids": "List[string] (top-K products)",
    "scores": "List[float]",
    "explanation": "string (e.g., 'Based on your browsing')"
}
ecommerce.architecture = "Two-stage: Candidate Generation + Ranking"
ecommerce.models = ["Collaborative Filtering (candidates)", "Gradient Boosted Trees (ranking)"]
ecommerce.metrics = ["Precision@K", "Revenue per impression", "CTR"]

ecommerce.display()

In [None]:
# Example 2: Ride-Sharing ETA Prediction

eta_prediction = ProblemFrame("Ride-Sharing ETA Prediction")
eta_prediction.business_objective = "Provide accurate arrival time estimates to improve user experience"
eta_prediction.ml_objective = "Predict trip duration given origin, destination, and current conditions"
eta_prediction.input_spec = {
    "origin": "(latitude, longitude)",
    "destination": "(latitude, longitude)",
    "request_time": "datetime",
    "current_traffic": "traffic_data",
    "weather": "weather_data"
}
eta_prediction.output_spec = {
    "estimated_duration_seconds": "int",
    "confidence_interval": "(lower, upper)",
    "breakdown": "Dict[segment, duration]"
}
eta_prediction.architecture = "Single model with feature engineering"
eta_prediction.models = ["Gradient Boosted Trees or Neural Network"]
eta_prediction.metrics = ["MAE", "MAPE", "90th percentile error"]

eta_prediction.display()

In [None]:
# Example 3: Social Media Feed Ranking

feed_ranking = ProblemFrame("Social Media Feed Ranking")
feed_ranking.business_objective = "Maximize user engagement and time spent on platform"
feed_ranking.ml_objective = "Predict engagement score for each post, rank to maximize session engagement"
feed_ranking.input_spec = {
    "user_id": "string",
    "candidate_posts": "List[post_id]",
    "user_context": "Dict (device, time, session_length)",
    "social_graph": "Graph (friends, follows)"
}
feed_ranking.output_spec = {
    "ranked_posts": "List[post_id]",
    "predicted_engagement": "List[float]",
    "diversity_score": "float"
}
feed_ranking.architecture = "Multi-stage: Candidate Gen -> Ranking -> Diversity Reranking"
feed_ranking.models = ["Embedding model (candidates)", "Deep ranking model", "Diversity optimizer"]
feed_ranking.metrics = ["Engagement rate", "Time spent", "Content diversity"]

feed_ranking.display()

---

## 5. Problem Framing Decision Tree

Here's a systematic approach to framing ML problems.

In [None]:
# Problem Framing Decision Framework

def frame_problem(answers: Dict[str, str]) -> Dict[str, str]:
    """
    Given answers to key questions, suggest a problem framing.
    
    Questions:
    - output_type: What type of output is needed? (label, score, ranking, sequence)
    - item_count: How many items to process? (single, batch, millions)
    - latency: What's the latency requirement? (real-time, near-real-time, batch)
    - complexity: How complex is the task? (simple, moderate, complex)
    """
    
    recommendation = {}
    
    # Determine task type
    output_type = answers.get('output_type', 'label')
    if output_type == 'label':
        recommendation['task_type'] = 'Classification'
    elif output_type == 'score':
        recommendation['task_type'] = 'Regression'
    elif output_type == 'ranking':
        recommendation['task_type'] = 'Learning to Rank'
    elif output_type == 'sequence':
        recommendation['task_type'] = 'Sequence Generation'
    
    # Determine architecture
    item_count = answers.get('item_count', 'single')
    latency = answers.get('latency', 'real-time')
    
    if item_count == 'millions' and latency == 'real-time':
        recommendation['architecture'] = 'Two-stage (Retrieval + Ranking)'
        recommendation['reasoning'] = 'Large item space with real-time needs requires efficient retrieval'
    elif item_count == 'millions' and latency == 'batch':
        recommendation['architecture'] = 'Single model with batch processing'
        recommendation['reasoning'] = 'Batch processing allows more complex models'
    else:
        recommendation['architecture'] = 'Single model'
        recommendation['reasoning'] = 'Simple case, single model sufficient'
    
    # Model complexity
    complexity = answers.get('complexity', 'simple')
    if complexity == 'simple':
        recommendation['model_suggestion'] = 'Logistic Regression, Decision Trees'
    elif complexity == 'moderate':
        recommendation['model_suggestion'] = 'Gradient Boosted Trees, Random Forest'
    else:
        recommendation['model_suggestion'] = 'Neural Networks, Transformers'
    
    return recommendation

# Example usage
scenarios = [
    {
        'name': 'Email Spam Detection',
        'answers': {'output_type': 'label', 'item_count': 'single', 'latency': 'real-time', 'complexity': 'moderate'}
    },
    {
        'name': 'Product Recommendation',
        'answers': {'output_type': 'ranking', 'item_count': 'millions', 'latency': 'real-time', 'complexity': 'complex'}
    },
    {
        'name': 'Price Prediction',
        'answers': {'output_type': 'score', 'item_count': 'single', 'latency': 'batch', 'complexity': 'moderate'}
    }
]

print("Problem Framing Recommendations:")
print("=" * 60)

for scenario in scenarios:
    result = frame_problem(scenario['answers'])
    print(f"\n{scenario['name']}:")
    print(f"  Task Type: {result['task_type']}")
    print(f"  Architecture: {result['architecture']}")
    print(f"  Reasoning: {result['reasoning']}")
    print(f"  Suggested Models: {result['model_suggestion']}")

---

## 6. Hands-On Exercise

Practice framing ML problems for the following scenarios.

In [None]:
# Exercise: Frame these problems

exercises = [
    {
        "scenario": "Music Streaming Service",
        "business_goal": "Help users discover new music they'll love",
        "hints": ["Similar to video recommendation", "Consider both discovery and familiar songs"]
    },
    {
        "scenario": "Job Matching Platform",
        "business_goal": "Connect job seekers with relevant opportunities",
        "hints": ["Two-sided marketplace", "Both jobs to users and users to jobs"]
    },
    {
        "scenario": "Smart Home Assistant",
        "business_goal": "Understand and respond to voice commands",
        "hints": ["Multi-stage: ASR -> NLU -> Action", "Consider latency requirements"]
    }
]

print("Problem Framing Exercises")
print("=" * 60)

for i, ex in enumerate(exercises, 1):
    print(f"\nExercise {i}: {ex['scenario']}")
    print(f"Business Goal: {ex['business_goal']}")
    print(f"Hints: {', '.join(ex['hints'])}")
    print("\nYour Task:")
    print("  1. Define the ML objective")
    print("  2. Specify input/output")
    print("  3. Choose single vs multi-model")
    print("  4. Suggest evaluation metrics")

In [None]:
# Template for your solutions

# Uncomment and complete:

# music_rec = ProblemFrame("Music Streaming Recommendation")
# music_rec.business_objective = "..."
# music_rec.ml_objective = "..."
# music_rec.input_spec = {...}
# music_rec.output_spec = {...}
# music_rec.architecture = "..."
# music_rec.models = [...]
# music_rec.metrics = [...]
# music_rec.display()

print("Complete the exercises above by uncommenting and filling in the template.")

---

## Summary

### Key Takeaways

1. **Business to ML Translation**: Always start by translating business objectives to measurable ML objectives

2. **I/O Specification**: Clearly define what goes in and what comes out
   - Input schema (types, formats)
   - Output schema (types, formats)
   - Context requirements

3. **Architecture Choice**:
   - **Single Model**: Simpler, lower latency, easier to maintain
   - **Multi-Model**: More flexible, scalable, better for complex tasks
   - Common patterns: Two-stage retrieval, Cascade, Ensemble

4. **Framing Process**:
   - Identify output type (label, score, ranking, sequence)
   - Consider scale (items, latency, complexity)
   - Choose appropriate architecture
   - Define evaluation metrics

### Common Patterns

| Problem Type | Typical Framing |
|--------------|----------------|
| Recommendation | Two-stage retrieval + ranking |
| Content Moderation | Cascade classification |
| Search | Query understanding + Retrieval + Ranking |
| Fraud Detection | Multi-model ensemble |

### Next Steps

In the next tutorial, we'll dive deep into **ML Categories** - understanding the different types of ML tasks and when to use each.

---

In [None]:
# Quick Quiz
quiz = [
    ("What should you do FIRST when framing an ML problem?", "A",
     ["A) Translate business objective to ML objective", "B) Choose a model", 
      "C) Collect data", "D) Set up infrastructure"]),
    ("When should you use a two-stage architecture?", "C",
     ["A) When you have little data", "B) When latency doesn't matter",
      "C) When searching over millions of items in real-time", "D) Always"]),
    ("What's the main benefit of multi-model architectures?", "B",
     ["A) Lower latency", "B) Better scalability and flexibility",
      "C) Easier debugging", "D) Lower cost"])
]

print("Quick Self-Assessment")
print("=" * 50)
for i, (q, a, opts) in enumerate(quiz, 1):
    print(f"\nQ{i}: {q}")
    for opt in opts:
        print(f"   {opt}")

print("\n" + "=" * 50)
print("Answers: 1-A, 2-C, 3-B")