# 06 - Model Development: Selection and Training

---

## What the Chapter Says

The chapter covers **Model Development** with these components:

### F1) Model Selection Process
1. **Baseline** (e.g., most popular videos)
2. **Simple models** (e.g., logistic regression)
3. **Complex models** (e.g., deep neural networks)
4. **Ensemble** options: bagging, boosting, stacking

### F2) Model Training Topics
- Dataset construction (5 steps)
- Labels: hand labeling vs natural labeling
- Class imbalance handling
- Loss function selection
- Training from scratch vs fine-tuning
- Distributed training: data parallelism vs model parallelism

---

## Meta Interview Signal

| Level | Expectations |
|-------|-------------|
| **E5** | Knows model selection progression. Can implement baseline → simple → complex. Understands class imbalance mitigation. |
| **E6** | Proposes ensemble strategies. Discusses distributed training at scale. Designs label collection pipelines. Considers feedback loops and iteration velocity. |

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
import warnings
warnings.filterwarnings('ignore')

np.random.seed(42)

---

## F1) Model Selection Process (Chapter Content)

The chapter specifies this exact progression:

```
Baseline → Simple Models → Complex Models → Ensemble
```

In [None]:
# Visualize model selection progression
fig, ax = plt.subplots(figsize=(14, 5))
ax.axis('off')
ax.set_title('Model Selection Progression (Chapter)', fontsize=14, fontweight='bold')

stages = [
    ('BASELINE', 1, '#BBDEFB', 'Most popular videos\nRandom prediction\nHeuristic rules'),
    ('SIMPLE', 4.5, '#C8E6C9', 'Logistic Regression\nLinear Regression\nNaive Bayes'),
    ('COMPLEX', 8, '#FFF9C4', 'Decision Trees\nGBDT / Random Forest\nNeural Networks'),
    ('ENSEMBLE', 11.5, '#FFCCBC', 'Bagging\nBoosting\nStacking'),
]

for (label, x, color, examples) in stages:
    rect = mpatches.FancyBboxPatch((x, 1), 2.5, 3, boxstyle='round,pad=0.1',
                                    facecolor=color, edgecolor='black', linewidth=2)
    ax.add_patch(rect)
    ax.text(x + 1.25, 3.5, label, ha='center', va='center', fontsize=12, fontweight='bold')
    ax.text(x + 1.25, 2, examples, ha='center', va='center', fontsize=9)

# Arrows
for x in [3.5, 7, 10.5]:
    ax.annotate('', xy=(x + 1, 2.5), xytext=(x, 2.5),
               arrowprops=dict(arrowstyle='->', color='black', lw=2))

ax.set_xlim(0, 15)
ax.set_ylim(0, 5)
plt.tight_layout()
plt.show()

In [None]:
# Chapter's model options list
model_options = pd.DataFrame({
    'Model': [
        'Logistic Regression',
        'Linear Regression',
        'Decision Trees',
        'GBDT / Random Forests',
        'SVM',
        'Naive Bayes',
        'Factorization Machines',
        'Neural Networks'
    ],
    'Type': [
        'Simple (Classification)',
        'Simple (Regression)',
        'Simple/Medium',
        'Complex (Ensemble)',
        'Simple/Medium',
        'Simple',
        'Complex',
        'Complex'
    ],
    'Use Case': [
        'Binary/multiclass classification, baseline for CTR',
        'Continuous output, watch time prediction',
        'Interpretable, handles non-linear',
        'Tabular data, feature interactions',
        'Small data, high-dimensional',
        'Text classification, fast training',
        'Recommendations, sparse features',
        'Complex patterns, multimodal data'
    ]
})

print("="*80)
print("MODEL OPTIONS LIST (Chapter Content)")
print("="*80)
print(model_options.to_string(index=False))

In [None]:
# Model selection considerations (from chapter)
considerations = pd.DataFrame({
    'Consideration': [
        'Data needs',
        'Training speed',
        'Hyperparameters / Tuning',
        'Continual learning',
        'Compute requirements',
        'Interpretability'
    ],
    'Description': [
        'How much data does the model need to perform well?',
        'How fast can we iterate? Training time matters.',
        'How many hyperparameters? How hard to tune?',
        'Can the model be updated incrementally?',
        'CPU vs GPU? Memory requirements?',
        'Can we explain predictions to users/regulators?'
    ],
    'Simple Models': [
        'Low - works with small datasets',
        'Fast - minutes to hours',
        'Few - easy to tune',
        'Often yes (online learning)',
        'Low - CPU sufficient',
        'High - coefficients are meaningful'
    ],
    'Complex Models': [
        'High - needs large datasets',
        'Slow - hours to days',
        'Many - requires extensive tuning',
        'Sometimes (fine-tuning)',
        'High - GPU required',
        'Low - black box'
    ]
})

print("\n" + "="*100)
print("MODEL SELECTION CONSIDERATIONS (Chapter Content)")
print("="*100)
print(considerations.to_string(index=False))

---

## Hands-On: Model Selection Progression

In [None]:
# Create synthetic CTR dataset
np.random.seed(42)
n = 10000

# Features
data = pd.DataFrame({
    'user_engagement_score': np.random.beta(2, 5, n),
    'item_popularity': np.random.beta(2, 3, n),
    'time_on_platform_days': np.random.exponential(30, n),
    'num_past_clicks': np.random.poisson(5, n),
    'is_weekend': np.random.choice([0, 1], n, p=[0.71, 0.29]),
    'device_mobile': np.random.choice([0, 1], n, p=[0.35, 0.65]),
})

# Target: CTR (click = 1, no click = 0)
# True relationship with some noise
prob = 1 / (1 + np.exp(-(
    -2 + 
    3 * data['user_engagement_score'] + 
    2 * data['item_popularity'] + 
    0.01 * data['num_past_clicks'] +
    0.5 * data['device_mobile'] +
    np.random.normal(0, 0.5, n)
)))
data['clicked'] = (np.random.random(n) < prob).astype(int)

# Create class imbalance (realistic CTR ~3%)
imbalance_mask = np.random.random(n) < 0.15
data.loc[~imbalance_mask & (data['clicked'] == 1), 'clicked'] = 0

print("SYNTHETIC CTR DATASET")
print("="*60)
print(f"Shape: {data.shape}")
print(f"Click rate: {data['clicked'].mean()*100:.2f}%")
print(f"\nFeatures: {list(data.columns[:-1])}")
print(f"\nSample:")
print(data.head())

In [None]:
# Prepare data
X = data.drop('clicked', axis=1)
y = data['clicked']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

print(f"Training set: {len(X_train)} samples")
print(f"Test set: {len(X_test)} samples")
print(f"\nClick distribution - Train: {y_train.mean()*100:.2f}%, Test: {y_test.mean()*100:.2f}%")

In [None]:
# Step 1: BASELINE (from chapter)
print("="*60)
print("STEP 1: BASELINE")
print("="*60)

# Baseline 1: Always predict majority class (no click)
baseline_always_zero = np.zeros(len(y_test))

# Baseline 2: Random prediction based on class distribution
click_rate = y_train.mean()
baseline_random = (np.random.random(len(y_test)) < click_rate).astype(int)

# Baseline 3: Most popular items heuristic (item_popularity > threshold)
threshold = X_train['item_popularity'].quantile(0.7)
baseline_popular = (X_test['item_popularity'] > threshold).astype(int)

print("\nBaseline 1: Always predict 'no click'")
print(f"  Accuracy: {accuracy_score(y_test, baseline_always_zero):.4f}")
print(f"  Precision: {precision_score(y_test, baseline_always_zero, zero_division=0):.4f}")
print(f"  Recall: {recall_score(y_test, baseline_always_zero, zero_division=0):.4f}")

print("\nBaseline 2: Random (based on training click rate)")
print(f"  Accuracy: {accuracy_score(y_test, baseline_random):.4f}")
print(f"  Precision: {precision_score(y_test, baseline_random):.4f}")
print(f"  Recall: {recall_score(y_test, baseline_random):.4f}")

print("\nBaseline 3: Popularity heuristic (high popularity → click)")
print(f"  Accuracy: {accuracy_score(y_test, baseline_popular):.4f}")
print(f"  Precision: {precision_score(y_test, baseline_popular):.4f}")
print(f"  Recall: {recall_score(y_test, baseline_popular):.4f}")

In [None]:
# Step 2: SIMPLE MODELS (from chapter)
print("\n" + "="*60)
print("STEP 2: SIMPLE MODELS")
print("="*60)

# Logistic Regression
lr = LogisticRegression(max_iter=1000)
lr.fit(X_train, y_train)
lr_pred = lr.predict(X_test)
lr_proba = lr.predict_proba(X_test)[:, 1]

print("\nLogistic Regression:")
print(f"  Accuracy: {accuracy_score(y_test, lr_pred):.4f}")
print(f"  Precision: {precision_score(y_test, lr_pred):.4f}")
print(f"  Recall: {recall_score(y_test, lr_pred):.4f}")
print(f"  ROC-AUC: {roc_auc_score(y_test, lr_proba):.4f}")

# Decision Tree (simple version)
dt = DecisionTreeClassifier(max_depth=5, random_state=42)
dt.fit(X_train, y_train)
dt_pred = dt.predict(X_test)
dt_proba = dt.predict_proba(X_test)[:, 1]

print("\nDecision Tree (max_depth=5):")
print(f"  Accuracy: {accuracy_score(y_test, dt_pred):.4f}")
print(f"  Precision: {precision_score(y_test, dt_pred):.4f}")
print(f"  Recall: {recall_score(y_test, dt_pred):.4f}")
print(f"  ROC-AUC: {roc_auc_score(y_test, dt_proba):.4f}")

In [None]:
# Step 3: COMPLEX MODELS (from chapter)
print("\n" + "="*60)
print("STEP 3: COMPLEX MODELS")
print("="*60)

# Random Forest
rf = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)
rf.fit(X_train, y_train)
rf_pred = rf.predict(X_test)
rf_proba = rf.predict_proba(X_test)[:, 1]

print("\nRandom Forest (100 trees):")
print(f"  Accuracy: {accuracy_score(y_test, rf_pred):.4f}")
print(f"  Precision: {precision_score(y_test, rf_pred):.4f}")
print(f"  Recall: {recall_score(y_test, rf_pred):.4f}")
print(f"  ROC-AUC: {roc_auc_score(y_test, rf_proba):.4f}")

# Gradient Boosting
gbdt = GradientBoostingClassifier(n_estimators=100, max_depth=5, random_state=42)
gbdt.fit(X_train, y_train)
gbdt_pred = gbdt.predict(X_test)
gbdt_proba = gbdt.predict_proba(X_test)[:, 1]

print("\nGradient Boosting (GBDT):")
print(f"  Accuracy: {accuracy_score(y_test, gbdt_pred):.4f}")
print(f"  Precision: {precision_score(y_test, gbdt_pred):.4f}")
print(f"  Recall: {recall_score(y_test, gbdt_pred):.4f}")
print(f"  ROC-AUC: {roc_auc_score(y_test, gbdt_proba):.4f}")

In [None]:
# Step 4: ENSEMBLE (from chapter - bagging, boosting, stacking)
print("\n" + "="*60)
print("STEP 4: ENSEMBLE METHODS")
print("="*60)

print("""
Ensemble Types (from chapter):

1. BAGGING (Bootstrap Aggregating)
   - Train multiple models on random subsets of data
   - Combine by voting (classification) or averaging (regression)
   - Example: Random Forest

2. BOOSTING
   - Train models sequentially, focusing on errors of previous models
   - Each model learns from mistakes of predecessors
   - Example: GBDT, XGBoost, LightGBM

3. STACKING
   - Train multiple diverse models
   - Use a meta-model to combine their predictions
   - Leverages strengths of different model types
""")

# Simple stacking example
print("\nSimple Stacking Example:")

# Level 1: Base models
stacking_features = np.column_stack([
    lr_proba,
    dt_proba,
    rf_proba,
    gbdt_proba
])

# Simple averaging (a form of stacking)
avg_proba = stacking_features.mean(axis=1)
avg_pred = (avg_proba > 0.5).astype(int)

print("Average Ensemble (LR + DT + RF + GBDT):")
print(f"  Accuracy: {accuracy_score(y_test, avg_pred):.4f}")
print(f"  Precision: {precision_score(y_test, avg_pred):.4f}")
print(f"  Recall: {recall_score(y_test, avg_pred):.4f}")
print(f"  ROC-AUC: {roc_auc_score(y_test, avg_proba):.4f}")

In [None]:
# Summary comparison
results = pd.DataFrame({
    'Model': ['Baseline (popular)', 'Logistic Regression', 'Decision Tree', 
              'Random Forest', 'GBDT', 'Ensemble (avg)'],
    'Accuracy': [
        accuracy_score(y_test, baseline_popular),
        accuracy_score(y_test, lr_pred),
        accuracy_score(y_test, dt_pred),
        accuracy_score(y_test, rf_pred),
        accuracy_score(y_test, gbdt_pred),
        accuracy_score(y_test, avg_pred)
    ],
    'Precision': [
        precision_score(y_test, baseline_popular),
        precision_score(y_test, lr_pred),
        precision_score(y_test, dt_pred),
        precision_score(y_test, rf_pred),
        precision_score(y_test, gbdt_pred),
        precision_score(y_test, avg_pred)
    ],
    'Recall': [
        recall_score(y_test, baseline_popular),
        recall_score(y_test, lr_pred),
        recall_score(y_test, dt_pred),
        recall_score(y_test, rf_pred),
        recall_score(y_test, gbdt_pred),
        recall_score(y_test, avg_pred)
    ],
    'ROC-AUC': [
        roc_auc_score(y_test, baseline_popular),
        roc_auc_score(y_test, lr_proba),
        roc_auc_score(y_test, dt_proba),
        roc_auc_score(y_test, rf_proba),
        roc_auc_score(y_test, gbdt_proba),
        roc_auc_score(y_test, avg_proba)
    ]
})

print("\n" + "="*60)
print("MODEL COMPARISON SUMMARY")
print("="*60)
print(results.round(4).to_string(index=False))

---

## F2) Dataset Construction (Chapter - 5 Steps)

In [None]:
# Chapter's 5 dataset construction steps
dataset_steps = pd.DataFrame({
    'Step': ['1', '2', '3', '4', '5'],
    'Action': [
        'Collect raw data',
        'Identify features & labels',
        'Select sampling strategy',
        'Split data',
        'Address class imbalance'
    ],
    'Details': [
        'Gather from logs, databases, APIs, external sources',
        'What predicts the target? What is the target?',
        'Random, stratified, time-based, importance sampling',
        'Train/validation/test split, avoid leakage',
        'Resampling, loss weighting, etc.'
    ]
})

print("="*70)
print("DATASET CONSTRUCTION STEPS (Chapter)")
print("="*70)
print(dataset_steps.to_string(index=False))

---

## Labels: Hand Labeling vs Natural Labeling (Chapter Content)

In [None]:
# Chapter's labeling types
labeling_types = pd.DataFrame({
    'Type': ['Hand Labeling', 'Natural Labeling'],
    'Description': [
        'Human annotators manually label data',
        'Labels derived from user behavior/system events'
    ],
    'Example': [
        'Annotators label images as "cat" or "dog"',
        'User liked post → label=1, else label=0 (feed relevance)'
    ],
    'Pros': [
        'High quality, flexible definition',
        'Cheap, scalable, real-time'
    ],
    'Cons': [
        'Expensive, slow, limited scale',
        'Noisy, may not capture true intent'
    ]
})

print("="*80)
print("LABELING TYPES (Chapter Content)")
print("="*80)
print(labeling_types.to_string(index=False))

print("\n" + "-"*60)
print("CHAPTER EXAMPLE: Natural Label for Feed Relevance")
print("-"*60)
print("""
Input: (user, post)
Label: 1 if user liked the post, else 0

This is a 'natural' label because:
- No human annotation required
- Derived directly from user behavior
- Scales to billions of examples
""")

---

## Class Imbalance (Chapter Content)

In [None]:
# Class imbalance mitigation strategies from chapter
print("="*70)
print("CLASS IMBALANCE MITIGATION (Chapter Content)")
print("="*70)

print("""
Problem: Majority class (no click) >> Minority class (click)

Chapter Mitigation Strategies:

1. RESAMPLING:
   - Oversample minority: Duplicate minority class samples
   - Undersample majority: Remove majority class samples

2. ALTER LOSS FUNCTION:
   - Weight minority more: Give higher loss weight to minority class
   - Focal loss: Focus on hard examples
   - Class-balanced loss: Inverse frequency weighting
""")

In [None]:
# Demonstrate class imbalance handling
print("Original class distribution:")
print(y_train.value_counts())
print(f"Imbalance ratio: {(y_train==0).sum() / (y_train==1).sum():.1f}:1")

In [None]:
# Method 1: Resampling - Oversample minority
print("\n" + "-"*40)
print("METHOD 1: Oversample Minority")
print("-"*40)

from sklearn.utils import resample

# Separate classes
X_majority = X_train[y_train == 0]
X_minority = X_train[y_train == 1]
y_majority = y_train[y_train == 0]
y_minority = y_train[y_train == 1]

# Oversample minority
X_minority_upsampled, y_minority_upsampled = resample(
    X_minority, y_minority,
    replace=True,
    n_samples=len(X_majority),
    random_state=42
)

X_balanced = pd.concat([X_majority, X_minority_upsampled])
y_balanced = pd.concat([y_majority, y_minority_upsampled])

print(f"After oversampling:")
print(y_balanced.value_counts())

# Train and evaluate
lr_balanced = LogisticRegression(max_iter=1000)
lr_balanced.fit(X_balanced, y_balanced)
lr_balanced_pred = lr_balanced.predict(X_test)
lr_balanced_proba = lr_balanced.predict_proba(X_test)[:, 1]

print(f"\nLogistic Regression with Oversampling:")
print(f"  Recall: {recall_score(y_test, lr_balanced_pred):.4f} (was {recall_score(y_test, lr_pred):.4f})")
print(f"  Precision: {precision_score(y_test, lr_balanced_pred):.4f}")

In [None]:
# Method 2: Class Weights
print("\n" + "-"*40)
print("METHOD 2: Class Weights (Alter Loss)")
print("-"*40)

# Using class_weight='balanced' automatically weights by inverse frequency
lr_weighted = LogisticRegression(max_iter=1000, class_weight='balanced')
lr_weighted.fit(X_train, y_train)
lr_weighted_pred = lr_weighted.predict(X_test)
lr_weighted_proba = lr_weighted.predict_proba(X_test)[:, 1]

print(f"Logistic Regression with Class Weights:")
print(f"  Recall: {recall_score(y_test, lr_weighted_pred):.4f} (was {recall_score(y_test, lr_pred):.4f})")
print(f"  Precision: {precision_score(y_test, lr_weighted_pred):.4f}")
print(f"  F1: {f1_score(y_test, lr_weighted_pred):.4f}")

print("\n[Chapter Mentions]: focal loss and class-balanced loss for deep learning")

---

## Training: Scratch vs Fine-tuning, Distributed Training (Chapter Content)

In [None]:
print("="*70)
print("TRAINING APPROACHES (Chapter Content - High Level)")
print("="*70)

print("""
TRAINING FROM SCRATCH vs FINE-TUNING:
--------------------------------------
• Training from scratch: Initialize weights randomly, train on your data
  - Use when: Lots of domain-specific data, unique task
  
• Fine-tuning: Start from pre-trained weights, adapt to your task
  - Use when: Limited data, task similar to pre-training objective
  - Examples: BERT for text, ResNet for images


DISTRIBUTED TRAINING:
---------------------
Two main approaches:

1. DATA PARALLELISM:
   - Same model copied to multiple workers
   - Each worker trains on different data shard
   - Gradients are aggregated
   - Use when: Model fits in single GPU memory

2. MODEL PARALLELISM:
   - Model split across multiple workers
   - Each worker holds part of the model
   - Use when: Model too large for single GPU (e.g., GPT-3)
""")

In [None]:
# Visual: Data Parallelism vs Model Parallelism
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Data Parallelism
ax1 = axes[0]
ax1.axis('off')
ax1.set_title('Data Parallelism', fontsize=14, fontweight='bold')

# Data shards
for i in range(3):
    rect = mpatches.FancyBboxPatch((0.5, 3-i*1), 1.5, 0.8, boxstyle='round,pad=0.1',
                                    facecolor='#BBDEFB', edgecolor='black', linewidth=1)
    ax1.add_patch(rect)
    ax1.text(1.25, 3.4-i*1, f'Data Shard {i+1}', ha='center', va='center', fontsize=9)

# Model copies
for i in range(3):
    ax1.annotate('', xy=(3, 3.4-i*1), xytext=(2, 3.4-i*1),
                arrowprops=dict(arrowstyle='->', color='gray', lw=1))
    rect = mpatches.FancyBboxPatch((3, 3-i*1), 1.5, 0.8, boxstyle='round,pad=0.1',
                                    facecolor='#C8E6C9', edgecolor='black', linewidth=1)
    ax1.add_patch(rect)
    ax1.text(3.75, 3.4-i*1, f'Model Copy', ha='center', va='center', fontsize=9)

# Gradient aggregation
for i in range(3):
    ax1.annotate('', xy=(5.75, 2.4), xytext=(4.5, 3.4-i*1),
                arrowprops=dict(arrowstyle='->', color='gray', lw=1))

rect = mpatches.FancyBboxPatch((5.5, 2), 2, 0.8, boxstyle='round,pad=0.1',
                                facecolor='#FFCCBC', edgecolor='black', linewidth=2)
ax1.add_patch(rect)
ax1.text(6.5, 2.4, 'Aggregate\nGradients', ha='center', va='center', fontsize=9)

ax1.set_xlim(0, 8)
ax1.set_ylim(0, 5)

# Model Parallelism
ax2 = axes[1]
ax2.axis('off')
ax2.set_title('Model Parallelism', fontsize=14, fontweight='bold')

# Input
rect = mpatches.FancyBboxPatch((0.5, 2), 1.5, 1, boxstyle='round,pad=0.1',
                                facecolor='#BBDEFB', edgecolor='black', linewidth=1)
ax2.add_patch(rect)
ax2.text(1.25, 2.5, 'Input', ha='center', va='center', fontsize=10)

# Model parts on different GPUs
parts = [('GPU 1\nLayers 1-10', 2.5, '#C8E6C9'), 
         ('GPU 2\nLayers 11-20', 4.5, '#FFF9C4'),
         ('GPU 3\nLayers 21-30', 6.5, '#E1BEE7')]

for i, (label, x, color) in enumerate(parts):
    if i > 0:
        ax2.annotate('', xy=(x, 2.5), xytext=(x-1.5, 2.5),
                    arrowprops=dict(arrowstyle='->', color='black', lw=2))
    else:
        ax2.annotate('', xy=(x, 2.5), xytext=(2, 2.5),
                    arrowprops=dict(arrowstyle='->', color='black', lw=2))
    rect = mpatches.FancyBboxPatch((x, 2), 1.3, 1, boxstyle='round,pad=0.1',
                                    facecolor=color, edgecolor='black', linewidth=1)
    ax2.add_patch(rect)
    ax2.text(x+0.65, 2.5, label, ha='center', va='center', fontsize=8)

ax2.set_xlim(0, 8.5)
ax2.set_ylim(0, 5)

plt.tight_layout()
plt.show()

---

## F3) Model Dev Talking Points Checklist (Chapter Content)

In [None]:
model_dev_checklist = pd.DataFrame({
    'Topic': [
        'Model suitability tradeoffs',
        'Labels quality & feedback loops',
        'Imbalance handling',
        'Overfitting/underfitting',
        'Continual learning cadence'
    ],
    'Questions to Address': [
        'Training time? Data needs? Compute? Latency? On-device? Interpretability?',
        'How are labels obtained? Time-to-label for natural labels?',
        'Class ratio? Resampling or loss weighting?',
        'Bias/variance tradeoff? Regularization?',
        'Retrain daily/weekly/monthly? Online learning?'
    ],
    'E5 Answer': [
        'GBDT is fast to train, interpretable, works well on tabular data',
        'Natural labels from clicks, 1-day delay for label collection',
        '10:1 imbalance, using class_weight to handle',
        'Added L2 regularization to prevent overfitting',
        'Retrain weekly with fresh data'
    ],
    'E6 Addition': [
        'Latency budget is 10ms, so we cache embeddings and use lightweight model',
        'Feedback loop: today\'s predictions affect tomorrow\'s labels, need to monitor',
        'Focal loss focuses on hard negatives, better than oversampling at scale',
        'Early stopping on validation set, monitor calibration',
        'Continuous learning with streaming data, A/B test before full rollout'
    ]
})

print("="*100)
print("MODEL DEVELOPMENT TALKING POINTS CHECKLIST (Chapter)")
print("="*100)
for _, row in model_dev_checklist.iterrows():
    print(f"\n{row['Topic']}")
    print(f"  Q: {row['Questions to Address']}")
    print(f"  E5: {row['E5 Answer']}")
    print(f"  E6: {row['E6 Addition']}")

---

## Tradeoffs (Chapter-Aligned)

| Tradeoff | Discussion | Interview Signal |
|----------|------------|------------------|
| **Simple vs Complex** | Interpretability vs performance | E5: Knows progression. E6: Justifies model choice for use case |
| **Hand vs Natural Labels** | Quality vs scale | E5: Understands both. E6: Discusses label noise handling |
| **Oversample vs Weight Loss** | Data augmentation vs loss function | E5: Can implement. E6: Discusses pros/cons at scale |
| **Data vs Model Parallelism** | Scale compute vs scale model | E5: Knows difference. E6: Proposes for specific model size |

---

## Meta Interview Signal (Detailed)

### E5 Answer Expectations

- Follows baseline → simple → complex progression
- Implements class imbalance mitigation
- Understands hand vs natural labeling
- Can explain model selection considerations

### E6 Additions

- **Iteration velocity**: "We need fast experimentation, so we start with GBDT before trying neural nets"
- **Feedback loops**: "User behavior changes based on model predictions - we need to monitor for distribution shift"
- **Scale**: "At Meta scale, we use data parallelism across 1000s of GPUs with synchronized SGD"
- **Label quality**: "We sample 1% of natural labels for human review to estimate label noise"

---

## Interview Drills

### Drill 1: Model Progression
For CTR prediction, walk through:
- What baseline would you use?
- What simple model would you try first?
- What complex model would you consider?

### Drill 2: Class Imbalance
You have a fraud detection dataset with 0.1% fraud rate. Explain:
- Why is this problematic?
- What two approaches would you try?
- What metrics would you use (not accuracy)?

### Drill 3: Labeling Strategy
For each task, decide hand labeling vs natural labeling:
- Feed ranking (relevant posts)
- Harmful content detection
- Image classification (objects)

### Drill 4: Distributed Training
You need to train a model with:
- 10B training examples
- 500M parameters
Which parallelism strategy? Why?

### Drill 5: Model Dev Checklist
Walk through all 5 talking points for a video recommendation system.