# Lesson 1: Model Selection Strategies

**Module 4: Model Development & Optimization**  
**Estimated Time**: 1-2 hours  
**Difficulty**: Beginner-Intermediate

---

## ðŸŽ¯ Learning Objectives

By the end of this lesson, you will:

âœ… Understand how to systematically select the right model architecture  
âœ… Learn the "Occam's Razor" of ML: Start Simple  
âœ… Compare Linear vs. Tree-based vs. Deep Learning models  
âœ… Master the trade-offs: Interpretability vs. Accuracy vs. Latency  
âœ… Answer interview questions on model selection logic  

---

## ðŸ“š Table of Contents

1. [The Hierarchy of Complexity](#1-hierarchy)
2. [Trade-off Triangle: Speed, Accuracy, Interpretability](#2-tradeoffs)
3. [Hands-On: Benchmarking Candidates](#3-hands-on)
4. [Interview Preparation](#4-interview-questions)

---

## 1. The Hierarchy of Complexity

When faced with a new problem, **never** start with a Transformer or Deep Neural Network. Follow this hierarchy:

### Level 1: The Baselines (Sanity Check)
- **Mean/Mode**: Predicting the average value or most frequent class. 
- **Heuristics**: "If price < $10, buy it".
- **Why?**: If your fancy model can't beat the average, it is useless.

### Level 2: The Interpretable Models
- **Linear/Logistic Regression**
- **Decision Trees** (Depth < 5)
- **Why?**: Business stakeholders often need to know *why* a decision was made. Linear weights give you exact feature impact.

### Level 3: The Workhorses (Tabular SOTA)
- **XGBoost / LightGBM / CatBoost**
- **Random Forests**
- **Why?**: For structured (tabular) data, Gradient Boosted Trees are typically State-of-the-Art (SOTA). They handle non-linearities and interactions well.

### Level 4: The Deep Learners (Unstructured SOTA)
- **ResNet / EfficientNet** (Images)
- **BERT / GPT** (Text)
- **Why?**: Only use these for images, audio, text, or when you have millions of tabular rows and need to squeeze out the last 0.1% accuracy.

## 2. Trade-off Triangle

You can rarely optimize all three. Pick two:

1. **Accuracy**: How well does it predict?
2. **Latency (Speed)**: How fast does it predict? (Critical for real-time API)
3. **Interpretability**: Can I explain it to a human?

| Model | Accuracy | Latency | Interpretability |
|-------|----------|---------|------------------|
| Logistic Reg | Low | Very Low (<1ms) | High |
| XGBoost | High | Low (~10ms) | Medium (SHAP) |
| Transformer | Very High | High (>100ms) | Low |

## 3. Hands-On: Benchmarking Candidates

Let's compare a Logistic Regression vs XGBoost on a classification problem.

In [None]:
import time
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# 1. Create Data (10k rows)
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# 2. Candidate 1: Logistic Regression
start = time.time()
lr = LogisticRegression()
lr.fit(X_train, y_train)
train_time_lr = time.time() - start

start = time.time()
acc_lr = accuracy_score(y_test, lr.predict(X_test))
infer_time_lr = (time.time() - start) / len(X_test) * 1000 # ms per sample

# 3. Candidate 2: Random Forest
start = time.time()
rf = RandomForestClassifier(n_estimators=100)
rf.fit(X_train, y_train)
train_time_rf = time.time() - start

start = time.time()
acc_rf = accuracy_score(y_test, rf.predict(X_test))
infer_time_rf = (time.time() - start) / len(X_test) * 1000 # ms per sample

print(f"{'Model':<15} {'Accuracy':<10} {'Train Time(s)':<15} {'Infer Latency(ms)':<20}")
print("-"*60)
print(f"{'Logistic':<15} {acc_lr:<10.3f} {train_time_lr:<15.4f} {infer_time_lr:<20.4f}")
print(f"{'RandomForest':<15} {acc_rf:<10.3f} {train_time_rf:<15.4f} {infer_time_rf:<20.4f}")

print("\nDecision: Is the extra accuracy of RF worth the increased latency?")

## 4. Interview Preparation

### Common Questions

#### Q1: "How do you choose a model for a new project?"
**Answer Framework**:
1. **Baseline**: "I always start with a heuristic or simple baseline (Logistic Regression) to establish a performance floor."
2. **Constraint Check**: "I check deployment constraints. Do we need <10ms latency? Does it need to run on a phone? This filters out massive models."
3. **Complexity Ladder**: "I then move to XGBoost. If that isn't enough, and I have unstructured data/massive scale, I consider Deep Learning."

#### Q2: "When would you choose a Decision Tree over a Neural Network?"
**Answer**: "When Interpretability is key (e.g., Credit Scoring regulation requires explaining rejections), or when data is small/tabular. Neural Networks are overkill for small tabular data and are 'black boxes' by default."