# Model Selection, Underfitting, Overfitting, and the Bias–Variance Tradeoff

---

## 1. What Does “Best Model” Mean?

A **best model** is **NOT** the model that fits training data perfectly.

A best model is one that:
- Learns the true underlying pattern
- Performs well on **unseen data**
- Generalizes beyond training samples

Formally, we want to minimize **generalization error**, not training error.

---

## 2. Model Complexity

Model complexity refers to:
- Number of parameters
- Flexibility of the model
- Ability to fit complicated patterns

Examples:
- Linear regression → low complexity
- Polynomial regression (high degree) → high complexity
- Deep neural networks → very high complexity

---

## 3. Underfitting

### Definition

A model **underfits** when it is **too simple** to capture the true relationship in data.

---

### Mathematical View

True relationship:
$$
y = f(x)
$$

Model approximation:
$$
\hat{y} = g(x)
$$

If:
$$
g(x) \neq f(x)
$$

because the model is too simple → **underfitting**

---

### Characteristics of Underfitting

- High training error
- High test error
- Model ignores important patterns
- Bias is high

---

### Example

Trying to fit a straight line to quadratic data:
$$
y = 3x^2 + 2x + 1
$$

Using:
$$
y = \beta_0 + \beta_1 x
$$

This **cannot work**, no matter how much data you have.

---

## 4. Overfitting

### Definition

A model **overfits** when it learns:
- Noise
- Random fluctuations
- Training-specific patterns

instead of the true underlying function.

---

### Mathematical View

Model fits:
$$
y = f(x) + \epsilon
$$

instead of:
$$
y = f(x)
$$

where:
$$
\epsilon = \text{noise}
$$

---

### Characteristics of Overfitting

- Very low training error
- High test error
- Model becomes unstable
- Variance is high

---

### Example

Using a very high-degree polynomial:

$$
y = \beta_0 + \beta_1 x + \beta_2 x^2 + \dots + \beta_{30} x^{30}
$$

The curve:
- Passes through almost every training point
- Fails badly on new data

---

## 5. Bias–Variance Decomposition

Expected prediction error can be decomposed as:

$$
\text{Error} = \text{Bias}^2 + \text{Variance} + \text{Irreducible Error}
$$

---

### Bias

Bias measures:
$$
\text{How far the average model prediction is from the true function}
$$

High bias → underfitting

---

### Variance

Variance measures:
$$
\text{How much the model prediction changes with different training data}
$$

High variance → overfitting

---

### Irreducible Error

Caused by:
- Measurement noise
- Randomness in data
- Cannot be eliminated

---

## 6. Bias–Variance Tradeoff

As model complexity increases:

- Bias ↓
- Variance ↑

As model complexity decreases:

- Bias ↑
- Variance ↓

There exists an **optimal complexity** where total error is minimum.

---

## 7. Graphical Intuition (Conceptual)

| Model Type | Bias | Variance | Fit |
|-----------|------|----------|-----|
| Too Simple | High | Low | Underfitting |
| Optimal | Balanced | Balanced | Best |
| Too Complex | Low | High | Overfitting |

---

## 8. Why Training Accuracy Is Misleading

A model can achieve:
$$
100\% \text{ training accuracy}
$$

and still be **useless**.

Reason:
$$
\text{Training error} \neq \text{Generalization error}
$$

---

## 9. How to Choose the Right Model

### Correct Approach

1. Split data:
   - Training set
   - Validation set
   - Test set

2. Train models with different complexity

3. Compare **validation error**

4. Select model with minimum validation error

---

## 10. Cross-Validation (Brief)

Instead of one split:

- Split data into K folds
- Train on K−1 folds
- Validate on remaining fold
- Repeat K times

Final error:
$$
\text{Average validation error}
$$

---

## 11. Why Simple Models Often Win

Reasons:
- Less variance
- More stable
- Easier to interpret
- Generalize better on small data

Rule of thumb:

$$
\text{Choose the simplest model that works}
$$

---

## 12. Practical Insight from the Lecture

- Complex models look impressive
- Simple models work reliably
- Overfitting is more dangerous than underfitting in exams and production

---

## 13. Exam-Ready One-Line Definitions

**Underfitting**  
> Model is too simple to learn the data pattern.

**Overfitting**  
> Model fits noise instead of the true relationship.

**Bias–Variance Tradeoff**  
> Increasing model complexity reduces bias but increases variance.

---

## 14. Final Takeaway

The goal of machine learning is **not perfection on training data**.

The real goal is:

$$
\text{Good performance on unseen data}
$$

A balanced model beats a flashy one every time.

---
