# 📈 Overfitting and Underfitting

---

## Visual Notes

<img src='img/overfitting-underfitting-1.png'>
<br>
<img src='img/overfitting-underfitting-2.png'>

---


## What are Overfitting and Underfitting?

- **Overfitting** occurs when a model learns the training data too well, including its noise and outliers. It performs very well on the training set but poorly on new, unseen data (test set). This means the model has **low bias** but **high variance**.
- **Underfitting** happens when a model is too simple to capture the underlying pattern of the data. It performs poorly on both the training and test sets. This means the model has **high bias** and **high variance**.

---

## Bias and Variance

- **Bias** is the error due to overly simplistic assumptions in the learning algorithm. High bias can cause the model to miss relevant relations (underfitting).
- **Variance** is the error due to too much complexity in the learning algorithm. High variance can cause the model to model the random noise in the training data (overfitting).
- The goal is to find a balance: **Low Bias, Low Variance** (a generalized model).

---

## Train, Test, and Validation Sets

- The dataset is typically split into:
  - **Training set:** Used to train the model.
  - **Validation set:** Used to tune hyperparameters and validate the model during training.
  - **Test set:** Used to evaluate the final model's performance on unseen data.
- A common split is **70% training, 30% test**.
- Sometimes, the training set is further split into training and validation sets.

---

## How to Identify Overfitting and Underfitting

- **Generalized Model:**
  - High accuracy on both train and test sets (e.g., 90% train, 88% test)
  - Indicates low bias and low variance.
- **Overfitting:**
  - High accuracy on train set (e.g., 90%) but low accuracy on test set (e.g., 50%)
  - Indicates low bias, high variance.
- **Underfitting:**
  - Low accuracy on both train and test sets
  - Indicates high bias, high variance.

---

## Visual Summary

- A generalized model fits the data well without capturing noise (low bias, low variance).
- An overfitted model fits the training data too closely, failing to generalize (low bias, high variance).
- An underfitted model fails to capture the underlying trend (high bias, high variance).

---

**Key Takeaway:**
- Always aim for a model that generalizes well: not too simple (underfitting) and not too complex (overfitting).
- Use validation techniques and regularization to help achieve this balance.