# Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE)

## Commands

* `MSE = (1/n) * Σ(y - y_hat)^2`
* `MAE = (1/n) * Σ|y - y_hat|`
* `RMSE = sqrt(MSE)`
* `RMSE = sqrt((1/n) * Σ(y - y_hat)^2)`

## Summary

* **Mean Squared Error (MSE)** measures the average squared difference between the estimated values and the actual value, emphasizing larger errors by squaring them.
* **Mean Absolute Error (MAE)** calculates the average of the absolute differences between prediction and actual observation, providing a linear score that weights all differences equally.
* **Root Mean Squared Error (RMSE)** is the square root of MSE, bringing the error metric back to the same unit as the target variable for easier interpretation.
* **MSE** is differentiable and converges faster due to its convex nature but is sensitive to outliers.
* **MAE** is robust to outliers but is computationally more complex to optimize because it is not differentiable at zero.

## Exam Notes

### Comparing MSE and MAE

**Question**: When should you use **MSE** compared to **MAE**?

**Answer**:  
Use **MSE** when you need a loss function that is **differentiable** at all points and want faster convergence. MSE creates a **quadratic curve** (convex function) with a single global minima, making optimization efficient. However, avoid MSE if your dataset has many **outliers**, as squaring the error penalizes them heavily and skews the model.

**Question**: When is **MAE** preferred over **MSE**?

**Answer**:  
**MAE** is preferred when your dataset contains **outliers** that you do not want to heavily influence the model. It is **robust to outliers** because it takes the absolute difference rather than squaring it. The trade-off is that MAE is **not differentiable at zero** (requires sub-gradients) and typically takes longer to converge.

---

## Mean Squared Error (MSE)

**Mean Squared Error (MSE)** is a common loss function defined by the formula:

$$
MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$

### Advantages of MSE

1. **Differentiable**:  
   The squared term creates a quadratic equation (parabola), which is a **convex function**. This ensures differentiability at all points, allowing gradient descent to compute slopes effectively.

2. **Single Global Minima**:  
   Being convex, MSE has only one **global minima** and no local minima, preventing the optimizer from getting stuck.

3. **Faster Convergence**:  
   Gradient descent converges faster due to the smooth convex curve.

### Disadvantages of MSE

1. **Not Robust to Outliers**:  
   Squaring magnifies large errors. Outliers can significantly distort the best fit line.

2. **Unit Mismatch**:  
   The unit of MSE is the square of the target variable’s unit, making interpretation less intuitive.

## Mean Absolute Error (MAE)

**Mean Absolute Error (MAE)** calculates the average absolute difference between predicted and actual values:

$$
MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|
$$

### Advantages of MAE

1. **Robust to Outliers**:  
   Absolute differences prevent large errors from dominating the loss.

2. **Same Unit**:  
   Error is expressed in the same unit as the target variable, making it easy to interpret.

### Disadvantages of MAE

1. **Slower Convergence**:  
   Optimization is generally slower than MSE.

2. **Not Differentiable at Zero**:  
   The absolute value function has a sharp point at zero, requiring **sub-gradient methods**.

## Root Mean Squared Error (RMSE)

**Root Mean Squared Error (RMSE)** is the square root of MSE:

$$
RMSE = \sqrt{MSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}
$$

### Key Characteristics

* **Same Unit**:  
  RMSE restores the original unit of the target variable, improving interpretability.

* **Differentiable**:  
  Retains optimization advantages similar to MSE.

* **Not Robust to Outliers**:  
  Since it is derived from MSE, it remains sensitive to outliers.

### Summary Table

| Metric | Outlier Robustness | Differentiable? | Unit Match? | Convergence Speed |
| :--- | :--- | :--- | :--- | :--- |
| **MSE** | No (Sensitive) | Yes | No (Squared) | Fast |
| **MAE** | Yes (Robust) | No (at 0) | Yes | Slow |
| **RMSE** | No (Sensitive) | Yes | Yes | Fast |
