# Introduction

Huber Loss was introduced to combine the benefits of Mean Squared Error (MSE) and Mean Absolute Error (MAE) in regression tasks. It is robust to outliers, combining the quadratic nature of MSE with the linear nature of MAE.

Huber Loss behaves like MSE when the error is small but switches to MAE when the error is large. This makes it less sensitive to outliers while retaining the smoothness of MSE for smaller errors.

**Mathematical Formulation** for a given threshold $\delta$, the Huber Loss is defined as:

$$\mathcal{L}(y, \hat{y}) = \begin{cases}
\frac{1}{2}(y - \hat{y})^2 & \text{for } |y - \hat{y}| \leq \delta \\
\delta |y - \hat{y}| - \frac{1}{2}\delta^2 & \text{otherwise}
\end{cases}$$

where:
- $y$ is the true value
- $\hat{y}$ is the predicted value
- $\delta$ is the threshold

# How it works

For errors smaller than $\delta$, the loss behaves like MSE and is quadratic. For errors larger than $\delta$, the loss switches to MAE, reducing the influence of large outliers by applying a linear penalty.

**Example**

If the error $|y - \hat{y}|$ is 0.5 and $\delta$ is 1, the loss is calculated as:

$$\mathcal{L}(y, \hat{y}) = \frac{1}{2}(0.5)^2 = 0.125$$

If the error $|y - \hat{y}|$ is 2 and $\delta$ is 1, the loss is calculated as:

$$\mathcal{L}(y, \hat{y}) = 1 \times 2 - \frac{1}{2} \times 1^2 = 1.5$$

### Pros and Cons

Pros:
- Combines the benefits of MSE (smooth gradients) and MAE (robustness to outliers).
- Effective in regression tasks where both small and large errors occur.
- More robust to outliers than MSE.

Cons:
- Requires tuning of the threshold $\delta$.
- May not perform as well as MSE for datasets without outliers.

# Use cases

- Ideal for regression tasks where outliers are present.
- Useful when you want the smoothness of MSE but need robustness to large errors.