### Regression Metrics

#### 1) MAE - Mean Absolute Error

# 📌 Mean Absolute Error (MAE) — Regression Metric

## 🧠 Definition

**Mean Absolute Error (MAE)** is a regression metric that measures the average magnitude of the errors between predicted and actual values, without considering their direction (i.e., positive or negative).

It is defined as:

$$
\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} \left| y_i - \hat{y}_i \right|
$$

Where:
- $n$ is the total number of data points
- $y_i$ is the true value
- $\hat{y}_i$ is the predicted value

---

## 🔍 Intuition

- MAE gives equal weight to all errors.
- It is a **linear score**, meaning each individual difference contributes proportionally to the total error.
- Unlike Mean Squared Error (MSE), **MAE is more robust to outliers**.

---

## 🧮 Example

Suppose we have the following true and predicted values:

| $y$ (True) | $\hat{y}$ (Predicted) |
|-----------|------------------------|
| 3         | 2.5                    |
| -0.5      | 0.0                    |
| 2         | 2                      |
| 7         | 8                      |

Then the MAE is:

$$
\text{MAE} = \frac{1}{4} \left( |3 - 2.5| + |-0.5 - 0.0| + |2 - 2| + |7 - 8| \right) = \frac{1}{4}(0.5 + 0.5 + 0 + 1) = 0.5
$$

---

## ✅ Pros

- Easy to understand and interpret.
- Robust to outliers compared to MSE.
- Same unit as target variable.

---

## ❌ Cons

- Gradient is not smooth at zero (less useful for optimization in some models).
- Does not penalize large errors more than small ones (unlike MSE).

---

## 🛠️ Code (Python Example)

```python
from sklearn.metrics import mean_absolute_error

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]

mae = mean_absolute_error(y_true, y_pred)
print(f"MAE: {mae}")


In [1]:
from sklearn.metrics import mean_absolute_error

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]

mae = mean_absolute_error(y_true, y_pred)
print(f"MAE: {mae}")

MAE: 0.5


#### 2) MSE - Mean Squared Error

# 📌 Mean Squared Error (MSE) — Regression Metric

## 🧠 Definition

**Mean Squared Error (MSE)** is a commonly used regression metric that measures the average of the squares of the errors between predicted and actual values.

It is defined as:

$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2
$$

Where:
- $n$ is the number of data points,
- $y_i$ is the actual (true) value,
- $\hat{y}_i$ is the predicted value.

---

## 🔍 Intuition

- Squaring the errors ensures they are positive and penalizes larger errors more heavily than smaller ones.
- MSE is **sensitive to outliers**, since large errors are squared and thus magnified.

---

## 🧮 Example

Given:

| $y$ (True) | $\hat{y}$ (Predicted) |
|-----------|------------------------|
| 3         | 2.5                    |
| -0.5      | 0.0                    |
| 2         | 2                      |
| 7         | 8                      |

Calculate:

$$
\begin{align*}
\text{MSE} &= \frac{1}{4} \left( (3 - 2.5)^2 + (-0.5 - 0)^2 + (2 - 2)^2 + (7 - 8)^2 \right) \\
&= \frac{1}{4} \left( 0.25 + 0.25 + 0 + 1 \right) = \frac{1.5}{4} = 0.375
\end{align*}
$$

---

## ✅ Pros

- Penalizes large errors more, which is useful when large mistakes are more costly.
- Smooth gradient — useful in many optimization algorithms (e.g., gradient descent).

---

## ❌ Cons

- Not robust to outliers — large errors dominate the metric.
- The result is in **squared units** of the target variable, which can be less interpretable.

---

## 🛠️ Code (Python Example)

```python
from sklearn.metrics import mean_squared_error

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]

mse = mean_squared_error(y_true, y_pred)
print(f"MSE: {mse}")


In [2]:
from sklearn.metrics import mean_squared_error

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]

mse = mean_squared_error(y_true, y_pred)
print(f"MSE: {mse}")

MSE: 0.375


#### 3) RMSE - Root Mean Square Error

# 📌 Root Mean Squared Error (RMSE) — Regression Metric

## 🧠 Definition

**Root Mean Squared Error (RMSE)** is the square root of the Mean Squared Error (MSE). It measures the average magnitude of the error in the same units as the target variable.

The formula is:

$$
\text{RMSE} = \sqrt{ \frac{1}{n} \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2 }
$$

Where:
- $n$ is the number of observations,
- $y_i$ is the actual value,
- $\hat{y}_i$ is the predicted value.

---

## 🔍 Intuition

- RMSE provides a **scale-sensitive** error metric — large errors are penalized more than small ones (due to squaring).
- Since the square root is applied, RMSE has the **same units as the target**, making it easier to interpret than MSE.

---

## 🧮 Example

Let’s use the same values as in the MSE example:

| $y$ (True) | $\hat{y}$ (Predicted) |
|-----------|------------------------|
| 3         | 2.5                    |
| -0.5      | 0.0                    |
| 2         | 2                      |
| 7         | 8                      |

We already calculated the MSE:

$$
\text{MSE} = 0.375
$$

So RMSE is:

$$
\text{RMSE} = \sqrt{0.375} \approx 0.612
$$

---

## ✅ Pros

- Same unit as the target variable — easy to interpret.
- Useful when large errors are especially undesirable (e.g., forecasting problems).

---

## ❌ Cons

- Like MSE, it is **sensitive to outliers**.
- Does not give information about direction of error (i.e., overestimation vs. underestimation).

---

## 🛠️ Code (Python Example)

```python
from sklearn.metrics import mean_squared_error
import numpy as np

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]

rmse = np.sqrt(mean_squared_error(y_true, y_pred))
print(f"RMSE: {rmse}")


In [3]:
from sklearn.metrics import mean_squared_error
import numpy as np

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]

rmse = np.sqrt(mean_squared_error(y_true, y_pred))
print(f"RMSE: {rmse}")

RMSE: 0.6123724356957945


#### 4) R2 Score

# 📌 $R^2$ Score — Coefficient of Determination

## 🧠 Definition

The **$R^2$ Score** measures the proportion of variance in the dependent variable that is predictable from the independent variables.

It is defined as:

$$
R^2 = 1 - \frac{ \sum_{i=1}^n (y_i - \hat{y}_i)^2 }{ \sum_{i=1}^n (y_i - \bar{y})^2 }
$$

Where:
- $y_i$ is the actual value,
- $\hat{y}_i$ is the predicted value,
- $\bar{y}$ is the mean of the actual values,
- $n$ is the number of samples.

---

## 🔍 Intuition

- The numerator is the **Residual Sum of Squares (RSS)**.
- The denominator is the **Total Sum of Squares (TSS)**.
- $R^2$ tells us **how well the model explains the variance** in the data.

---

## 🔢 Interpretation

- $R^2 = 1$: Perfect prediction.
- $R^2 = 0$: Model is no better than predicting the mean $\bar{y}$.
- $R^2 < 0$: Model is worse than simply using the mean.

> A higher $R^2$ indicates a better fit, but **does not guarantee** a good model (especially if assumptions are violated).

---

## 🧮 Example

Let:

- $y = [3, -0.5, 2, 7]$
- $\hat{y} = [2.5, 0.0, 2, 8]$

Step 1: Compute mean of true values:

$$
\bar{y} = \frac{3 + (-0.5) + 2 + 7}{4} = 2.875
$$

Step 2: Compute TSS and RSS:

$$
\text{TSS} = \sum (y_i - \bar{y})^2 = (3 - 2.875)^2 + (-0.5 - 2.875)^2 + (2 - 2.875)^2 + (7 - 2.875)^2 = 29.1875
$$

$$
\text{RSS} = \sum (y_i - \hat{y}_i)^2 = 0.25 + 0.25 + 0 + 1 = 1.5
$$

Step 3: Compute $R^2$:

$$
R^2 = 1 - \frac{1.5}{29.1875} \approx 0.9486
$$

---

## ✅ Pros

- Indicates **how well the model fits** the data.
- Easy to interpret — closer to 1 means better performance.

---

## ❌ Cons

- **Not reliable** for non-linear models.
- **Can be misleading** when used alone.
- **Does not indicate** whether predictions are biased or correct.

---

## 🛠️ Code (Python Example)

```python
from sklearn.metrics import r2_score

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]

r2 = r2_score(y_true, y_pred)
print(f"R^2 Score: {r2}")


In [4]:
from sklearn.metrics import r2_score

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]

r2 = r2_score(y_true, y_pred)
print(f"R^2 Score: {r2}")


R^2 Score: 0.9486081370449679


#### 5) Adjusted R2 Score

# 📌 Adjusted $R^2$ Score — Corrected Coefficient of Determination

## 🧠 What is $R^2$?

The **$R^2$ score** (Coefficient of Determination) measures how well the regression model explains the variance in the target variable. It is defined as:

$$
R^2 = 1 - \frac{ \sum_{i=1}^n (y_i - \hat{y}_i)^2 }{ \sum_{i=1}^n (y_i - \bar{y})^2 }
$$

Where:
- $y_i$ = actual value,
- $\hat{y}_i$ = predicted value,
- $\bar{y}$ = mean of actual values,
- The numerator is the **Residual Sum of Squares (RSS)**,
- The denominator is the **Total Sum of Squares (TSS)**.

> An $R^2$ of 1 means perfect predictions. An $R^2$ of 0 means predictions are as good as the mean. Negative values imply worse than mean prediction.

---

## 🧠 What is Adjusted $R^2$?

While $R^2$ increases with more features, **Adjusted $R^2$ penalizes unnecessary features** to discourage overfitting. It adjusts $R^2$ based on the number of predictors used.

The formula is:

$$
R^2_{\text{adj}} = 1 - \left( \frac{(1 - R^2)(n - 1)}{n - p - 1} \right)
$$

Where:
- $R^2$ = regular coefficient of determination,
- $n$ = number of samples,
- $p$ = number of features (independent variables).

---

## 🔍 Intuition

- If adding a feature doesn't improve $R^2$ much, Adjusted $R^2$ **goes down**.
- Encourages **parsimonious models** (simpler with fewer variables).
- Helpful for comparing models with different numbers of features.

---

## 🔢 Example

Assume:
- $R^2 = 0.90$
- $n = 100$ samples
- $p = 5$ features

Then:

$$
R^2_{\text{adj}} = 1 - \left( \frac{(1 - 0.90)(100 - 1)}{100 - 5 - 1} \right)
= 1 - \left( \frac{0.10 \cdot 99}{94} \right)
= 1 - \left( \frac{9.9}{94} \right)
\approx 1 - 0.1053 = 0.8947
$$

So, adjusted $R^2 \approx 0.895$.

---

## ✅ Pros

- **Prevents overfitting** by penalizing extra features.
- Useful when comparing models with **different numbers of predictors**.

---

## ❌ Cons

- Only suitable for **linear regression**.
- Still doesn't show how large or biased the errors are.

---

## 🛠️ Code (Python Example)

```python
from sklearn.metrics import r2_score

def adjusted_r2(y_true, y_pred, n, p):
    r2 = r2_score(y_true, y_pred)
    return 1 - ((1 - r2) * (n - 1)) / (n - p - 1)

# Example
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]

n = len(y_true)
p = 2  # Number of features used

adj_r2 = adjusted_r2(y_true, y_pred, n, p)
print(f"Adjusted R^2: {adj_r2}")
