# Model Evaluation Metrics

Choosing the right metric is as important as choosing the right model. The metric defines what the model tries to optimize.

---

## 1. Regression Metrics (Continuous Targets)

### Error-based Metrics
These measure the distance between predicted values ($\hat{y}$) and actual values ($y$).

**1. MAE (Mean Absolute Error)**
The average of absolute differences.
$$MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$
* **Pros:** Robust to outliers; easily interpretable in original units.
* **Cons:** Non-differentiable at zero (harder for Gradient Descent).

**2. MSE (Mean Squared Error)**
The average of squared differences.
$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$
* **Pros:** Differentiable; useful for optimization.
* **Cons:** Penalizes larger errors heavily (sensitive to outliers).

**3. RMSE (Root Mean Squared Error)**
The square root of MSE.
$$RMSE = \sqrt{MSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}$$
* **Pros:** Same units as the target variable (e.g., dollars, meters).
* **Cons:** Highly sensitive to outliers.

**4. RMSLE (Root Mean Squared Logarithmic Error)**
Takes the log of predictions and actuals before calculating error.
$$RMSLE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (\log(\hat{y}_i + 1) - \log(y_i + 1))^2}$$
* **Pros:** Penalizes underestimates more than overestimates; good for exponential growth data.

---

### Percentage Error Metrics
Useful for explaining error to non-technical stakeholders (e.g., "We are off by 5%").

**1. MAPE (Mean Absolute Percentage Error)**
$$MAPE = \frac{100\%}{n} \sum_{i=1}^{n} \left| \frac{y_i - \hat{y}_i}{y_i} \right|$$
* **Cons:** Undefined if $y=0$; punishes negative errors differently than positive ones.

**2. sMAPE (Symmetric MAPE)**
Bounds the error between 0% and 200%.
$$sMAPE = \frac{200\%}{n} \sum_{i=1}^{n} \frac{|\hat{y}_i - y_i|}{|y_i| + |\hat{y}_i|}$$

---

### Goodness-of-fit Metrics

**1. $R^2$ (Coefficient of Determination)**
Proportion of variance in the dependent variable explained by the model.
$$R^2 = 1 - \frac{SS_{residual}}{SS_{total}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}$$
* **Range:** $(-\infty, 1]$. An $R^2$ of 1 is perfect; 0 is baseline (guessing the mean).

**2. Adjusted $R^2$**
Penalizes adding useless features. (Standard $R^2$ always increases as you add features).
$$R^2_{adj} = 1 - (1-R^2) \frac{n-1}{n-p-1}$$
* $n$: Number of samples.
* $p$: Number of predictors/features.

---

## 2. Classification Metrics (Categorical Targets)

### The Confusion Matrix Components


* **TP:** True Positive (Correctly predicted Yes)
* **TN:** True Negative (Correctly predicted No)
* **FP:** False Positive (Type I Error)
* **FN:** False Negative (Type II Error)

### Core Metrics

**1. Precision**
Accuracy of positive predictions. "Of all the ones I labeled Positive, how many actually were?"
$$Precision = \frac{TP}{TP + FP}$$

**2. Recall (Sensitivity)**
Coverage of actual positives. "Of all the real Positives, how many did I find?"
$$Recall = \frac{TP}{TP + FN}$$

**3. F1-Score**
The harmonic mean of Precision and Recall. Use this when you need a balance between the two.
$$F1 = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}$$

**4. Specificity**
Coverage of actual negatives. "How good are we at avoiding false alarms?"
$$Specificity = \frac{TN}{TN + FP}$$

---

### Threshold-Independent Metrics

**1. ROC-AUC (Area Under Receiver Operating Characteristic)**
* Plots **TPR (Recall)** vs **FPR ($1 - Specificity$)**.
* **Use:** General measure of ranking performance. $0.5$ is random, $1.0$ is perfect.

**2. PR-AUC (Area Under Precision-Recall Curve)**
* **Use:** Much better than ROC-AUC for **highly imbalanced datasets** (e.g., Fraud Detection).

---

### Probabilistic & Advanced Metrics

**1. Log Loss (Binary Cross-Entropy)**
Penalizes confident wrong predictions.
$$LogLoss = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i)]$$

**2. Matthews Correlation Coefficient (MCC)**
A balanced measure even with class imbalance. Range $[-1, 1]$.
$$MCC = \frac{TP \times TN - FP \times FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}$$

---

## Summary: When to Use Which?

| Scenario | Recommended Metric |
| :--- | :--- |
| **Regression (Standard)** | **RMSE** (Interpretable), **$R^2$** (Fit quality) |
| **Regression (Outliers)** | **MAE** (Robust) |
| **Regression (Business)** | **MAPE** (Percentage error is easy to explain) |
| **Classification (Balanced)** | **Accuracy**, **F1-Score** |
| **Classification (Imbalanced)** | **PR-AUC**, **F1-Score**, **MCC** |
| **Minimize False Alarms** | **Precision** (e.g., Spam Detection) |
| **Minimize Missed Cases** | **Recall** (e.g., Cancer Diagnosis) |
| **Probabilistic Calibration** | **Log Loss**, **Brier Score** |