# 📜 Evaluation Metrics in Regression (AI/ML/DL)

---

## 🔹 1. Error-Based Metrics
These measure the deviation between predicted ($\hat{y}$) and true ($y$) values.

- **Mean Absolute Error (MAE)**  
  $$
  MAE = \frac{1}{n}\sum |y - \hat{y}|
  $$  
  ✅ Robust to outliers  
  ❌ Doesn’t penalize large errors strongly  

- **Mean Squared Error (MSE)**  
  $$
  MSE = \frac{1}{n}\sum (y - \hat{y})^2
  $$  
  ✅ Penalizes large errors heavily  
  ❌ Sensitive to outliers  

- **Root Mean Squared Error (RMSE)**  
  $$
  RMSE = \sqrt{MSE}
  $$  
  ✅ Interpretable in original units  
  ❌ Same weaknesses as MSE  

- **Mean Absolute Percentage Error (MAPE)**  
  $$
  MAPE = \frac{100}{n}\sum \left|\frac{y - \hat{y}}{y}\right|
  $$  
  ✅ Scale-independent, percentage-based  
  ❌ Undefined if $y=0$, biased for small values  

- **Symmetric MAPE (sMAPE)**  
  $$
  sMAPE = \frac{100}{n}\sum \frac{|y - \hat{y}|}{(|y|+|\hat{y}|)/2}
  $$  
  ✅ Better for forecasting  
  ❌ Still unstable near zero  

---

## 🔹 2. Relative Error Metrics
- **Mean Relative Error (MRE):** Average ratio of error to true value.  
- **Normalized RMSE (NRMSE):** RMSE divided by data range or mean.  
- **Relative Absolute Error (RAE):** Error relative to a baseline (e.g., predicting the mean).  

---

## 🔹 3. Goodness-of-Fit Metrics
- **$R^2$ (Coefficient of Determination)**  
  $$
  R^2 = 1 - \frac{\sum (y - \hat{y})^2}{\sum (y - \bar{y})^2}
  $$  
  ✅ Measures variance explained  
  ❌ Can be misleading for nonlinear models  

- **Adjusted $R^2$**: Corrects $R^2$ for number of predictors.  

---

## 🔹 4. Robust Regression Metrics
- **Median Absolute Error:** Median of $|y - \hat{y}|$.  
  ✅ More robust than MAE  

- **Quantile Loss (Pinball):** Used for quantile regression.  
  $$
  L_\tau(r) = \max(\tau r, (\tau-1) r)
  $$  

- **Huber Loss (as evaluation):** Quadratic for small errors, linear for large.  

---

## 🔹 5. Forecasting & Time-Series Metrics
- **Mean Absolute Scaled Error (MASE):** Error relative to a naive baseline forecast.  
- **RMSSE (Root Mean Squared Scaled Error):** Used in forecasting competitions (M5).  
- **Theil’s U Statistic:** Compares forecast accuracy to naive model.  

---

## 🔹 6. Probabilistic Regression Metrics
Used when predicting distributions instead of point estimates.

- **Negative Log Likelihood (NLL):** Penalizes likelihood of observed values.  
- **CRPS (Continuous Ranked Probability Score):** Compares predicted distribution to actual outcome.  
- **Calibration Metrics (Brier Score variants):** Evaluate probabilistic accuracy.  

---

## 🔹 7. Domain-Specific Metrics
- **Cosine Similarity / Geometric Losses:** For embeddings and vector regression.  
- **IoU / Dice Score (adapted):** Used when regression predicts structured outputs (e.g., bounding boxes).  
- **Perceptual Metrics:**  
  - **PSNR** (Peak Signal-to-Noise Ratio)  
  - **SSIM** (Structural Similarity)  
  Used in image and audio regression.  

---

## ✅ Key Usage Scenarios
- **General regression:** MAE, MSE, RMSE, $R^2$  
- **Imbalanced scales / relative errors:** MAPE, sMAPE, NRMSE  
- **Outliers:** Median Absolute Error, Huber, quantile loss  
- **Forecasting:** MASE, RMSSE, Theil’s U  
- **Uncertainty-aware:** NLL, CRPS  
- **Images/audio:** PSNR, SSIM, perceptual losses  


# 📊 Comparative Table: Evaluation Metrics for Regression (AI/ML/DL)

| Metric                                   | Formula (simplified)                                       | Intuition                      | Pros                            | Cons                                           | When to Use                         |
| ---------------------------------------- | ---------------------------------------------------------- | ------------------------------ | ------------------------------- | ---------------------------------------------- | ----------------------------------- |
| **MAE (Mean Absolute Error)**            | $$MAE = \frac{1}{n}\sum |y - \hat{y}|$$                 | Average absolute deviation     | Robust to outliers, interpretable | Doesn’t penalize large errors strongly        | General regression, skewed data     |
| **MSE (Mean Squared Error)**             | $$MSE = \frac{1}{n}\sum (y - \hat{y})^2$$                  | Penalizes squared deviations   | Smooth gradients, standard loss | Sensitive to outliers                          | Gaussian noise, stable datasets     |
| **RMSE (Root MSE)**                      | $$RMSE = \sqrt{\frac{1}{n}\sum (y - \hat{y})^2}$$          | Error in same units as target  | Easy interpretation             | Same outlier issues as MSE                     | Reporting performance               |
| **Median Absolute Error**                | $$\text{MedAE} = \text{median}(|y - \hat{y}|)$$            | Median of errors               | Very robust to outliers          | Ignores variance of large errors               | Heavy-tailed noise                   |
| **MAPE (Mean Abs. Percentage Error)**    | $$MAPE = \frac{100}{n}\sum \left|\frac{y-\hat{y}}{y}\right|$$ | Percent error               | Scale-free, intuitive           | Undefined if $$y=0$$, biased when values small | Business/forecasting (nonzero y)    |
| **sMAPE (Symmetric MAPE)**               | $$sMAPE = \frac{200}{n}\sum \frac{|y-\hat{y}|}{|y|+|\hat{y}|}$$ | Symmetric percentage error | Scale-free, handles symmetry    | Still unstable near zero values                | Time-series forecasting             |
| **RAE (Relative Abs. Error)**            | $$RAE = \frac{\sum |y-\hat{y}|}{\sum |y-\bar{y}|}$$        | Error vs baseline (mean)       | Compares to naive model         | Needs meaningful baseline                      | Model comparison                    |
| **NRMSE (Normalized RMSE)**              | $$NRMSE = \frac{RMSE}{\max(y)-\min(y)}$$                   | RMSE scaled by range/mean      | Dimensionless                   | Depends on normalization choice                | Comparing across datasets           |
| **R² (Coefficient of Determination)**    | $$R^2 = 1 - \frac{\sum (y-\hat{y})^2}{\sum (y-\bar{y})^2}$$ | Variance explained             | Intuitive                       | Can be negative, misleading in nonlinear cases | Model fit in regression tasks       |
| **Adjusted R²**                          | $$1-(1-R^2)\frac{n-1}{n-p-1}$$                             | R² adjusted for features       | Penalizes overfitting           | Only valid for linear models                   | Feature selection evaluation        |
| **Huber Error**                          | $$L = \begin{cases} \tfrac{1}{2}r^2 & |r|\le \delta \\ \delta(|r|-\tfrac{1}{2}\delta) & \text{otherwise} \end{cases}$$ | Blend of MSE & MAE | Robust to outliers, smooth | Requires δ tuning | Regression with moderate outliers |
| **RMSLE (Root Mean Squared Log Error)**  | $$RMSLE = \sqrt{\frac{1}{n}\sum(\log(1+y)-\log(1+\hat{y}))^2}$$ | Penalizes relative error | Handles exponential growth      | Undefined if $$y < -1$$                        | Growth prediction, finance          |
| **MASE (Mean Abs. Scaled Error)**        | $$MASE = \frac{MAE}{MAE_{\text{naive}}}$$                  | Error scaled by naive forecast | Scale-free, interpretable       | Needs baseline                                 | Forecasting competitions            |
| **RMSSE (Root MSE Scaled Error)**        | $$RMSSE = \sqrt{\frac{MSE}{MSE_{\text{naive}}}}$$          | RMSE relative to naive         | Used in forecasting contests    | Needs seasonal baseline                        | Time-series (M5, Kaggle)            |
| **Theil’s U**                            | $$U = \frac{RMSE}{RMSE_{\text{naive}}}$$                   | Compare vs naive predictor     | Dimensionless                   | Limited interpretability                       | Economic/finance forecasts          |
| **Negative Log Likelihood (NLL)**        | $$NLL = -\sum \log p(y|\hat{\theta})$$                     | Probabilistic fit quality      | Models uncertainty              | Needs distribution assumption                  | Probabilistic regression            |
| **CRPS (Continuous Ranked Prob. Score)** | $$CRPS = \int (F(y)-\mathbb{1}(t\le y))^2 dt$$             | Distributional accuracy        | Proper scoring rule             | Costly to compute                              | Probabilistic forecasting           |
| **Brier Score (adapted)**                | $$\text{Brier} = \frac{1}{n}\sum (p-y)^2$$                 | Probabilistic calibration      | Easy to interpret               | Limited to [0,1] regression                    | Risk models, probability regression |
| **PSNR (Peak Signal-to-Noise Ratio)**    | $$PSNR = 10\log_{10}\frac{MAX^2}{MSE}$$                    | Signal reconstruction quality  | Perceptual for images           | Not task-general                               | Image/signal regression             |
| **SSIM (Structural Similarity Index)**   | Structural overlap function                               | Captures perceptual quality    | Correlates with human vision    | Non-convex, complex                            | Vision, denoising, generative tasks |

---

## ✅ Key Takeaways

- **General regression:** MAE, MSE, RMSE, R²  
- **Outliers present:** Median Abs. Error, Huber  
- **Relative/scale-free:** MAPE, sMAPE, RAE, NRMSE  
- **Forecasting:** MASE, RMSSE, Theil’s U  
- **Uncertainty-aware:** NLL, CRPS  
- **Images/audio:** PSNR, SSIM  


# 📜 Unified Evaluation Atlas: Classification vs Regression (AI/ML/DL)

---

## 🔹 Classification Metrics

| Metric            | Formula (simplified)                                      | Intuition                           | Pros                        | Cons                              | When to Use                          |
| ----------------- | --------------------------------------------------------- | ----------------------------------- | --------------------------- | --------------------------------- | ------------------------------------ |
| **Accuracy**      | $Acc = \frac{TP+TN}{TP+TN+FP+FN}$                         | Overall correctness                 | Simple, intuitive           | Misleading with imbalance          | Balanced datasets                    |
| **Precision**     | $Prec = \frac{TP}{TP+FP}$                                 | Correctness of positives            | Reduces false alarms        | Ignores false negatives            | Fraud detection, spam filters        |
| **Recall (TPR)**  | $Rec = \frac{TP}{TP+FN}$                                  | Coverage of positives                | Captures completeness       | Ignores false positives            | Medical diagnosis                    |
| **F1-Score**      | $F1 = 2 \cdot \frac{Prec \cdot Rec}{Prec+Rec}$            | Harmonic mean of P & R               | Balances precision & recall | Harder to interpret for business   | Imbalanced datasets, NLP, CV         |
| **Specificity**   | $Spec = \frac{TN}{TN+FP}$                                 | Correct rejection of negatives       | Complements recall          | Ignores false negatives            | Screening tests                      |
| **ROC-AUC**       | Area under ROC (TPR vs FPR)                               | Threshold-free separability          | Robust to imbalance         | Overestimates with skewed data     | Binary classification, ranking       |
| **PR-AUC**        | Area under Precision–Recall curve                         | Positive class focus                 | Good for imbalance          | Unstable at low recall             | Rare-event detection                 |
| **MCC**           | $\frac{TP\cdot TN - FP\cdot FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}$ | Correlation between preds & truth | Works with imbalance        | Complex formula                    | Medicine, bioinformatics             |
| **Cohen’s Kappa** | $\kappa=\frac{p_o-p_e}{1-p_e}$                            | Chance-corrected accuracy            | Adjusts for imbalance       | Less common in DL                  | Multi-class imbalance                |
| **Top-K Accuracy**| Correct if true label $\in$ top-K predictions             | Ranking correctness                  | Useful for multi-class      | Not useful for binary tasks        | ImageNet, NLP vocab classification   |
| **Log-Loss**      | $-\frac1n\sum[y\log p+(1-y)\log(1-p)]$                    | Probabilistic confidence             | Good for calibration        | Sensitive to outliers              | Calibration of classifiers           |
| **Brier Score**   | $\frac1n\sum(p-y)^2$                                      | Probability calibration              | Interpretable               | Not scale-free                     | Risk & reliability models            |
| **Jaccard Index** | $\frac{TP}{TP+FP+FN}$                                     | Overlap measure                      | Good for multi-label        | Ignores true negatives             | Segmentation, multi-label tasks      |

---

## 🔹 Regression Metrics

| Metric               | Formula (simplified)                                      | Intuition                        | Pros                             | Cons                                    | When to Use                           |
| -------------------- | --------------------------------------------------------- | -------------------------------- | -------------------------------- | --------------------------------------- | ------------------------------------- |
| **MAE**              | $\frac{1}{n}\sum |y-\hat{y}|$                             | Avg. absolute error              | Robust to outliers               | Doesn’t penalize large errors strongly  | General regression, skewed data       |
| **MSE**              | $\frac{1}{n}\sum (y-\hat{y})^2$                           | Penalizes large errors           | Smooth gradients, standard loss  | Sensitive to outliers                   | Gaussian noise, stable datasets        |
| **RMSE**             | $\sqrt{\frac{1}{n}\sum (y-\hat{y})^2}$                    | Error in same units as target    | Interpretable                    | Same issues as MSE                      | Reporting model fit                    |
| **Median AE**        | $\text{median}(|y-\hat{y}|)$                              | Typical (median) error           | Very robust to extreme outliers  | Ignores variance of large errors        | Heavy-tailed noise                     |
| **MAPE**             | $\frac{100}{n}\sum \left|\frac{y-\hat{y}}{y}\right|$      | Percent error                    | Scale-free, intuitive            | Undefined if $y=0$                      | Forecasting/business KPIs              |
| **sMAPE**            | $\frac{200}{n}\sum \frac{|y-\hat{y}|}{|y|+|\hat{y}|}$     | Symmetric % error                | Scale-free, handles symmetry     | Still unstable near 0                   | Time-series forecasting                |
| **R²**               | $1 - \frac{\sum (y-\hat{y})^2}{\sum (y-\bar{y})^2}$       | Variance explained               | Intuitive fit measure            | Misleading for nonlinear models         | Model fit evaluation                   |
| **Adjusted R²**      | $1-(1-R^2)\frac{n-1}{n-p-1}$                              | Penalized R²                     | Penalizes overfitting            | Only valid for linear models            | Feature set evaluation                 |
| **Huber Error**      | Quadratic for small errors, linear for large ones         | Hybrid of L1 & L2                | Robust + smooth optimization     | Requires δ tuning                       | Outlier-prone regression               |
| **RMSLE**            | $\sqrt{\frac{1}{n}\sum(\log(1+y)-\log(1+\hat{y}))^2}$     | Penalizes relative errors        | Handles exponential growth       | Undefined if $y<0$                      | Finance, growth prediction             |
| **MASE**             | $\frac{MAE}{MAE_{\text{naive}}}$                          | Error relative to naive forecast | Scale-free, interpretable        | Needs baseline                          | Forecasting competitions               |
| **CRPS**             | $\int (F(y)-\mathbb{1}(t\le y))^2 dt$                     | Distributional accuracy          | Proper probabilistic metric      | Costly to compute                       | Probabilistic forecasting              |
| **PSNR / SSIM**      | $PSNR=10\log_{10}\frac{MAX^2}{MSE}$; SSIM = structure fn. | Signal/structure similarity      | Correlates with human perception | Task-specific, non-convex               | Images, audio, generative tasks        |

---

## ✅ Cross-Family Insights

- **Classification** → discrete outcomes: metrics come from the **confusion matrix** & **probability calibration**.  
- **Regression** → continuous outcomes: metrics focus on **error magnitudes, variance explained, and scale-free comparisons**.  

**When data is imbalanced:**  
- Classification → **PR-AUC, F1, MCC**.  
- Regression → **Median AE, Huber, RMSLE**.  

**When uncertainty matters:**  
- Classification → **Log-Loss, Brier Score**.  
- Regression → **NLL, CRPS**.  

**When interpretability for business matters:**  
- Classification → **Accuracy, Precision, Recall**.  
- Regression → **MAE, MAPE, RMSE**.  
