# Forecasting Metrics

[Link to the video](https://www.youtube.com/watch?v=jhh4tHYmVew&list=PLKmQjl_R9bYd32uHImJxQSFZU5LPuXfQe&index=8)

- [Mean Absolute Error](#mean-absolute-error)
- [Mean Squared Error](#mean-squared-error)
- [Root Mean Square Error](#root-mean-square-error)
- [Mean Absolute Percentage Error](#mean-absolute-percentage-error)
- [Symmetric Mean Absolute Percentage Error](#symmetric-mean-absolute-percentage-error)
- [Mean Squared Logarithm Error](#mean-squared-logarithm-error)

## Mean Absolute Error

$
MAE = \frac{1}{n} \sum_{i=1}^{n} \left| y_i - \hat{y}_i \right|
$

### üîç Where
- n = number of data point
- $ y_i $ = Actual Value
- $ \hat{y}_i $ = predicted value
- $ |y_i - \hat{y}_i| $ = absolute error for each point

### ‚ÅâÔ∏è Pros

- Intuative
- erros in same units as forecast

### ‚ö†Ô∏è Cons
- doesnt penalize outliers
- scale dependent

## Mean Squared Error

$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$

### üîç Where
- n = number of data point
- $ y_i $ = Actual Value of the  $ i^{\text{th}} $ observations (data points)
- $ \hat{y}_i $ = predicted value of the $ i^{\text{th}} $ observation
- $  y_i - \hat{y}_i $ = The error for the $ i^{\text{th}} $ point (difference between actual and predicted)
- $ (y_i - \hat{y}_i)^2$ = The squared error, which ensures all errors are positive and penalizes large errors more heavily

- $  \sum_{i=1}^{n}$ =  Sum all squared errors over all n observations
- $ \frac{1}{n} $ = Take the mean (average) of the squared errors

### ‚ÅâÔ∏è Pros
- penalizes outliers

### ‚ö†Ô∏è Cons
- less predicable
- scale dependent

## Root Mean Square Error

$\text{RMSE} = \sqrt{ \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 }$

### üîç Where
- n = Number of observations
- $ y_i $ = Actual Value
- $ \hat{y}_i $ = Predicted Value
- $(y_i - \hat{y}_i)^2 $ = Squared error for each prediction
- $  \sum_{i=1}^{n}$ =  Sum all squared errors
- $ \frac{1}{n} $ = Mean of squared errors (this is the MSE)
- $ \sqrt{ ... } $ = Take the squrae root of the MSE

### ‚ÅâÔ∏è Pros
- punishes outliers
- error in forecast units
- best of both worlds of MSE and MAE

### ‚ö†Ô∏è Cons
- less interpretable
- scale dependent

##  Mean Absolute Percentage Error

$
\text{MAPE} = \frac{100\%}{n} \sum_{i=1}^{n} \left| \frac{y_i - \hat{y}_i}{y_i} \right|
$

### üîç Where
- n = Number of observations
- $ y_i $ = Actual Value
- $ \hat{y}_i $ = Predicted Value
- $y_i - \hat{y}_i $ = Prediction error
- $  \sum$ =  Sum all absolute percentage erros
- $  \frac{100\%}{n} $ = Take the mean and express as a percentage


### ‚ÅâÔ∏è Pros
- easy to interpret
- scale independent

### ‚ö†Ô∏è Cons
- infinite error if the actual value is near zero
- biased to under-forecast

### üìê What it tells you

        ‚ÄúOn average, my prediction was X% off from the actual value.‚Äù

### ‚ö†Ô∏è Caution
MAPE can be misleading when:
- $ y_i = 0 $ (division by zero -- boom üí•)
- the actual values are very small - huge percentage swings


## Symmetric Mean Absolute Percentage Error

$
\text{SMAPE} = \frac{100\%}{n} \sum_{i=1}^{n} \frac{ \left| y_i - \hat{y}_i \right| }{ \left( \left| y_i \right| + \left| \hat{y}_i \right| \right) / 2 }
$

### üîç Where
- n = Number of data points
- $ y_i $ = Actual Value
- $ \hat{y}_i $ = Predicted Value
- $ 100\% $ = Convert it into a percentage
- ‚õµ = Symmetric - handles over-and under-predictions more fairly
- üí£ Does not explode when actual is near zero, like MAPE can

### ‚ÅâÔ∏è Pros
- no longer favors under forecasting

### ‚ö†Ô∏è Caution
- MAPE: Biased when actual values are near 0
- SMAPE: fixes that by dividing by average magnitude
- values range from 0% to 200%

## Mean Squared Logarithm Error

$
\text{MSLE} = \frac{1}{n} \sum_{i=1}^{n} \left( \log(1 + y_i) - \log(1 + \hat{y}_i) \right)^2
$

### üîç Where
- n = Number of data points
- $ y_i $ = Actual Value
- $ \hat{y}_i $ = Predicted Value
- $\log_e(1 + x)$ = natural log (base e) of x + 1, prevents log(0)
- $\left( \log(1 + y_i) - \log(1 + \hat{y}_i) \right)^2$ = Squared log difference
- $  \sum$ = Sum of all data points
- $ \frac{1}{n} $ Average the squared difference

### ‚ÅâÔ∏è Explanation

- Less sensitive to large absolute errors ‚Äî great when you don‚Äôt want to penalize big predictions too harshly.

- Emphasizes underestimates more than overestimates (log difference squashes large values).

- Works best when both actual and predicted values are non-negative.