# Day-48: Time Series Evaluation Metrics

We've built ARIMA and Holt-Winters models, but how do we know which one is better? Today is all about becoming the judge and jury of your forecasts. We're diving into the essential Time Series Evaluation Metrics and, crucially, the only correct way to validate your models: Walk-Forward Validation.

## Topics Covered:

## Time Series Evaluation Metrics

In regression, we use metrics like $R^2$
 , but for forecasting, we focus on the errorâ€”the difference between the actual value $(Y_t)$ and the forecasted value $(\^Y_t)$.

### 1. RMSE: Root Mean Squared Error

RMSE is the most common metric for regression and forecasting. It measures the average magnitude of the error.

$$\text{RMSE} = \sqrt{\frac{1}{n}\sum_{t=1}^{n} (Y_t - \hat{Y}_t)^2}$$

- `Analogy`: The Speeding Ticket. RMSE is like calculating your average speeding fine. Because you square the error, large errors are penalized much more heavily than small errors. This makes RMSE sensitive to outliers.

- Use Case: When large forecasting errors are disproportionately costly (e.g., predicting an inventory shortage for a critical part).

- Units: It's in the same units as the target variable (e.g., if you're predicting sales in dollars, RMSE is in dollars).

### 2. MAE: Mean Absolute Error

MAE measures the average magnitude of the error using absolute values.

$$\text{MAE} = \frac{1}{n}\sum_{t=1}^{n} |Y_t - \hat{Y}_t|$$

- `Analogy`: The Distance Traveled. MAE is simpler. It's the average absolute difference between your prediction and the truth. A large error counts just as much as two small errors that add up to the same amount. It is less sensitive to outliers than RMSE.

- Use Case: When you want a robust, easily understandable metric of average error.

- Units: It's in the same units as the target variable.

### 3. MAPE: Mean Absolute Percentage Error

MAPE measures the average accuracy in percentage terms.

$$\text{MPAE} = \frac{100}{n}\sum_{t=1}^{n} |\frac{Y_t - \hat{Y}_t}{Y_t}|$$

- `Analogy`: The Relative Cost. MAPE answers the question: "On average, how far off was my forecast as a percentage of the actual value?" It's great for comparing performance across different time series with vastly different scales (e.g., comparing sales forecasts for a $5 item and a $5,000 item).

- `Warning`: MAPE is undefined when $Y_t$ is zero and unstable when $Y_t$ is close to zero.

## Walk-Forward Validation

When validating a time series model, you CANNOT use a standard train/test split (e.g., 80% train, 20% test with random sampling). Time is sequential, and your model must only be trained on data before the forecast date.

Walk-Forward Validation (also called Rolling Origin Evaluation) is the correct technique.

### The Process:

1. *Initial Split*: 
  - Define your initial training set and a small testing window (e.g., 1 month).

2. *Train & Forecast*:
  - *Step 1*: Train the model on the data up to $T_0$.
  - *Step 2*: Forecast $T_1$. Record the error.

3. *Walk Forward*:
  - *Step 3*: Retrain the model by adding the actual data point $T_1$ to the training set.
  - *Step 4*: Forecast $T_2$. Record the error.

4. *Repeat*: The process "walks forward" one time step at a time, retraining the model with the latest actual data before each new forecast.

5. *Analogy*: The Pilot's Logbook. A pilot doesn't just train on old flights and then guess. They fly a short leg, observe the results, update their navigation based on the actual conditions they just flew through, and then forecast the next short leg. This ensures the model is always using the most recent information.

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error, mean_absolute_percentage_error, mean_absolute_error

# --- 1. Recreate Time Series Data ---
np.random.seed(42)
index = pd.date_range(start='2020', periods=100, freq='ME')
data = 5 * np.sin(np.linspace(0, 3*np.pi, 100)) + np.arange(100) * 0.5 + np.random.randn(100) * 2
ts = pd.Series(data, index=index)

# --- 2. Setup Walk-Forward Validation ---
split_point = int(len(ts) * 0.8)
train = ts[:split_point]
test = ts[split_point:] 

history = [x for x in train.values]
predictions = list()

arima_order = (1, 1, 0)
print(f"--- Starting Walk-Forward Validation (Test Size: {len(test)} periods) ---")

# Walk through each step in the test set
for t in range(len(test)):
    # 1. TRAIN: Fit the model using all data in 'history'
    model = ARIMA(history, order=arima_order)
    model_fit = model.fit()

    # 2. FORECAST: Make a single-step forecast
    yhat = model_fit.forecast(steps=1)[0]
    predictions.append(yhat)

    # 3. WALK FORWARD: Get the actual observation and add it to history
    obs = test.values[t]
    history.append(obs)
    
# Convert test and prediction lists to arrays
actual_values = test.values
predicted_values = np.array(predictions)

# --- 4. Evaluate Metrics (Using Scikit-learn Functions) ---
print("\n--- Evaluation Metrics (Walk-Forward) ---")

# RMSE (Still requires mean_squared_error)
rmse = np.sqrt(mean_squared_error(actual_values, predicted_values))
print(f"RMSE (Root Mean Squared Error): {rmse:.3f}")

# MAE (Using sklearn.metrics.mean_absolute_error)
mae = mean_absolute_error(actual_values, predicted_values)
print(f"MAE (Mean Absolute Error): {mae:.3f}")

# MAPE (Using sklearn.metrics.mean_absolute_percentage_error)
# Note: The sklearn function returns MAPE as a fraction (0.0 to 1.0), so we multiply by 100
mape = mean_absolute_percentage_error(actual_values, predicted_values) * 100 
print(f"MAPE (Mean Absolute Percentage Error): {mape:.3f}%")

--- Starting Walk-Forward Validation (Test Size: 20 periods) ---

--- Evaluation Metrics (Walk-Forward) ---
RMSE (Root Mean Squared Error): 2.058
MAE (Mean Absolute Error): 1.747
MAPE (Mean Absolute Percentage Error): 3.659%


## Summary of Day 48


Today, you armed yourself with the metrics to evaluate and compare models (RMSE,MAE,MAPE) and, most importantly, learned the correct validation strategy: Walk-Forward Validation.

## What's Next (Day 49)


Tomorrow, on Day 49, we are putting everything together in a Comprehensive Time Series Forecasting Project! We'll use the diagnostic tools from Day 47, the models from Days 45 and 46, and the metrics and validation from today to build and select the ultimate forecasting solution!