# Measuring Point Forecast Error.

**In this lecture you will learn:**
    
* How to partition your time series data into training and test sets
* The definition of a point forecast error
* The difference between scale dependent and relative error measures
* How to compute *mean absolute error*
* How to compute *mean absolute percentage error*
* The difference between in-sample and out-of-sample error

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import dates

import seaborn as sns 
import matplotlib.style as style
style.use('ggplot')

## Load data for this lecture

In [None]:
ed_month = pd.read_csv('data/ed_mth_ts.csv', index_col='date', parse_dates=True)
ed_month.index.freq='MS'
arrival_rate = ed_month['arrivals'] / ed_month.index.days_in_month
arrival_rate.shape

## Train-Test Split

Just like in 'standard' machine learning problems it is important to seperate the data used for model training and model testing.  A key difference with time series forecasting is that you must take the temporal ordering of data into account. 

The good news is that pandas makes train test split of data very simple.  There are two options:

1. Split the dataframe using `DataFrame.iloc[start:end]` 
2. Split the dataframe using dates.

**Method 1:**

In [None]:
arrival_rate.shape[0]

In [None]:
train_length = arrival_rate.shape[0] - 12
train, test = arrival_rate.iloc[:train_length], arrival_rate.iloc[train_length:]

In [None]:
train.shape

In [None]:
test.shape

**Method 2:**

In [None]:
SPLIT_DATE = '2016-06-01'
train = arrival_rate.loc[arrival_rate.index < SPLIT_DATE]
test = arrival_rate.loc[arrival_rate.index >= SPLIT_DATE]

In [None]:
train.shape

In [None]:
test.shape

### IMPORTANT - DO NOT LOOK AT THE TEST SET!

We need to **hold back** a proportion of our data.  This is so we can simulate real forecasting conditions and check a models accuracy on **unseen** data.  We don't want to know what it looks like as that will introduce bias into the forecasting process and mean we overfit our model to the data we hold.

**Remember - there is no such thing as real time data from the future!**

In [None]:
ax = train.plot(figsize=(12,4))
ax.set_ylabel('ed arrivals')
ax.legend(['training data'])

# Point Forecasts

The numbers we produced using the baseline methods in the last lecture are called **point forecasts**.  They are actually the mean value of a **forecast distribution**.  As a reminder:


In [None]:
from forecast.baseline import SNaive

In [None]:
snf = SNaive(period=12)
snf.fit(train)
preds = snf.predict(horizon=12)
preds

The values in `preds` are point forecasts.  For the time being we will focus on point forecasts.  We will revisit forecast distributions in a future lecture.

## Point Forecast Errors

The point forecast is our best estimate of future observations of the time series.  We use our test set (some times called a holdout set) to simulate real world forecasting.  As our forecasting method has not seen this data before we can measure the difference between the forecast and the ground-truth observed value.  

**Problem: Errors can be both positive and negative so just taking the average will mask the true size of the errors.**  

* There are a large number of forecast error metrics available.  Each has its own pro's and con's.  Here we review some of the most used in practice.

### MAE and MSE

* A simple way to remedy the problem with the average error is to use **Mean Absolute Error (MAE)** or **Mean Squared Error (MSE)**.  
* There's a bit of a debate about if you should take the median value or the mean, but here we will just use the mean.

In [None]:
from sklearn.metrics import mean_squared_error, mean_absolute_error

In [None]:
mean_squared_error(y_true=test, y_pred=preds)

In [None]:
mean_absolute_error(y_true=test, y_pred=preds)

### RMSE

* Mean absolute error is conceptually easier to understand than MSE. 
    * The dimensions of MSE are airpassengers squared!  Which is odd!  
* One way to remedy this units issue is the **Root Mean Squared Error (RMSE)**

RMSE = $\sqrt{mean(e_t^2)}$ where $e_t$ is the error in predicting $y_t$.


In [None]:
np.sqrt(mean_squared_error(y_true=test, y_pred=preds))

### MAPE

RMSE and MAE are called 'scale dependent' measures as the units and magnitude are specific to the problem and context.  An alternative approach is to use a scale invariant measure such as the **mean absolute percentage error (MAPE)**

The percentage error is given by $p_t = \frac{100e_t}{y_t}$ where $e_t$ is the error in predicting $y_t$.  

Therefore, MAPE = $mean(|p_t|)$. 

In [None]:
from forecast.metrics import mean_absolute_percentage_error

In [None]:
mean_absolute_percentage_error(y_true=test, y_pred=preds)

A limitation of MAPE is that it is inflated when the denominator is small relative to the absolute forecast error (such in the case of outliers or extreme unexpected events). It is also penalises negative errors more than positive errors.  A consequence of this property is that MAPE can lead to selecting a model that tends to under forecast.  The two following examples illustrate the issue. $$APE_{1} = \left| \frac{y_t - \hat{y_t}}{y_t} \right|= \left| \frac{150 - 100}{150} \right| = \frac{50}{150} = 33.33\%$$  

$$APE_{2} = \left| \frac{100 - 150}{100} \right| = \frac{50}{100} = 50\%$$

## A note on the difference between in-sample and out-of-sample error.

### In-sample errors

These errors are often called the models **residuals** they represent the difference between the training data (the data the model has seen) and the models fitted values.  For example, let's a look at the residuals of the SNaive model fitted to the ED arrival data and then calculate the in-sample MAE.

In [None]:
snf._fitted.tail(10)

You can access the predictions via the `fittedvalues` property.  Notice that the first 12 observations to not have a prediction.  This is because of the way SNaive works i.e. carrying forward the previous 12 observations.  You cannot carry forward observations that do not exist!

In [None]:
snf.fittedvalues

And access the residuals via the `resid` property

In [None]:
snf.resid

In [None]:
mean_absolute_error(y_true=train[12:], y_pred=snf.fittedvalues[12:])

### Out of sample errors

* Out of sample errors are based on predictions of observations the model has not seen (in the test set).  
* These are the point forecast errors we have already calculated.  
* You should expect the out-of-sample errors to be larger than in-sample errors.

In [None]:
mean_absolute_error(y_true=test, y_pred=preds)

## Comparing forecasting methods using a test (holdout) set.

Let's compare the MAE of the methods on the ED dataset.

In [None]:
#convenience function for creating all objects quickly
from forecast.baseline import baseline_estimators

In [None]:
models = baseline_estimators(seasonal_periods=12)

In [None]:
models

In [None]:
HORIZON = len(test)

print(f'{HORIZON}-Step MAE\n----------')
for model_name, model in models.items():
    model.fit(train)
    preds = model.predict(HORIZON)
    mae = mean_absolute_error(y_true=test, y_pred=preds)
    print(f'{model_name}: {mae:.1f}')