# SWAST Forecasting Tool

An ensemble of Regression with ARIMA Errors and Facebook Prophet

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


import warnings
warnings.filterwarnings('ignore')

check statsmodels version

In [2]:
import statsmodels as sm
print(sm.__version__)

import os

os.getcwd()

0.11.1


'/home/tom/Documents/code/swast-forecast-tool'

In [3]:
from swast_forecast.utility import (pre_process_daily_data, 
                                    default_ensemble,
                                    forecast, 
                                    multi_region_forecast)

Importing plotly failed. Interactive plots will not work.


## Constants

In [4]:
PATH = '../ambo_data/Daily_Responses_5_Years_2019_full.csv'

## Read in the data

In [5]:
clean = pre_process_daily_data(path=PATH, 
                               observation_col='Actual_Value', 
                               index_col='Actual_dt')
clean.head()

ora,BNSSG,Cornwall,Devon,Dorset,Gloucestershire,OOA,Somerset,Trust,Wiltshire
actual_dt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2013-12-30,415.0,220.0,502.0,336.0,129.0,,183.0,2042.0,255.0
2013-12-31,420.0,236.0,468.0,302.0,128.0,,180.0,1996.0,260.0
2014-01-01,549.0,341.0,566.0,392.0,157.0,,213.0,2570.0,351.0
2014-01-02,450.0,218.0,499.0,301.0,115.0,,167.0,2013.0,258.0
2014-01-03,419.0,229.0,503.0,304.0,135.0,,195.0,2056.0,269.0


## Creating and fitting an Ensemble model to a region

The easy way to create an ensemble model is to call the `default_ensemble()` function form the utility module.  This returns the best known forecasting model.

In [6]:
model = default_ensemble()
model

ProphetARIMAEnsemble(order=(1, 1, 3), seasonal_order=(1, 0, 1, 7), prophet_default_alpha=0.05)

The code above informs us that the ensemble includes a Regression model with ARIMA errors with parameters (1, 1, 3)(1, 0, 1, 7).  By default a Prophet model will create a 80\% prediction interval 100(1-alpha)

To fit we call the `.fit()` method and pass in a `pd.Series` (or `pd.DataFrame`) that contains the historical observations.  By default you do not need to pass in holidays.  The ensemble will model new years day automatically (via Prophet's holidays function and as a dummy variable in the Regression with ARIMA errors).

In [7]:
#example - fitting Wiltshire - this will take a few seconds.
model.fit(clean['Wiltshire'])

## Forecasting an individual region.

Use the `.predict()` method to make a forecast.  The method takes 3 parameters:

* **horizon**: int - the forecast horizon e.g. 84 days
* **alpha**: float, optional (default=0.05) - a value between 0 and 1 and used to construct a 100(1 - alpha) prediction interval. E.g. alpha=0.2 returns a 80\% interval.  
* **return_all_models**: bool, optional (default=False). If sets to true returns the ensemble prediction AND the Prophet and Regression predictions.

In [10]:
#example 1: predict 5 days ahead - remember we have fitted Wiltshire training data.
forecast_frame = model.predict(horizon=7)
print(forecast_frame.to_markdown())

| ds                  |    yhat |   yhat_lower_95 |   yhat_upper95 |
|:--------------------|--------:|----------------:|---------------:|
| 2020-01-01 00:00:00 | 389.034 |         353.899 |        424.194 |
| 2020-01-02 00:00:00 | 328.488 |         291.634 |        365.817 |
| 2020-01-03 00:00:00 | 332.563 |         295.256 |        369.517 |
| 2020-01-04 00:00:00 | 348.734 |         311.209 |        384.396 |
| 2020-01-05 00:00:00 | 348.858 |         310.358 |        386.455 |
| 2020-01-06 00:00:00 | 335.639 |         297.341 |        372.774 |
| 2020-01-07 00:00:00 | 326.683 |         290.259 |        363.039 |


The method returns a `pd.DataFrame` containing mean forecast (yhat) and an upper and lower prediction interval.  The code below demonstrates how to return predictions from both the ARIMA and Prophet models.  We will also return a different prediction interval.

In [None]:
#example 2: predict 5 days ahead, return 95% PI and individual model preds
forecast_frame = model.predict(horizon=5, alpha=0.05, return_all_models=True)
forecast_frame

## An 'all in one' forecast function

As an alternative to the above the `utility` module contains a convenience function called `forecast`.  This is an all-in-one function.  Just pass in your training data (for a single time series) and horizon.

In [None]:
forecast(clean['Wiltshire'], 
         horizon=6, 
         alpha=0.2,
         return_all_models=False)

## Forecasting multiple regions in one go.

If there are multiple regions to forecast put all of the training data into the same frame (see `clean`) and pass this to the `multi_region_forecast()` function from the `utility` module.

This is an efficient function as it runs the forecasts in parrallel across your CPU cores.  E.g. if you have a 4 cores then 4 regions will be forecast simultaneously.  This will reduce model run time (assuming you have more than one Core).

In [None]:
#note depending on your machine this will take 20 seconds to run.
forecasts = multi_region_forecast(y_train=clean, horizon=8)

In [None]:
#the function returns a list of pd.DataFrame's
type(forecasts)

In [None]:
#results for BNSSG
forecasts[0]

In [None]:
#results for Cornwall is at index 1 etc.
forecasts[1]