# SWAST Forecasting Tool

An ensemble of Regression with ARIMA Errors and Facebook Prophet

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


import warnings
warnings.filterwarnings('ignore')

check statsmodels version

In [5]:
import statsmodels as sm
print(sm.__version__)

import os

os.getcwd()

0.14.2


'/home/alison/Dropbox/code/swast-forecast-tool/swast-forecast-tool'

In [6]:
from swast_forecast.utility import (pre_process_daily_data, 
                                    default_ensemble,
                                    forecast, 
                                    multi_region_forecast)

## Constants

In [7]:
PATH = 'data/Daily_Responses_5_Years_2019_full.csv'

## Read in the data

In [8]:
clean = pre_process_daily_data(path=PATH, 
                               observation_col='Actual_Value', 
                               index_col='Actual_dt')
clean.head()

FileNotFoundError: [Errno 2] No such file or directory: '../code/ambo_data/Daily_Responses_5_Years_2019_full.csv'

## Creating and fitting an Ensemble model to a region

The easy way to create an ensemble model is to call the `default_ensemble()` function form the utility module.  This returns the best known forecasting model.

In [9]:
model = default_ensemble()
model

ProphetARIMAEnsemble(order=(1, 1, 3), seasonal_order=(1, 0, 1, 7), prophet_default_alpha={self.alpha})

The code above informs us that the ensemble includes a Regression model with ARIMA errors with parameters (1, 1, 3)(1, 0, 1, 7).  By default a Prophet model will create a 80\% prediction interval 100(1-alpha)

To fit we call the `.fit()` method and pass in a `pd.Series` (or `pd.DataFrame`) that contains the historical observations.  By default you do not need to pass in holidays.  The ensemble will model new years day automatically (via Prophet's holidays function and as a dummy variable in the Regression with ARIMA errors).

In [23]:
#example - fitting Wiltshire - this will take a few seconds.
model.fit(clean['Wiltshire'])

## Forecasting an individual region.

Use the `.predict()` method to make a forecast.  The method takes 3 parameters:

* **horizon**: int - the forecast horizon e.g. 84 days
* **alpha**: float, optional (default=0.05) - a value between 0 and 1 and used to construct a 100(1 - alpha) prediction interval. E.g. alpha=0.2 returns a 80\% interval.  
* **return_all_models**: bool, optional (default=False). If sets to true returns the ensemble prediction AND the Prophet and Regression predictions.

In [8]:
#example 1: predict 7 days ahead - remember we have fitted Wiltshire training data.
forecast_frame = model.predict(horizon=7)
forecast_frame

Unnamed: 0_level_0,yhat,yhat_lower_95,yhat_upper95
ds,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-01,389.033671,351.919364,425.335036
2020-01-02,328.488021,292.561903,365.33434
2020-01-03,332.563272,294.420597,369.72371
2020-01-04,348.733984,311.278243,385.298249
2020-01-05,348.858335,311.9749,384.266166
2020-01-06,335.638559,298.904823,373.145688
2020-01-07,326.682607,289.616191,364.330569


The method returns a `pd.DataFrame` containing mean forecast (yhat) and an upper and lower prediction interval.  The code below demonstrates how to return predictions from both the ARIMA and Prophet models.  We will also return a different prediction interval.

In [9]:
#example 2: predict 7 days ahead, return 80% PI and individual model preds
forecast_frame = model.predict(horizon=7, alpha=0.2, return_all_models=True)
forecast_frame

Unnamed: 0_level_0,yhat,yhat_lower_80,yhat_upper80,arima_mean,arima_lower_80,arima_upper_80,prophet_mean,prophet_lower_80,prophet_upper_80
ds,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2020-01-01,389.033671,365.903063,412.236253,381.652918,357.811543,405.494293,396.414423,373.994583,418.978212
2020-01-02,328.488021,304.336866,352.826624,328.893489,304.534189,353.252789,328.082554,304.139543,352.400458
2020-01-03,332.563272,307.712967,355.896359,333.117808,308.470773,357.764844,332.008735,306.955161,354.027874
2020-01-04,348.733984,323.96889,372.477904,347.836168,322.927592,372.744744,349.631799,325.010187,372.211064
2020-01-05,348.858335,324.436039,373.209843,348.568032,323.420965,373.715099,349.148638,325.451113,372.704588
2020-01-06,335.638559,310.761936,360.863522,337.437881,312.07267,362.803092,333.839236,309.451202,358.923952
2020-01-07,326.682607,302.811351,350.021557,330.057359,304.492022,355.622696,323.307855,301.13068,344.420418


## An 'all in one' forecast function

As an alternative to the above the `utility` module contains a convenience function called `forecast`.  This is an all-in-one function.  Just pass in your training data (for a single time series) and horizon.

In [10]:
forecast(clean['Wiltshire'], 
         horizon=6, 
         alpha=0.2,
         return_all_models=False)

Unnamed: 0_level_0,yhat,yhat_lower_80,yhat_upper80
ds,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-01,389.033671,365.330475,413.303734
2020-01-02,328.488021,305.111123,353.09287
2020-01-03,332.563272,308.295871,357.659112
2020-01-04,348.733984,324.316497,373.217938
2020-01-05,348.858335,324.066967,373.264873
2020-01-06,335.638559,311.172885,360.231234


## Forecasting multiple regions in one go.

If there are multiple regions to forecast put all of the training data into the same frame (see `clean`) and pass this to the `multi_region_forecast()` function from the `utility` module.

This is an efficient function as it runs the forecasts in parrallel across your CPU cores.  E.g. if you have a 4 cores then 4 regions will be forecast simultaneously.  This will reduce model run time (assuming you have more than one Core).

In [11]:
#note depending on your machine this will take 20 seconds to run.
forecasts = multi_region_forecast(y_train=clean, horizon=7)

In [12]:
#the function returns a list of pd.DataFrame's
type(forecasts)

list

In [13]:
#results for BNSSG
forecasts[0]

Unnamed: 0_level_0,yhat,yhat_lower_95,yhat_upper95
ds,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-01,667.046404,619.578479,713.810816
2020-01-02,545.976607,498.663481,593.498654
2020-01-03,553.843168,504.492593,601.66142
2020-01-04,588.031241,539.018238,635.780397
2020-01-05,579.76199,528.787633,627.440772
2020-01-06,554.88195,505.055762,603.628492
2020-01-07,537.738373,487.071878,585.8506


In [14]:
#results for Cornwall is at index 1 etc.
forecasts[1]

Unnamed: 0_level_0,yhat,yhat_lower_95,yhat_upper95
ds,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-01,317.561603,285.27376,349.390984
2020-01-02,252.695551,220.00244,284.571467
2020-01-03,255.66932,222.614295,288.650042
2020-01-04,271.179998,238.173673,303.84805
2020-01-05,274.010956,241.990156,306.655835
2020-01-06,261.538837,228.260275,294.836532
2020-01-07,250.725712,218.823505,283.241629
