# Assignment 1: Time Series Forecast With Python (Seasonal ARIMA)

**Lecturer**: Vincent Claes<br>
**Authors:** Bryan Honof, Jeffrey Gorissen<br>
**Start Date:** 19/10/2018
    
**Objective:** Visualize and predict the future temperatures via ARIMA

**Description:** In this notebook we train our model

In [14]:
import math
import warnings
import datetime

import pandas            as pd
import itertools         as it
import statsmodels.api   as sm
import matplotlib.pyplot as plt
import numpy             as np

from sklearn.metrics             import mean_absolute_error
from statsmodels.tsa.arima_model import ARIMA

warnings.filterwarnings("ignore") # specify to ignore warning messages

In [2]:
data_csv = pd.read_csv('./data/rolmean_data.csv')
data = pd.DataFrame()

# Convert the creation_date column to datetime64
data['dateTime'] = pd.to_datetime(data_csv['dateTime'])
# Convert the value column to float
data['temperature'] = pd.to_numeric(data_csv['temperature'])

# Set the dateTime column as index
data = data.set_index(['dateTime'])

# Sort the dataFrame just to be sure...
data = data.sort_index()

data = data.dropna()

# Double check the results
data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 892 entries, 2018-11-11 03:00:00 to 2018-11-20 09:45:00
Data columns (total 1 columns):
temperature    892 non-null float64
dtypes: float64(1)
memory usage: 13.9 KB


In [3]:
data.tail(5)

Unnamed: 0_level_0,temperature
dateTime,Unnamed: 1_level_1
2018-11-20 08:45:00,16.54
2018-11-20 09:00:00,16.88
2018-11-20 09:15:00,17.16
2018-11-20 09:30:00,17.41
2018-11-20 09:45:00,17.97


## Search for best parameters


```p``` is the auto-regressive part of the model. It allows us to incorporate the effect of past values into our model. Intuitively, this would be similar to stating that it is likely to be warm tomorrow if it has been warm the past 3 days.<br>
```d``` is the integrated part of the model. This includes terms in the model that incorporate the amount of differencing (i.e. the number of past time points to subtract from the current value) to apply to the time series. Intuitively, this would be similar to stating that it is likely to be same temperature tomorrow if the difference in temperature in the last three days has been very small.<br>
```q``` is the moving average part of the model. This allows us to set the error of our model as a linear combination of the error values observed at previous time points in the past.

In [27]:
# Tune Seasonal ARIMA model
# Define the p, d and q parameters to take any value between 0 and 2
p = d = q = range(0, 10)

# Generate all different combinations of p, q and q triplets
pdq = list(it.product(p, d, q))
#print(pdq)

print('Examples of parameter combinations for Seasonal ARIMA...')
print('ARIMA: {}'.format(pdq[1]))
print('ARIMA: {}'.format(pdq[1]))
print('ARIMA: {}'.format(pdq[2]))
print('ARIMA: {}'.format(pdq[2]))

Examples of parameter combinations for Seasonal ARIMA...
ARIMA: (0, 0, 1)
ARIMA: (0, 0, 1)
ARIMA: (0, 0, 2)
ARIMA: (0, 0, 2)


In [None]:
result_list = []
AIC         = []
MAE         = []
parm_       = []

for param in pdq:
        try:
            mod     = ARIMA(data, order=param)
            results = mod.fit()
            
            predict = results.predict()
            print(predict.info())
            
            AIC.append(results.aic)
            mae = mean_absolute_error(data, predict)
            MAE.append(mae)
            parm_.append(param)
            
            print('ARIMA{} - AIC{} - MAE{}'.format(param, round(results.aic, 2), round(mae, 2)))
            result_list.extend([param, round(results.aic,2)])
            
        except:
            print('error!')
            continue
            
print('Done!')

In [None]:
pos = AIC.index(min(AIC))
print(parm_[pos], min(AIC))

In [None]:
pos = MAE.index(min(MAE))
print(parm_[pos], min(MAE))

In [None]:
# Tune Seasonal ARIMA model
# Define the p, d and q parameters to take any value between 0 and 2
p = d = q = range(0,2)

# Generate all different combinations of p, q and q triplets
pdq = list(it.product(p, d, q))
print(pdq)

# Generate all different combinations of seasonal p, q and q triplets
seasonal_pdq = [(x[0], x[1], x[2], 96) for x in list(it.product(p, d, q))]

print('Examples of parameter combinations for Seasonal ARIMA...')
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[1]))
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[2]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[3]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[4]))

In [None]:
result_list = []
AIC         = []
parm_       = []
parm_s      = []

for param in pdq:
    for param_seasonal in seasonal_pdq:
        try:
            mod = sm.tsa.statespace.SARIMAX(data,
                                            order=param,
                                            seasonal_order=param_seasonal,
                                            enforce_stationarity=False,
                                            enforce_invertibility=False)
            results = mod.fit()
            AIC.append(results.aic)
            parm_.append(param)
            parm_s.append(param_seasonal)
            
            print('ARIMA{}x{} - AIC:{}'.format(param, param_seasonal, round(results.aic,2)))
            result_list.extend([param, param_seasonal, round(results.aic,2)])
        except:
            print('error')
            continue
            
print('Done!')

In [None]:
min(AIC)
pos = AIC.index(min(AIC))
print(parm_[pos], parm_s[pos], min(AIC))

[next notebook](./5_fitting_and_predicting.ipynb)