# __AR(I)MA__

# Introduction to ARIMA Models
We'll investigate a variety of different forecasting models in upcoming sections, all of which are based on ARIMA.

<b>ARIMA</b>, or <i>Autoregressive Integrated Moving Average</i> is a combination of 3 models:
* <b>AR(p)</b> Autoregression - a regression model that utilizes the dependent relationship between a current observation and observations over a previous period
* <b>I(d)</b> Integration - uses differencing of observations (subtracting an observation from an observation at the previous time step) in order to make the time series stationary
* <b>MA(q)</b> Moving Average - a model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations.

#### __Autoregression _AR(p)___

$y_{t} = c + \phi_{1}y_{t-1} + \phi_{2}y_{t-2} + \phi_{p}y_{t-p} + \varepsilon_{t}$ <br/>
<br/>
$c$: constant

#### __Differencing _I(d)___

$y_{t}´ = y_t - y_{t-1}$ <br/>
<br/>
$y_{t}´´ = (y_t - y_{t-1}) - (y_{t-1} - y_{t-2})$

#### __Moving average model _MA(q)___

$y_{t} = c + \varepsilon_{t} + \theta_{1}\varepsilon_{t-1} + \theta_{2}\varepsilon_{t-2} + \theta_{q}\varepsilon_{t-q}$ <br/>
<br/>
$c$: mean <br/>
$\epsilon$: error term

__Example__

$\Delta lemonade_t = \epsilon_t - 0.5\epsilon_{t-1}$

where $\Delta lemonade_t$ is the change of lemonade consumption and where $\epsilon_t$ represents a difference in temperature $\Delta T$, both at time $t$. Now assume a temperature increase at time $t$ with $\Delta T = positive$ while the temperature remains constant with $\Delta T = 0$ at time $t+1$. Then we get:

$\Delta lemonade_{t+1} = - 0.5\epsilon_{t}$

Hence, we have a decrease in lemonade sales at time $t+1$.
One way to interpret that example is that when the temperature increased at time $t$, people purchased more lemonade than they could consume at time $t$. Hence, they consumed some of that lemonade at time $t+1$ and therefore purchased less.

#### __ARIMA(p,d,q)__

ARIMA combines the three above components to make predictions.

$y´_{t} = c + \phi_{1}y´_{t-1} + ... + \phi_{p}y´_{t-p} + \theta_{1}\varepsilon_{t-1} + ... + \theta_{q}\varepsilon_{t-q} + \varepsilon_{t}$

In [None]:
import pandas as pd
import numpy as np

from statsmodels.tsa.arima_model import ARMA,ARMAResults,ARIMA,ARIMAResults
from statsmodels.graphics.tsaplots import plot_acf,plot_pacf
from pmdarima import auto_arima # on order to determine ARIMA orders

import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

In [None]:
df1 = pd.read_csv('../data/DailyTotalFemaleBirths.csv',index_col='Date',parse_dates=True)
df1.index.freq = 'D'
df1 = df1[:120]  # we only want the first four months

df2 = pd.read_csv('../data/TradeInventories.csv',index_col='Date',parse_dates=True)
df2.index.freq='MS'

In [None]:
print(df1.info())
df1.head()

In [None]:
print(df2.info())
df2.head()

## __Augmented Dickey-Fuller Test__
The Augmented Dickey-Fuller Test cheks for stationarity of a time series. Note that ARIMA can only be performed on stationary datasets. It might be possible that more than a one-time differencing is necessary.

In [None]:
#this is a convenient fucntion to conduct the Augmented Dickey-Fuller Test

from statsmodels.tsa.stattools import adfuller

def adf_test(series,title=''):
    """
    Pass in a time series and an optional title, returns an ADF report
    """
    print(f'Augmented Dickey-Fuller Test: {title}')
    result = adfuller(series.dropna(),autolag='AIC') # .dropna() since differencing produces NaNs
    
    labels = ['ADF test statistic','p-value','# lags used','# observations']
    out = pd.Series(result[0:4],index=labels)

    for key,val in result[4].items():
        out[f'critical value ({key})']=val
        
    print(out.to_string())          # .to_string() removes the line "dtype: float64"
    
    if result[1] <= 0.05:
        print("Strong evidence against the null hypothesis")
        print("Reject the null hypothesis")
        print("Data has no unit root and is stationary")
    else:
        print("Weak evidence against the null hypothesis")
        print("Fail to reject the null hypothesis")
        print("Data has a unit root and is non-stationary")

## __ARMA__

ARMA is just a special case of ARIMA, namely when forecasting stationary data not requiring any differencing.

In [None]:
plt.style.use('ggplot')

df1['Births'].plot(figsize=(10,5), color='blue');

### __Dickey-Fuller Test to check for stationarity__

$H_0$: Time series IS NOT stationary (if p-value > 0.05)

$H_1$: Time series IS stationary (if p-value <= 0.05)

In [None]:
adf_test(df1['Births'])

### __Determine the (p,q) ARMA orders using pmdarima.auto_arima__
This tool provides best recommendations for p and q analogous to Grid Search in scikit-learn.

__Usage of Akaike Information Criterion (AIC)__

AIC evaluates a collection of models and estimates the quality of each model __relative__ to the others. __Penalties__ are are provided for the number of parameters used in an effort to thwart overfitting.

A good model is the one that has minimum AIC among all the other models. 

In [None]:
auto_arima(df1['Births'],seasonal=False).summary()

### __Test / train split__
Apart from providing data, there is no room to tweak time series forecasts with any feature engineering. Hence, the risk of overfitting to the existing dataset is little, which is ẃhy we do not split the dataset into train / validation / test here, but only into train and test data.

Rule of thumb: set the length of your test set equal to your intended forecast size. Here: 1 month

In [None]:
train = df1.iloc[:90]
test = df1.iloc[90:]

### __Fit ARMA(p,q) model__
Also check out help(ARMA) to learn what incoming arguments are available/expected, and what's being returned.

In [None]:
model = ARMA(train['Births'],order=(2,2))
results = model.fit()
results.summary()

This suggests we should use an ARMA(2,2) to fit our data.

### __Predicted values for single month__

In [None]:
start=len(train)
end=len(train)+len(test)-1
predictions = results.predict(start=start, end=end).rename('ARMA(2,2) Predictions')

### __Plot predictions vs actuals__

In [None]:
title = 'Daily Total Female Births'
ylabel='Births'
xlabel='' # we don't really need a label here

ax = test['Births'].plot(legend=True,figsize=(12,6),title=title)
predictions.plot(legend=True)
ax.autoscale(axis='x',tight=True)
ax.set(xlabel=xlabel, ylabel=ylabel);

Since our starting dataset exhibited no trend or seasonal component, this prediction makes sense.

## __ARIMA__

Now we will use an non-stationary dataset, hence it requires differencing (_I_).

In [None]:
import matplotlib.ticker as ticker 
formatter = ticker.StrMethodFormatter('{x:,.0f}') #adding ticks to y-values

title = 'Real Manufacturing and Trade Inventories'
ylabel='Chained 2012 Dollars'
xlabel='' # we don't really need a label here

ax = df2['Inventories'].plot(figsize=(12,5),title=title, color='blue')
ax.autoscale(axis='x',tight=True)
ax.set(xlabel=xlabel, ylabel=ylabel)
ax.yaxis.set_major_formatter(formatter);

### __Run seasonal_decompose to check for potential seasonality__

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose

result = seasonal_decompose(df2['Inventories'], model='additive')  # model='add' also works
result.plot();

The decomposition detects a seasonal component. However, its magnitude suggests only a minor effect. We hence treat our dataset as a non-seasonal one.

In [None]:
auto_arima(df2['Inventories'],seasonal=False).summary()

This suggests we should use an ARIMA(1,1,1) to fit our data.

### __Let's check all this manually (demonstration purpose only)__

In [None]:
#let's difference once as suggested by auto_arima and see what the adf_test tells us
from statsmodels.tsa.statespace.tools import diff
df2['d1'] = diff(df2['Inventories'],k_diff=1)

adf_test(df2['d1'],'Real Manufacturing and Trade Inventories')

This suggests that we reached stationarity after the first difference as expected from auto_arima.

### __Run the ACF and PACF plots__
A pacf plot can reveal recommended AR(p) orders, and an acf plot can do the same for MA(q) orders.

In [None]:
title = 'Autocorrelation: Real Manufacturing and Trade Inventories'
lags = 40
plot_acf(df2['Inventories'],title=title,lags=lags);

In [None]:
title = 'Partial Autocorrelation: Real Manufacturing and Trade Inventories'
lags = 40
plot_pacf(df2['Inventories'],title=title,lags=lags);

This tells us that the AR component should be more important than MA. From the <a href='https://people.duke.edu/~rnau/411arim3.htm'>Duke University Statistical Forecasting site</a>:<br>
> <em>If the PACF displays a sharp cutoff while the ACF decays more slowly (i.e., has significant spikes at higher lags), we    say that the stationarized series displays an "AR signature," meaning that the autocorrelation pattern can be explained more    easily by adding AR terms than by adding MA terms.</em><br>

Let's take a look at <tt>pmdarima.auto_arima</tt> done stepwise to see if having $p$ and $q$ terms the same still makes sense:

In [None]:
stepwise_fit = auto_arima(df2['Inventories'], start_p=0, start_q=0,
                          max_p=2, max_q=2, m=12,
                          seasonal=False,
                          d=None, trace=True,
                          error_action='ignore',   # we don't want to know if an order does not work
                          suppress_warnings=True,  # we don't want convergence warnings
                          stepwise=True)           # set to stepwise

stepwise_fit.summary()

Our manual checkup on the p,d,q parameters confirmed our outcome of the initial auto_arima.

### __train / test split__

In [None]:
len(df2)

In [None]:
# Set one year for testing
train = df2.iloc[:252]
test = df2.iloc[252:]

### __Fit on ARIMA(1,1,1) model__

In [None]:
model = ARIMA(train['Inventories'],order=(1,1,1))
results = model.fit()
results.summary()

In [None]:
# Obtain predicted values
start=len(train)
end=len(train)+len(test)-1
predictions = results.predict(start=start, end=end, dynamic=False, typ='levels').rename('ARIMA(1,1,1) Predictions')

Passing dynamic=False means that forecasts at each point are generated using the full history up to that point (all lagged values).

Passing typ='levels' predicts the levels of the original endogenous variables. If we'd used the default typ='linear' we would have seen linear predictions in terms of the differenced endogenous variables.

More information on these arguments: https://www.statsmodels.org/stable/generated/statsmodels.tsa.arima_model.ARIMAResults.predict.html

In [None]:
#compare predictions to expected values
for i in range(len(predictions)):
    print(f"predicted={round(predictions[i], 3)}, expected={test['Inventories'][i]}")

In [None]:
#plot predictions vs actuals
title = 'Real Manufacturing and Trade Inventories'
ylabel='Chained 2012 Dollars'
#xlabel redundant

ax = test['Inventories'].plot(legend=True,figsize=(12,6),title=title)
predictions.plot(legend=True)
ax.autoscale(axis='x',tight=True)
ax.set(xlabel=xlabel, ylabel=ylabel)
ax.yaxis.set_major_formatter(formatter);

### __Model evaluation__

In [None]:
from sklearn.metrics import mean_squared_error

error = mean_squared_error(test['Inventories'], predictions)
print(f'ARIMA(1,1,1) MSE Error: {error:11.10}')

In [None]:
from statsmodels.tools.eval_measures import rmse

error = rmse(test['Inventories'], predictions)
print(f'ARIMA(1,1,1) RMSE Error: {error:11.10}')

In [None]:
relative_error = error / predictions.mean()
relative_error

In [None]:
mape = (sum(abs((test['Inventories'] - predictions)\
                /test['Inventories'])))*(100/len(test['Inventories']))
mape

__Remember MAPE represents a percentage value!__

### __Apply model to complete dataset and forecast into future!__

In [None]:
model = ARIMA(df2['Inventories'],order=(1,1,1))
results = model.fit()
#forecast of 11 time steps
forecast = results.predict(len(df2),len(df2)+11,typ='levels').rename('ARIMA(1,1,1) Forecast')

In [None]:
# Plot predictions against known values
title = 'Real Manufacturing and Trade Inventories'
ylabel='Chained 2012 Dollars'
xlabel='' # we don't really need a label here

ax = df2['Inventories'].plot(legend=True,figsize=(12,6),title=title)
forecast.plot(legend=True)
ax.autoscale(axis='x',tight=True)
ax.set(xlabel=xlabel, ylabel=ylabel)
ax.yaxis.set_major_formatter(formatter);