<img src="https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png" style="float: left; margin: 15px">

## ARIMA and Timeseries Modeling

Week 10 | Lesson 3.1

---

Throughout this lesson, we are going to build up to the **ARIMA** time-series model. 

This models combines the ideas of differencing and two models we will see below: **AR** or autoregressive models and **MA** or moving average models.



In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime

sns.set_style('whitegrid')

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

In [4]:
data = pd.read_csv('/Users/kiefer/github-repos/DSI-SF-2/datasets/unemployment_timeseries/seasonally-adjusted-quarterly-us.csv')

---

## Autoregressive (AR) models

Autoregressive (AR) models use data from previous time-points to predict the next time-point. These are essentially regression models where the predictors are previous timepoints of the outcome.

Typically, AR models are denoted `AR(p)`, where _p_ indicates the number of previous time points to incorporate. `AR(1)` is the most common.

In an autoregressive model we learn regression coefficients on the features that are the previous _p_ values.

### $$y_i = c + \beta_1 * y_{i-1} + \beta_2 * y_{i-2}\ +\ ...\ +\ \beta_p * y_{i-p}\ +\ \epsilon$$



As with other linear models, interpretation becomes more complex as we add more factors; as we go from AR(1) to AR(2) we begin to have significant _multi-collinearity_.

Recall, _autocorrelation_ is the correlation of a value with itself. A timeseries with high autocorrelation implies that the data is highly dependent on previous values and an autoregressive model would perform well.

Autoregressive models are useful for learning falls or rises in our series. Typically, this model type is useful for small-scale trends, such as an increase in demand that will gradually increase the series.

---

## Moving Average (MA) models

**Moving average models** take previous _error terms_ as inputs. They predict the next value based on the overall average and how incorrect our previous predictions were. This is useful for modeling a sudden occurrence - like something going out of stock affecting sales or a sudden rise in popularity.

As in autoregressive models, we have an order term, _q_, and we refer to our model as `MA(q)`.  This moving average model is dependent on the last _q_ errors. If we have a time series of sales per week, $y_i$, we can regress each $y_i$ on the last _q_ error terms.

### $$y_i = mean + \beta_1 * \epsilon_i + ... \beta_q * \epsilon_q$$

MA models require a more complex fitting procedure where we iteratively fit a model, compute the errors, and then refit, over and over again.

MA includes the mean of the time series. The behavior of the model is therefore characterized by random jumps around the mean value.

In an `MA(1)` model, there is one coefficient on the error of our previous prediction impacting our estimate for the next value in the timeseries.

---

## ARMA models

The final stepping stone before **ARIMA** models are **ARMA** models.

_ARMA_ models combine the autoregressive models and moving average models. We combine both, parameterizing the behavior of the model with `p` and `q` terms corresponding to the `AR(p)` model and `MA(q)` model.

Autoregressive models slowly incorporate changes in preferences, tastes, and patterns. Moving average models base their prediction not on the prior value but the prior error, allowing us to correct sudden changes based on random events - supply, popularity spikes, etc.


---

## Autoregressive Ingegrated Moving Average (ARIMA) models

ARIMA is just like the `ARMA(p, q)` model, but instead of predicting the value of the series it predicts the _differenced_ series or changes in the series. The order of differencing is set by an _d_ term as in `ARIMA(p, d, q)`, or alternatively you can just fit an `ARMA(p, q)` model on a differenced timeseries.

Recall the pandas `diff` function. This computes the difference between two consecutive values. In an ARIMA model, we attempt to predict this difference instead of the actual values.

### $$y_t - y_{(t-1)} = ARMA(p, q)$$

This handles the stationarity assumption: instead of detrending or differencing manually, the model does this via the differencing term.

For a higher value of _d_, for example, d=2, an `ARIMA(p, 2, q)` model is equivalent to:

    diff(diff(y)) = ARMA(p, q)

The order of differencing is the same as applying the `diff` function _d_ times.

Compared to an ARMA model, ARIMA models do not rely on the underlying series being stationary. The differencing operation can _convert_ the series to one that is stationary.

Since ARIMA models automatically include differencing, we can use this on a broader set of data without assumptions of a constant mean.