# Lab 2: ARMA Processes and Stationarity

For this lab session, we'll be focusing on ARMA($p$, $q$) models and the material covered in the first two weeks of class. We will also discuss solutions to **Problem Set 1**. 

First, we import all the necessary Python packages/libraries. 

In [1]:
### imports
from __future__ import print_function, division

import pandas as pd
import numpy as np
import scipy.optimize
import scipy.signal
import scipy.stats

import statsmodels.api as sm
from statsmodels import tsa
from datetime import date, datetime, timedelta
import copy

### Check pandas version and import correct web reader
from distutils.version import StrictVersion
if StrictVersion(pd.__version__) >= StrictVersion('0.19'):
    from pandas_datareader import data, wb
else:
    from pandas.io import data, wb

from cycler import cycler
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import matplotlib as mpl
import matplotlib.dates

### Plotting and display options
np.set_printoptions(precision=3)
pd.set_option('precision', 2)
pd.set_option('display.float_format', lambda x: '%.2f' % x)

plt.style.use('ggplot')

mpl.rcParams['lines.linewidth'] = 1.5
mpl.rcParams['lines.color'] = 'blue'
mpl.rcParams['axes.prop_cycle'] = cycler('color', ['#30a2da', '#e5ae38', '#fc4f30', '#6d904f', '#8b8b8b'])
mpl.rcParams['legend.fancybox'] = True
mpl.rcParams['legend.fontsize'] = 14
mpl.rcParams['axes.facecolor'] = '#f0f0f0'
mpl.rcParams['axes.labelsize'] = 15
mpl.rcParams['axes.axisbelow'] = True
mpl.rcParams['axes.linewidth'] = 1.2
mpl.rcParams['axes.labelpad'] = 0.0
mpl.rcParams['axes.xmargin'] = 0.05  # x margin.  See `axes.Axes.margins`
mpl.rcParams['axes.ymargin'] = 0.05  # y margin See `axes.Axes.margins`
mpl.rcParams['xtick.labelsize'] = 14
mpl.rcParams['ytick.labelsize'] = 14
mpl.rcParams['figure.subplot.left'] = 0.08
mpl.rcParams['figure.subplot.right'] = 0.95
mpl.rcParams['figure.subplot.bottom'] = 0.07

### figure configuration
fsize = (10,7.5) # figure size
tsize = 18 # title font size
lsize = 16 # legend font size
csize = 14 # comment font size
grid = True # grid

### this allows plots to appear directly in the notebook
%matplotlib inline

## ARMA($p$,$q$) models

**Recall:** a stochastic process process $\{X_t\}$ is an ARMA($p$,$q$) process (autoregressive moving average process of order $p$ and $q$, respectively) if we have
$$
X_t = \varepsilon_{t} + \sum_{i=1}^{p} \phi_i X_{t-i} + \sum_{i=0}^{q} \theta_i \varepsilon_{t-i}
$$
where $\varepsilon_{t}$ is white noise, i.e. $E[\varepsilon_{t}]=0$ and $Var[\varepsilon_{t}]=\sigma^2$

If the coefficients $\phi_i \equiv 0$, then the ARMA($p$,$q$) process collapses to an MA($q$) process. Similarly, if $\theta_i\equiv 0$ then the ARMA($p$,$q$) process collapses to an AR($p$) process.

A slightly more general formulation also used is given by the expression
$$
\sum_{i=0}^{p} \phi_i X_{t-i} = \sum_{i=0}^{q} \theta_i \varepsilon_{t-i}
$$
Note that this formulation introduces $\phi_0,\theta_0$ terms which were implicitly defined to be 1 above. Further, the sign of the AR coefficients is flipped from the first formulation. This is frequently how statistical computing packages treat ARMA($p$,$q$) processes.


Let's simulate some ARMA($p$,$q$) processes and examine their autocorrelation functions (i.e. ACFs).

### Exercise 1: Simulate ARMA($p$,$q$) models

**Problem:** Write a function that simulates an ARMA($p$,$q$) process given parameter inputs.

**Note:** `statsmodels` has a function that generates ARMA($p$,$q$) processes, but for now, write your own function.

In [2]:
### For simulating ARMA processes, specifying the roots can be more informative than coefficients.
def _roots2coef(roots):
    """Given roots, get the coefficients"""
    ### sympy: package for symbolic computation
    from sympy import symbols, expand
    N_roots = len(roots)
    L = symbols("L")
    expr = expand(1)
    for r in roots:
        expr*= (r - L)
    expr = expand(expr)
    coef_list = [expr.coeff(L,n) for n in range(N_roots+1)]
    ### convert to numpy floats
    coefs = np.array(coef_list ).astype(float)
    ### normalize zero lag to 1
    coefs /= coefs[0]
    return coefs

def arma_from_roots(ar_roots=[], ma_roots=[]):
    """Create an ARMA model class from roots"""
    ar_coef = _roots2coef(ar_roots)
    ma_coef = _roots2coef(ma_roots)
    arma_process = sm.tsa.ArmaProcess(ar_coef, ma_coef )
    ### note that now arma_process has many helpful methods:
    ### arcoefs, macoefs, generate_sample, ...
    return arma_process

### Exercise 2: OLS Estimation

**Problem:** Write a function that returns the parameter estimates and residuals from a simple OLS regression.

**Note:** It's helpful to include an option that deals with adding a constant within the function, although this is up to you. If you do add a constant within your function, it can be useful to return the new matrix as well. 

Once again, `statsmodels` and other Python libraries have functions for running linear regressions. For now, write your own function.

In [3]:
def _sm_calc_ols(y,x, addcon=True):
    """Wrapper for statsmodels OLS regression"""
    X = sm.add_constant(x) if addcon else x
    ols_results = sm.OLS(y,X).fit()
    beta_hat = ols_results.params
    resids = ols_results.resid
    return beta_hat, resids, X

### Exercise 3: Estimate the *ACF* and *PACF* by OLS

Using your OLS function, write functions that both estimate and plot the *ACF* and *PACF* by OLS.

**Note:** compare with the `statsmodels` methods:
* `sm.tsa.stattools.acf`, `sm.graphics.tsa.plot_acf`
* `sm.tsa.stattools.pacf`, `sm.graphics.tsa.plot_pacf`

Both of these functions have multiple methods for estimation; we'll stick with OLS.

### Exercise 4: Determine the *order of integration* of an ARIMA($p$,$n$,$q$) series

**Problem:** Determine the *order of integration* of the simulated data series in `arima_sims.csv`. Look at the *ACF* and *PACF* of the raw data, as well as various differences. Be sure to keep an eye out for when the data seems to be _over-differenced_

In [4]:
### Read in the data
df_arima = pd.read_csv('arima_sims.csv')
df_arima.head()

Unnamed: 0,y1,y2
0,145.69,17.53
1,148.59,16.42
2,151.83,14.08
3,155.14,14.93
4,158.78,16.01


### Exercise 5: Parsimonious MA($1$) model

Simulate the following MA($1$) model:
$$
y_t = \varepsilon_t - \varepsilon_{t-1} 
$$

What is a particuarly parsimonious way in which we can transform this model?

### Exercise 6: Diagnostic tests

Given the residuals of an estimated ARMA($p$,$q$) model, write Python functions that calculate the following:
* **Durbin-Watson** statistic
* **Breusch-Pagan** statistic and p-values
* **AIC** and **BIC**

### Exercise 7: Determine the lag order of time series data

Using your OLS, *ACF*/*PACF* functions, and diagnostics, try to figure out the ARMA($p$,$q$) orders using *ACF* and *PACF* plots of the various data in `arma_sims.csv`. 

*Steps*:
* Plot *ACF* and *PACF* of the data
* Estimate AR($p$) models using OLS for $p$=1,2,...,5
* Plot *ACF* and *PACF* of the residuals
* **DW** and **Breusch-Pagan** tests for serial correlation in the residuals
* **AIC** and **BIC**

**Note:** If you get a warning like: ``FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated...``, then this is an issue with the ``scipy`` library. 

Please update this library by running the following code in your Jupyter notebook:
``!pip install --upgrade scipy``

In [None]:
### Load in the data
df_arma = pd.read_csv('arma_sims.csv')
df_arma.head()

### Exercise 8: AR($1$) simulation and OLS estimation

Write a function that simulates and then estimates an AR($1$) process by OLS.

Use the functions you wrote above, or make use of the following statsmodels functions:
* `sm.tsa.arma_generate_sample` (more efficient than our own code; for repeated simulations this will be much faster)
* `sm.OLS`

### Exercise 9: AR($1$) OLS distribution - Part 1

Plot the distribution of the OLS estimator over repeated simulations

### Exercise 10: AR($1$) OLS distribution - Part 2

Plot the distribution of $\sqrt{T}(\hat{\phi} - \phi)$.

Note: to compare with the theoretical normal PDF, see:
* `mpl.mlab.normpdf`

That is all for today! 😎