#### https://online.stat.psu.edu/stat510/lesson/2

In [1]:
import pandas as pd 
from statsmodels.tsa.stattools import adfuller
import matplotlib.pyplot as plt
import seaborn as sns 
from statsmodels.tsa.seasonal import seasonal_decompose
import numpy as np
from sklearn.metrics import mean_squared_error
from IPython.display import Image

#### Moving Average Models

In AR(1) model, the variable $x_t$ is related to its lagged value $x_{t-1}$

$$
x_t = \delta + \phi_1 x_{t-1} + \omega_t
$$

In MA(1) model, $x_t$ is related to the past error $\omega_{t-1}$

$$
x_t = \mu + \theta_1 \omega_{t-1} + \omega_t
$$

In [2]:
Image(url="MA1_properties.png", width=600, height=600)

Note that there's a cutoff in the ACF

In [3]:
Image(url="MA1_ACF.gif", width=600, height=600)

In [4]:
Image(url="MA2_ACF.gif", width=600, height=600)

Invertibility of MA models: An MA model is said to be invertible if it is algebraically equivalent to a convering infinite order AR model.

#### Partial Autocorrelation Function (PACF)

For instance, consider a regression context in which y is the response variable and $x_1$, $x_2$, $x_3$ are predictor variables. 

Goal is to find the partial correlation between y and x3

First regress y against x1, x2

Then regress x3 against x1, x2

Correlate the parts of y and x3 that are not explained by x1, x2

$$
\frac{cov(y,x_3|x_1,x_2)}{std(y|x_1, x_2) std(x_3 | x_1, x_2)}
$$

Typically, matrix manipulations having to do with the covariance matrix of a multivariate distribution are used to determine estimates of the PACFs

In [5]:
Image(url="MA1_PACF.gif", width=600, height=600)

In [6]:
Image(url="AR1_PACF.gif", width=600, height=600)

#### Therefore, use ACF is identify MA lags, use PACF is identify AR lags

#### Useful Notations

**Backshift Operator**
$$
B x_t = x_{t-1}
$$
A power of B means to repeatedly apply the backshift
$$
B^2 x_t = x_{t-2}
$$
The backshift operator B doesn't operate on coefficients
$$
B \theta_1 = \theta_1
$$

**AR(p) Model**
$$
x_t = \delta + \phi_1 x_{t-1} + \phi_2 x_{t-2} + ... + \phi_p x_{t-p} + \omega_t
$$
It can be written as
$$
x_t = \delta + \phi_1 B x_t + \phi_2 B^2 x_t + ... + \phi_p B^p x_t + \omega_t
$$

Define **"AR polynomial"** as $\Phi(B) = 1 - \phi_1 B - ... - \phi_p B^p$
<br>We can then further organize it:
$$
\Phi(B) x_t = \delta + \omega_t
$$

AR(1) model has $\Phi(B) = 1 - \phi_1 B$ <br>
AR(2) model has $\Phi(B) = 1 - \phi_1 B - \phi_2 B^2$

**MA(q) Model**
$$
x_t = \mu + \theta_1 \omega_{t-1} + \theta_2 \omega_{t-2} + ... + \theta_p \omega_{t-p} + \omega_t
$$
It can be written as
$$
x_t = \mu + \theta_1 B \omega_t + \theta_2 B^2 \omega_t + ... + \theta_p B^p \omega_t + \omega_t
$$

Again, we can define **"MA polynomial"** as $\Theta(B) = (1 + \theta_1 B + ... + ... + \theta_q B^q)$<br>
We can then further organize it:
$$
x_t - \mu = \Theta(B) \omega_t
$$

Note: some convention negate $\theta_1$ to make MA polynomial in the same format as AR polynomial

**Differencing**: We can express $x_t - x_{t-1}$ as $(1-B) x_t = \bigtriangledown x_t$

#### Differencing with subscript: $\bigtriangledown_{12} x_t = x_t - x_{t-12}$

#### Differencing with superscript: $\bigtriangledown^2 = (1-B)^2 x_t = (1 - 2B + B^2) x_t = x_t - 2 x_{t-1} + x_{t-2}$