# Time Series Theory 3

## Moving Average (MA) Processes
* $Y_t = \Sigma_{k=1}^P\theta_ke_{t-k}$, with $\theta_0=1$, and $P$ is the **Model Order**, which tells us how many coefficients are in the model
* What are the unknowns in the MA model? They are $P$ and $\bar{\theta}$, the coefficient (here represented with a vector)
* For a given $P$, the variance $\text{Var}(Y_t)=P\sigma_e^2$- if $P$ is fixed, then variance is fixed. Therefore this is a stationary process for fixed $P$
* Say we have $P=2$, then,
  - $Y_t=\theta_0e_t+\theta_1e_{t-1}+\theta_2e_{t-2}=e_t+\theta_1e_{t-1}+\theta_2e_{t-2}$
  - where $e_t\sim\mathcal{N}(0,\sigma^2)$ and all $e$ are iid. $\theta_0=1$
* Say the mean of the process will still be $\mu_{Y_t}=0$, and the variance can be calculated as: $\text{Var}(Y_t) = \text{Var}(e_t+\theta_1e_{t-1}+\theta_2e_{t-2})=\text{Var}(e_t)+\theta_1^2\text{Var}(e_{t-1})+\theta_2^2\text{Var}(e_{t-2})$
  - Therefore, we get the variance as: $\sigma_e^2+\theta_1^2\sigma_e^2+\theta_2^2\sigma_e^2$

- Therefore the mean and variance are given by:
  * $E[Y_t]=0$
  * $\text{Var}(Y_t) =\sigma_e^2\Sigma_{k=0}^q\theta_k^2$
---
Therefore, this is a stationary process, as the mean and variance are constant

---
**EXAMPLE**
* Suppose you win 1 Dollar if a fair coin shows a head and lose 1 dollar if it shows a tail. Denote the outcome on toss $t$ by $a_t$
* $e_t=\left\{\begin{array}{ll} 1 & \text{if head}\\ -1 & \text{if tail}\end{array}\right.$
* The average $Y_t$ winning from 4 tosses: $Y_t = \frac{1}{2}e_t+\frac{1}{2}e_{t-1}+\frac{1}{2}e_{t-2}+\frac{1}{2}e_{t-3}$
* This is a moving average process

- Notice that the observed series $Y_t$ is autocorrelated even though the generating series $e_t$ is uncorrelated
- The series $Y_t$ is the weighted aggregation of some uncorrelated random variables
- In Economics, the generating series, $e_t$ is called the random shock

---
## MA[2] process
##### Lag of 0
* An MA[2] process is $Y_t=e_t+\theta_1e_{t-1}+\theta_2e_{t-2}$
* The ACF of this process, $\gamma_{Y_t,Y_{t+0}}=\gamma_0=E[(Y_t-\mu_y)(Y_{t-0}-\mu_y)]$
* $\gamma_0 = E[(e_t+\theta_1e_{t-1}+\theta_2e_{t-2})(e_t+\theta_1e_{t-1}+\theta_2e_{t-2})]$
* $=E[e_t^2+\theta_1^2e_{t-1}^2+\theta_2^2e_{t-2}^2+2(\theta_1e_te_{t-1}+\theta_2e_te_{t-2}+\theta_1\theta_2e_{t-1}e_{t-2})]$
* $=E[e_t^2+\theta_1^2e_{t-1}^2+\theta_2^2e_{t-2}^2]$, as $e_t$ is iid
* $=\sigma_e^2(1+\theta_1^2+\theta_2^2)$


##### Lag of 1
- The ACF of this process, $\gamma_{Y_t,Y_{t+1}}=\gamma_1=E[(Y_t-\mu_y)(Y_{t-1}-\mu_y)]$
- $\gamma_1 = E[(e_t+\theta_1e_{t-1}+\theta_2e_{t-2})(e_{t-1}+\theta_1e_{t-2}+\theta_2e_{t-3})]$
- $=E[e_te_{t-1}+\theta_1e_te_{t-2}+\theta_2e_te_{t-3}+\theta_1e_{t-1}^2+\theta_1^2e_{t-1}e_{t-2}+\theta_1\theta_2e_{t-1}e_{t-3}+\theta_2e_{t-2}e_{t-1}+\theta_2\theta_1e_{t-2}^2+\theta_2^2e_{t-2}e_{t-3}]$
- $=E[\theta_1e_{t-1}^2+\theta_1\theta_2e_{t-2}^2]$, as $e_t$ is iid
- $=\sigma_e^2(\theta_1+\theta_1\theta_2)$

##### Lag of 2
*  The ACF of this process, $\gamma_{Y_t,Y_{t+2}}=\gamma_2=E[(Y_t-\mu_y)(Y_{t-2}-\mu_y)]$
* $\gamma_2 = E[(e_t+\theta_1e_{t-1}+\theta_2e_{t-2})(e_{t-2}+\theta_1e_{t-3}+\theta_2e_{t-4})]$
* $=E[e_te_{t-2}+\theta_1e_te_{t-3}+\theta_2e_te_{t-4}+\theta_1e_{t-1}e_{t-2}+\theta_1^2e_{t-1}e_{t-3}+\theta_1\theta_2e_{t-1}e_{t-4}+\theta_2e_{t-2}^2+\theta_2\theta_1e_{t-2}e_{t-3}+\theta_2^2e_{t-2}e_{t-4}]$
* $=E[\theta_2e_{t-2}^2]$, as $e_t$ is iid
* $=\sigma_e^2\theta_2$

##### Lag of 3 and onwards
$\gamma_k=0,\:k\in[3,\infty)$ in a MA[2] process

## Finding the values of $\bar{\theta}$
* $Y_t = e_t+\theta_1y_{t-1}$, an MA[1] process
* $Y_t$ is known, $e_t$ is unknown, but $e_t\sim\mathcal{N}(0,\sigma^2)$, and is iid
* From here we have two paths forward:
  - Iterative Least Squares
    1. Assume $e_t$ ($e_{t-1}$ values are assumed)
    2. Fit a least squares model between $e_t$ and $Y_t$
    3. Compile residuals: $\xi_t=Y_t-\hat{Y}_t$, $\hat{Y}_t$ is from the Least Squares Model
    4. Let $e_t = \xi_t$. Iterate steps 2 and 3
    5. Stop when ACF of $\xi_t$ is like white noise OR $n$ iterations

---

* If the $e_t$s are normal, then so is the process, which is strictly stationary.
* The autocorrelation is:

$\rho_k = \left\{\begin{array}{ll}1 & \text{if}\: k= 0\\\Sigma_{i=1}^{q-k}\theta_i\theta_{i+k}/\Sigma_{i=1}^q\theta_i^2 & \text{if}\:k=1,2,\dots,q \\ 0 & \text{if}\: k\gt q \\ \rho_{-k} & \text{if}\:k\lt 0\end{array}\right.$

* The process is weakly stationary because the mean is constant and the covariance does not depend on $t$. Note that it cuts off at lag $q$

---
### Moving Average Processes
* For general processes introduce the backward shift operator $B$, $B^jy_t=y_{t-j}$
* Then the MA($q$) process is given by
  - $Y_t = (\theta_0+\theta_1B+\theta_2B^2+\dots+\theta_qB^{2q}=\theta(B)e_t$

---
## MA: Stationarity
* In general, MA processes are stationary regardless of the values of the parameters, but not necessarily 'invertible'
* An MA process is said to be invertible if it can be converted into a stationary AR process of infinite order
* In order to ensure there is a unique MA process for a given ACF, we impose the condition of invertibility
* Therefore, invertibility condition for MA process servers two purposes:
  1. It is useful to represent an MA process as an infinite order AR process;
  2. It ensures that for a given ACF, there is a unique MA process

## Auto Regressive Processes
* Assume $\{e_t\}$ is purely random with mean zero and std $\sigma_e$
* Then the AR process of order $p$ or AR($p$) is:

$Y_t = \phi_1y_{t-1}+\phi_2t_{t-2}+\dots+\phi_py_{t-p}+e_t$

$Y_t = \Sigma^p\phi_ky_{t-k} + e_t$

- What is the relation between AR and MA?
  * Let's define $B$ as the backshift operator, $B^jy_t = y_{t-j}$
  * So we can rewrite the AR equation as
    - $Y_t = \Sigma^p\phi_kB^ky_t +e_t$
  * Using algebra to shift the $y_t$ terms on one side, we get
    - $y_t[\Sigma^p\phi_kB^k] = e_t$
  * $\frac{y_t}{e_t}=1/[1-\Sigma^p\phi_kB^k]$
  * For an MA process, $y_t = e_t+\Sigma^q e_{t-k}\theta_k$
     - $\frac{y_t}{e_t}=1+\Sigma^q\theta_kB^k$
 * Their relationship in the extremes, and they complement each other

---
#### Finding first two moments of the AR[1] process
$y_t =\Sigma_{k=1}^p y_{t-k}\phi_k + e_t$, the AR process

$\mu_y =0$, $\gamma_{Y_t,Y_{t+0}}=\gamma_0=\sigma_y^2$ and $e_t\sim\mathcal{N}(0,\sigma^2)$

Let's take an AR[1] process: $Y_t = \phi_1y_{t-1} +e_t$

$\sigma_y^2 = E[(\phi_1y_{t-1}+e_t)(\phi_1y_{t-1}+e_t)]= E[\phi_1^2y_{t-1}^2+e_t^2+2\phi_1y_{t-1}e_t]$

Given $y_{t-1}=y_{t-2}\phi_1+e_{t-1}$, we can say theres no linear relation between $y_{t-1}$ and $e_t$ as $e_t$ and $e_{t-1}$ are iid

$\sigma_y^2= \phi_1^2\sigma_y^2 + \sigma_e^2 + 0\rightarrow \sigma_y^2 = \sigma_e^2/(1-\phi_1^2)$

#### Autocovariance of the AR[1] process (lag1)
$\gamma_1 = E[(Y_t-\mu_y)(Y_{t-1}-\mu_y)]=E[(\phi_1y_{t-1}+e_t-\mu_y)(\phi_1y_{t-2}+e_{t-1}-\mu_y)]$

$\gamma_1 = E[\phi_1^2y_{t-1}y_{t-2}+e_te_{t-1}+\phi_1y_{t-1}e_{t-1}+\phi_1y_{t-2}e_t]$

$\gamma_1 = \phi_1^2\gamma_1+0 + \phi_1\sigma_e^2+0\rightarrow \gamma_1 = \phi_1\sigma_e^2/(1-\phi_1^2)$

As $\phi_1$ is typically less than one, the autocovariance is reduced by a factor of $\phi_1$ in an AR[1] process
$\gamma_1 = \phi_1\gamma_0=\phi_1\sigma_y$

#### Autocovariance of the AR[1] process (lag2)
$\gamma_2 = E[(Y_t-\mu_y)(Y_{t-2}-\mu_y)]=E[(\phi_1y_{t-1}+e_t-\mu_y)(\phi_1y_{t-3}+e_{t-2}-\mu_y)]$

$\gamma_2 = E[\phi_1^2y_{t-1}y_{t-3}+e_te_{t-2}+\phi_1y_{t-1}e_{t-2}+\phi_1y_{t-3}e_t]$

if we expand the equation for $y_{t-1}$, we get $y_{t-1} =\phi_1(\phi_1y_{t-3}+e_{t-2})+e_{t-1}$

$\gamma_2 = \phi_1^2\gamma_2+0 +\phi_1^2\sigma_e^2+0\rightarrow \gamma_2 = \phi_1^2\sigma_e^2/(1-\phi_1^2)$

As $\phi_1$ is typically less than one, the autocovariance is reduced by a factor of $\phi_1^2$ in an AR[1] process
$\gamma_2 = \phi_1^2\gamma_0=\phi_1\sigma_y$


Similarly, $\gamma_3 = \phi_1^3\sigma_e^2/(1-\phi_1^2)$, and so on

---
* Since the ACF will never go to zero, we cannot use ACF to determine the model order of an AR process
* We see there's an indirect relationship between $y_t$ and $y_{t-k}$, in the AR model
* ACF cannot distinguish between the indirect and direct  pathways
* For that, we use PACF, so we only get the direct effect, by removing any indirect relationships, then applying ACF

### Steps to calculate PACF
NOTE: $\hat{\alpha},\hat{\beta} = \rho_1$
1. Use the Least Squares method to model $\hat{y}_t = \hat{\alpha}y_{t-1}$
2. Use the Least Squares method to model $\hat{y}_{t-2} = \hat{\beta}y_{t-1}$
3. Compute residuals:
   - $\xi_t = y_t-\hat{y}_t$, $\:\xi_t$ will not have any effect on $y_{t-1}$
   - $\xi_{t-2} = y_{t-2}-\hat{y}_{t-2}$, $\:\xi_{t-2}$ will not have any effect on $y_{t-1}$
4. PACF[2]$ =\Phi_2 = \text{Corr}(\xi_t,\xi_{t-2})$

## ARMA process
* Combine AR and MA processes
* *A random process that contains both auto-regressive and moving average parts is said to be an ARMA($L_{AR},L_{MA}$) process*
* An ARMA process of order $(p,q)$ is given by:

$Y_t = \alpha_1y_{t-1}+\dots+\alpha_py_{t-p}+e_t+\beta_1e_{t-1}+\dots+\beta_qe_{t-q}$

Alternate expressions are possible using the backshift operator

$\Phi(B)y_t = \Theta(B)e_t$

* where
  - $\Phi(B)=1+\alpha_1B+\dots+\alpha_pB^p$
  - $\Theta(B) = 1+\beta_1B+\dots+\beta_qB^q$

<img src='acpacf.jpg'>

## ARIMA processes
* General autoregressive integrated moving average processes are called ARIMA processes
* Typically used on non-stationary time series
* When differenced, say $d$ times, the process is an ARMA process
* Call the differentiated process $W_t$. then $W_t$ is an ARMA process and
  - $W_t(\text{ARMA})=\Delta^dy_t(\text{ARIMA})=(1-B)^dy_t$
 
* It has 3 terms: $p,d,q$
  - $p$ is the AR term
  - $d$ is the difference order
  - $q$ is the MA term

## Box-Jenkins Methods (ARIMA models)
* The Box-Jenkins methodology refers to a set of procedures for identifying and estimating time series models within the class of autoregressive integrated moving average (ARIMA) models
* ARIMA models are regression models that use lagged values of the dependent variable and/or random disturbance terms as explanatory variables
* ARIMA models rely heavily on the autocorrelation pattern in the data
* This method applies to both seasonal and non-seasonal data

---
### Box-Jenkins methods- 5 Steps
1. Stationarity checking and Differencing- extracting the non-stationarity and removing trends
2. Model Identification - is it an AR, MA or ARMA (also identify order)
3. Parameter Estimation - find the $\phi$s or $d$s
4. Diagnostic Checking - validate the model
5. Forecasting - predict the required data

---
#### Differencing
1. If the process is non-stationary, then first differences of the series are computed to determine if that operation results in a stationary series
2. The process is continued until a stationary time series is found
3. This then determines the value of $d$
4. Sometimes, transformations, like log or some variance stabilizing transformations are made before differencing

---
#### Diagnostic Checking
* Often it is not straightforward to determine a single model that most adequately represents the data generating process, and it is not uncommon to estimate several models at the inital stage. The model that is finally chosen is the one considered best based on a set of diagnostic checking criteria. These criteria include:
  - t-tests for coefficient significance
  - residual analysis
  - model selection criteria

### Model Selection Criterion
* Akaike Information Criterion (AIC)
  - AIC $=-2\ln(L)+2k$
* Schwartz Bayesian Criterion (SBC)
  - SBC $-2\ln(L)+k\ln(n)$
* Here,
  - $L$ is the Likelihood function
  - $k$ is the number of parameters to be estimated
  - $n$ is the number of observations
*  Ideally, the AIC and SBC should be as small as possible