# Time Series: Statistical Models & Fitting

<hr>

**Recap**<br>

Weak stationarity is defined as:

1. Mean, variance independent of $t$: $\mu_X (t) = \mu$, $var_X (t) = var_X$

2. Autocovariance is just a function of the time distance, *e.g. autocovariance between Jan and Feb should be the same as Oct and Nov*: $\gamma_X (s, t) = \gamma(\lvert s - t \rvert)$

To check for stationarity, under appropriate technical conditions, the distribution of the estimator is:

$\hat\gamma_W (h) \sim N(0, \frac{\sigma_W^2}{n})$

which means that the autocovariance, for $h > 0$ is expected to be close to zero if the series is stationary. If the autocovariance function does not decay to zero at all, or decays to zero very slowly, it is an indication of nonstationarity

<img alt="Stationarity" src="assets/stationarity.png" width="300">


<hr>

**Model 1: Autoregressive, $AR(p)$**

$X_t = \phi_1 X_{t-1} + \phi_2 X_{t-2} + \dots \phi_p X_{t-p} + W_t$

AR(1) means that the model will use $t-1$ time steps to predict $X_t$ by estimating $\phi$ that minimizes the error.

The model is usually not very useful for long term prediction as it converges to a constant value (the unconditional mean of the time series). To get longer term prediction, increase the time step between different measurements. For example, with daily time series data, it may be hard to predict 30 days ahead; but if we first average the daily data into weekly data then we might be able to predict the value for next month (with 4 time steps ahead). Another way is to model the trend and seasonality seperately - this will assume that the trend and seasonality will persist in the long-term.

****

**Model 2: Random Walk**

$X_t = X_{t-1} + W_t + \delta$ where $\delta$ is a deterministic linear increase at each time step

Essentially, this is a sum of white noise random variables:

$X_t = X_{t-1} + W_t = X_{t-2} + W_{t-1} + W_t = X_0 + W_1 + \dots + W_t$

With drift (linear increase):

$X_t = t \cdot \delta + X_0 + \sum_{s=0}^{t} W_s$

<img alt="Random Walk with Drift" src="assets/random_walk_drifted.png" width="300">

Properties:

- $\mathbb {E}[X_t] = t \cdot \delta + X_0$
- without drift: $var(X_t) = t \cdot \sigma_w^2$
- Not stationary, because its expectation and variance grows with $t$
- but $\nabla X_t = X_t - X_{t-1}$ is stationary
- autocovariance: $\gamma_X (s,t) = cov(X_s, X_t) = var(X_0) + \min(s,t) \cdot \sigma_W^2$

****

**Model 3: Moving Average, $MA(q)$**

Given $q$, uses the previous $q$ white noises to predict the next position:

$X_t = W_t + \theta_1 W_{t-1} + \dots + \theta_q W_{t+q}$

Properties:

- $\mathbb {E}[X_t] = 0$
- autocovariance $\gamma (s, t)$ depends only on $\lvert s - t \rvert$ and is therefore stationary

    $\gamma_X (h) = Cov(\sum_{j=0}^{q} \theta_j W_{t-j}, \sum_{k=0}^{q} \theta_k W_{t+h-k}) = \sum_{j = 0}^{q-h} \theta_j \theta_{j+h} \sigma_W^2$
    
    
- ACF reflects order: $\gamma(s,t) = 0$ if $\lvert s - t \rvert > q$
- ACF distinguishes MA and AR models where ACF goes to 0 when time distance is more than order for MA models but ACF decays exponentially as time distance increases for AR models 

****

**Model 4: ARMA(p,q)**

$X_t = \phi_1 X_{t-1} + \dots + \phi_p X_{t-p} + W_t + \theta_1 W_{t-1} + \dots + \theta_q W_{t-q}$

**Model 5: ARIMA(p,d,q)**

Additional $d$ term for differencing order in addition to autoregressive and moving average terms.

****

**Regression x Time series**

- General model: $X_t = \beta^T \cdot z_t + W_t$
- linear trend: $X_t = \beta_1 + \beta_2 t = W_t$
- AR(2) model: $X_t = \phi_1 X_{t-1} + \phi_2 X_{t-2} + W_t$
- external regressors: $X_t = \beta_1 X_{t-1} + \beta_2 Y_t + W_t$, where $Y_t$ is an external variable

Using least squares estimate: $\min_{\beta} \sum_{t} (x_t - \beta^T z_t)^2$, but errors may be correlated over time as observations are not orthogonal.

How to decide which model to use? Use ACF as diagnostic tool.

Example: $X_t = T_t + Y_t$, sum of
- linear trend: $T_t = 50 + 3t$
- AR(1) model: $Y_t = 0.8Y_{t-1} + W_t$, $\sigma_W = 20$

<img alt="Sample Time Series" src="assets/sample_time_series.jpg" width="300">

Look at largest cross-covariance terms to determine which variables to select

<img alt="Cross Covariance" src="assets/cross_covariance.jpg" width="300">

Generally, follow these steps when fitting a time series:

1. Transform time series to make it stationary:
    - log-transform
    - remove trends / seasonality
    - differencing
2. Check if time series looks like only white noise using ACF
3. Otherwise, fit MA models if ACF decays exponentially or AR models if ACF decays linearly
4. Estimate coefficients and compute residuals to test for white noise

****

**Parameter estimation for stationary AR(p), using Yule-Walker**

Estimate parameters $\hat\phi$, $\hat\sigma_W^2$ using Yule-Walker equations: method of moments

1. Estimate autocovariances $\gamma(h)$ for $h = 0, 1, 2, \dots$ from averages
2. Solve system of linear equations: for $h= 1, \dots, p$ using estimates of $\gamma(h)$
    
    $\gamma(h) = \phi_1 \gamma(h-1) + \phi_2 \gamma(h-2) + \dots + \phi_p \gamma(h-p)$

    $\sigma_W^2 = \gamma(0) - \phi_1 \gamma(1) - \phi_2 \gamma(2) - \dots - \phi_p \gamma(p)$
    
- Yule-Walker equations in matrix form ($\Gamma_p$) is a $p \times p$ covariance matrix with $(i, j)$th entry is $\gamma (i - j)$
- Using column vectors $\gamma_p = [\gamma(1), \dots, \gamma(p)]^T$
- Solve for $\phi$ with this equation:

    $\gamma_p = \Gamma_p \phi$
    
    $\phi = \Gamma_p^{-1} \gamma_p$
    
    
- Solve for $\sigma_W^2$ with this equation:

    $\sigma_W^2 = \gamma(0) - \phi_1 \gamma(1) - \phi_2 \gamma(2) - \dots - \phi_p \gamma(p) = \gamma(0) - \hat\phi^T \gamma_p = \gamma(0) - \gamma_p^T \Gamma_p^{-1} \gamma_p$

****

**Forecasting with AR(p) model**

- Estimate $M$ steps into the future based on $N$  observations
- Estimate coefficients $\hat\phi_1, \dots, \hat\phi_p$ and plug in:

    $\hat X_{n+1} = \hat\phi_1 X_n + \dots +\hat\phi_p X_{n-p+1}$
    
    $X_{n+2} = \hat\phi_1 \hat X_{n+1} + \hat\phi_2 X_n + \dots +\hat\phi_p X_{n-p+2}$

    Note: When plugging in $\hat X_{n+1}$ then we are just plugging in $\hat\phi_1 X_n + \dots +\hat\phi_p X_{n-p+1}$
    
    $\therefore$ it is always a linear combination of last $p$ observations, $X_n, \dots, X_{n-p+1}$
    
    **Caution**: Only works for short horizons $m$. For long horizons, it actually converges to the mean
    
    <img alt="Convergence to the mean" src="assets/converges_to_mean.jpg" width="300">

****

**Using Partial Autocorrelation Function, PACF, to determine order, $p$ for AR(p)**

For an AR(1) model, the ACF may decay slowly to zero and does not deterministically tell us what is the order that should be applied - unlike the MA(q) model.

Example: 

$AR(1): X_t = \phi X_{t-1} + W_t = \phi^2 X_{t-2} + \phi W_{t-1} + W_t$

$Corr(X_t, X_{t-2}) = Corr(\phi^2 X_{t-2} + \phi W_{t-1} + W_t, X_{t-2}) = \phi^2 \gamma(0)$


Partial correlation of $X$, $Y$ given $Z$:
- Regress X on Z; Y on Z
- $\rho_{XY \vert Z} = corr(X - \hat X, Y - \hat Y)$ -- Therefore this captures the relationship of $X$, $Y$ beyond $Z$
- Formally, the partial autocorrelation of time series $X_t$ at lag $h$ is:

    $\alpha_X (h) := Corr(X_h - \hat X_h^{lin_{h-1}}, X_0 - \hat X_0^{lin_{h-1}})$
    
    where $\hat X_h^{lin_{h-1}}$ is the linear regression projection of $X_h$ on $X_1, \dots, X_{h-1}$ and $\hat X_0^{lin_{h-1}}$ is the linear regression projection of $X_0$ on $X_1, \dots, X_{h-1}$.
    
    $\therefore$ $\alpha_X (h)$ is the correlation between $X_h$ and $X_0$ after removing the linear predictions based on the intermediate terms of the series $X_1, \dots, X_{h-1}$
    
    A convenient way to compute the partial autocorrelation $\alpha_X (h)$ is to use the Frisch-Waugh-Lovell theorem. FWL theorem says that $\alpha_X (h)$ is the regression coefficient on regressor $X_{t-h}$ in the regression of $X_t$ along with all intermediate terms, i.e.
    
    $X_t = \phi_1 X_{t-1} + \dots + \phi_h X_{t-h} + \tilde X_t$
    
    then $\alpha_X (h) = \phi_h$ where the regression coefficients $\phi_1, \dots, \phi_h$ can be estimated by the method of moments with Yule-Walker equations.
    
    

****

# Basic code
A `minimal, reproducible example`