# Time Series Models

<hr>

## Exponential Smoothing

Randomness (*or random variation*) naturally exists and exponential smoothing can help to manage that.

Suppose $S_t$ is the expected baseline response at time period $t$ and $x_t$ is the observed response. What is the real response over time without variation? Is there a real increase in baseline or is this a random event?

Two ways to answer:

- $S_t = x_t$
- $S_t = S_{t-1}$

The exponential smoothing method combines both such that $S_t = \alpha x_t + (1-\alpha) S_{t-1}$ where $0 < \alpha < 1$ and is essentially a weighted average between the observation and the previous baseline, controlled by $\alpha$, such that: 

- $\alpha \to 0$: a lot of randomness in the system
- $\alpha \to 1$: not much randomness in the system where the current reading is the best predictor

But this basic equation does not deal with trends or cyclical variations.

****

**Dealing with trends and cyclic effects**

- *Trend patterns*

    Suppose $T_t$ is the trend at a time period $t$ then we can modify the exponential smoothing method to the following:

    $S_t = \alpha x_t + (1-\alpha) (S_{t-1} + T_{t-1})$
    
    This adjusts for trend as an additive component to the original formula.

    We can compute $T_t$ as an weighted average, controlled by $\beta$, such that:

    $T_t = \beta (S_t - S_{t-1}) + (1 - \beta) T_{t-1}$ with an initial condition of $T_1 = 0$

    
- *Cyclic/seasonal patterns*

    In adjusting for seasonality, we can approach this with a multiplicative way:
    
    $S_t = \frac{\alpha x_t}{C_{t-L}} + (1-\alpha) (S_{t-1} + T_{t-1})$, i.e. *Holt-Winters method*
    
    where:
    - $L$: the length of a cycle
    - $C_t$: the multiplicative seasonality factor for time $t$ to inflate or deflate the observation
    
    To compute $C_t$, we can update the seasonal/cyclic factor in a similar way:
    
    $C_t = \gamma (\frac{x_t}{S_t}) + (1 - \gamma) (C_{t-L})$
    
    where the initial $C_1, \dots, C_L = 1$

****

**Forecasting**

Suppose $F_{t+1}$ is our forecast for one time unit ahead and our prediction follows:

- Forecasting with basic exponential smoothing

    $F_{t+1} = \alpha S_t + (1-\alpha) S_t$
    
    since our best guess for $x_{t+1} = S_t$
    
    $\therefore F_{t+1} = S_t$
    
    
- Forecasting with trend (additive)

    $S_t = \alpha x_t + (1-\alpha) (S_{t-1} + T_{t-1})$
    
    $T_t = \beta (S_t - S_{t-1}) + (1-\beta) T_{t-1}$

    $\therefore F_{t+1} = S_t + T_t$
    
    This generalizes to the next periods as we assume the current trend is our best guess for future trend.
    
    $F_{t+k} = S_t + k \cdot T_t$ where $k = 1, 2, \dots$
    
    
- Forecasting with seasonality (multiplicative)

    $S_t = \frac{\alpha x_t}{C_{t-L}} + (1-\alpha) (S_{t-1} + T_{t-1})$
    
    The best estimate of the next time period's seasonal factor will be:
    
    $C_{t+1} = C_{(t+1)-L}$
    
    $\therefore$ Our forecast for time period $t+1$ will be:
    
    $F_{t+1} = (S_t + T_t) \cdot C_{(t+1)-L}$
    
    
To find the best values of $\alpha, \beta, \gamma$ is the optimal set of parameters that minimizes the bias between forecast and observation, $(F_t - x_t)^2$

****

## ARIMA

*Autoregressive integrated moving average*

Three key parts:

1. **Differences**

    - Exponential smoothing basic equation: $S_t = \alpha x_t + (1-\alpha) S_{t-1}$
    - $S_t = \alpha x_t + (1-\alpha) S_{t-1} + (1-\alpha)^2 \alpha x_{t-2} + \dots$
        - Estimates $S_t$ based on $x_t, x_{t-1}$, etc.
        - Works well when data is *stationary, i.e. if the mean, variance, and other measures are all expected to be constant over time*
    - Often, data is not stionary but the differences might be stationary:
        - First-order difference, $D_1$: difference of consecutive observations such that, $D_1 = (x_t - x_{t-1})$
        - Second-order difference, $D_2$: difference of the differences, such that, $D_2 = (x_t - x_{t-1}) - (x_{t-1} - x_{t-2})$
        - $d$th-order differences
  
  
2. **Autoregression**

    - Predicting the current value based on previous time periods' values, i.e. using earlier values to predict current value
    - Order-$p$ autoregressive model: $S_t$ is a function of $\{ x_t, x_{t-1}, \dots, x_{t-(p-1)}\}$
    

3. **Moving Average**

    - Using previous errors as predictors, $\epsilon_t = (\bar x_t - x_t)$, i.e. the predicted value minus the observed value is the error
    - Order-$q$ moving average goes back $q$ time periods, $\epsilon_{t-1}, \dots, \epsilon_{t-q}$
    
****
    
**Putting it altogether**

The ARIMA model has three parameters, $p, d, q$ and can be represented as $\text{ARIMA(p,d,q)}$, such that:

$D_{(d)t} = \mu + \sum_{i=1}^{p} \alpha_i D_{(d)t - i} - \sum_{i=1}^{q} \theta_i (\bar x_{t-i} - x_{t-i})$

where

- $D_{(d)t}$ is the differenced value at time $t$
- $\mu$ is the mean value of the stationary distribution
- The second term being the autoregressive part based on differenced terms
- The last term being the moving average part based on previous errors


The parameters $p, d, q$ can be optimized for the best fit to the data. Different values of $p, d, q$ can represent different types of models:

- $\text{ARIMA(0,0,0)}$ is a white noise model
- $\text{ARIMA(0,1,0)}$ is a random walk model
- $\text{ARIMA(p,0,0)}$ is an AR (autoregressive) model
- $\text{ARIMA(0,0,q)}$ is a moving average model
- $\text{ARIMA(0,1,1)}$ is an exponential smoothing model

It is best used for short-term forecasting and is better than exponential smoothing when the data is more stable, with fewer peaks/valleys and outliers. A rule of thumb requires ~40 data points for ARIMA to work well.

****

## GARCH
*Generalized Autoregressive Conditional Heteroskedasticity*

One common way to estimate variance on time-series is to use *GARCH*, which is structurally similar to the ARIMA model.

$\sigma_t^2 = \omega + \sum_{i=1}^{p} \beta_i \sigma_{t-i}^2 + \sum_{i=1}^{q} \gamma_i \epsilon_{t-i}^2$

Two differences from ARIMA:

1. Estimates variances/squared errors, not observations or linear errors
2. Estimates raw variances and not differences of variances
    
<hr>

# Basic code
A `minimal, reproducible example`