---
layout: post
title:  "Time series: ARIMA"
date:   2023-05-19 10:14:54 +0700
categories: MachineLearning
---

# TOC

# Time series

A time series is a series of data points that are measured in sequential order over an interval of time. We would study the series to infer what has happened and maybe even forecast what would happen. Our approach is to assume that there is an underlying process generating those data points according to some statistical distribution. Hence those data points are realisation of random variables. By that nature, we can also call a time series a discrete time stochastic process.

Examples of time series include the stock price or sunspot appearance.

There are features of time series that we should get acquainted with: trend, seasonality and serial dependence. A trend is a consistent directional movement in a time series. If there is an underlying rationale for the trend, it is deterministic, otherwise it is stochastic. Seasonality is a feature that shows when the data varies according to time of the year, for example, temperature. Serial correlation is the dependence among data points, especially when they are close in time.

# Stationarity

Let's define expectation, variance and covariance. The expected value or expectation E(x) of a random variable x is its mean value in the population. We denote $$ E(x) = \mu $$. The variance is a measure of how spread the variable is, it is the expectation of the deviations of the variable from the mean: $$ \sigma^2(x) = E{[x-\mu]}^2 $$. The standard deviation is the square root of the variance of x $$ \sigma(x) $$. The covariance of two random variables x and y tells us how linearly related these two variables are. It is $$ \sigma(x,y) = E{[(x - \mu_x)(y-\mu_y)]} $$. This can be estimated from the sample: $$ Cov(x,y) = \frac{1}{1-n} \sum_{i=1}^n (x_i - \bar{x})(y_i - \bar(y)) $$. Correlation tells us how two variables co-vary $$ \rho(x,y) = \frac{E{[(x-\mu_x)(y-\mu_y)]}}{\sigma_x \sigma_y} = \frac{\sigma(x,y)}{\sigma_x \sigma_y} $$. A correlation of 1 means that the two variables have exact positive linear association. A correlation of 0 indicates no linear association (but it can have nonlinear relationship). A correlation of -1 indicates exact negative linear association. The sample correlation $$ Cor(x,y) = \frac{Cov(x,y)}{sd(x)sd(y)} $$.

For a time series, the definitions of mean, variance are a bit different. The mean of a time series $$ x_t $$ is the expectation $$ E(x_t) = \mu(t) $$. If we remove the deterministic trends or seasonal effects, we can assume that the series is stationary in the mean $$ \mu(t) = \mu $$ that is independent of time. Then $$ \mu $$ can be estimated with the sample mean $$ \bar{x} = \sum_{t=1}^n \frac{x_t}{n} $$. The variance of a time series that is stationary in the mean is $$ \sigma^2(t) = E{[(x_t - \mu)^2]} $$. If we also assume the variance is constant (stationary in variance) $$ \sigma^2(t) = \sigma^2 $$, then we can estimate $$ Var(x) = \frac{\sum(x_t - \bar{x})^2}{n-1} $$

A time series is called second order stationary if the correlation between sequential observations is only a function of the lag - the number of time steps separating each sequential observation. Then the serial covariance (auto covariance) of lag k $$ C_k = E{[(x_t - \mu)(x_{t+k} - \mu)]} $$. The serial correlation (auto correlation) of lag k: $$ \rho_k = \frac{C_k}{\sigma^2} $$. The sample version of autocovariance $$ c_k = \frac{1}{n} \sum_{t=1}^{n-k} (x_t - \bar(x)) (x_{t+k} - \bar{x}) $$. The sample autocorrelation is $$ r_k = \frac{c_k}{c_0} $$.

# White noise

Let's consider two operators: backward shift and difference operators. The backward shift or lag operator B inputs an element and output the previous element: $$ B x_t = x_{t-1} $$. We can apply B n times to step backward n times: $$ B^n x_t = x_{t-n} $$. The difference operator $$ \nabla $$ input an element and return the difference between that and the previous element: $$ \nabla x_t = x_t - x_{t-1} = (1 - B) x_t $$. When we apply n times: $$ \nabla^n = (1 - B)^n $$

A time series model is when the model fits the series so that the remaining series doesn't have auto correlation. The residual error series $$ x_t = y_t - \hat{y_t} $$. (Discrete) white noise is a series that the elements $$ \{w_t: t=1,..n\} $$ are independent and identically distribution (iid) with mean of zero, variance $$ \sigma^2 $$ and no auto correlation. If the elements are from a standard normal distribution, $$ N(0,\sigma^2) $$ then the series is a Gaussian white noise. Here is the properties of a discrete white noise:

$$ \mu_w = E(w_t) = 0 $$

$$ \rho_k = Cor(w_t, w_{t+k}) = \begin{cases}
1 \text{ if } k = 0 \\
0 \text{ if } k \neq 0 \\
\end{cases} $$

 
# Random walk

A random walk is another time series model in which the current element is the previous one plus a random step up or down: $$ x_t = x_{t-1} + w_t $$ with $$ w_t $$ is a discrete white noise series. Applying the backward shift operator: $$ x_t = B x_t + w_t = x_{t-1} + w_t $$. Repeat til the end we got $$ x_t = (1 + B + B^2 + ...) w_t \Rightarrow x_t = w_t + w_{t-1} + w_{t-2} + ... $$. We can see that the random walk the the sum of elements from a discrete white noise series. So $$ \mu_x = 0 $$ and $$ Cov(x_t, x_{t+k}) = t \sigma^2 $$. Which means the covariance is time dependent. In other words, the random walk is non stationary.

# Autoregressive model - AR(p)

Let's define a strictly stationary series, a series that is unchanged for any arbitrary shift in time. A time series model $$ \{x_t\} $$ is strictly stationary if the joint statsitical distribution of the elements $$ x_{t_1}, ... x_{t_n} $$ is the same as that of $$ x_{t_1+m},...x_{t_n+ m}, \forall t_i, m $$.

We would also consider a way to choose among models: Akaike Information Criterion (AIC). If we take the likelihood function for a statistical model with k parameters, and L maximizes the likelihood then $$ AIC = -2log(L) + 2k $$. The best model has the smallest AIC.

Now we dive into the autoregressive model. It is an extension of the random walk in which it adds term further back in time. A time series model $$ \{ x_t \} $$ is an autoregressive model of order p, AR(p), if: $$ x_t \alpha_1 x_{t-1} + ... + \alpha_p x_{t-p} + w_t = \sum_{i=1}^p \alpha_i x_{t-1} + w_t $$ where $$ \{w_t\} $$ is white noise and $$ \alpha_i \in R, \alpha_p \neq 0 $$. To predict with AR(p), for any time t: $$ \hat{x_t} = \alpha_1 x_{t-1} + ... + \alpha_p x_{t-p} $$.

# Moving average - MA(q)



# ARMA

ARMA stands for autoregressive moving average. It is a model to understand and predict a time series. The AR part is to regress the variable $$ X_t $$ on its own lagged (past) values. The MA is to model the error term as a linear combination of error terms at various times. The model is a stochastic process with two polynomials, p is the order of AR and q is the order of MA.

## AR

Model for AR(p): $$ X_t = \sum_{i=1}^p \phi_i X_{t-i} + \epsilon_t $$ with $$ \phi_1,...\phi_p $$ are parameters and the random variable $$ \epsilon_t $$ is white noise.

## MA

Model for MA(q): $$ X_t = \mu + \epsilon_t + \sum_{i=1}^q \theta_i \epsilon_{t-i} $$ with $$ \theta_1,...\theta_q $$ are parameters, $$ \mu = 0 $$ is the expectation of $$ X_t $$, and $$ \epsilon_t, \epsilon_{t-1}.. $$ are white noise.

## ARMA

The combined model ARMA(p,q) will have p autoregressive terms and q moving average terms.

$$ X_t = \epsilon_t + \sum_{i=1}^p \phi_i X_{t-i} + \sum_{i=1}^q \theta_i \epsilon_{t-1} $$