## Problem formulation

## Basic ARMA model

Basic ARMA(p, q) model can be presented as:
    \begin{equation}
    x_t = \mu + \varepsilon_t +  \sum_{i=1}^p \phi_i x_{t-i} + \sum_{i=1}^q \theta_i e_{t-i}
    \end{equation}
where $\varepsilon_i$ are error terms which are expected to form a white noise process (i.e. are independent identically distributed random variables with $\varepsilon_i \sim N(0, \sigma^2)$).
The key obstacle in analysing and fitting an ARMA model is that error terms $\varepsilon_i$ are not observable, nor can they be simply calculated as it is in case of, for example, linear regression ($\varepsilon = Y - \alpha - \beta X$). The equation above 
    \begin{equation}
    \varepsilon_t = x_t - (\mu + \sum_{i=1}^p \phi_i x_{t-i} + \sum_{i=1}^q \theta_i e_{t-i})
    \end{equation}
essencially makes $\varepsilon_t$ a function of $\{\varepsilon_{t-1}...\varepsilon_{t-p}\}$ which, in turn, can only be calculated recursivelly through previous values of $\varepsilon$.

# State-space representation



The problem can be reformulated in matrix form using a state space representation with $m$-dimensional vectors $\alpha_t$ such that:
    \begin{equation}
    \alpha_t^i = \phi_i x_{t-1} + \alpha_{t-1}^i + \theta_{i-1} \varepsilon_t
    \end{equation}
where $m = \max(p, q+1)$ and where the coefficients $\phi_i$ for $p<i\le m$ and $\theta_i$ for $q<i\le m$ are assumed to be $0$.

It allows us to split our system into **transition equation** which can be interpreted as its true state under assumed ARMA(p, q) mode:
    \begin{equation}
    \alpha_t = K\alpha_{t-1} + R \varepsilon_t*
    \end{equation}
where:
    \begin{equation}
    K=
      \begin{bmatrix}
        \phi_1 & 1 & 0 & ... & 0 \\
        \phi_2 & 0 & 1 & ... & 0 \\
        \vdots & \vdots & \vdots & ...  & \vdots \\ 
        \phi_m & 0 & 0 & ... & 0
      \end{bmatrix}
       \quad and \quad
     R=
      \begin{bmatrix}
        1 & 0 & 0 & ... & 0 \\
        0 & \theta_1 & 0 & ... & 0 \\
        \vdots & \vdots & \vdots & ...  & \vdots \\ 
        0 & 0 & 0 & ... & \theta_m
      \end{bmatrix}
    \end{equation}
  and **measurement equation**, where 
      \begin{equation}
    \alpha_t^1 = \phi_1 x_{t-1} + \alpha_{t-1}^1 + \varepsilon_t =
        \phi_1 x_{t-1} + \phi_2 x_{t-2} + \alpha_{t-2}^1 + \theta_1 \varepsilon_{t-1} +  \varepsilon_t = ... =
        x_t
    \end{equation}
  or $x_t = Z \alpha_t$ where $Z = \begin{bmatrix}1 & 0 & ... & 0 \end{bmatrix}$.
    
    
<font color='red'>*Here I interpret $\varepsilon_t$ as a vector of errors while $R$ is a square matrix. In literature it is also shown as a vector of $\theta$ multiplied by a constant error but in this case Kalman filter update equation would have a dimensionality mismatch.</font>

### Materials:
 - https://otexts.com/fpp2/non-seasonal-arima.html - basic problem formulation
 - https://uh.edu/~bsorense/kalman.pdf - 
 - https://www.stat.purdue.edu/~chong/stat520/ps/statespace.pdf
 - http://www.stat.ucla.edu/~frederic/221/W17/221ch3.pdf
 - https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python - general information about filters