In [0]:
import numpy as np
import matplotlib.pyplot as plt

**Definition - Autoregressive model**

An autoregressive model of order p, written as AR(p), can be written as

$$x_t=\phi_1 x_{t-1} + \phi_2 x_{t-2} + ... + \phi_p x_{t-p} + w_t$$

Here \\(x_t\\) is stationary and \\(\phi_p\\) are constants and non-zero.
\\(w_t\\) is assumed to be a Gaussian noise series with zero mean and variance \\(\sigma_w^2\\).

Expanding the AR(1) model recursively we can write

$$x_t = \phi^k x_{t-k} + \sum_{j=0}^{k-1} \phi_j w_{t-j}$$

Since \\(x_t = \phi_1 x_{t-1} + w_t = \phi_1 (\phi_1 x_{t-2} + w_{t-1}) + w_t\\) and can be expanded similarly. 

If \\(|\phi|<1\\) the term \\(\phi^k x_{t-k} \\) goes to zero. Furthermore, if \\(x_t\\) is also stationary the AR(1) model can be written as a linear process by

$$x_t =  \sum_{j=0}^{\infin} \phi_j w_{t-j}$$

This process will have zero mean du to the linearity of the expectation operator.

The autocovariance of an AR(1) process is $$\gamma(h)=\frac{\sigma_w^2 \phi^h}{1 - \phi^2}$$

**Definition - The autoregressive operator**

The autoregressive operator is defined as 

$$\phi(B)=1 - \phi_1 B - \phi_2 B^2 - ... - \phi_p B^p$$




In [0]:
## Example - Realization of autoregressive model
'''
In this example it is seen how the realization of an AR(1) model depends on phi
If 0<phi<1 the realization is stationary and the acf show that the linear dependence between samples are steadily declining
If -<phi<0 the realization is stationary and the acf show that the linear dependence between samples are alternating in a decreasing manner
If 1<phi the relization is non-stationary
'''

from statsmodels.tsa.stattools import acf

def ar1_realization(phi):
    N = 100
    mu, sigma = 0, 1
    w = np.random.normal(mu, sigma, N)
    x = np.zeros(N)

    for i in range(1, N):
        x[i] = phi * x[i-1] + w[i]

    plt.figure()
    plt.plot(x)
    plt.title(f"Realization of AR(1) - Phi = {phi}")
    plt.xlabel("Sample")
    plt.ylabel("Value")

    plt.figure()
    plt.plot(acf(x))
    plt.title(f"Autocorrelation of AR(1) - Phi = {phi}")
    plt.xlabel("Lag")
    plt.ylabel("Value")

ar1_realization(0.9)
ar1_realization(-0.9)
ar1_realization(1)


**Definition - Moving average model**

The moving average model of order q, **MA**(q) is defined as

$$ x_t = w_t + \theta_{1} w_{t-1} + \theta_{2} w_{t-2} ... + \theta_{p} w_{t-p}$$

Here \\(theta\\) are the parameters of the model and non-zero. \\(w_t\\) is Gaussian noise.
The moving average process can also be written as

$$ x_t = \theta(B) w_t $$ where \theta(B) is the mmoving average operator

**Definition - The moving average operator**

The moving average operator is defined as 

$$ \theta(B) = 1 + \theta_1 B + \theta_2 B^2 ... \theta_p B^p $$

**Example - MA(1) process**

The **MA**(1) process is defined as 

$$ x_t = w_t + \theta w_{t-1} $$

Here \\(E[x_t]=0\\) and the autocorrelation is \\(\rho(h)=\frac{\theta}{1+\theta^2}\\) when \\(h = 1\\)

In [0]:
## Example - Realization of moving average model
'''
In this example it is seen how the realization of an MA(1) model depends on theta
The MA process is always stationary regardless of theta
If 0<theta<1 the acf show that the linear dependence between samples are positive at lag 1 and zero after
If -<theta<0 the acf show that the linear dependence between samples are negative(Due to the negative theta) at lag 1 and zero after
If 1<theta the relization is stationary
'''

from statsmodels.tsa.stattools import acf

def ma1_realization(phi):
    N = 1000
    mu, sigma = 0, 1
    w = np.random.normal(mu, sigma, N)
    x = np.zeros(N)

    for i in range(1, N):
        x[i] = phi * w[i-1] + w[i]

    plt.figure()
    plt.plot(x)
    plt.title(f"Realization of MA(1) - Phi = {phi}")
    plt.xlabel("Sample")
    plt.ylabel("Value")

    plt.figure()
    plt.plot(acf(x), "-*")
    plt.title(f"Autocorrelation of MA(1) - Phi = {phi}")
    plt.xlabel("Lag")
    plt.ylabel("Value")

ma1_realization(0.9)
ma1_realization(-0.9)
ma1_realization(10)



**Example - Causality**

The moving average process will always be stationary regardless of \\(\theta\\) vvalues as shown above. The autoregressive process is however different. COnsider the autoregressive process 

$$ x_{t} = \phi x_{t-1} + w_t $$

Which can be recursively rewritten to

$$ x_{t} = \phi^p x_{t-p} + \sum_{j=0}^{p-1} \phi^j w_{t-j}  $$

And if \\(| \phi | < 1\\) will converge to 

$$ x_{t} = \sum_{j=0}^{p-1} \phi^j w_{t-j}  $$

This shows that an **AR**(1) process can be represented as a linear process with coeffecients \\(\psi_j=\phi^j\\). For a derivation of this, see "Time Series Analysis and Its Applications" Shumway & Stoffer page 88.

Consider a non-stationary process, the random walk, which is a **AR**(1) process with \\(\phi=1\\). Autoregressive processes with \\(\phi>1\\) are called explosive and cannot be expressed a linear process. This is because the term \\(\phi^p x_{t-p}\\) does not converge to 0.

However, the process can be rewritten to another form, to allow representation as a linear process. 

First a shift forward in time i performed

$$ x_{t+1} = \phi x_{t} + w_{t+1}  $$

Then we rewrite to express in terms of \\(x_{t}\\)

$$  x_{t} =  - \phi^{-1} w_{t+1} + \phi^{-1} x_{t+1}  $$

Now, recursively substituting \\(x_{t+1}\\) yields

$$ x_{t} = \phi^{-p} x_{t+p} - \sum_{j=1}^{p} \phi^{-j} w_{t+j}  $$

It is seen that for \\(\theta> 1\\) the term \\(\phi^{-p} x_{t+p}\\) converges to 0 and we are back at expressing \\(x_t\\) as a linear process. But instead of describing \\(x_t\\) as a linear process of past errors it is expressed in terms of future. This is then a non-causal system.




**Definition AR and MA polynomials**

The ARMA process can be written as

$$ \phi(B) x_t = \theta(B) w_t $$

Where \\(\phi(B)\\) and \\(\theta(B)\\) are the characteristic equations.

The AR and MA polynomials are defined as 

$$ \phi(z) = 1 - \phi_1 z - \phi_2 z^2 - ... - \phi_p z^p $$

$$ \theta(z) = 1 + \theta_1 z + \theta_2 z^2 + ... + \theta_p z^p $$

Where in the characteristic equation we switch B with z which is a complex number

**Definition - Causality**

For an **ARMA**(p,q) process to be causal, it needs to be representable as a linear process of past errors.

$$ x_{t} = \sum_{j=0}^{\infin} \psi_{j} w_{t-j} = \psi(B) w_t $$

Since \\( \phi(B) x_t = \theta(B) w_t \\) and conversely \\(x_t = \frac{\theta(B)}{\phi(B)} w_t\\) we can rewrite this to

$$ \frac{\theta(B)}{\phi(B)} w_t = \sum_{j=0}^{\infin} \psi_{j} w_{t-j} = \psi(B) w_t $$

Setting \\(w_{t-j}\\) to \\(w_t B^j\\) and cancelling \\(w_t\\) in all terms

$$ \frac{\theta(B)}{\phi(B)} = \sum_{j=0}^{\infin} \psi_{j} B^j = \psi(B) $$

Substituting \\(B\\) by \\(z\\) 

$$ \frac{\theta(z)}{\phi(z)} = \sum_{j=0}^{\infin} \psi_{j} z^j = \psi(z) $$

In order to make sure that the system is causal, meaning that we can write is as a linear process og past errors, all AR parameters needs to have magnitude less than 1. For this to be true, all roots of the characteristic eqation has to lie outside the unit circle.

**Example - Causal AR(1) process**

Consider the causal **AR**(1) process 

$$ x_t = 0.3x_t + w_t $$

We know it is causal, since \\(abs(\phi) < 1\\)

Here \\(\phi(B)=1-0.3B\\) hence \\(\phi(z)=1-0.3z\\). The characteristic equation has the root \\(z=\frac{1}{0.3}\\) which is outside the unit circle. 

**Example - Non-causal AR(1) process**

Let's go the other way and define a characteristic equation

$$ \phi(z) = (1-2z) $$ This has the root \\(z=\frac{1}{2}\\). Writing out the **AR**(1) process that generates this characteristic equation

$$ x_t(1-2B)=w_t $$

$$ x_t = 2x_{t-1} + w_t $$

Here we see that \\(\phi=2\\) which cannot be expressed as a linear combinarion of past errors as shown before.

**Example - Non-causal AR(2) process**

Defining a characteristic equation with one root inside and one root outside of the unit circle

$$ \phi(z)=(1-0.1z)(1+2z)=1+2z-0.1z-0.2z^2$$

This will result in thr AR process

$$ x_t(1+1.9B-0.2B^2) = w_t $$

Rewriting to

$$ x_t = - 1.9 x_t + 0.2 x_{t-2} + w_t $$

Here we see that we have \\(abs(\phi_1)>1\\) which will yield a non-causal system

**Intuition - Causality**

The reason we care if an AR process is causal, is that we want the current obersavation \\(x_t\\) to depend only on past values. This means we can forecast new values, since the future values will only depend on past values(\\(x_{t+1}\\) will only depend on available values).
In signal processing, to perform real-time filtering we need to only depend on previous and current samples, hence a causal system is nescassary.
Similarly in control theory, we want our system output to only depend on past and present inputs and not future inputs.

**Example - Invertibility**

Consider the **MA**(1) process

$$ x_t = \theta w_{t-1} + w_t $$

Here the current value depend on the previous error \\(w_{t-1}\\) and the current error \\(w_{t}\\). Since we only have observations readily available, and not the error, we want to model this in terms of past observations \\(x_{t-n}\\) instead.

This leads to the notion of an **AR**(\\(\infin\\)) process 

Rewriting the process to

$$ x_t = \theta w_{t-1} + w_t = (1+\theta B) w_t $$

$$ \frac{x_t}{1+\theta B} = w_t $$

We recognize that \\(\frac{1}{1+\theta B}\\) can be expressed as an infinite geometric series with the common factor 1.

$$ \frac{1}{1+\theta B} = \sum_{j=0}^\infin (-\theta) B^j$$

$$ x_t \sum_{j=0}^\infin (-\theta)^j B^j = w_t $$

Which if written out will become

$$ x_t (1 + (-\theta) B + (-\theta)^2 B^2 - ...) = w_t $$

Which we recognize as an infinite AR process **AR**(\\(\infin\\)).

This is much more useful than the actual **MA**(1) process here we only depend on the previous observations and the current error.

**Definition - Invertibility**

A time series is said to be invertible if it can be written as 

$$ \pi(B )x_{t} = \sum_{j=0}^{\infin} \pi_{j} x_{t-j} = w_t $$

Meaning that hte current observation is a linear combination of all previous combinarion and the current error.

This is invertible if \\(\theta(z)\\) has all the roots outside of the unit circle.

**Example - Invertible MA(1) process**

Consider the **MA**(1) process 

$$ x_t = 0.4 w_{t-1} + w_t $$

This can be rewritten to the **AR**(1) process 

$$ x_t \sum_{j=0}^\infin (-0.4)^j B^j = w_t $$

$$ x_t = -0.4 x_t - 0.4^2 x_{t-2} ... + w_t $$

Which is stationary and causal since \\(abs(\phi) < 1\\). The original MA process will have the MA polynomial

$$ \theta(z) = (1+0.4z) $$

With the root \\(z=\frac{1}{0.4}\\) which lies outside of the unit circle.

**Example - Non-invertible MA(1) process**

$$ x_t = 2 w_{t-1} + w_t $$

This can be rewritten to the **AR**(1) process 

$$ x_t \sum_{j=0}^\infin (-2)^j B^j = w_t $$

$$ x_t = -2 x_t - 2^2 x_{t-2} ... + w_t $$

Which is non-stationary and non-causal since \\(abs(\phi) > 1\\). The original MA process will have the MA polynomial

$$ \theta(z) = (1+2z) $$

With the root \\(z=\frac{1}{2}\\) which lies inside of the unit circle.

**Intuition - Invertibility**

We want to be able to express our time series as a linear combination of current and past observations. If \\(abs(\theta) < 1\\) the current observations \\(x_t\\) will depend more on recent observations that on past observations. If \\(abs(\theta) = 1\\) the past observations will have same weight as the current observations.
In many systems(but not all) the two latter is not desireable properties, hence we favour invertible systems.


**Definition - Parameter redundancy**

An ARMA model has no redundant parameters if it does not have duplicate roots in the AR or MA polynomials \\(\phi(z)\\) or \\(\theta(z)\\).