(Turn on equation numbering first.)

In [None]:
%%javascript
MathJax.Hub.Config({
  TeX: { equationNumbers: { autoNumber: "AMS" } }
});

This note book is about learning some theory basics on time series and their analysis. _Stationary_ time series play an important role in time series analysis. Therefore, let's start with a definition: a time series $\{y_t\}$ is _covariance stationary_ (or _weakly stationary_) if its expectation value and (auto-)covariances are time independent:
\begin{align}
\mathrm{E}[y_t] &= \mu \label{eq:stationary1} \\
\mathrm{E}[(y_t - \mu)(y_{t-j} - \mu)] &= \gamma_j \label{eq:stationary2}
\end{align}
For the rest of this notebook, we will refer to _covariance stationary_ using the short-hand (but somehow imprecise) _stationary_.

As an example consider a simple random walk model
\begin{equation}
y_t = y_{t-1} + u_t \label{eq:rw}
\end{equation}
where $u_t$ is Gaussian white noise ($u_t \sim \mathcal{N}(0,\sigma^2)$). For simplicity we define $y_0 = 0$ which simplifies \eqref{eq:rw} to 
\begin{equation}
y_t = \sum_{i=1}^t u_t. \label{eq:rw_simple}
\end{equation}

So let's start with generating an example time series for 100 time steps and plot it.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from math import sqrt
%matplotlib inline

In [None]:
timesteps = 100
# set mean and variance of the noise
mean = 0
var = 1
# generate the Gaussian white noise
noise = np.random.normal(mean,sqrt(var),size=timesteps)
# construct and plot the time series
y = np.cumsum(noise)
plt.plot(y)
plt.xlabel('$t$')
plt.ylabel('$y_t$');

The question now is whether time series generated by such a model are _stationary_. In order to answer this question, we need to check the conditions \eqref{eq:stationary1} and \eqref{eq:stationary2} outlined above. This can be done by generating an ensemble of time series for this model and evaluate the mean and the (auto-)covariances as function of the timestep $t$.

In [None]:
ensemble_size = 1000
# generate an ensemble of the Gaussian white noise
noise = np.random.normal(mean,sqrt(var),size=(timesteps,ensemble_size))
# summing along the first axis yields the different time series
y = np.cumsum(noise,axis=0)
# make an example plot for the first 5 time series of the ensemble
plt.plot(y[:,:5])
plt.xlabel('$t$')
plt.ylabel('$y_t$');

Now that we have an ensemble of time series generated by this model, we can test the conditions for stationarity.

In [None]:
# get the mean and covariances as function of the time
# this time we average of the second axis which corresponds to the different time series in the ensemble
means = np.mean(y,axis=1)
# calculate variance from ensemble (correct for bias)
vars = np.var(y,axis=1,ddof=1)
_,(ax1,ax2) = plt.subplots(nrows = 2, sharex=True,figsize=(8,8))
ax1.plot(means)
ax1.set_xlabel('$t$')
ax1.set_ylabel('$\mathrm{E}[y_t]$')
ax2.plot(vars)
ax2.set_xlabel('$t$')
ax2.set_ylabel('$\sigma^2(y_t)$');

As one can see the expected values for $y_t$ are rather stable while the variance $\sigma^2 (y_t)$ is increasing linearly with $t$. This can easily be seen from equation \eqref{eq:rw_simple} above since $$\sigma^2 (y_t) = \sum_{i=1}^t \sigma^2 (u_t) = t \sigma^2.$$
Therefore, random walk models do **not** produce stationary time series.

_Moving average processes of order $q$_ form another class of time series and are called MA(q). They are defined by

\begin{equation}
y_t = \mu + u_t + \sum_{i=1}^q \theta_i u_{t - i} = \mu + \theta^\intercal u \label{eq:maq}
\end{equation}

with $\theta^\intercal = \left(\theta_0 = 1, \theta_1 \dots \theta_q \right)$ and $u = \left(u_t, u_{t-1} \dots u_{t-q} \right)^\intercal$ where $\mu$ and $\theta_i$ are constants while $\{u_t\}$ is white noise ($\sim iid(0,\sigma^2)$). From \eqref{eq:maq} one can easily derive the expectation value and (auto-)covariances for $y_t$:

\begin{align}
\mathrm{E}[y_t] &= \mu \\
\gamma_0 &= \sigma^2 \sum_{i=0}^q \theta_i^2 \\
\gamma_j &= \begin{cases} \sigma^2 \sum_{i=1}^q \left[ \theta_i \sum_{j=0}^{i - 1} \theta_j \right]& \text{if } 1 \le j \le q \\ 0 & \text{if } j > q\end{cases}
\end{align}

Since these are all time-independent, MA(q) models generate stationary time series.

In [None]:
# set coefficients (up to 15) and add theta_0 = 1 at the beginning
thetas = np.random.rand(15)
thetas = np.insert(thetas,0,1)

def ma_q(mean,variance,theta,q=3):
    q += 1
    y = np.zeros((timesteps,ensemble_size))
    ut = np.zeros((q,ensemble_size))
    for t in np.arange(timesteps):
        ut[:-1,:] = ut[1:,:]
        ut[-1,:] = np.random.normal(0,sqrt(variance),size=(1,ensemble_size))
        y[t,:] = mean + np.dot(theta,ut)
    
    return y
    
y = ma_q(0.7,1,thetas[:4])
# make an example plot for the first 5 time series of the ensemble
plt.plot(y[:,:5])
plt.xlabel('$t$')
plt.ylabel('$y_t$');

In [None]:
# get the mean and covariances as function of the time
means = np.mean(y,axis=1)
# calculate variance from ensemble (correct for bias)
vars = np.var(y,axis=1,ddof=1)
_,(ax1,ax2) = plt.subplots(nrows = 2, sharex=True,figsize=(8,8))
ax1.plot(means)
ax1.set_xlabel('$t$')
ax1.set_ylabel('$\mathrm{E}[y_t]$')
ax2.plot(vars)
ax2.set_xlabel('$t$')
ax2.set_ylabel('$\sigma^2(y_t)$');

In [None]:
print(np.sum(thetas[:4]**2))