# Linear State Space Models


<a id='index-0'></a>

> “We may regard the present state of the universe as the effect of its past and the cause of its future” – Marquis de Laplace


## Overview

This lecture introduces the **linear state space** dynamic system.

The linear state space system is a generalization of the scalar AR(1) process [we studied before](https://python.quantecon.org/ar1_processes.html).

This model is a workhorse that carries a powerful theory of prediction.

Its many applications include:

- representing dynamics of higher-order linear systems  
- predicting the position of a system $ j $ steps into the future  
- predicting a geometric sum of future values of a variable like  
  - non-financial income  
  - dividends on a stock  
  - the money supply  
  - a government deficit or surplus, etc.  

Let’s start with some imports:

In [None]:
import random

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
from quantecon import LinearStateSpace

%matplotlib inline

plt.rcParams["figure.figsize"] = (11, 5)  #set default figure size

## The Linear State Space Model


<a id='index-1'></a>
The objects in play are:

- An $ n \times 1 $ vector $ x_t $ denoting the **state** at time $ t = 0, 1, 2, \ldots $.  
- An IID sequence of $ m \times 1 $ random vectors $ w_t \sim N(0,I) $.  
- A $ k \times 1 $ vector $ y_t $ of **observations** at time $ t = 0, 1, 2, \ldots $.  
- An $ n \times n $ matrix $ A $  called the **transition matrix**.  
- An $ n \times m $ matrix $ C $  called the **volatility matrix**.  
- A $ k \times n $ matrix $ G $ sometimes called the **output matrix**.  


Here is the linear state-space system


<a id='equation-st-space-rep'></a>
$$
\begin{aligned}
    x_{t+1} & =  A x_t + C w_{t+1}   \\
    y_t &  =  G x_t \nonumber \\
    x_0 & \sim N(\mu_0, \Sigma_0) \nonumber
\end{aligned} \tag{1}
$$


<a id='lss-pgs'></a>

### Primitives

The primitives of the model are

1. the matrices $ A, C, G $  
1. shock distribution, which we have specialized to $ N(0,I) $  
1. the distribution of the initial condition $ x_0 $, which we have set to $ N(\mu_0, \Sigma_0) $  


Given $ A, C, G $ and draws of $ x_0 $ and $ w_1, w_2, \ldots $, the
model [(14.1)](#equation-st-space-rep) pins down the values of the sequences $ \{x_t\} $ and $ \{y_t\} $.

Even without these draws, the primitives 1–3 pin down the *probability distributions* of $ \{x_t\} $ and $ \{y_t\} $.

Later we’ll see how to compute these distributions and their moments.

### Examples

By appropriate choice of the primitives, a variety of dynamics can be represented in terms of the linear state space model.

The following examples help to highlight this point.

They also illustrate the wise dictum *finding the state is an art*.


<a id='lss-sode'></a>

#### Second-order Difference Equation

Let $ \{y_t\} $ be a deterministic sequence that satisfies


<a id='equation-st-ex-1'></a>
$$
y_{t+1} =  \phi_0 + \phi_1 y_t + \phi_2 y_{t-1}
\quad \text{s.t.} \quad
y_0, y_{-1} \text{ given} \tag{2}
$$

To map [(14.2)](#equation-st-ex-1) into our state space system [(14.1)](#equation-st-space-rep), we set

$$
x_t=
\begin{bmatrix}
    1 \\
    y_t \\
    y_{t-1}
\end{bmatrix}
\qquad
A = \begin{bmatrix}
          1 & 0 & 0 \\
          \phi_0 & \phi_1 & \phi_2  \\
          0 & 1 & 0
    \end{bmatrix}
\qquad
C= \begin{bmatrix}
    0 \\
    0 \\
    0
    \end{bmatrix}
\qquad
G = \begin{bmatrix} 0 & 1 & 0 \end{bmatrix}
$$

You can confirm that under these definitions, [(14.1)](#equation-st-space-rep) and [(14.2)](#equation-st-ex-1) agree.

The next figure shows the dynamics of this process when $ \phi_0 = 1.1, \phi_1=0.8, \phi_2 = -0.8, y_0 = y_{-1} = 1 $.


<a id='lss-sode-fig'></a>

In [None]:
def plot_lss(A,
         C,
         G,
         n=3,
         ts_length=50):

    ar = LinearStateSpace(A, C, G, mu_0=np.ones(n))
    x, y = ar.simulate(ts_length)

    fig, ax = plt.subplots()
    y = y.flatten()
    ax.plot(y, 'b-', lw=2, alpha=0.7)
    ax.grid()
    ax.set_xlabel('time', fontsize=12)
    ax.set_ylabel('$y_t$', fontsize=12)
    plt.show()

In [None]:
ϕ_0, ϕ_1, ϕ_2 = 1.1, 0.8, -0.8

A = [[1,     0,     0  ],
     [ϕ_0,   ϕ_1,   ϕ_2],
     [0,     1,     0  ]]
C = np.zeros((3, 1))
G = [0, 1, 0]

plot_lss(A, C, G)

#### Univariate Autoregressive Processes


<a id='index-3'></a>
We can use [(14.1)](#equation-st-space-rep) to represent the model


<a id='equation-eq-ar-rep'></a>
$$
y_{t+1} = \phi_1 y_{t} + \phi_2 y_{t-1} + \phi_3 y_{t-2} + \phi_4  y_{t-3} + \sigma w_{t+1} \tag{3}
$$

where $ \{w_t\} $ is IID and standard normal.

To put this in the linear state space format we take $ x_t = \begin{bmatrix} y_t & y_{t-1} &  y_{t-2} &  y_{t-3} \end{bmatrix}' $ and

$$
A =
\begin{bmatrix}
    \phi_1 & \phi_2 & \phi_3 & \phi_4 \\
    1 & 0 & 0 & 0 \\
    0 & 1 & 0 & 0 \\
    0 & 0 & 1 & 0
\end{bmatrix}
\qquad
C = \begin{bmatrix}
        \sigma \\
        0 \\
        0 \\
        0
    \end{bmatrix}
\qquad
 G = \begin{bmatrix}
         1 & 0  & 0 & 0
     \end{bmatrix}
$$

The matrix $ A $ has the form of the *companion matrix* to the vector
$ \begin{bmatrix}\phi_1 &  \phi_2 & \phi_3 & \phi_4 \end{bmatrix} $.

The next figure shows the dynamics of this process when

$$
\phi_1 = 0.5, \phi_2 = -0.2, \phi_3 = 0, \phi_4 = 0.5, \sigma = 0.2, y_0 = y_{-1} = y_{-2} =
y_{-3} = 1
$$


<a id='lss-uap-fig'></a>

In [None]:
ϕ_1, ϕ_2, ϕ_3, ϕ_4 = 0.5, -0.2, 0, 0.5
σ = 0.2

A_1 = [[ϕ_1,   ϕ_2,   ϕ_3,   ϕ_4],
       [1,     0,     0,     0  ],
       [0,     1,     0,     0  ],
       [0,     0,     1,     0  ]]

C_1 = [[σ],
       [0],
       [0],
       [0]]

G_1 = [1, 0, 0, 0]

plot_lss(A_1, C_1, G_1, n=4, ts_length=200)

#### Seasonals


<a id='index-5'></a>
We can use [(14.1)](#equation-st-space-rep) to represent

1. the *deterministic seasonal* $ y_t = y_{t-4} $  
1. the *indeterministic seasonal* $ y_t = \phi_4 y_{t-4} + w_t $  


In fact, both are special cases of [(14.3)](#equation-eq-ar-rep).

With the deterministic seasonal, the transition matrix becomes

$$
A = \begin{bmatrix}
        0 & 0 & 0 & 1 \\
        1 & 0 & 0 & 0 \\
        0 & 1 & 0 & 0 \\
        0 & 0 & 1 & 0
    \end{bmatrix}
$$

It is easy to check that $ A^4 = I $, which implies that $ x_t $ is strictly periodic with period 4:<sup><a href=#foot1 id=foot1-link>[1]</a></sup>

$$
x_{t+4} = x_t
$$

Such an $ x_t $ process can be used to model deterministic seasonals in quarterly time series.

The *indeterministic* seasonal produces recurrent, but aperiodic, seasonal fluctuations.

#### Time Trends


<a id='index-6'></a>
The model $ y_t = a t + b $ is known as a *linear time trend*.

We can represent this model in the linear state space form by taking


<a id='equation-lss-ltt'></a>
$$
A
= \begin{bmatrix}
    1 & 1  \\
    0 & 1
  \end{bmatrix}
\qquad
C
= \begin{bmatrix}
        0 \\
        0
  \end{bmatrix}
\qquad
G
= \begin{bmatrix}
        a & b
  \end{bmatrix} \tag{4}
$$

and starting at initial condition $ x_0 = \begin{bmatrix} 0 & 1\end{bmatrix}' $.

In fact, it’s possible to use the state-space system to represent polynomial trends of any order.

For instance, we can represent the model $ y_t = a t^2 + bt + c $ in the linear state space form by taking

$$
A
= \begin{bmatrix}
    1 & 1 & 0 \\
    0 & 1 & 1 \\
    0 & 0 & 1
  \end{bmatrix}
\qquad
C
= \begin{bmatrix}
        0 \\
        0 \\
        0
  \end{bmatrix}
\qquad
G
= \begin{bmatrix}
        2a & a + b & c
  \end{bmatrix}
$$

and starting at initial condition $ x_0 = \begin{bmatrix} 0 & 0 & 1 \end{bmatrix}' $.

It follows that

$$
A^t =
\begin{bmatrix}
 1 & t & t(t-1)/2 \\
 0 & 1 & t \\
 0 & 0 & 1
\end{bmatrix}
$$

Then $ x_t^\prime = \begin{bmatrix} t(t-1)/2 &t & 1 \end{bmatrix} $. You can now confirm that $ y_t = G x_t $ has the correct form.

## Distributions and Moments


<a id='index-9'></a>

### Unconditional Moments

Using [(14.1)](#equation-st-space-rep), it’s easy to obtain expressions for the
(unconditional) means of $ x_t $ and $ y_t $.

We’ll explain what *unconditional* and *conditional* mean soon.

Letting $ \mu_t := \mathbb{E} [x_t] $ and using linearity of expectations, we
find that


<a id='equation-lss-mut-linear-models'></a>
$$
\mu_{t+1} = A \mu_t
\quad \text{with} \quad \mu_0 \text{ given} \tag{6}
$$

Here $ \mu_0 $ is a primitive given in [(14.1)](#equation-st-space-rep).

The variance-covariance matrix of $ x_t $ is $ \Sigma_t := \mathbb{E} [ (x_t - \mu_t) (x_t - \mu_t)'] $.

Using $ x_{t+1} - \mu_{t+1} = A (x_t - \mu_t) + C w_{t+1} $, we can
determine this matrix recursively via


<a id='equation-eqsigmalaw-linear-models'></a>
$$
\Sigma_{t+1}  = A \Sigma_t A' + C C'
\quad \text{with} \quad \Sigma_0 \text{ given} \tag{7}
$$

As with $ \mu_0 $, the matrix $ \Sigma_0 $ is a primitive given in [(14.1)](#equation-st-space-rep).

As a matter of terminology, we will sometimes call

- $ \mu_t $ the *unconditional mean*  of $ x_t $  
- $ \Sigma_t $ the *unconditional variance-covariance matrix*  of $ x_t $  


This is to distinguish $ \mu_t $ and $ \Sigma_t $ from related objects that use conditioning
information, to be defined below.

However, you should be aware that these “unconditional” moments do depend on
the initial distribution $ N(\mu_0, \Sigma_0) $.

#### Moments of the Observations

Using linearity of expectations again we have


<a id='equation-lss-umy'></a>
$$
\mathbb{E} [y_t] = \mathbb{E} [G x_t] = G \mu_t \tag{8}
$$

The variance-covariance matrix of $ y_t $ is easily shown to be


<a id='equation-lss-uvy'></a>
$$
\textrm{Var} [y_t] = \textrm{Var} [G x_t] = G \Sigma_t G' \tag{9}
$$

### Distributions


<a id='index-10'></a>
In general, knowing the mean and variance-covariance matrix of a random vector
is not quite as good as knowing the full distribution.

However, there are some situations where these moments alone tell us all we
need to know.

These are situations in which the mean vector and covariance matrix are **sufficient statistics** for the population distribution.

(Sufficient statistics form a list of objects that characterize a population distribution)

One such situation is when the vector in question is Gaussian (i.e., normally
distributed).

This is the case here, given

1. our Gaussian assumptions on the primitives  
1. the fact that normality is preserved under linear operations  


In fact, it’s [well-known](https://en.wikipedia.org/wiki/Multivariate_normal_distribution#Affine_transformation) that


<a id='equation-lss-glig'></a>
$$
u \sim N(\bar u, S)
\quad \text{and} \quad
v = a + B u
\implies
v \sim N(a + B \bar u, B S B') \tag{10}
$$

In particular, given our Gaussian assumptions on the primitives and the
linearity of [(14.1)](#equation-st-space-rep) we can see immediately that  both $ x_t $ and
$ y_t $ are  Gaussian for all $ t \geq 0 $ <sup><a href=#fn-ag id=fn-ag-link>[2]</a></sup>.

Since $ x_t $ is Gaussian, to find the distribution, all we need to do is
find its mean and variance-covariance matrix.

But in fact we’ve already done this, in [(14.6)](#equation-lss-mut-linear-models) and [(14.7)](#equation-eqsigmalaw-linear-models).

Letting $ \mu_t $ and $ \Sigma_t $ be as defined by these equations,
we have


<a id='equation-lss-mgs-x'></a>
$$
x_t \sim N(\mu_t, \Sigma_t) \tag{11}
$$

By similar reasoning combined with [(14.8)](#equation-lss-umy) and [(14.9)](#equation-lss-uvy),


<a id='equation-lss-mgs-y'></a>
$$
y_t \sim N(G \mu_t, G \Sigma_t G') \tag{12}
$$

### Ensemble Interpretations

How should we interpret the distributions defined by [(14.11)](#equation-lss-mgs-x)–[(14.12)](#equation-lss-mgs-y)?

Intuitively, the probabilities in a distribution correspond to relative frequencies in a large population drawn from that distribution.

Let’s apply this idea to our setting, focusing on the distribution of $ y_T $ for fixed $ T $.

We can generate independent draws of $ y_T $ by repeatedly simulating the
evolution of the system up to time $ T $, using an independent set of
shocks each time.

The next figure shows 20 simulations, producing 20 time series for $ \{y_t\} $, and hence 20 draws of $ y_T $.

The system in question is the univariate autoregressive model [(14.3)](#equation-eq-ar-rep).

The values of $ y_T $ are represented by black dots in the left-hand figure

In [None]:
def cross_section_plot(A,
                   C,
                   G,
                   T=20,                 # Set the time
                   ymin=-0.8,
                   ymax=1.25,
                   sample_size = 20,     # 20 observations/simulations
                   n=4):                 # The number of dimensions for the initial x0

    ar = LinearStateSpace(A, C, G, mu_0=np.ones(n))

    fig, axes = plt.subplots(1, 2, figsize=(16, 5))

    for ax in axes:
        ax.grid(alpha=0.4)
        ax.set_ylim(ymin, ymax)

    ax = axes[0]
    ax.set_ylim(ymin, ymax)
    ax.set_ylabel('$y_t$', fontsize=12)
    ax.set_xlabel('time', fontsize=12)
    ax.vlines((T,), -1.5, 1.5)

    ax.set_xticks((T,))
    ax.set_xticklabels(('$T$',))

    sample = []
    for i in range(sample_size):
        rcolor = random.choice(('c', 'g', 'b', 'k'))
        x, y = ar.simulate(ts_length=T+15)
        y = y.flatten()
        ax.plot(y, color=rcolor, lw=1, alpha=0.5)
        ax.plot((T,), (y[T],), 'ko', alpha=0.5)
        sample.append(y[T])

    y = y.flatten()
    axes[1].set_ylim(ymin, ymax)
    axes[1].set_ylabel('$y_t$', fontsize=12)
    axes[1].set_xlabel('relative frequency', fontsize=12)
    axes[1].hist(sample, bins=16, density=True, orientation='horizontal', alpha=0.5)
    plt.show()

In [None]:
ϕ_1, ϕ_2, ϕ_3, ϕ_4 = 0.5, -0.2, 0, 0.5
σ = 0.1

A_2 = [[ϕ_1, ϕ_2, ϕ_3, ϕ_4],
       [1,     0,     0,     0],
       [0,     1,     0,     0],
       [0,     0,     1,     0]]

C_2 = [[σ], [0], [0], [0]]

G_2 = [1, 0, 0, 0]

cross_section_plot(A_2, C_2, G_2)

In the right-hand figure, these values are converted into a rotated histogram
that shows relative frequencies from our sample of 20 $ y_T $’s.

Here is another figure, this time with 100 observations

In [None]:
t = 100
cross_section_plot(A_2, C_2, G_2, T=t)

Let’s now try with 500,000 observations, showing only the histogram (without rotation)

In [None]:
T = 100
ymin=-0.8
ymax=1.25
sample_size = 500_000

ar = LinearStateSpace(A_2, C_2, G_2, mu_0=np.ones(4))
fig, ax = plt.subplots()
x, y = ar.simulate(sample_size)
mu_x, mu_y, Sigma_x, Sigma_y = ar.stationary_distributions()
f_y = norm(loc=float(mu_y), scale=float(np.sqrt(Sigma_y)))
y = y.flatten()
ygrid = np.linspace(ymin, ymax, 150)

ax.hist(y, bins=50, density=True, alpha=0.4)
ax.plot(ygrid, f_y.pdf(ygrid), 'k-', lw=2, alpha=0.8, label=r'true density')
ax.set_xlim(ymin, ymax)
ax.set_xlabel('$y_t$', fontsize=12)
ax.set_ylabel('relative frequency', fontsize=12)
ax.legend(fontsize=12)
plt.show()

The black line is the population density of $ y_T $ calculated from [(14.12)](#equation-lss-mgs-y).

The histogram and population distribution are close, as expected.

By looking at the figures and experimenting with parameters, you will gain a
feel for how the population distribution depends on the model primitives [listed above](#lss-pgs), as intermediated by
the distribution’s sufficient statistics.

#### Ensemble Means

In the preceding figure, we approximated the population distribution of $ y_T $ by

1. generating $ I $ sample paths (i.e., time series) where $ I $ is a large number  
1. recording each observation $ y^i_T $  
1. histogramming this sample  


Just as the histogram approximates the population distribution, the *ensemble* or
*cross-sectional average*

$$
\bar y_T := \frac{1}{I} \sum_{i=1}^I y_T^i
$$

approximates the expectation $ \mathbb{E} [y_T] = G \mu_T $ (as implied by the law of large numbers).

Here’s a simulation comparing the ensemble averages and population means at time points $ t=0,\ldots,50 $.

The parameters are the same as for the preceding figures,
and the sample size is relatively small ($ I=20 $).


<a id='lss-em-fig'></a>

In [None]:
I = 20
T = 50
ymin = -0.5
ymax = 1.15

ar = LinearStateSpace(A_2, C_2, G_2, mu_0=np.ones(4))

fig, ax = plt.subplots()

ensemble_mean = np.zeros(T)
for i in range(I):
    x, y = ar.simulate(ts_length=T)
    y = y.flatten()
    ax.plot(y, 'c-', lw=0.8, alpha=0.5)
    ensemble_mean = ensemble_mean + y

ensemble_mean = ensemble_mean / I
ax.plot(ensemble_mean, color='b', lw=2, alpha=0.8, label='$\\bar y_t$')
m = ar.moment_sequence()

population_means = []
for t in range(T):
    μ_x, μ_y, Σ_x, Σ_y = next(m)
    population_means.append(float(μ_y))

ax.plot(population_means, color='g', lw=2, alpha=0.8, label='$G\mu_t$')
ax.set_ylim(ymin, ymax)
ax.set_xlabel('time', fontsize=12)
ax.set_ylabel('$y_t$', fontsize=12)
ax.legend(ncol=2)
plt.show()

The ensemble mean for $ x_t $ is

$$
\bar x_T := \frac{1}{I} \sum_{i=1}^I x_T^i \to \mu_T
\qquad (I \to \infty)
$$

The limit $ \mu_T $ is a  “long-run average”.

(By *long-run average* we mean the average for an infinite ($ I = \infty $)  number of sample $ x_T $’s)

Another application of the law of large numbers assures us that

$$
\frac{1}{I} \sum_{i=1}^I (x_T^i - \bar x_T) (x_T^i - \bar x_T)' \to \Sigma_T
\qquad (I \to \infty)
$$

### Joint Distributions

In the preceding discussion, we looked at the distributions of $ x_t $ and
$ y_t $ in isolation.

This gives us useful information but doesn’t allow us to answer questions like

- what’s the probability that $ x_t \geq 0 $ for all $ t $?  
- what’s the probability that the process $ \{y_t\} $ exceeds some value $ a $ before falling below $ b $?  
- etc., etc.  


Such questions concern the *joint distributions* of these sequences.

To compute the joint distribution of $ x_0, x_1, \ldots, x_T $, recall
that joint and conditional densities are linked by the rule

$$
p(x, y) = p(y \, | \, x) p(x)
\qquad \text{(joint }=\text{ conditional }\times\text{ marginal)}
$$

From this rule we get $ p(x_0, x_1) = p(x_1 \,|\, x_0) p(x_0) $.

The Markov property $ p(x_t \,|\, x_{t-1}, \ldots, x_0) =  p(x_t \,|\, x_{t-1}) $ and repeated applications of the preceding rule lead us to

$$
p(x_0, x_1, \ldots, x_T) =  p(x_0) \prod_{t=0}^{T-1} p(x_{t+1} \,|\, x_t)
$$

The marginal $ p(x_0) $ is just the primitive $ N(\mu_0, \Sigma_0) $.

In view of [(14.1)](#equation-st-space-rep), the conditional densities are

$$
p(x_{t+1} \,|\, x_t) = N(Ax_t, C C')
$$

## Noisy Observations

In some settings, the observation equation $ y_t = Gx_t $ is modified to
include an error term.

Often this error term represents the idea that the true state can only be
observed imperfectly.

To include an error term in the observation we introduce

- An IID sequence of $ \ell \times 1 $ random vectors $ v_t \sim N(0,I) $.  
- A $ k \times \ell $ matrix $ H $.  


and extend the linear state-space system to


<a id='equation-st-space-rep-noisy'></a>
$$
\begin{aligned}
    x_{t+1} & =  A x_t + C w_{t+1}   \\
    y_t &  =  G x_t + H v_t \nonumber \\
    x_0 & \sim N(\mu_0, \Sigma_0) \nonumber
\end{aligned} \tag{17}
$$

The sequence $ \{v_t\} $ is assumed to be independent of $ \{w_t\} $.

The process $ \{x_t\} $ is not modified by noise in the observation
equation and its moments, distributions and stability properties remain the same.

The unconditional moments of $ y_t $ from [(14.8)](#equation-lss-umy) and [(14.9)](#equation-lss-uvy)
now become


<a id='equation-lss-umy-2'></a>
$$
\mathbb{E} [y_t] = \mathbb{E} [G x_t + H v_t] = G \mu_t \tag{18}
$$

The variance-covariance matrix of $ y_t $ is easily shown to be


<a id='equation-lss-uvy-2'></a>
$$
\textrm{Var} [y_t] = \textrm{Var} [G x_t + H v_t] = G \Sigma_t G' + HH' \tag{19}
$$

The distribution of $ y_t $ is therefore

$$
y_t \sim N(G \mu_t, G \Sigma_t G' + HH')
$$