***

*Course:* [Math 535](https://people.math.wisc.edu/~roch/mmids/) - Mathematical Methods in Data Science (MMiDS)  
*Chapter:* 7-Probabilistic models   
*Author:* [Sebastien Roch](https://people.math.wisc.edu/~roch/), Department of Mathematics, University of Wisconsin-Madison  
*Updated:* Jan 8, 2024   
*Copyright:* &copy; 2024 Sebastien Roch

***

In [None]:
# IF RUNNING ON GOOGLE COLAB, UNCOMMENT THE FOLLOWING CODE CELL
# When prompted, upload: 
#     * mmids.py
# from your local file system
# Files at: https://github.com/MMiDS-textbook/MMiDS-textbook.github.io/tree/main/utils
# Alternative instructions: https://colab.research.google.com/notebooks/io.ipynb

In [None]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
    print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

In [None]:
# PYTHON 3
import numpy as np
from numpy import linalg as LA
from numpy.random import default_rng
rng = default_rng(535)
import matplotlib.pyplot as plt
import pandas as pd
import networkx as nx
import mmids

## Motivating example: location tracking

Suppose we let loose a cyborg corgi in a large park. We would like to know where it is at all time. For this purpose, it has an implanted location device that sends a signal to a tracking app.   

Here is an example of the data we might have. The red dots are recorded locations at reguler time intervals. The dotted line helps keep track of the time order of the recordings. (We will explain later how this dataset is generated.)

In [None]:
ss = 4
os = 2
F = np.array([[1., 0., 1., 0.],[0., 1., 0., 1.],[0., 0., 1., 0.],[0., 0., 0., 1.]]) 
H = np.array([[1., 0., 0., 0.],[0., 1, 0., 0.]])
Q = 0.1 * np.diag(np.ones(ss))
R = 10 * np.diag(np.ones(os))
x_0 = np.array([0., 0., 1., 1.])
T = 50
x, y = mmids.lgSamplePath(ss, os, F, H, Q, R, x_0, T)
plt.plot(y[0,:], y[1,:], marker='o', c='r', linestyle='dotted')
plt.xlim((np.min(y[0,:])-5, np.max(y[0,:])+5)) 
plt.ylim((np.min(y[1,:])-5, np.max(y[1,:])+5))
plt.show()

By convention, we start at $(0,0)$. Notice how squiggly the trajectory is. One issue might be that the times  at which the location is recorded are too far between. But, in fact, there is another issue: the tracking device is *inaccurate*. 

To get a better estimate of the true trajectory, it is natural to try to model the noise in the measurement as well as the dynamics itself. Probabilistic models are perfectly suite for this. 

In this chapter, we will encounter of variety of such models and show how to take advantage of them to estimate unknown states (or parameters). In particular, conditional independence will play a key role.

We will come back to location tracking later in the chapter.

$\newcommand{\P}{\mathbb{P}}$
$\newcommand{\E}{\mathbb{E}}$
$\newcommand{\S}{\mathcal{S}}$
$\newcommand{\X}{\mathcal{X}}$
$\newcommand{\var}{\mathrm{Var}}$
$\newcommand{\btheta}{\boldsymbol{\theta}}$
$\newcommand{\bbeta}{\boldsymbol{\beta}}$
$\newcommand{\bphi}{\boldsymbol{\phi}}$
$\newcommand{\bpi}{\boldsymbol{\pi}}$
$\newcommand{\bmu}{\boldsymbol{\mu}}$
$\newcommand{\bSigma}{\boldsymbol{\Sigma}}$
$\newcommand{\balpha}{\boldsymbol{\alpha}}$
$\newcommand{\indep}{\perp\!\!\!\perp}$  

## Background: introduction to parametric families, generalized linear models and maximum likelihood estimation

**NUMERICAL CORNER** In Numpy, the module [numpy.random](https://numpy.org/doc/stable/reference/random/index.html) provides a way to sample from a variety of standard distributions. We first initialize the [pseudorandom number generator](https://en.wikipedia.org/wiki/Pseudorandom_number_generator) with a [random seed](https://en.wikipedia.org/wiki/Random_seed). In particular it allows the results to be reproducible: using the same seed produces the same results again.

In [None]:
seed = 535
rng = np.random.default_rng(535)

We then set the distribution and its parameters. Here's are lists of available [probability distributions](https://numpy.org/doc/stable/reference/random/generator.html#distributions).

In [None]:
p = 0.1 # probability of success
N = 5 # number of samples
rng.binomial(1,p,size=N) # Bernoulli is special case of binomial with 1 trial

Here are a few other examples.

In [None]:
p = [0.1, 0.2, 0.7]
n = 100
rng.multinomial(n,p,size=N)

In [None]:
mu = np.array([0.1, -0.3])
sig = np.array([[2., 0.],[0., 3.]])
rng.multivariate_normal(mu,sig,size=N)

$\unlhd$

$\newcommand{\P}{\mathbb{P}}$
$\newcommand{\E}{\mathbb{E}}$
$\newcommand{\S}{\mathcal{S}}$
$\newcommand{\var}{\mathrm{Var}}$
$\newcommand{\bmu}{\boldsymbol{\mu}}$
$\newcommand{\bSigma}{\boldsymbol{\Sigma}}$
$\newcommand{\btheta}{\boldsymbol{\theta}}$
$\newcommand{\bpi}{\boldsymbol{\pi}}$
$\newcommand{\indep}{\perp\!\!\!\perp}$
$\newcommand{\bp}{\mathbf{p}}$
$\newcommand{\bx}{\mathbf{x}}$
$\newcommand{\bX}{\mathbf{X}}$
$\newcommand{\by}{\mathbf{y}}$
$\newcommand{\bY}{\mathbf{Y}}$
$\newcommand{\bz}{\mathbf{z}}$
$\newcommand{\bZ}{\mathbf{Z}}$
$\newcommand{\bw}{\mathbf{w}}$
$\newcommand{\bW}{\mathbf{W}}$
$\newcommand{\bv}{\mathbf{v}}$
$\newcommand{\bV}{\mathbf{V}}$

## Linear-Gaussian models and Kalman filtering

**Implementing the Kalman filter** We implement the Kalman filter as described above with known covariance matrices. We take $\Delta = 1$ for simplicity. The code is adapted from [[Mur](https://github.com/probml)].

We will test Kalman filtering on a simulated path drawn from the linear-Gaussian model above. The following function creates such a path and its noisy observations.

In [None]:
seed = 535
rng = np.random.default_rng(seed)

In [None]:
def lgSamplePath(ss, os, F, H, Q, R, x_0, T):
    x = np.zeros((ss,T)) 
    y = np.zeros((os,T))
    x[:,0] = x_0
    ey = np.zeros(os)
    ey = rng.multivariate_normal(np.zeros(os),R) 
    y[:,0] = H @ x[:,0] + ey
    
    for t in range(1,T):
        ex = np.zeros(ss)
        ex = rng.multivariate_normal(np.zeros(ss),Q) # noise on x_t
        x[:,t] = F @ x[:,t-1] + ex
        ey = np.zeros(os)
        ey = rng.multivariate_normal(np.zeros(os),R) # noise on y_t
        y[:,t] = H @ x[:,t] + ey
    
    return x, y

Here is an example. Here $\bSigma$ is denoted as $V$. In the plot, the blue crosses are the unobserved true path and the orange dots are the noisy observations.

In [None]:
ss = 4 # state size
os = 2 # observation size
F = np.array([[1., 0., 1., 0.],[0., 1., 0., 1.],[0., 0., 1., 0.],[0., 0., 0., 1.]]) 
H = np.array([[1., 0., 0., 0.],[0., 1, 0., 0.]])
Q = 0.1 * np.diag(np.ones(ss))
R = 10 * np.diag(np.ones(os))
x_0 = np.array([0., 0., 1., 1.]) # initial state
T = 50
x, y = lgSamplePath(ss, os, F, H, Q, R, x_0, T)

In [None]:
plt.plot(y[0,:], y[1,:], marker='o', c='r', linestyle='dotted')
plt.xlim((np.min(y[0,:])-5, np.max(y[0,:])+5)) 
plt.ylim((np.min(y[1,:])-5, np.max(y[1,:])+5))
plt.show()

In [None]:
plt.plot(x[0,:], x[1,:], marker='x', c='g', linestyle='dashed', alpha=0.5)
plt.xlim((np.min(x[0,:])-5, np.max(x[0,:])+5)) 
plt.ylim((np.min(x[1,:])-5, np.max(x[1,:])+5))
plt.scatter(y[0,:], y[1,:], c='r')
plt.show()

The following function implements the Kalman filter. Here $A$ is $F$ and $C$ is $H$. The full recursion is broken up into several steps.

In [None]:
def kalmanUpdate(ss, A, C, Q, R, y_t, mu_prev, Sig_prev):
    mu_pred = A @ mu_prev
    Sig_pred = A @ Sig_prev @ A.T + Q
    e_t = y_t - C @ mu_pred # error at time t
    S = C @ Sig_pred @ C.T + R
    Sinv = LA.inv(S)
    K = Sig_pred @ C.T @ Sinv # Kalman gain matrix
    mu_new = mu_pred + K @ e_t
    Sig_new = (np.diag(np.ones(ss)) - K @ C) @ Sig_pred
    return mu_new, Sig_new

In [None]:
def kalmanFilter(ss, os, y, A, C, Q, R, init_mu, init_Sig, T):
    mu = np.zeros((ss, T))
    Sig = np.zeros((ss, ss, T))
    mu[:,0] = init_mu
    Sig[:,:,0] = init_Sig

    for t in range(1,T):
        mu[:,t], Sig[:,:,t] = kalmanUpdate(ss, A, C, Q, R, y[:,t], mu[:,t-1], Sig[:,:,t-1])

    return mu, Sig

We apply this to the example above. The inferred unobserved states are in green.

In [None]:
init_mu = x_0
init_Sig = 1 * np.diag(np.ones(ss))
mu, Sig = kalmanFilter(ss, os, y, F, H, Q, R, init_mu, init_Sig, T)

In [None]:
plt.plot(x[0,:], x[1,:], marker='x', c='g', linestyle='dashed', alpha=0.5)
plt.xlim((np.min(x[0,:])-5, np.max(x[0,:])+5)) 
plt.ylim((np.min(x[1,:])-5, np.max(x[1,:])+5))
plt.scatter(y[0,:], y[1,:], c='r')
plt.plot(mu[0,:], mu[1,:], marker='s', linewidth=2)
plt.show()

To quantify the improvement in the inference compared to the observations, we compute the mean squared error in both cases.

In [None]:
dobs = x[0:1,:] - y[0:1,:]
mse_obs = np.sqrt(np.sum(dobs**2))
print(mse_obs)

In [None]:
dfilt = x[0:1,:] - mu[0:1,:]
mse_filt = np.sqrt(np.sum(dfilt**2))
print(mse_filt)

We indeed observe a reduction.

**Missing data** We can also allow for the possibility that some observations are missing. Imagine for instance losing GPS signal while going through a tunnel. The recursions above are still valid, with the only modification that the $\bY_t$ and $H$ terms are dropped at those times $t$ where there is no observation. In Numpy, we can use [`NaN`](https://numpy.org/doc/stable/reference/constants.html#numpy.nan). (Alternatively, one can use the [numpy.ma](https://numpy.org/doc/stable/reference/maskedarray.generic.html) module.) 

We use a same sample path as above, but mask observations at times $t=10,\ldots,20$.

In [None]:
ss = 4
os = 2
F = np.array([[1., 0., 1., 0.],[0., 1., 0., 1.],[0., 0., 1., 0.],[0., 0., 0., 1.]]) 
H = np.array([[1., 0., 0., 0.],[0., 1, 0., 0.]])
Q = 0.01 * np.diag(np.ones(ss))
R = 10 * np.diag(np.ones(os))
x_0 = np.array([0., 0., 1., 1.])
T = 30
x, y = lgSamplePath(ss, os, F, H, Q, R, x_0, T)

In [None]:
for i in range(10,20):
    y[0,i] = np.nan
    y[1,i] = np.nan

Here is the sample we are aiming to infer.

In [None]:
plt.plot(x[0,:], x[1,:], marker='x', c='g', linestyle='dashed', alpha=0.5)
plt.xlim((np.min(x[0,:])-5, np.max(x[0,:])+5)) 
plt.ylim((np.min(x[1,:])-5, np.max(x[1,:])+5))
plt.scatter(y[0,:], y[1,:], c='r')
plt.show()

We modify the recursion accordingly.

In [None]:
def kalmanUpdate(ss, A, C, Q, R, y_t, mu_prev, Sig_prev):
    mu_pred = A @ mu_prev
    Sig_pred = A @ Sig_prev @ A.T + Q
    if np.isnan(y_t[0]) or np.isnan(y_t[1]):
        return mu_pred, Sig_pred
    else:
        e_t = y_t - C @ mu_pred # error at time t
        S = C @ Sig_pred @ C.T + R
        Sinv = LA.inv(S)
        K = Sig_pred @ C.T @ Sinv # Kalman gain matrix
        mu_new = mu_pred + K @ e_t
        Sig_new = (np.diag(np.ones(ss)) - K @ C) @ Sig_pred
        return mu_new, Sig_new

In [None]:
init_mu = x_0
init_Sig = 1 * np.diag(np.ones(ss))
mu, Sig = kalmanFilter(ss, os, y, F, H, Q, R, init_mu, init_Sig, T)

In [None]:
plt.plot(x[0,:], x[1,:], marker='x', c='g', alpha=0.5)
plt.xlim((np.min(x[0,:])-5, np.max(x[0,:])+5)) 
plt.ylim((np.min(x[1,:])-5, np.max(x[1,:])+5))
plt.scatter(y[0,:], y[1,:], c='r', alpha=0.5)
plt.plot(mu[0,:], mu[1,:], marker='s', linewidth=2)
plt.show()