# CS5340 Lecture 8:  HMMs #

Lecturer: Harold Soh (harold@comp.nus.edu.sg)

Graduate TAs: Abdul Fatir Ansari and Chen Kaiqi (AY19/20)

This notebook is a supplement to Lecture 8 of CS5340: Uncertainty Modeling in AI

The material uses the hmmlearn package and is based on the tutorial provided by the hmmlearn package (https://hmmlearn.readthedocs.io/en/latest/tutorial.html)

To install hmmlearn, please refer to: https://github.com/hmmlearn/hmmlearn. 
Typically, to install: 

```pip install --upgrade --user hmmlearn```


In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

from hmmlearn import hmm
from scipy.optimize import linear_sum_assignment
from sklearn.metrics.pairwise import euclidean_distances

In [None]:
# for printing
np.set_printoptions(formatter={'float': lambda x: "{0:0.2f}".format(x)})

## Creating our HMM ##

Let us first create a Hidden Markov Model (we call HMM_A) where we know all the parameters


In [None]:
# the start probabilities (pi)
startprob = np.array([0.6, 0.3, 0.1, 0.0])

# The transition matrix (A)
# each row represents the transition probability from one component to the others
transmat =  np.array([[0.7, 0.3, 0.0, 0.0],
                      [0.4, 0.1, 0.3, 0.2],
                      [0.1, 0.1, 0.7, 0.1],
                      [0.4, 0.0, 0.1, 0.5]])

# Next comes the emission probabilities (\phi)
# The means of each component
means = np.array([[0.0,  5.0],
                  [5.0, 5.0],
                  [0.0, 0.0],
                  [5.0, 0.0]])

# The covariance of each component
var_param = 1.0 # you can play with this parameter to increase/decrease the spread of the observations
covars = var_param * np.tile(np.identity(2), (4, 1, 1))

# Build our HMM with the parameters above
HMM_A = hmm.GaussianHMM(n_components=4, covariance_type="full")

# Instead of fitting it from the data, we directly set the estimated
# parameters, the means and covariance of the components
HMM_A.startprob_ = startprob
HMM_A.transmat_ = transmat
HMM_A.means_ = means
HMM_A.covars_ = covars

## Sample from our HMM ##

We can then sample trajectories from HMM.

In [None]:
# Generate one long sequence
X, Z = HMM_A.sample(20)

# Plot the sampled data
plt.plot(X[:, 0], X[:, 1], ".-", label="observations", ms=6,
         mfc="orange", alpha=0.7)

# Indicate the component numbers
rooms = ["bedroom", "toilet", "living room", "kitchen"]
for i, m in enumerate(means):
    plt.text(m[0], m[1], '%s' % rooms[i],
             size=17, horizontalalignment='center',
             bbox=dict(alpha=.7, facecolor='w'))
plt.legend(loc='best')
plt.show()

## Learn a new HMM from data ##

Here, we will learn a new HMM model (HMM_B) using data sampled from our known HMM model above.

In [None]:
# generate multiple sequences
M = 100 # number of sequences
N = 10 # each sequence length
X, Z = HMM_A.sample(N)
L = len(X)
for i in range(M-1):
    Xtemp, Ztemp = HMM_A.sample(N)
    X = np.concatenate([X, Xtemp])
    Z = np.concatenate([Z, Ztemp])
    L = np.append(L, len(Xtemp))

In [None]:
HMM_B = hmm.GaussianHMM(n_components=4, covariance_type="full", n_iter=100, verbose=True)
HMM_B.fit(X,L)

### After Learning ###
Let's check if the model has learnt the correct parameters.

*Note*: the component indices may not match; you want to verify that you can find a matching component for each of the means. We will use the Hungarian algorithm to try to find best matches.

In [None]:
print("Component Means")
print("Learnt")
print(HMM_B.means_)
print("True")
print(HMM_A.means_)

In [None]:
# we can try to match the components using the Hungarian algorithm
cost = euclidean_distances( HMM_A.means_, HMM_B.means_)
row_ind, col_ind = linear_sum_assignment(cost)
# print(row_ind)
# print(col_ind)

def remapMeans(A, ind):
    B = np.array(A)
    for i in range(B.shape[0]):
        B[i,:] = A[ind[i], :]
    return B

def remapMat(A, ind):
    B = np.array(A)
    for i in range(B.shape[0]):
        B[i,:] = A[ind[i], ind]
    return B


In [None]:
means_remap = remapMeans(HMM_B.means_, col_ind)
print("Learnt Means")
print(means_remap)
print("True Means")
print(HMM_A.means_)

plt.scatter(means_remap[:,0], means_remap[:,1])
plt.scatter(HMM_A.means_[:,0], HMM_A.means_[:,1], marker='+' )
plt.legend(["Learnt", "True"])


In [None]:
print("Transition Probabilities")
print("Learnt A")
trans_remap = remapMat(HMM_B.transmat_, col_ind)
print(trans_remap)
print("True A")
print(HMM_A.transmat_)

plt.subplot(121)
plt.imshow(trans_remap, vmin=0.0, vmax=1.0)
plt.title("Learnt Transitions")
plt.colorbar()
plt.subplot(122)
plt.imshow(HMM_A.transmat_, vmin=0.0, vmax=1.0)
plt.title("True Transitions")
plt.colorbar()

In [None]:
# predict the latent components using the relearned model 
Zpred = HMM_B.predict(X)

In [None]:
print(Zpred)
print(Z)