# CO495 ASML Coursework 2 - Hidden Markov Models

In this coursework, you are asked to implement filtering, smoothing, and optionally Viterbi decoding for discrete and continuous valued HMMs. Input data and initialization is provided and should be used for reproducibility.

In [None]:
import numpy as np
from collections import namedtuple
from scipy.stats import norm
import math

The functions below are here to guide you in your implementation of the EM and Viterbi algorithms for Hidden Markov Models. We follow Section 17.4 of _Machine Learning: A Probabilistic Perspective_ by Kevin Murphy (2012).

You should write vectorized modular code to promote re-usability, efficiency, and readability.

Your task is to complete the implementation and to report the results obtained from the provided initialization. You are strongly encouraged to explore different initialization schemes for the algorithms.

## Helper functions and classes

You may use this function in your implementation.

In [None]:
# def normalize(A, dim=None, precision=1e-9):
#     """This function is taken from Kevin Murphy's code for Machine Learning: a Probabilistic Perspective.

#     Make the entries of a (multidimensional) array sum to 1
#     A, z = normalize(A) normalize the whole array, where z is the normalizing constant
#     A, z = normalize(A, dim)
#     If dim is specified, we normalize the specified dimension only.
#     dim=0 means each column sums to one
#     dim=1 means each row sums to one


#     Set any zeros to one before dividing.
#     This is valid, since s=0 iff all A(i)=0, so
#     we will get 0/1=0

#     Adapted from https://github.com/probml/pmtk3"""
#     z = A.sum(dim)
#     # If z is a scalar, z.shape is an empty tuple and evaluated to False
#     if z.shape:
#         z[np.abs(z) < precision] = 1
#     elif np.abs(z) < precision:
#         return 0, 1
    
#     return A / z, z

In [None]:
def normalize(A, dim=None, precision=1e-9):
    """This function is adapted from Kevin Murphy's code for Machine Learning: a Probabilistic Perspective.

    Make the entries of a (multidimensional) array sum to 1
    A, z = normalize(A) normalize the whole array, where z is the normalizing constant
    A, z = normalize(A, dim)
    If dim is specified, we normalize the specified dimension only.
    dim=0 means each column sums to one
    dim=1 means each row sums to one


    Set any zeros to one before dividing.
    This is valid, since s=0 iff all A(i)=0, so
    we will get 0/1=0

    Adapted from https://github.com/probml/pmtk3"""
    
    if dim is not None and dim > 1:
        raise ValueError("Normalize doesn't support more than two dimensions.")
    
    z = A.sum(dim)
    # If z is a scalar, z.shape is an empty tuple and evaluates to False
    if z.shape:
        z[np.abs(z) < precision] = 1
    elif np.abs(z) < precision:
        return 0, 1
    
    if dim == 1:
        return np.transpose(A.T / z), z
    else:
        return A / z, z

The initial values are provided as namedtuples (initialization.A is the initial value for A)

In [None]:
InitGaussian = namedtuple('InitGaussian', ['A', 'Means', 'Variances', 'pi'])
InitMultinomial = namedtuple('InitMultinomial', ['A', 'B', 'pi'])

## Filtering and Smoothing

Break down your implementation according to the functions below. Feel free to create additional ones whenever you see fit, but the general flow of the algorithm should be made apparent

### Observation model

The core of EM estimation on HMM operates on vectors of probabilities, so the main difference between EM for Gaussian HMM and multinomial HMM is the computation of the observation probabilities and which parameters to estimate.

Complete the two functions below to compute the probabilities of the data for a given observation model and use them in the rest of the algorithm. Your filtering and smoothing steps should be model agnostic.

In [None]:
# import numpy as np
# from scipy.stats import norm
# f1=np.load('data_gaussian.npz')
# f2=np.load('init_gaussian.npz')
# #print f2['arr_0'],f2['arr_1'],f2['arr_2'],f2['arr_3']
# Y_d=f1['arr_0']
# #print Y_d
def computeSmallB_Gaussian(Y, Means, Variances, Nhidden, T):
    """Compute the probabilities for the data points Y for a Gaussian observation model 
        with parameters Means and Variances.
        
        Input parameters:
            - Y: the data
            - Means: vector of the current estimates of the means
            - Variances: vector of the current estimates of the variances
            - Nhidden: number of hidden states
            - T: length of the sequence
        Output:
            - b: vector of observation probabilities
    """
    b=np.zeros((Nhidden,T))#here nh=2
    for i in range(Nhidden):
        for j in range(T):
            b[i][j]=norm.pdf(Y[j],Means[i],Variances[i])
    return b


In [None]:
# import numpy as np
# f=np.load('data_multinomial.npz')
# Y_d= f['arr_0']
# f= np.load('init_multinomial.npz')
# B= f['arr_1']

def computeSmallB_Discrete(Y, B):
    """Compute the probabilities for the data points Y for a multinomial observation model 
        with observation matrix B
        
        Input parameters:
            - Y: the data
            - B: matrix of observation probabilities
        Output:
            - b: vector of observation probabilities
    """
    N = B.shape[0]
    T = len(Y)
    b = np.zeros((N,T))
    for i in range(N):
        for j in range(T):
            b[i][j]=B[i][Y[j]-1]
    return b 
# b= computeSmallB_Discrete(Y_d[1], B)[:,0]*np.array([1,1])
# print  normalize(b, dim=None, precision=1e-9)

### Smoothing and filtering: Estimation step

The E step involves smoothing and filtering. Refer to the course notes and/or to the recommended readings to implement these steps in the functions below.

In [None]:
import numpy as np
def BackwardFiltering(A, b, N, T):
    """Perform backward filtering.
        Input parameters:
            - A: estimated transition matrix (between states)
            - b: estimated observation probabilities (local evidence vector)
            - N: number of hidden states
            - T: length of the sequence
        Output:
            - beta: filtered probabilities
    """
    beta=np.zeros((T,N))
    beta[T-1]=np.ones(N)
    t=T-2
    while t>=0:
        #beta[t]=np.dot(A,(beta[t+1]*b[:,t+1]))#N*N,N,N
        beta[t]=normalize(np.dot(A,(beta[t+1]*b[:,t+1])),dim=None,precision=1e-9)[0]
        t=t-1
    return beta#beta t*n, save T vectors of z_t

In [None]:
def ForwardFiltering(A, b, pi, N, T):
    """Filtering using the forward algorithm (Section 17.4.2 of K. Murphy's book)
    Input:
      - A: estimated transition matrix
      - b: estimated observation probabilities (local evidence vector)
      - pi: initial state distribution pi(j) = p(z_1 = j)
    Output:
      - Filtered belief state at time t: alpha = p(z_t|x_1:t)
      - log p(x_1:T)
      - Z: normalization constant"""
    Z=np.zeros(T)
    alpha=np.zeros((T,N))
    alpha[0]=np.reshape(pi,2)*np.reshape(b[:,0],2)#shape(2,)
    alpha[0],Z[0]=normalize(alpha[0], dim=None, precision=1e-9)
    for t in range(T-1):
        alpha[t+1],Z[t+1] =normalize(b[:,t]*np.dot(A.T,alpha[t]),dim=None,precision=1e-9)
    logProb=math.log(normalize(alpha[T-1])[1])
    for z in Z:
        logProb=logProb+math.log(z)
    return alpha, logProb, Z

In [None]:
def ForwardBackwardSmoothing(A, b, pi, N, T):
    """Smoothing using the forward-backward algorithm.
    Input:
      - A: estimated transition matrix
      - b: local evidence vector (observation probabilities)
      - pi: initial distribution of states
      - N: number of hidden states
      - T: length of the sequence
    Output:
      - alpha: filtered belief state as defined in ForwardFiltering
      - beta: conditional likelihood of future evidence as defined in BackwardFiltering
      - gamma: gamma_t(j) proportional to alpha_t(j) * beta_t(j)
      - lp: log probability defined in ForwardFiltering
      - Z: constant defined in ForwardFiltering"""
    alpha,logProb,Z=ForwardFiltering(A, b, pi, N, T)
    beta=BackwardFiltering(A, b, N, T)
    gamma=np.zeros((T,N))
    for t in range(T):
        gamma[t]=alpha[t]*beta[t]
    return alpha, beta, gamma, logProb, Z

Use the output of SmoothedMarginals in the maximization step for A.

In [None]:
def SmoothedMarginals(A, b, alpha, beta, T, Nhidden):
    "Two-sliced smoothed marginals p(z_t = i, z_t+1 = j | x_1:T)"
    
    marginal = np.zeros((Nhidden, Nhidden, T-1));

    for t in range(T-1):
        marginal[:, :, t] = normalize(A * np.dot(np.reshape(alpha[t, :],(2,1)), np.reshape((b[:, t+1] * beta[t+1, :]),(1,2))))[0]
    
    return marginal

## EM estimation

Implement the main algorithm in the skeletons below.
How can you measure the performance of your model and choose an appropriate convergence criterion?
_Hint: the logProb returned by the ForwardBackwardSmoothing function can be used_.

### Gaussian observation model

In [None]:
def EM_estimate_gaussian(Y, Nhidden, Niter, epsilon, init):
    
    # Dimensions of the data
    N, T = Y.shape
    
    # Initialization
    
    # Initial transition matrix should be stochastic (rows sum to 1)
    A = init.A
    
    # Initial means and variances of the emission probabilities
    Means = init.Means
    Variances = init.Variances;
    
    # Class prior
    pi = init.pi
    
    ###############################################
    # EM algorithm
    
    i = 0
    # Initialize convergence criterion here
    change = 10000#initialize a extremly large number
    while i < Niter and change > epsilon: # and condition on criterion and precision epsilon
        # Iterate here
        print 'iterarion'
        i=i+1
        oldMeans = Means
        oldVariances = Variances
        means_num=np.zeros(Nhidden)
        means_denum=np.zeros(Nhidden)
        var_num=np.zeros(Nhidden)#here only one-dimension of data points is considered
        var_denum=np.zeros(Nhidden)
        pi_num=np.zeros(Nhidden)
        pi_denum=np.zeros(Nhidden)
        alpha = np.zeros((T,Nhidden))
        beta = np.zeros((T,Nhidden))
        #gamma = np.zeros((T,Nhidden))
        b=np.zeros((Nhidden,T))
        for k in range(Nhidden):
            means_num = 0
            means_denum = 0
            for l in range(N):
                b=computeSmallB_Gaussian(Y[l], oldMeans, oldVariances, Nhidden, T)
                alpha, beta, gamma, logProb, Z=ForwardBackwardSmoothing(A, b, pi, Nhidden, T)
                for t in range(T):
                    means_num=means_num+gamma[t][k]*Y[l][t]
                    means_denum=means_denum+gamma[t][k]
            Means[k]=means_num / means_denum
            
#             for l in range(N):
#                 b=computeSmallB_Gaussian(Y[l], oldMeans, oldVariances, Nhidden, T)
#                 alpha, beta, gamma, logProb, Z=ForwardBackwardSmoothing(A, b, pi, Nhidden, T)
#                 for t in range(T):
#                     var_num[k]=var_num[k]+gamma[t][k]*(Y[l][t]-Means[k])**2
#                     var_denum[k]=var_denum[k]+gamma[t][k]
#             Variances[k]=var_num[k]/var_denum[k]
            

            
#         change = np.dot((Means - oldMeans).T,(Means - oldMeans)) + np.dot((Variances-oldVariances).T,(Variances-oldVariances))
        

    return A, Means, Variances, pi

### Multinomial observation model

In the maximization step for B you will have to compute a quantity involving indicators on the values of Y. One efficient way to do it is to pre-compute a representation of Y using _one-hot encoding_. In MATLAB:

```% X sparse coding
Nv = length(unique(Y));
X = zeros(T, Nv);
for i=1:T
    X(i, Y(i)) = 1;
end
% Maximization: emission matrix
B1 = B1 + gamma * X;```

In [None]:
def EM_estimate_multinomial(Y, Nhidden, Niter, epsilon, init):
    
    # Dimensions of the data
    N, T = Y.shape
    
    # Initialization
    
    # Initial transition matrix should be stochastic (rows sum to 1)
    A = init.A
    
    # Observation matrix B
    B = init.B
    
    # Class prior
    pi = init.pi
    
    ###############################################
    # EM algorithm
    
    i = 0
    # Initialize convergence criterion here
    
    while i<Niter: # and condition on criterion and precision epsilon
        # Iterate here
        break
        
    return A, B, pi

## Viterbi decoding

Viterbi decoding should be performed on the smoothed data and most of the algorithm doesn't depend on the output model. To help you, we identified the steps that are model specific. Implement Viterbi decoding by completing the skeleton below. 'smallB' is a function and should be used in the standard way: smallB(x).

In [None]:
def ViterbiDecode(Y, Nhidden, outModel, init):
    
    if outModel == 'gauss':
        A, Mu, Sigma, Pi = EM_estimate_gaussian(Y, Nhidden, 100, 1e-6, init)
        smallB = lambda X : computeSmallB_Gaussian(X, Mu, Sigma, Nhidden, len(X))
    elif outModel == 'multinomial':
        A, B, Pi = EM_estimate_multinomial(Y, Nhidden, 100, 1e-6, init)
        smallB = lambda X : computeSmallB_Discrete(X, B)
    else:
        raise ValueError('Invalid observation model: must be either "gauss" or "multinomial"')
        
    # Implement Viterbi decoding here.
    
    return S

## Demo code

In [None]:
with np.load('init_gaussian.npz') as f:
    init_g = InitGaussian(f['arr_0'], f['arr_1'], f['arr_2'], f['arr_3'])
    
with np.load('init_multinomial.npz') as f:
    init_m = InitMultinomial(f['arr_0'], f['arr_1'], f['arr_2'])

with np.load('data_gaussian.npz') as f:
    Y_c, S_c = f['arr_0'], f['arr_1']

with np.load('data_multinomial.npz') as f:
    Y_d, S_d = f['arr_0'], f['arr_1']
    


A_g, Means_g, Variances_g, Pi_g = EM_estimate_gaussian(Y_c, 2, 10, 1e-6, init_g)
A_m, B_m, Pi_m = EM_estimate_multinomial(Y_d, 2, 100, 1e-6, init_m)
print A_g, Means_g, Variances_g, Pi_g

# S_g = ViterbiDecode(Y_c, 2, 'gauss', init_g)
# S_m = ViterbiDecode(Y_d, 2, 'multinomial', init_m)

# print('*** Viterbi decoding accuracy (Gaussian): {}'.format( (S_c == S_g).sum() / S_c.size ))
# print('*** Viterbi decoding accuracy (Multinomial): {}'.format( (S_d == S_m).sum() / S_d.size ))