# Hidden Markov Models (HMMs) in Pomegranate 
## Instructions:


In case you haven't already installed pomegranate on your own machine, you can find out how to install it here: <a href="https://pomegranate.readthedocs.io/en/latest/install.html">https://pomegranate.readthedocs.io/en/latest/install.html</a>

* Go through the notebook and complete the tasks. 
* Make sure you understand the examples given. If you need help, refer to the Essential readings or the documentation link provided, or go to the Topic 9 discussion forum. 
* Save your notebooks when you are done.
 
For information on pomegranate, you might want to refer to the official documentation:
<a href="https://pomegranate.readthedocs.io">https://pomegranate.readthedocs.io</a>
The specific page related to building an HMM can be found here: <a href="https://pomegranate.readthedocs.io/en/latest/HiddenMarkovModel.html">https://pomegranate.readthedocs.io/en/latest/HiddenMarkovModel.html</a>


**Task**
This lab gives you the chance to build a 3 state HMM to solve the activity recognition problem of detecting sitting, standing, or walking, using only a single observation (the y-axis accelerometer signals from a person's leg). 
 

### 1. Generate some artificial data sequences for sit, stand, and walk
 This first section of code simulates data from an accelerometer strapped to a person's upper leg. When the person stands still, the acc_y detects 1g of gravity. When the person sits, the accelerometer is horizontal and reads 0g. And when the person walks, a sine-like pattern can be measured as the accelerometer moves with their leg. 

Run the following code to generate (and save to .csv files) two sets of data: a training set, and a separate test set.   

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt

def generate_sequence( activity_ground_truth, wnd_len = 10, noise_div=5 ):
    out = []
    labels = []
    # generate some fake data that follows the state sequence activity_ground_truth
    for act in activity_ground_truth:
        noise = (np.random.ranf(wnd_len)-0.5)/noise_div
        if act=='W': out.extend(np.sin(np.arange(0,wnd_len/2,.5))/2 + noise + 1) 
        if act=='-': out.extend(noise )
        if act=='^': out.extend(noise + 1 )
        labels.extend([act for _ in np.arange(0,wnd_len)])

    assert(len(labels)==len(out))

    index = pd.Series(np.round(np.arange(0,len(labels)/wnd_len,1/wnd_len),2),name='time') #pd.TimedeltaIndex(start='0',periods=len(labels),freq='s',name='time')
    data = pd.DataFrame(data={'acc_y':out,'ground':labels},index=index)
    
    return data


# Class labels:
# - sit
# ^ stand
# W walk

train_ground = np.array(list('---^^^----WWWWWWWWW^^^^^-----^WWWWW^--W^^^WW'))
generate_sequence( train_ground, wnd_len = 10, noise_div=5 ).to_csv('sit_stand_walk.csv')

test_ground = np.array(list('^^^^^^WWWW^^^-------WWWW--^^^^W^^'))
generate_sequence( test_ground, wnd_len = 9, noise_div=2 ).to_csv('sit_stand_walk_test.csv')


####  Load and plot the datasets
You can now load the artificial dataset and plot the data by running the following cell:

In [None]:
fig = plt.figure(figsize=(15,3))
axes = fig.subplots(1,2)

train=pd.read_csv('sit_stand_walk.csv',index_col='time')
train.plot(ax=axes[0], title='training data')
test=pd.read_csv('sit_stand_walk_test.csv',index_col='time')
test.plot(ax=axes[1], title='test data')

classes = np.unique(train.ground)


### 2. Build an HMM by hand using Pomegranate

Use the pomegranate module to build a single HMM consisting of 3 states: sit, stand, and walk.
Some skeleton code is provided below for you. Fill in the missing parts. 

**Hints**: Each of the 3 states should have a single observation distribition (you can use ``pm.NormalDistribution``). Then add initial transitions within states and between them, estimating the likely transtion probabilities. Remember to add an initial ``model.start`` state, from which you can link to all of the main 3 states. Finally, call the ``model.bake`` method to prepare your HMM for use. 


In [None]:
import pomegranate as pm

# Create an initial HMM with sit, stand, and walk the latent variables
model = pm.HiddenMarkovModel()

# Generate some Gaussian observation
obs_sit = pm.NormalDistribution(0, 0.1) 
obs_stand = # your code here...
obs_walk = # your code here...

# Create the 3 states
sit = pm.State(obs_sit, name='-')
stand = pm.State(obs_stand, name='^')
walk = pm.State(obs_walk, name='W')

model.add_states(sit, stand, walk)
model.add_transition(model.start, sit, .33)
model.add_transition(model.start, stand, .33)
model.add_transition(model.start, walk, .33)

# Possible transition probabilities
model.add_transition(sit, sit, 0.8)
model.add_transition(sit, stand, 0.1)
model.add_transition(sit, walk, 0.1)

# ...your code for more transitions here...

# Finally, once the model is set up you need to 'bake' it
model.bake()



In [None]:
# You can look at the entire structure of the HMM by calling:
print(model)


In [None]:

# Pomegranate's predict function returns the most likely state sequence using
# the numbered indexes of each state. For this reason, it's useful to have a helper
# function that lets us map between those indexes and our class labels
def map_states_to_labels( state_seq, model ):
    # This is a useful helper function to map between the pomegranate model's 
    # assigned state numbers (0,1,2) and our class labels ('-','W','^')
    class_map = {}
    for idx, state in enumerate(model.states):
        class_map[idx] = state.name
    return np.array([class_map[x] for x in state_seq])


### 3. Predict the most likely state sequence given the training data

First evaluate the state sequence prediction capabilities of your model on the training data. 

In [None]:
pred_states = model.predict(train.acc_y)

# convert the numeric prediction sequence to label format
train['pred'] = map_states_to_labels( pred_states, model )

# Calculate the confusion matrix, and class-relative precision and recall values
import sklearn.metrics as metrics

print('Traing set evaluation')
print(classes)
print(metrics.confusion_matrix(train.ground, train.pred, labels=classes))
print(metrics.classification_report(train.ground, train.pred))

### 4. Evaluate using the kept-aside test set
Now run the evaluation using the test set to estimate its response to previously unseen data.

In [None]:
test['pred'] = # your code here...

print('Test set evaluation')
print(classes)
print(metrics.confusion_matrix(test.ground, test.pred, labels=classes))
print(metrics.classification_report(test.ground, test.pred))

### 5. Train a new model using the data

Let's copy our initial model (using ``model.copy()``) and properly train its parameters using the training dataset. Evaluate this updated model on both the training and testing set. How does this compare to your completely hand-crafted model?


In [None]:
m2 = model.copy()

# re-train the model to fit the data more closely
# m2.fit( # your code here...
m2.bake()

# your code here...


print('Training set evaluation')

# your code here...


In [None]:
# evaluate its performance on the test set

# your code here...


### 6. Discussion suggestion

Does your hand-crafted model perform better or worse than the trained model (on the kept-aside test set)? Consider why might this be the case.


----


**Answer**  

