### Hidden Markov Models

**OBJECTIVES**

- Introduce Markov Models
- Introduce Hidden Markov Models
- Use HMM's to investigate time series data
- Use HMM's to classify speech



### Markov Models
<center>
<img src = https://upload.wikimedia.org/wikipedia/commons/thumb/2/2b/Markovkate_01.svg/440px-Markovkate_01.svg.png />
</center>



### Hidden Markov Models

<center>
 <img src = https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/HiddenMarkovModel.svg/600px-HiddenMarkovModel.svg.png />
</center>

In [None]:
from IPython.display import Audio
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

In [None]:
import pandas_datareader as pdr

In [None]:
#!pip install pandas_datareader
#!pip install lxml

In [None]:
#get some stock data


In [None]:
#plot it


### HMMLearn

We will use the `hmmlearn` library to implement our hidden markov model.  Here, we use the `GaussianHMM` class.  Depending on the nature of your data you may be interested in a different probability distribution, we will see more here in *Bayes's Week*!

- **HMM Learn**: [here](https://hmmlearn.readthedocs.io/en/latest/tutorial.html)

In [None]:
# !pip install hmmlearn

In [None]:
from hmmlearn import hmm

In [None]:
#instantiate 


In [None]:
#fit


In [None]:
#predict


In [None]:
#look at our predictions


In [None]:
# #plot against stock data
# plt.plot(X.index, model.predict(X))
# plt.plot(X/(X.max() - X.min()))

In [None]:
#modularize our efforts
def make_regimes(n, X):
    model = hmm.GaussianHMM(n_components=n)
    model.fit(X)
    plt.plot(X.index, model.predict(X))
    plt.plot((X/(X.max() - X.min())))

In [None]:
#test it out on some tickers


### Looking at Speech Files

For a deeper dive into HMM's for speech recognition please see Rabner's article A tutorial on hidden Markov models and selected applications in speech recognition [here](https://web.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf).

In [None]:
from scipy.io import wavfile

In [None]:
!ls sounds/apple

In [None]:
#read in the data and structure


In [None]:
#plot the sound


In [None]:
#look at another sample


In [None]:
#kiwi's perhaps


In [None]:
from IPython.display import Audio

In [None]:
#take a listen to an apple
Audio('sounds/banana/banana02.wav')

### Generating Features from Audio: Mel Frequency Cepstral Coefficient

Big idea here is to extract the important elements that allow us to identify speech.  For more info on the MFCC, see [here](http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/).

In [None]:
#!pip install python_speech_features

In [None]:
import python_speech_features as features

#extract the mfcc features


In [None]:
#plot them



In [None]:
#determine our x and y


In [None]:
import os

In [None]:
#make a custom markov class to return scores

        
    

In [None]:
# test out on kiwi

### Problem: Applying to all our sounds

To recognize a specific sound, you are to train Markov Models on each of the sound files in the sound folder.  When presented a new sound, you will assign the label based on the highest score from all these models.  Below, train a collection of Markov models and keep track of the trained models with the list `hmm_models`.  Then, see how the models predict for the four sounds we left out of the training given as `in_files` below.

In [None]:
#this code works through our files in the sound
#directory and extracts the label, builds a model
#tracks the scores
import numpy as np
hmm_models = []
labels = []
#looping over files in sounds folder

    #listing the soundfiles with full paths
    
    #loop over the individual sounds, read in, model
    

In [None]:
#write a loop that bops over the files and prints the label based on
#highest score



### Making Predictions

Now that we have our models, given a new sound we want to score these based on what we've learned and select the most likely example.

In [None]:
in_files = ['sounds/pineapple/pineapple15.wav',
           'sounds/orange/orange15.wav',
           'sounds/apple/apple15.wav',
           'sounds/kiwi/kiwi15.wav']

### Further Reading

- **Textbook**: Marsland's *Machine Learning: An Algorithmic Perspective* has a great overview of HMM's.
- **Time Series Examples**: Checkout Aileen Nielsen's tutorial from [SciPy 2019](https://www.youtube.com/watch?v=v5ijNXvlC5A) and her book *Practical Time Series Analysis*
- **Speech Recognition**: Rabiner's [*A tutorial on hidden Markov models and selected applications in speech recognition*]('readings/rabiner.pdf')
- **HMM;s and Dynamic Programming**: Avik Das' PyData Talk [*Dynamic Programming for Machine Learning: Hidden Markov Models*](https://www.youtube.com/watch?v=MADX-L75ub8)



<center>
 <img src = 'readings/bhmm.png'  />
<center>
    
    
