# Sequence Models

In this class, we relax the assumption that the data points are independently and identically distributed (i.i.d.) by moving to a scenario of structured prediction, where the inputs are assumed to have temporal or spacial dependencies. 

<b> Exercise 2.1 - Hidden Markov Models (HMM) </b>
<br>

Consider a person who is only interested in four activities: walking in the park (walk), shopping (shop), cleaning the apartment (clean) and playing tennis (tennis). Also, consider that the choice of what the person does on a given day is determined exclusively by the weather on that day, which can be either rainy or sunny. Now, supposing that we observe what the person did on a sequence of days, the question is: can we use that information to predict the weather on each of those days? To tackle this problem, we assume that the weather behaves as a discrete Markov chain: the weather on a given day depends only on the weather on the previous day. The entire system can be described as an HMM.
For example, assume we are asked to predict the weather conditions on two different sequences of days. During these two sequences, we observed the person performing the following activities:
<br>
<ul>
<li> “walk walk shop clean” </li>
<li> “clean walk tennis walk” </li>
</ul>
-> This will be our test set.
<br>
<br>
Moreover, and in order to train our model, we are given access to three different sequences of days, containing both
the activities performed by the person and the weather on those days, namely: 
<br>
<ul>
<li>“walk/rainy walk/sunny shop/sunny clean/sunny”</li>
<li>“walk/rainy walk/rainy shop/rainy clean/sunny”</li>
<li>“walk/sunny shop/sunny shop/sunny clean/sunny”</li>
</ul>
<br>
-> This will be our training set.

> Load the simple sequence dataset. From the ipython command line create a simple sequence object and look at the training and test set.

In [4]:
import lxmls.readers.simple_sequence as ssr
simple = ssr.SimpleSequence()

In [5]:
print "Train dataset:\n", simple.train
print "Test dataset:\n", simple.test

Train dataset:
[walk/rainy walk/sunny shop/sunny clean/sunny , walk/rainy walk/rainy shop/rainy clean/sunny , walk/sunny shop/sunny shop/sunny clean/sunny ]
Test dataset:
[walk/rainy walk/sunny shop/sunny clean/sunny , clean/sunny walk/sunny tennis/sunny walk/sunny ]


> Get in touch with the classes used to store the sequences, you will need this for the next exercise. Note that each label is internally stored as a number. This number can be used as index of an array to store information regarding that label.

In [13]:
for sequence in simple.train.seq_list: 
    print "Each sequence:", sequence, "\n"

print "\n"

for sequence in simple.train.seq_list: 
    print "Each sequence.x:", sequence.x, "\n"
    
print "\n"
    
for sequence in simple.train.seq_list: 
    print "Each sequence.y:", sequence.y, "\n"

Each sequence: walk/rainy walk/sunny shop/sunny clean/sunny  

Each sequence: walk/rainy walk/rainy shop/rainy clean/sunny  

Each sequence: walk/sunny shop/sunny shop/sunny clean/sunny  



Each sequence.x: [0, 0, 1, 2] 

Each sequence.x: [0, 0, 1, 2] 

Each sequence.x: [0, 1, 1, 2] 



Each sequence.y: [0, 1, 1, 1] 

Each sequence.y: [0, 0, 0, 1] 

Each sequence.y: [1, 1, 1, 1] 



So, the observactions correspond internally to: 
<br>[walk->0, shop->1, clean->2]
<br>[rainy->0, sunny->1]
    

<b> Exercise 2.2 - HMM Maximum Likelihood Training</b>
<br>

The provided function train supervised from the hmm.py file implements the above parameter estimates. Run this function given the simple dataset above and look at the estimated probabilities. 

In [17]:
import lxmls.sequences.hmm as hmmc
hmm = hmmc.HMM(simple.x_dict, simple.y_dict)
hmm.train_supervised(simple.train)




In [19]:
print "Initial Probabilities:", hmm.initial_probs, "\n"
print "Transition Probabilities:", hmm.transition_probs, "\n"
print "Final Probabilities:", hmm.final_probs, "\n"
print "Emission Probabilities", hmm.emission_probs, "\n"

Initial Probabilities: [0.66666667 0.33333333] 

Transition Probabilities: [[0.5   0.   ]
 [0.5   0.625]] 

Final Probabilities: [0.    0.375] 

Emission Probabilities [[0.75  0.25 ]
 [0.25  0.375]
 [0.    0.375]
 [0.    0.   ]] 



> Are they correct? You can also check the variables ending in  counts instead of  probs to see the raw counts (for example, typing hmm.initial counts will show you the raw counts of initial states). How are the counts related to the probabilities?

To answer the question, we can checks the implementation is correct with the following sanity checks:
<ul>
<li> Initial Counts: – Should sum to the number of sentences</li>
<li> Transition/Final Counts: – Should sum to the number of tokens </li>
<li> Emission Counts: – Should sum to the number of tokens </li>
</ul>

In [50]:
number_of_tokens=sum([len(seq) for seq in simple.train])

print "The inial counts are correct:",  sum(hmm.initial_counts)==len(simple.train.seq_list)
print "Transition and Final counts are correct:",hmm.transition_counts.sum()+ hmm.final_counts.sum()==number_of_tokens
print "Emission counts are correct:", hmm.emission_counts.sum()==number_of_tokens

The inial counts are correct: True
Transition and Final counts are correct: True
Emission counts are correct: True


In [40]:
Exercise 2.3 Convince yourself that the score of a path in the trellis (summing over the scores above) is equivalent to the log-probability log P(X = x, Y = y), as defined in Eq. 2.2. Use the given function compute scores on the first training sequence and confirm that the values are correct. You should get the same values as presented below.

[4, 4, 4]
