# Structured Perceptron

The structured perceptron, namely its averaged version, is a very simple
algorithm that relies on Viterbi decoding and very simple additive
updates. In practice this algorithm is very easy to implement and
behaves remarkably well in a variety of problems. These two
characteristics make the structured perceptron algorithm a natural
first choice to try and test a new problem or a new feature set. 

<img src="../images_for_notebooks/day_3/structured_perceptron.png">



There are only two differences, which mimic the ones already seen for the comparison between CRFs 
and multi-class ME models:

- Instead of explicitly enumerating all possible output  configurations (exponentially many of them) to compute 
 $$\widehat{y} := \text{argmax}_{y'\in\mathcal{Y}} W \cdot F(x^m,y')$$, 
it finds the best sequence through the Viterbi algorithm. 


- Instead of updating the features for the entire $\widehat{y}$, 
it updates only the node and edge features at the positions where the
  labels are different i.e., where mistakes are made.


In [None]:
%matplotlib inline
%load_ext autoreload
%autoreload 2
import sys
# We will this append to ensure we can import lxmls toolking
sys.path.append('../../lxmls-toolkit')

# Exercise 3.3 - Algorithm implementation

**Implement the structured perceptron algorithm.**

**To do this, edit file ```sequences/structured_perceptron.py``` and implement the function ```.perceptron_updates```**

         
This function should apply one round of the perceptron algorithm, updating the weights for a given sequence, and returning the number of predicted labels (which equals the sequence length) and the number of mistaken labels.

Hint: You can try to adapt the function

    def gradient_update(self, sequence, eta):


You will need to replace the computation of posterior marginals by the Viterbi algorithm, and to change the parameter updates according to Algorithm 11. Note the role of the functions

    self.feature_mapper.get_*_features()

in providing the indices for the features obtained for $f(x^m,y^m)$ or f$(x^m,\hat{y}^m )$

In [53]:
## Nothing to do, ex 3.3 is about implementing the perceptron update

# Exercise 3.4 - POS Tagging

**Repeat Exercises 3.1–3.2 using the structured perceptron algorithm instead of a CRF. Report the results.**



### Part 1 Run the structured perceptron with the standard feature_mapper
Here is the code for the simple feature set:

In [54]:
feature_mapper = lxmls.sequences.id_feature.IDFeatures(train_seq)
feature_mapper.build_features()

In [55]:
import lxmls.sequences.structured_perceptron as spc

sp = spc.StructuredPerceptron(corpus.word_dict, corpus.tag_dict, feature_mapper)
sp.num_epochs = 20

In [56]:
sp.train_supervised(train_seq)

Epoch: 0 Accuracy: 0.656806
Epoch: 1 Accuracy: 0.820898
Epoch: 2 Accuracy: 0.879176
Epoch: 3 Accuracy: 0.907432
Epoch: 4 Accuracy: 0.925239
Epoch: 5 Accuracy: 0.939956
Epoch: 6 Accuracy: 0.946284
Epoch: 7 Accuracy: 0.953790
Epoch: 8 Accuracy: 0.958499
Epoch: 9 Accuracy: 0.955114
Epoch: 10 Accuracy: 0.959235
Epoch: 11 Accuracy: 0.968065
Epoch: 12 Accuracy: 0.968212
Epoch: 13 Accuracy: 0.966740
Epoch: 14 Accuracy: 0.971302
Epoch: 15 Accuracy: 0.968653
Epoch: 16 Accuracy: 0.970419
Epoch: 17 Accuracy: 0.971891
Epoch: 18 Accuracy: 0.971744
Epoch: 19 Accuracy: 0.973510


#### The execution of the previous cell should print
    Epoch: 0 Accuracy: 0.656806
    Epoch: 1 Accuracy: 0.820898
    Epoch: 2 Accuracy: 0.879176
    Epoch: 3 Accuracy: 0.907432
    Epoch: 4 Accuracy: 0.925239
    Epoch: 5 Accuracy: 0.939956
    Epoch: 6 Accuracy: 0.946284
    Epoch: 7 Accuracy: 0.953790
    Epoch: 8 Accuracy: 0.958499
    Epoch: 9 Accuracy: 0.955114
    Epoch: 10 Accuracy: 0.959235
    Epoch: 11 Accuracy: 0.968065
    Epoch: 12 Accuracy: 0.968212
    Epoch: 13 Accuracy: 0.966740
    Epoch: 14 Accuracy: 0.971302
    Epoch: 15 Accuracy: 0.968653
    Epoch: 16 Accuracy: 0.970419
    Epoch: 17 Accuracy: 0.971891
    Epoch: 18 Accuracy: 0.971744
    Epoch: 19 Accuracy: 0.973510

In [57]:
# Make predictions for the various sequences using the trained model.
pred_train = sp.viterbi_decode_corpus(train_seq)
pred_dev = sp.viterbi_decode_corpus(dev_seq)
pred_test = sp.viterbi_decode_corpus(test_seq)

# Evaluate and print accuracies
eval_train = sp.evaluate_corpus(train_seq, pred_train)
eval_dev = sp.evaluate_corpus(dev_seq, pred_dev)
eval_test = sp.evaluate_corpus(test_seq, pred_test)
print "SP -  Accuracy Train: %.3f Dev: %.3f Test: %.3f"%(eval_train,eval_dev, eval_test)

SP -  Accuracy Train: 0.984 Dev: 0.835 Test: 0.840


#### The execution of the previous cell should print

    SP -  Accuracy Train: 0.984 Dev: 0.835 Test: 0.840


### Part 2 Run the structured perceptron with the standard feature_mapper_ext



In [60]:
# Build features
# import lxmls.sequences.extended_feature as exfc
feature_mapper_ext = lxmls.sequences.extended_feature .ExtendedFeatures(train_seq)
feature_mapper_ext.build_features()

In [61]:
import lxmls.sequences.structured_perceptron as spc

sp_ext = spc.StructuredPerceptron(corpus.word_dict, corpus.tag_dict, feature_mapper_ext)
sp_ext.num_epochs = 20

In [62]:
sp_ext.train_supervised(train_seq)

Epoch: 0 Accuracy: 0.764386
Epoch: 1 Accuracy: 0.872701
Epoch: 2 Accuracy: 0.903458
Epoch: 3 Accuracy: 0.927594
Epoch: 4 Accuracy: 0.938484
Epoch: 5 Accuracy: 0.951141
Epoch: 6 Accuracy: 0.949816
Epoch: 7 Accuracy: 0.959529
Epoch: 8 Accuracy: 0.957616
Epoch: 9 Accuracy: 0.962325
Epoch: 10 Accuracy: 0.961148
Epoch: 11 Accuracy: 0.970567
Epoch: 12 Accuracy: 0.968212
Epoch: 13 Accuracy: 0.973216
Epoch: 14 Accuracy: 0.974393
Epoch: 15 Accuracy: 0.973951
Epoch: 16 Accuracy: 0.976600
Epoch: 17 Accuracy: 0.977483
Epoch: 18 Accuracy: 0.974834
Epoch: 19 Accuracy: 0.977042


#### The execution of the previous cell should print
    Epoch: 0 Accuracy: 0.764386
    Epoch: 1 Accuracy: 0.872701
    Epoch: 2 Accuracy: 0.903458
    Epoch: 3 Accuracy: 0.927594
    Epoch: 4 Accuracy: 0.938484
    Epoch: 5 Accuracy: 0.951141
    Epoch: 6 Accuracy: 0.949816
    Epoch: 7 Accuracy: 0.959529
    Epoch: 8 Accuracy: 0.957616
    Epoch: 9 Accuracy: 0.962325
    Epoch: 10 Accuracy: 0.961148
    Epoch: 11 Accuracy: 0.970567
    Epoch: 12 Accuracy: 0.968212
    Epoch: 13 Accuracy: 0.973216
    Epoch: 14 Accuracy: 0.974393
    Epoch: 15 Accuracy: 0.973951
    Epoch: 16 Accuracy: 0.976600
    Epoch: 17 Accuracy: 0.977483
    Epoch: 18 Accuracy: 0.974834
    Epoch: 19 Accuracy: 0.977042


In [63]:
# Make predictions for the various sequences using the trained model.
pred_train = sp_ext.viterbi_decode_corpus(train_seq)
pred_dev = sp_ext.viterbi_decode_corpus(dev_seq)
pred_test = sp_ext.viterbi_decode_corpus(test_seq)

# Evaluate and print accuracies
eval_train = sp_ext.evaluate_corpus(train_seq, pred_train)
eval_dev = sp_ext.evaluate_corpus(dev_seq, pred_dev)
eval_test = sp_ext.evaluate_corpus(test_seq, pred_test)
print "SP_ext -  Accuracy Train: %.3f Dev: %.3f Test: %.3f"%(eval_train,eval_dev, eval_test)

SP_ext -  Accuracy Train: 0.984 Dev: 0.888 Test: 0.890


#### The execution of the previous cell should print
    SP_ext - Accuracy Train: 0.984 Dev: 0.888 Test: 0.890




#### Summary


| Model   | Train acc | Dev acc | Test acc |
| --------| --------- | ------- |--------- |
| crf     |  0.949    | 0.846   |   0.858  |
| crf_ext |  0.984    | 0.899   |   0.894  |
| sp      |  0.984    | 0.835   |   0.840  |
| sp_ext  |  0.984    | 0.888   |   0.890  |



CRF -  Accuracy Train: 0.949 Dev: 0.846 Test: 0.858

CRF_ext -  Accuracy Train: 0.984 Dev: 0.899 Test: 0.894

SP -  Accuracy Train: 0.984 Dev: 0.835 Test: 0.840

SP_ext Train: 0.984 Dev: 0.835 Test: 0.840