# HW 4 - Neural POS Tagger

In this exercise, you are going to build a set of deep learning models on part-of-speech (POS) tagging using Tensorflow and Keras. Tensorflow is a deep learning framwork developed by Google, and Keras is a frontend library built on top of Tensorflow (or Theano, CNTK) to provide an easier way to use standard layers and networks.

To complete this exercise, you will need to build deep learning models for POS tagging in Thai using NECTEC's ORCHID corpus. You will build one model for each of the following type:

- Neural POS Tagging with Word Embedding using Fixed / non-Fixed Pretrained weights
- Neural POS Tagging with Viterbi / Marginal CRF

Pretrained word embeddding are already given for you to use (albeit, a very bad one). Optionally, you can use your best pretrained word embeddding from previous exercise.

We also provide the code for data cleaning, preprocessing and some starter code for keras in this notebook but feel free to modify those parts to suit your needs. You can also complete this exercise using only Tensorflow (without using Keras). Feel free to use additional libraries (e.g. scikit-learn) as long as you have a model for each type mentioned above.

### Don't forget to shut down your instance on Gcloud when you are not using it ###

## 1. Setup and Preprocessing

We use POS data from [ORCHID corpus](https://www.nectec.or.th/corpus/index.php?league=pm), which is a POS corpus for Thai language.
A method used to read the corpus into a list of sentences with (word, POS) pairs have been implemented already. The example usage has shown below.
We also create a word vector for unknown word by random.

In [1]:
from data.orchid_corpus import get_sentences
import numpy as np
import numpy.random
import keras.preprocessing
np.random.seed(42)

Using TensorFlow backend.


In [2]:
unk_emb =np.random.randn(32)
train_data = get_sentences('train')
test_data = get_sentences('test')
print(train_data[0])

[('การ', 'FIXN'), ('ประชุม', 'VACT'), ('ทาง', 'NCMN'), ('วิชาการ', 'NCMN'), ('<space>', 'PUNC'), ('ครั้ง', 'CFQC'), ('ที่ 1', 'DONM')]


Next, we load pretrained weight embedding using pickle. The pretrained weight is a dictionary which map a word to its embedding.

In [1]:
import pickle
fp = open('basic_ff_embedding.pt', 'rb')
embeddings = pickle.load(fp)
fp.close()
print(embeddings[0])

FileNotFoundError: [Errno 2] No such file or directory: 'basic_ff_embedding.pt'

The given code below generates an indexed dataset(each word is represented by a number) for training and testing data. The index 0 is reserved for padding to help with variable length sequence. (Additionally, You can read more about padding here [https://machinelearningmastery.com/data-preparation-variable-length-input-sequences-sequence-prediction/])

## 2. Prepare Data

In [2]:
word_to_idx ={}
idx_to_word ={}
label_to_idx = {}
for sentence in train_data:
    for word,pos in sentence:
        if word not in word_to_idx:
            word_to_idx[word] = len(word_to_idx)+1
            idx_to_word[word_to_idx[word]] = word
        if pos not in label_to_idx:
            label_to_idx[pos] = len(label_to_idx)+1
word_to_idx['UNK'] = len(word_to_idx)

n_classes = len(label_to_idx.keys())+1

NameError: name 'train_data' is not defined

This section is tweaked a little from the demo, word2features will return word index instead of features, and sent2labels will return a sequence of word indices in the sentence.

In [3]:
def word2features(sent, i, emb):
    word = sent[i][0]
    if word in word_to_idx :
        return word_to_idx[word]
    else :
        return word_to_idx['UNK']

def sent2features(sent, emb_dict):
    return np.asarray([word2features(sent, i, emb_dict) for i in range(len(sent))])

def sent2labels(sent):
    return numpy.asarray([label_to_idx[label] for (word, label) in sent],dtype='int32')

def sent2tokens(sent):
    return [word for (word, label) in sent]

In [4]:
sent2features(train_data[100], embeddings)

NameError: name 'train_data' is not defined

In [None]:
print(idx_to_idx[])

Next we create train and test dataset, then we use keras to post-pad the sequence to max sequence with 0. Our labels are changed to a one-hot vector.

In [7]:
%%time
x_train = np.asarray([sent2features(sent, embeddings) for sent in train_data])
y_train = [sent2labels(sent) for sent in train_data]
x_test = [sent2features(sent, embeddings) for sent in test_data]
y_test = [sent2labels(sent) for sent in test_data]

CPU times: user 357 ms, sys: 37 µs, total: 357 ms
Wall time: 357 ms


In [8]:
x_train=keras.preprocessing.sequence.pad_sequences(x_train, maxlen=None, dtype='int32', padding='post', truncating='pre', value=0.)
y_train=keras.preprocessing.sequence.pad_sequences(y_train, maxlen=None, dtype='int32', padding='post', truncating='pre', value=0.)
x_test=keras.preprocessing.sequence.pad_sequences(x_test, maxlen=102, dtype='int32', padding='post', truncating='pre', value=0.)
y_temp =[]
for i in range(len(y_train)):
    y_temp.append(np.eye(n_classes)[y_train[i]][np.newaxis,:])
y_train = np.asarray(y_temp).reshape(-1,102,n_classes)
del(y_temp)

In [9]:
print(x_train[100],x_train.shape)
print(y_train[100][3],y_train.shape)

[ 29 327   5 328   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0] (18500, 102)
[ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.] (18500, 102, 48)


## 3. Evaluate

Our output from keras is a distribution of problabilities on all possible label. outputToLabel will return an indices of maximum problability from output sequence.

evaluation_report is the same as in the demo

In [10]:
def outputToLabel(yt,seq_len):
    out = []
    for i in range(0,len(yt)):
        if(i==seq_len):
            break
        out.append(np.argmax(yt[i]))
    return out

In [11]:
import pandas as pd
from IPython.display import display

def evaluation_report(y_true, y_pred):
    # retrieve all tags in y_true
    tag_set = set()
    for sent in y_true:
        for tag in sent:
            tag_set.add(tag)
    for sent in y_pred:
        for tag in sent:
            tag_set.add(tag)
    tag_list = sorted(list(tag_set))
    
    # count correct points
    tag_info = dict()
    for tag in tag_list:
        tag_info[tag] = {'correct_tagged': 0, 'y_true': 0, 'y_pred': 0}

    all_correct = 0
    all_count = sum([len(sent) for sent in y_true])
    for sent_true, sent_pred in zip(y_true, y_pred):
        for tag_true, tag_pred in zip(sent_true, sent_pred):
            if tag_true == tag_pred:
                tag_info[tag_true]['correct_tagged'] += 1
                all_correct += 1
            tag_info[tag_true]['y_true'] += 1
            tag_info[tag_pred]['y_pred'] += 1
    accuracy = (all_correct / all_count) * 100
            
    # summarize and make evaluation result
    eval_list = list()
    for tag in tag_list:
        eval_result = dict()
        eval_result['tag'] = tag
        eval_result['correct_count'] = tag_info[tag]['correct_tagged']
        precision = (tag_info[tag]['correct_tagged']/tag_info[tag]['y_pred'])*100 if tag_info[tag]['y_pred'] else '-'
        recall = (tag_info[tag]['correct_tagged']/tag_info[tag]['y_true'])*100 if (tag_info[tag]['y_true'] > 0) else 0
        eval_result['precision'] = precision
        eval_result['recall'] = recall
        eval_result['f_score'] = (2*precision*recall)/(precision+recall) if (type(precision) is float and recall > 0) else '-'
        
        eval_list.append(eval_result)

    eval_list.append({'tag': 'accuracy=%.2f' % accuracy, 'correct_count': '', 'precision': '', 'recall': '', 'f_score': ''})
    
    df = pd.DataFrame.from_dict(eval_list)
    df = df[['tag', 'precision', 'recall', 'f_score', 'correct_count']]
    display(df)

## 4. Train a model

In [12]:
from keras.models import Sequential, Model
from keras.layers import Embedding, Reshape, Activation, Input, Dense,GRU,Reshape,TimeDistributed,Bidirectional,Dropout,Masking
from keras_contrib.layers import CRF
from keras.optimizers import Adam

The model is this section is separated to two groups

- Neural POS Tagger (4.1)
- Neural CRF POS Tagger (4.2)

## 4.1.1 Neural POS Tagger  (Example)

We create a simple Neural POS Tagger as an example for you. This model dosen't use any pretrained word embbeding so it need to use Embedding layer to train the word embedding from scratch.

In [13]:
model = Sequential()
model.add(Embedding(len(word_to_idx),32,input_length=102,mask_zero=True))
model.add(Bidirectional(GRU(32, return_sequences=True)))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(n_classes,activation='softmax')))
model.summary()
adam  = Adam(lr=0.001)
model.compile(optimizer=adam,  loss='categorical_crossentropy', metrics=['categorical_accuracy'])

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 102, 32)           480608    
_________________________________________________________________
bidirectional_1 (Bidirection (None, 102, 64)           12480     
_________________________________________________________________
dropout_1 (Dropout)          (None, 102, 64)           0         
_________________________________________________________________
time_distributed_1 (TimeDist (None, 102, 48)           3120      
Total params: 496,208
Trainable params: 496,208
Non-trainable params: 0
_________________________________________________________________


In [None]:
%%time
model.fit(x_train,y_train,batch_size=64,epochs=10,verbose=1)

Epoch 1/10

In [14]:
%%time
#model.save_weights('/data/my_pos_no_crf.h5')
model.load_weights('/data/my_pos_no_crf.h5')
y_pred=model.predict(x_test)
ypred = [outputToLabel(y_pred[i],len(y_test[i])) for i in range(len(y_pred))]
evaluation_report(y_test, ypred)

Unnamed: 0,tag,precision,recall,f_score,correct_count
0,1,99.8092,99.3487,99.5784,3661.0
1,2,94.7592,94.4835,94.6212,7793.0
2,3,91.066,96.5066,93.7074,16299.0
3,4,99.9689,99.3654,99.6662,12840.0
4,5,91.6667,98.5075,94.964,66.0
5,6,99.7817,87.5479,93.2653,457.0
6,7,97.6374,97.4026,97.5199,2025.0
7,8,67.1687,53.7349,59.7055,223.0
8,9,55.8824,61.9565,58.7629,228.0
9,10,62.7866,42.4315,50.6401,356.0


CPU times: user 29.6 s, sys: 5.63 s, total: 35.2 s
Wall time: 14.3 s


## 4.1.2 Neural POS Tagger - Fix Weight

### #TODO 1
We would like you create a neural postagger model with keras with the pretrained word embedding as an input. The word embedding should be fixed across training time. To finish this excercise you must train the model and show the evaluation report with this model as shown in the example.

(You may want to read about Keras's Masking layer)

Optionally, you can use your own pretrained word embedding from previous homework

In [None]:
# Write your code here (ใส่ emmbed)

In [18]:
embeddings

{'(SAGD': array([-0.02339406, -0.0489365 , -0.00072291, -0.03499512,  0.03551661,
         0.0214556 , -0.0233124 ,  0.0256612 , -0.01898398, -0.02451582,
         0.00913509, -0.02366463,  0.01170618,  0.01845349,  0.01920917,
         0.04426711, -0.03166447, -0.03189739, -0.00567032, -0.00328954,
        -0.02295108, -0.02727159, -0.04683184, -0.04834228,  0.02301944,
        -0.0399333 ,  0.00095583,  0.01402641, -0.02077529, -0.04142363,
        -0.01050837, -0.03602161], dtype=float32),
 '-Active': array([-0.04077761, -0.03709619,  0.00225024, -0.0318418 ,  0.02005713,
         0.0446962 ,  0.04657872,  0.03855249,  0.00883476, -0.01760205,
        -0.02168959,  0.02865219, -0.0499002 ,  0.01523545,  0.00845388,
        -0.00043755, -0.02248582, -0.02173762, -0.00410794, -0.00378044,
         0.02958771,  0.02232884, -0.02125523,  0.036335  , -0.04377437,
         0.0166716 ,  0.04929546, -0.02609433, -0.03324266,  0.02926985,
         0.0239333 , -0.0311361 ], dtype=float32),
 '

In [19]:
pre_em = []
pre_em.append(np.zeros(32))
for i in range(1,len(idx_to_word)+1):
    if(idx_to_word[i] in embeddings.keys()):
        pre_em.append(embeddings[idx_to_word[i]])
    else:
        pre_em.append(np.zeros(32))

In [20]:
# Write your code here
model = Sequential()
model.add(Embedding(len(word_to_idx),32,input_length=102,mask_zero=True, weights=[np.array(pre_em)], trainable=False))
model.add(Bidirectional(GRU(32, return_sequences=True)))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(n_classes,activation='softmax')))
model.summary()
adam  = Adam(lr=0.001)
model.compile(optimizer=adam,  loss='categorical_crossentropy', metrics=['categorical_accuracy'])

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 102, 32)           480608    
_________________________________________________________________
bidirectional_2 (Bidirection (None, 102, 64)           12480     
_________________________________________________________________
dropout_2 (Dropout)          (None, 102, 64)           0         
_________________________________________________________________
time_distributed_2 (TimeDist (None, 102, 48)           3120      
Total params: 496,208
Trainable params: 15,600
Non-trainable params: 480,608
_________________________________________________________________


In [21]:
%%time
model.fit(x_train,y_train,batch_size=64,epochs=10,verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
CPU times: user 55min 47s, sys: 8min 57s, total: 1h 4min 44s
Wall time: 20min 11s


<keras.callbacks.History at 0x7f5a0e464780>

In [22]:
%%time
model.save_weights('/data/my_pos_fix_weight.h5')
#model.load_weights('/data/my_pos_fix_weight.h5')
y_pred=model.predict(x_test)
ypred = [outputToLabel(y_pred[i],len(y_test[i])) for i in range(len(y_pred))]
evaluation_report(y_test, ypred)

Unnamed: 0,tag,precision,recall,f_score,correct_count
0,1,93.4301,99.5658,96.4004,3669.0
1,2,66.025,61.5422,63.7048,5076.0
2,3,55.2566,61.5253,58.2227,10391.0
3,4,60.6494,88.3222,71.9156,11413.0
4,5,-,0.0,-,0.0
5,6,68.1818,2.87356,5.51471,15.0
6,7,93.2432,82.9726,87.8086,1725.0
7,8,38.5417,8.91566,14.4814,37.0
8,9,30.4348,7.6087,12.1739,28.0
9,10,-,0.0,-,0.0


CPU times: user 43.8 s, sys: 7.15 s, total: 50.9 s
Wall time: 17.7 s


## 4.1.3 Neural POS Tagger - Trainable pretrained weight

### #TODO 2
We would like you create a neural postagger model with keras with the pretrained word embedding as an input. However The word embedding is trainable (not fixed). To finish this excercise you must train the model and show the evaluation report with this model as shown in the example.

Please note that the given pretrained word embedding only have weights for the vocabuary in BEST corpus from previous homework.

Optionally, you can use your own pretrained word embedding from previous homework.

In [None]:
# Write your code here

In [23]:
# Write your code here trainable
model = Sequential()
model.add(Embedding(len(word_to_idx),32,input_length=102,mask_zero=True, weights=[np.array(pre_em)], trainable=True))
model.add(Bidirectional(GRU(32, return_sequences=True)))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(n_classes,activation='softmax')))
model.summary()
adam  = Adam(lr=0.001)
model.compile(optimizer=adam,  loss='categorical_crossentropy', metrics=['categorical_accuracy'])

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, 102, 32)           480608    
_________________________________________________________________
bidirectional_3 (Bidirection (None, 102, 64)           12480     
_________________________________________________________________
dropout_3 (Dropout)          (None, 102, 64)           0         
_________________________________________________________________
time_distributed_3 (TimeDist (None, 102, 48)           3120      
Total params: 496,208
Trainable params: 496,208
Non-trainable params: 0
_________________________________________________________________


In [24]:
%%time
model.fit(x_train,y_train,batch_size=64,epochs=10,verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
CPU times: user 53min 44s, sys: 8min 42s, total: 1h 2min 26s
Wall time: 19min 35s


<keras.callbacks.History at 0x7f5a0e464588>

In [25]:
%%time
model.save_weights('/data/my_pos_nonfix_weight.h5')
#model.load_weights('/data/my_pos_nonfix_weight.h5')
y_pred=model.predict(x_test)
ypred = [outputToLabel(y_pred[i],len(y_test[i])) for i in range(len(y_pred))]
evaluation_report(y_test, ypred)

Unnamed: 0,tag,precision,recall,f_score,correct_count
0,1,99.8911,99.5929,99.7418,3670.0
1,2,94.8165,94.92,94.8682,7829.0
2,3,90.8743,96.9329,93.8059,16371.0
3,4,99.9379,99.6208,99.7791,12873.0
4,5,91.6667,98.5075,94.964,66.0
5,6,99.5708,88.8889,93.9271,464.0
6,7,96.8556,97.7874,97.3193,2033.0
7,8,72.2603,50.8434,59.6888,211.0
8,9,74.2857,63.587,68.5212,234.0
9,10,62.9696,41.9547,50.3577,352.0


CPU times: user 46.6 s, sys: 7.58 s, total: 54.2 s
Wall time: 18.7 s


### #TODO 3
Compare the result between all neural tagger models in 4.1.x and provide a convincing reason and example for the result of these models (which model perform best or worst, why?)

(If you use your own weight please state so in the answer)

<b>Write your answer here : </b>
The model which use pre-trained data for embedding layer is the best performance since the pre-trained data will lead the better performane(accuracy, loss function value) more than using random values.

The evalution value of fixed embedding layer is lower than non-fixed embedding layer. According to TODO1, the weights did not change when training. Therefore, the features will not affect the embedding layer.

## 4.2.1 CRF Viterbi

Your next two tasks are to incorporate Conditional random fields (CRF) to your model. <b>You do not need to use pretrained weight</b>.

Keras already implement a CRF neural model for you. However, you need to use the official extension repository for Keras library, call keras-contrib. You should read about keras-contrib crf layer before attempt this exercise section

### #TODO 4
Use Keras-contrib CRF layer in your model. You should set the layer parameter so it can give the best performance on testing using <b>viterbi algorithm</b>. Your model must use crf for loss function and metric. CRF is quite complex compare to previous example model, so you should train it with more epoch, so it can converge.

To finish this excercise you must train the model and show the evaluation report with this model as shown in the example.

Do not forget to save this model weight.

In [26]:
# Write your code here(เปลี่ยน soft + max)
from keras_contrib.layers import CRF
from keras.callbacks import ReduceLROnPlateau
from keras import regularizers

In [63]:
model = Sequential()
model.add(Embedding(len(word_to_idx),32,input_length=102,mask_zero=True))
model.add(Bidirectional(GRU(32, return_sequences=True)))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(n_classes,activation = 'relu')))
# model.add(Masking(mask_value=0.))
crf = CRF (n_classes,
#                  learn_mode='join',
#                  test_mode= 'viterbi',
#                  sparse_target=False,
#                  use_boundary=True,
#                  use_bias=True,
#                  activation='',
#                  kernel_initializer='glorot_uniform',
#                  chain_initializer='orthogonal',
#                  bias_initializer='zeros',
#                  boundary_initializer='zeros',
#                  kernel_regularizer=None,
#                  chain_regularizer=None,
#                  boundary_regularizer=None,
#                  bias_regularizer=None,
#                  kernel_constraint=None,
#                  chain_constraint=None,
#                  boundary_constraint=None,
#                  bias_constraint=None,
#                  input_dim=None,
#                  unroll=False
          )
model.add(crf)
model.summary()
adam  = Adam(lr=0.001)
model.compile(optimizer=adam,  loss=crf.loss_function, metrics=[crf.accuracy])

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_14 (Embedding)     (None, 102, 32)           480608    
_________________________________________________________________
bidirectional_14 (Bidirectio (None, 102, 64)           12480     
_________________________________________________________________
dropout_14 (Dropout)         (None, 102, 64)           0         
_________________________________________________________________
time_distributed_13 (TimeDis (None, 102, 48)           3120      
_________________________________________________________________
crf_10 (CRF)                 (None, 102, 48)           4752      
Total params: 500,960
Trainable params: 500,960
Non-trainable params: 0
_________________________________________________________________


In [64]:
%%time
model.fit(x_train,y_train,batch_size=128,epochs=15,verbose=1,shuffle =True,validation_split=0.1)

Train on 16650 samples, validate on 1850 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
CPU times: user 50min 16s, sys: 7min 14s, total: 57min 31s
Wall time: 18min 40s


<keras.callbacks.History at 0x7f5a09a58128>

In [66]:
%%time
model.fit(x_train,y_train,batch_size=256,epochs=5,verbose=1,shuffle =True,validation_split=0.1)

Train on 16650 samples, validate on 1850 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
CPU times: user 9min 7s, sys: 1min 16s, total: 10min 24s
Wall time: 3min 22s


<keras.callbacks.History at 0x7f5a09acc1d0>

In [71]:
%%time
#model.save_weights('/data/CRF_weight.h5')
model.load_weights('/data/CRF_weight.h5')
y_pred=model.predict(x_test)
ypred = [outputToLabel(y_pred[i],len(y_test[i])) for i in range(len(y_pred))]
evaluation_report(y_test, ypred)

Unnamed: 0,tag,precision,recall,f_score,correct_count
0,1,99.8912,99.6201,99.7554,3671.0
1,2,94.3092,94.435,94.3721,7789.0
2,3,91.7714,95.0915,93.402,16060.0
3,4,99.9067,99.4583,99.682,12852.0
4,5,81.4815,98.5075,89.1892,66.0
5,6,98.9177,87.5479,92.8862,457.0
6,7,98.2893,96.7292,97.503,2011.0
7,8,58.7859,44.3373,50.5495,184.0
8,9,69.4051,66.5761,67.9612,245.0
9,10,60.8819,51.0131,55.5123,428.0


CPU times: user 1min 1s, sys: 10 s, total: 1min 11s
Wall time: 25.1 s


## 4.2.2 CRF Marginal

### #TODO 5

Use Keras-contrib CRF layer in your model. You should set the layer parameter so it can give the best performance on testing using <b>marginal problabilities</b>. You <b>must not train the model</b> from scratch but use the pretrained weight from previous CRF Viterbi model.

To finish this excercise you must train the model and show the evaluation report with this model as shown in the example.

In [None]:
# Write your code here

In [72]:
model = Sequential()
model.add(Embedding(len(word_to_idx),32,input_length=102,mask_zero=True))
model.add(Bidirectional(GRU(32, return_sequences=True)))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(n_classes,activation = 'relu')))
# model.add(Masking(mask_value=0.))
crf = CRF (n_classes,
                 learn_mode='marginal',
                 test_mode= 'marginal',
                 sparse_target=False,
                 use_boundary=True,
                 use_bias=True,
                 activation='linear',
                 kernel_initializer='glorot_uniform',
                 chain_initializer='orthogonal',
                 bias_initializer='zeros',
                 boundary_initializer='zeros',
                 kernel_regularizer=None,
                 chain_regularizer=None,
                 boundary_regularizer=None,
                 bias_regularizer=None,
                 kernel_constraint=None,
                 chain_constraint=None,
                 boundary_constraint=None,
                 bias_constraint=None,
                 input_dim=None,
                 unroll=False
          )
model.add(crf)
model.summary()
adam  = Adam(lr=0.001)
model.compile(optimizer=adam,  loss=crf.loss_function, metrics=[crf.accuracy])

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_15 (Embedding)     (None, 102, 32)           480608    
_________________________________________________________________
bidirectional_15 (Bidirectio (None, 102, 64)           12480     
_________________________________________________________________
dropout_15 (Dropout)         (None, 102, 64)           0         
_________________________________________________________________
time_distributed_14 (TimeDis (None, 102, 48)           3120      
_________________________________________________________________
crf_11 (CRF)                 (None, 102, 48)           4752      
Total params: 500,960
Trainable params: 500,960
Non-trainable params: 0
_________________________________________________________________


In [73]:
%%time
model.load_weights('/data/CRF_weight.h5')
model.fit(x_train,y_train,batch_size=256,epochs=10,verbose=1,shuffle =True,validation_split=0.1)

Train on 16650 samples, validate on 1850 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
CPU times: user 18min 50s, sys: 2min 49s, total: 21min 39s
Wall time: 7min 49s


In [76]:
%%time
model.fit(x_train,y_train,batch_size=256,epochs=10,verbose=1,shuffle =True,validation_split=0.1)

Train on 16650 samples, validate on 1850 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
CPU times: user 19min 22s, sys: 2min 52s, total: 22min 15s
Wall time: 7min 55s


<keras.callbacks.History at 0x7f5a08bd8128>

In [81]:
%%time
#model.save_weights('/data/CRF_margin_weight.h5')
model.load_weights('/data/CRF_margin_weight.h5')
y_pred=model.predict(x_test)  
ypred = [outputToLabel(y_pred[i],len(y_test[i])) for i in range(len(y_pred))]
evaluation_report(y_test, ypred)

Unnamed: 0,tag,precision,recall,f_score,correct_count
0,1,99.7558,99.7829,99.7694,3677.0
1,2,93.169,93.9258,93.5459,7747.0
2,3,91.8683,93.9843,92.9142,15873.0
3,4,99.9689,99.5976,99.7829,12870.0
4,5,84.4156,97.0149,90.2778,65.0
5,6,92.126,89.6552,90.8738,468.0
6,7,96.9624,96.7292,96.8457,2011.0
7,8,59.3103,41.4458,48.7943,172.0
8,9,67.147,63.3152,65.1748,233.0
9,10,59.6932,51.0131,55.0129,428.0


CPU times: user 1min 4s, sys: 11.6 s, total: 1min 15s
Wall time: 27.2 s


### #TODO 6

Please pick the best example that can show the different between CRF that use viterbi and CRF that use marginal problabilities. Compare the result and provide a convincing reason. (which model perform better, why?)

<b>Write your answer here :</b>
The viterbi model have high accuracy than the marginal problabilities model. The viterbi use dynamic algorithum while the marginal use greedy algorithum.