# HW 4 - Neural POS Tagger

In this exercise, you are going to build a set of deep learning models on part-of-speech (POS) tagging using Tensorflow and Keras. Tensorflow is a deep learning framwork developed by Google, and Keras is a frontend library built on top of Tensorflow (or Theano, CNTK) to provide an easier way to use standard layers and networks.

To complete this exercise, you will need to build deep learning models for POS tagging in Thai using NECTEC's ORCHID corpus. You will build one model for each of the following type:

- Neural POS Tagging with Word Embedding using Fixed / non-Fixed Pretrained weights
- Neural POS Tagging with Viterbi / Marginal CRF

Pretrained word embeddding are already given for you to use (albeit, a very bad one). Optionally, you can use your best pretrained word embeddding from previous exercise.

We also provide the code for data cleaning, preprocessing and some starter code for keras in this notebook but feel free to modify those parts to suit your needs. You can also complete this exercise using only Tensorflow (without using Keras). Feel free to use additional libraries (e.g. scikit-learn) as long as you have a model for each type mentioned above.

### Don't forget to shut down your instance on Gcloud when you are not using it ###

## 1. Setup and Preprocessing

We use POS data from [ORCHID corpus](https://www.nectec.or.th/corpus/index.php?league=pm), which is a POS corpus for Thai language.
A method used to read the corpus into a list of sentences with (word, POS) pairs have been implemented already. The example usage has shown below.
We also create a word vector for unknown word by random.

In [2]:
from data.orchid_corpus import get_sentences
import numpy as np
import numpy.random
import keras.preprocessing
np.random.seed(42)

Using TensorFlow backend.


In [3]:
unk_emb = np.random.randn(32)
train_data = get_sentences('train')
test_data = get_sentences('test')
print(train_data[0])

[('การ', 'FIXN'), ('ประชุม', 'VACT'), ('ทาง', 'NCMN'), ('วิชาการ', 'NCMN'), ('<space>', 'PUNC'), ('ครั้ง', 'CFQC'), ('ที่ 1', 'DONM')]


Next, we load pretrained weight embedding using pickle. The pretrained weight is a dictionary which map a word to its embedding.

In [4]:
import pickle
fp = open('basic_ff_embedding.pt', 'rb')
embeddings = pickle.load(fp)
fp.close()

The given code below generates an indexed dataset(each word is represented by a number) for training and testing data. The index 0 is reserved for padding to help with variable length sequence. (Additionally, You can read more about padding here [https://machinelearningmastery.com/data-preparation-variable-length-input-sequences-sequence-prediction/])

## 2. Prepare Data

In [5]:
word_to_idx ={}
idx_to_word ={}
label_to_idx = {}
for sentence in train_data:
    for word,pos in sentence:
        if word not in word_to_idx:
            word_to_idx[word] = len(word_to_idx)+1
            idx_to_word[word_to_idx[word]] = word
        if pos not in label_to_idx:
            label_to_idx[pos] = len(label_to_idx)+1
word_to_idx['UNK'] = len(word_to_idx)

n_classes = len(label_to_idx.keys())+1

This section is tweaked a little from the demo, word2features will return word index instead of features, and sent2labels will return a sequence of word indices in the sentence.

In [6]:
def word2features(sent, i, emb):
    word = sent[i][0]
    if word in word_to_idx :
        return word_to_idx[word]
    else :
        return word_to_idx['UNK']

def sent2features(sent, emb_dict):
    return np.asarray([word2features(sent, i, emb_dict) for i in range(len(sent))])

def sent2labels(sent):
    return np.asarray([label_to_idx[label] for (word, label) in sent],dtype='int32')

def sent2tokens(sent):
    return [word for (word, label) in sent]

In [7]:
sent2features(train_data[100],embeddings)

array([ 29, 327,   5, 328])

Next we create train and test dataset, then we use keras to post-pad the sequence to max sequence with 0. Our labels are changed to a one-hot vector.

In [8]:
%%time
x_train = np.asarray([sent2features(sent, embeddings) for sent in train_data])
y_train = [sent2labels(sent) for sent in train_data]
x_test = [sent2features(sent, embeddings) for sent in test_data]
y_test = [sent2labels(sent) for sent in test_data]

CPU times: user 349 ms, sys: 65 µs, total: 349 ms
Wall time: 349 ms


In [9]:
x_train=keras.preprocessing.sequence.pad_sequences(x_train, maxlen=None, dtype='int32', padding='post', truncating='pre', value=0.)
y_train=keras.preprocessing.sequence.pad_sequences(y_train, maxlen=None, dtype='int32', padding='post', truncating='pre', value=0.)
x_test=keras.preprocessing.sequence.pad_sequences(x_test, maxlen=102, dtype='int32', padding='post', truncating='pre', value=0.)
y_temp =[]
for i in range(len(y_train)):
    y_temp.append(np.eye(n_classes)[y_train[i]][np.newaxis,:])
y_train = np.asarray(y_temp).reshape(-1,102,n_classes)
del(y_temp)

In [10]:
print(x_train[100],x_train.shape)
print(y_train[100][3],y_train.shape)

[ 29 327   5 328   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0] (18500, 102)
[ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.] (18500, 102, 48)


## 3. Evaluate

Our output from keras is a distribution of problabilities on all possible label. outputToLabel will return an indices of maximum problability from output sequence.

evaluation_report is the same as in the demo

In [11]:
def outputToLabel(yt,seq_len):
    out = []
    for i in range(0,len(yt)):
        if(i==seq_len):
            break
        out.append(np.argmax(yt[i]))
    return out

In [12]:
import pandas as pd
from IPython.display import display

def evaluation_report(y_true, y_pred):
    # retrieve all tags in y_true
    tag_set = set()
    for sent in y_true:
        for tag in sent:
            tag_set.add(tag)
    for sent in y_pred:
        for tag in sent:
            tag_set.add(tag)
    tag_list = sorted(list(tag_set))
    
    # count correct points
    tag_info = dict()
    for tag in tag_list:
        tag_info[tag] = {'correct_tagged': 0, 'y_true': 0, 'y_pred': 0}

    all_correct = 0
    all_count = sum([len(sent) for sent in y_true])
    for sent_true, sent_pred in zip(y_true, y_pred):
        for tag_true, tag_pred in zip(sent_true, sent_pred):
            if tag_true == tag_pred:
                tag_info[tag_true]['correct_tagged'] += 1
                all_correct += 1
            tag_info[tag_true]['y_true'] += 1
            tag_info[tag_pred]['y_pred'] += 1
    accuracy = (all_correct / all_count) * 100
            
    # summarize and make evaluation result
    eval_list = list()
    for tag in tag_list:
        eval_result = dict()
        eval_result['tag'] = tag
        eval_result['correct_count'] = tag_info[tag]['correct_tagged']
        precision = (tag_info[tag]['correct_tagged']/tag_info[tag]['y_pred'])*100 if tag_info[tag]['y_pred'] else '-'
        recall = (tag_info[tag]['correct_tagged']/tag_info[tag]['y_true'])*100 if (tag_info[tag]['y_true'] > 0) else 0
        eval_result['precision'] = precision
        eval_result['recall'] = recall
        eval_result['f_score'] = (2*precision*recall)/(precision+recall) if (type(precision) is float and recall > 0) else '-'
        
        eval_list.append(eval_result)

    eval_list.append({'tag': 'accuracy=%.2f' % accuracy, 'correct_count': '', 'precision': '', 'recall': '', 'f_score': ''})
    
    df = pd.DataFrame.from_dict(eval_list)
    df = df[['tag', 'precision', 'recall', 'f_score', 'correct_count']]
    display(df)

## 4. Train a model

In [13]:
from keras.models import Sequential, Model
from keras.layers import Embedding, Reshape, Activation, Input, Dense,GRU,Reshape,TimeDistributed,Bidirectional,Dropout,Masking
from keras_contrib.layers import CRF
from keras.optimizers import Adam

The model is this section is separated to two groups

- Neural POS Tagger (4.1)
- Neural CRF POS Tagger (4.2)

## 4.1.1 Neural POS Tagger  (Example)

We create a simple Neural POS Tagger as an example for you. This model dosen't use any pretrained word embbeding so it need to use Embedding layer to train the word embedding from scratch.

In [14]:
model = Sequential()
model.add(Embedding(len(word_to_idx),32,input_length=102,mask_zero=True))
model.add(Bidirectional(GRU(32, return_sequences=True)))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(n_classes,activation='softmax')))
model.summary()
adam  = Adam(lr=0.001)
model.compile(optimizer=adam,  loss='categorical_crossentropy', metrics=['categorical_accuracy'])

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 102, 32)           480608    
_________________________________________________________________
bidirectional_1 (Bidirection (None, 102, 64)           12480     
_________________________________________________________________
dropout_1 (Dropout)          (None, 102, 64)           0         
_________________________________________________________________
time_distributed_1 (TimeDist (None, 102, 48)           3120      
Total params: 496,208
Trainable params: 496,208
Non-trainable params: 0
_________________________________________________________________


In [24]:
%%time
model.fit(x_train,y_train,batch_size=64,epochs=10,verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
CPU times: user 55min 27s, sys: 9min 13s, total: 1h 4min 40s
Wall time: 20min 28s


<keras.callbacks.History at 0x7f7eb31f1cf8>

In [25]:
%%time
#model.save_weights('/data/my_pos_no_crf.h5')
#model.load_weights('/data/my_pos_no_crf.h5')
y_pred=model.predict(x_test)
ypred = [outputToLabel(y_pred[i],len(y_test[i])) for i in range(len(y_pred))]
evaluation_report(y_test, ypred)

Unnamed: 0,tag,precision,recall,f_score,correct_count
0,1,99.8092,99.3487,99.5784,3661.0
1,2,94.7938,94.4835,94.6384,7793.0
2,3,91.0462,96.5125,93.6997,16300.0
3,4,99.9766,99.3654,99.6701,12840.0
4,5,91.6667,98.5075,94.964,66.0
5,6,99.7817,87.5479,93.2653,457.0
6,7,97.6374,97.4026,97.5199,2025.0
7,8,67.4627,54.4578,60.2667,226.0
8,9,56.4039,62.2283,59.1731,229.0
9,10,62.6761,42.4315,50.6041,356.0


CPU times: user 45.2 s, sys: 7.31 s, total: 52.5 s
Wall time: 18.3 s


## 4.1.2 Neural POS Tagger - Fix Weight

### #TODO 1
We would like you create a neural postagger model with keras with the pretrained word embedding as an input. The word embedding should be fixed across training time. To finish this excercise you must train the model and show the evaluation report with this model as shown in the example.

(You may want to read about Keras's Masking layer)

Optionally, you can use your own pretrained word embedding from previous homework

In [35]:
embedding_weights = [np.zeros(32)]
for idx in range(1,len(idx_to_word)+1):
    if(idx_to_word[idx] in embeddings.keys()):
        embedding_weights.append(embeddings[idx_to_word[idx]])
    else:
        embedding_weights.append(np.zeros(32))

In [36]:
# Write your code here
model = Sequential()
model.add(Embedding(len(word_to_idx),32,input_length=102,mask_zero=True,weights=[np.array(embedding_weights)],trainable=False))
model.add(Bidirectional(GRU(32, return_sequences=True)))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(n_classes,activation='softmax')))
model.summary()
adam  = Adam(lr=0.001)
model.compile(optimizer=adam,  loss='categorical_crossentropy', metrics=['categorical_accuracy'])

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (None, 102, 32)           480608    
_________________________________________________________________
bidirectional_2 (Bidirection (None, 102, 64)           12480     
_________________________________________________________________
dropout_2 (Dropout)          (None, 102, 64)           0         
_________________________________________________________________
time_distributed_2 (TimeDist (None, 102, 48)           3120      
Total params: 496,208
Trainable params: 15,600
Non-trainable params: 480,608
_________________________________________________________________


In [37]:
%%time
model.fit(x_train,y_train,batch_size=64,epochs=10,verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
CPU times: user 54min 27s, sys: 8min 58s, total: 1h 3min 25s
Wall time: 19min 53s


<keras.callbacks.History at 0x7faad18d3908>

In [38]:
%%time
model.save_weights('/data/fixw_pos_no_crf.h5')
y_pred=model.predict(x_test)
ypred = [outputToLabel(y_pred[i],len(y_test[i])) for i in range(len(y_pred))]
evaluation_report(y_test, ypred)

Unnamed: 0,tag,precision,recall,f_score,correct_count
0,1,93.5969,99.5658,96.4892,3669.0
1,2,63.6856,65.6159,64.6363,5412.0
2,3,53.5991,66.3982,59.3161,11214.0
3,4,62.4492,84.3755,71.7751,10903.0
4,5,-,0.0,-,0.0
5,6,54.5455,2.29885,4.41176,12.0
6,7,93.5414,83.5979,88.2906,1738.0
7,8,49.2063,7.46988,12.9707,31.0
8,9,17.9487,1.90217,3.4398,7.0
9,10,-,0.0,-,0.0


CPU times: user 42.6 s, sys: 6.91 s, total: 49.5 s
Wall time: 17.4 s


## 4.1.3 Neural POS Tagger - Trainable pretrained weight

### #TODO 2
We would like you create a neural postagger model with keras with the pretrained word embedding as an input. However The word embedding is trainable (not fixed). To finish this excercise you must train the model and show the evaluation report with this model as shown in the example.

Please note that the given pretrained word embedding only have weights for the vocabuary in BEST corpus from previous homework.

Optionally, you can use your own pretrained word embedding from previous homework.

In [39]:
# Write your code here
model = Sequential()
model.add(Embedding(len(word_to_idx),32,input_length=102,mask_zero=True,weights=[np.array(embedding_weights)],trainable=True))
model.add(Bidirectional(GRU(32, return_sequences=True)))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(n_classes,activation='softmax')))
model.summary()
adam  = Adam(lr=0.001)
model.compile(optimizer=adam,  loss='categorical_crossentropy', metrics=['categorical_accuracy'])

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_5 (Embedding)      (None, 102, 32)           480608    
_________________________________________________________________
bidirectional_3 (Bidirection (None, 102, 64)           12480     
_________________________________________________________________
dropout_3 (Dropout)          (None, 102, 64)           0         
_________________________________________________________________
time_distributed_3 (TimeDist (None, 102, 48)           3120      
Total params: 496,208
Trainable params: 496,208
Non-trainable params: 0
_________________________________________________________________


In [40]:
%%time
model.fit(x_train,y_train,batch_size=64,epochs=10,verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
CPU times: user 51min 43s, sys: 8min 32s, total: 1h 15s
Wall time: 18min 58s


<keras.callbacks.History at 0x7faad18d36a0>

In [41]:
%%time
model.save_weights('/data/nfixw_pos_no_crf.h5')
y_pred=model.predict(x_test)
ypred = [outputToLabel(y_pred[i],len(y_test[i])) for i in range(len(y_pred))]
evaluation_report(y_test, ypred)

Unnamed: 0,tag,precision,recall,f_score,correct_count
0,1,99.431,99.5929,99.5119,3670.0
1,2,94.648,95.1988,94.9226,7852.0
2,3,91.382,96.9388,94.0784,16372.0
3,4,99.9767,99.6208,99.7984,12873.0
4,5,95.6522,98.5075,97.0588,66.0
5,6,99.7817,87.5479,93.2653,457.0
6,7,97.2275,97.8355,97.5306,2034.0
7,8,74.7292,49.8795,59.8266,207.0
8,9,76.4259,54.6196,63.7084,201.0
9,10,62.4765,39.6901,48.5423,333.0


CPU times: user 42.1 s, sys: 6.91 s, total: 49 s
Wall time: 17.1 s


### #TODO 3
Compare the result between all neural tagger models in 4.1.x and provide a convincing reason and example for the result of these models (which model perform best or worst, why?)

(If you use your own weight please state so in the answer)

<b>Write your answer here :</b>
Model with fixed weights give bad result while model with trainable weight give lot better result because it model with fixed weights cannot adapt to the training data.

## 4.2.1 CRF Viterbi

Your next two tasks are to incorporate Conditional random fields (CRF) to your model. <b>You do not need to use pretrained weight</b>.

Keras already implement a CRF neural model for you. However, you need to use the official extension repository for Keras library, call keras-contrib. You should read about keras-contrib crf layer before attempt this exercise section

### #TODO 4
Use Keras-contrib CRF layer in your model. You should set the layer parameter so it can give the best performance on testing using <b>viterbi algorithm</b>. Your model must use crf for loss function and metric. CRF is quite complex compare to previous example model, so you should train it with more epoch, so it can converge.

To finish this excercise you must train the model and show the evaluation report with this model as shown in the example.

Do not forget to save this model weight.

In [44]:
# Write your code here
from keras_contrib.layers import CRF
from keras.callbacks import ReduceLROnPlateau
from keras import regularizers

model = Sequential()
model.add(Embedding(len(word_to_idx),32,input_length=102,mask_zero=True))
model.add(Bidirectional(GRU(32, return_sequences=True)))
model.add(Dropout(0.5))
model.add(TimeDistributed(Dense(n_classes, activation='tanh')))
crf = CRF(n_classes)
model.add(crf)
model.summary()
adam  = Adam(lr=0.001)
model.compile(optimizer=adam,loss=crf.loss_function, metrics=[crf.accuracy])

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_7 (Embedding)      (None, 102, 32)           480608    
_________________________________________________________________
bidirectional_5 (Bidirection (None, 102, 64)           12480     
_________________________________________________________________
dropout_5 (Dropout)          (None, 102, 64)           0         
_________________________________________________________________
time_distributed_5 (TimeDist (None, 102, 48)           3120      
_________________________________________________________________
crf_2 (CRF)                  (None, 102, 48)           4752      
Total params: 500,960
Trainable params: 500,960
Non-trainable params: 0
_________________________________________________________________


In [46]:
%%time
model.fit(x_train,y_train,batch_size=128,epochs=20,verbose=1,shuffle=True)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
CPU times: user 1h 13min 20s, sys: 11min 8s, total: 1h 24min 28s
Wall time: 27min 17s


<keras.callbacks.History at 0x7faad05f1240>

In [47]:
%%time
model.save_weights('/data/crf_viterbi.h5')
y_pred=model.predict(x_test)
ypred = [outputToLabel(y_pred[i],len(y_test[i])) for i in range(len(y_pred))]
evaluation_report(y_test, ypred)

Unnamed: 0,tag,precision,recall,f_score,correct_count
0,1,99.8911,99.5929,99.7418,3670.0
1,2,91.9832,95.4292,93.6745,7871.0
2,3,91.0914,95.4763,93.2323,16125.0
3,4,99.9222,99.3886,99.6547,12843.0
4,5,95.6522,98.5075,97.0588,66.0
5,6,100,87.5479,93.3606,457.0
6,7,97.8271,97.4507,97.6386,2026.0
7,8,66.0494,51.5663,57.9161,214.0
8,9,76.1905,60.8696,67.6737,224.0
9,10,63.0088,42.4315,50.7123,356.0


CPU times: user 1min 5s, sys: 10.7 s, total: 1min 16s
Wall time: 26.5 s


## 4.2.2 CRF Marginal

### #TODO 5

Use Keras-contrib CRF layer in your model. You should set the layer parameter so it can give the best performance on testing using <b>marginal problabilities</b>. You <b>must not train the model</b> from scratch but use the pretrained weight from previous CRF Viterbi model.

To finish this excercise you must train the model and show the evaluation report with this model as shown in the example.

In [48]:
# Write your code here
from keras_contrib.layers import CRF
from keras.callbacks import ReduceLROnPlateau
from keras import regularizers

model = Sequential()
model.add(Embedding(len(word_to_idx),32,input_length=102,mask_zero=True))
model.add(Bidirectional(GRU(32, return_sequences=True)))
model.add(Dropout(0.5))
model.add(TimeDistributed(Dense(n_classes, activation='tanh')))
crf = CRF(n_classes,learn_mode='marginal',test_mode='marginal')
model.add(crf)
model.summary()
adam  = Adam(lr=0.001)
model.compile(optimizer=adam,loss=crf.loss_function, metrics=[crf.accuracy])

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_8 (Embedding)      (None, 102, 32)           480608    
_________________________________________________________________
bidirectional_6 (Bidirection (None, 102, 64)           12480     
_________________________________________________________________
dropout_6 (Dropout)          (None, 102, 64)           0         
_________________________________________________________________
time_distributed_6 (TimeDist (None, 102, 48)           3120      
_________________________________________________________________
crf_3 (CRF)                  (None, 102, 48)           4752      
Total params: 500,960
Trainable params: 500,960
Non-trainable params: 0
_________________________________________________________________


In [49]:
%%time
model.load_weights('/data/crf_viterbi.h5')
model.fit(x_train,y_train,batch_size=128,epochs=20,verbose=1,shuffle=True)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
CPU times: user 1h 16min 29s, sys: 12min 26s, total: 1h 28min 56s
Wall time: 30min 3s


In [50]:
%%time
model.save_weights('/data/crf_marginal.h5')
y_pred=model.predict(x_test)
ypred = [outputToLabel(y_pred[i],len(y_test[i])) for i in range(len(y_pred))]
evaluation_report(y_test, ypred)

Unnamed: 0,tag,precision,recall,f_score,correct_count
0,1,99.8641,99.7015,99.7827,3674.0
1,2,94.1623,93.2832,93.7207,7694.0
2,3,90.3477,95.9915,93.0841,16212.0
3,4,100,99.3964,99.6973,12844.0
4,5,95.6522,98.5075,97.0588,66.0
5,6,100,90.2299,94.864,471.0
6,7,98.2985,97.2583,97.7756,2022.0
7,8,56.3084,58.0723,57.1767,241.0
8,9,76.087,57.0652,65.2174,210.0
9,10,61.6838,42.789,50.5278,359.0


CPU times: user 1min 4s, sys: 11.9 s, total: 1min 16s
Wall time: 27.8 s


### #TODO 6

Please pick the best example that can show the different between CRF that use viterbi and CRF that use marginal problabilities. Compare the result and provide a convincing reason. (which model perform better, why?)

<b>Write your answer here :</b>