<a id=contents></a>

# Baseline Model building - Conditional Random Field model



[1. ETL and Train Test Split](#ETL)

[2. Modelling with Conditional Random Field](#CRF)

[3. Choice of model architectures](#selection)


[7. Conclusions and model comparison table](#conc)

In [165]:

import pandas as pd
import numpy as np

import pickle

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
sns.set_style("darkgrid")

from sklearn import metrics
from sklearn.model_selection import train_test_split, GridSearchCV, cross_validate
from matplotlib import cm
import numpy as np

from sklearn_crfsuite import CRF, scorers, metrics
from sklearn.model_selection import cross_val_predict
from sklearn_crfsuite.metrics import flat_classification_report, flat_accuracy_score, flat_f1_score

from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer
import re
import string
tokenizer = RegexpTokenizer(r'\b\w{3,}\b')
stop_words = list(set(stopwords.words("english")))
stop_words += list(string.punctuation)

import warnings
warnings.filterwarnings('ignore')

from scipy import stats as ss
import eli5
#baseline sequential evaluation metrics
from seqeval.metrics import accuracy_score as seq_acc
from seqeval.metrics import classification_report as seq_cr
from seqeval.metrics import f1_score as seq_f1_score

#python package for evaluation in line with 
import nereval


%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


<a id=ETL ><a/> 

## 1. ETL of data and Train-Test Split
    
[LINK to table of contents](#contents)

In [116]:
with open('clean_data/crf_train_data.pkl', 'rb') as f:
    crf_features_train = pickle.load(f)
    
with open('clean_data/crf_test_data.pkl', 'rb') as f:
    crf_features_test = pickle.load(f)
    
with open('clean_data/crf_valid_data.pkl', 'rb') as f:
    crf_features_valid = pickle.load(f)
    
with open('clean_data/crf_valid_targets.pkl', 'rb') as f:
    crf_targets_valid = pickle.load(f)
    
with open('clean_data/crf_train_targets.pkl', 'rb') as f:
    crf_targets_train = pickle.load(f)
    
with open('clean_data/crf_test_targets.pkl', 'rb') as f:
    crf_targets_test = pickle.load(f)
    

In [123]:
# a reminder of how our feature data is structured - 7th word of 1st sentence
crf_features_train[0][6]

{'word.lower()': 'london',
 'word.istitle()': 1,
 'len(word)': 6,
 'word.isupper()': 0,
 'word.isdigit()': 0,
 'word.prefix_2': 'Lo',
 'word.suffix_2': 'on',
 'word.prefix_3': 'Lon',
 'word.suffix_3': 'don',
 'word.frequency': 0,
 'word.+1_POS': 'TO',
 'word.-1_POS': 'IN',
 'word.-2_POS': 'VBN',
 'word.BOS': 0,
 'word.same_POS_-1': 0}

In [163]:
# and our target data
crf_targets_train[0][6]

'B-geo'

Before modelling, here's a quick reminder of the meaning of the target variable Tags:
* B - beginning of NE chunk
* I - inside NE chunk
* O - not an NE


* geo = Geographical Entity
* org = Organization
* per = Person
* gpe = Geopolitical Entity
* tim = Time indicator
* art = Artifact
* eve = Event
* nat = Natural Phenomenon



In [134]:
print(f'We have {len(crf_features_train)} sentences in our training data.')
print(f'We have {len(crf_features_valid)} sentences in our validation data.')
print(f'We have {len(crf_features_test)} sentences in our test data.')

We have 1799 sentences in our training data.
We have 600 sentences in our validation data.
We have 600 sentences in our test data.


<a id = 'CRF'></a>

## 2. Baseline modelling with a Conditional Random Field

[LINK to table of contents](#contents)

Sklearn's CRF requires the input data to be a list of lists of dicts. I stored these as pickle files in notebook and loaded them above

In [24]:
crf = CRF(algorithm='lbfgs',
          c1=0.1,
          c2=0.1,
          max_iterations=100,
          all_possible_transitions=True)

In [128]:
#baseline
%time
crf.fit(crf_features_train, crf_targets_train)

CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 16.9 µs


CRF(algorithm='lbfgs', all_possible_states=None, all_possible_transitions=True,
    averaging=None, c=None, c1=0.1, c2=0.1, calibration_candidates=None,
    calibration_eta=None, calibration_max_trials=None, calibration_rate=None,
    calibration_samples=None, delta=None, epsilon=None, error_sensitive=None,
    gamma=None, keep_tempfiles=None, linesearch=None, max_iterations=100,
    max_linesearch=None, min_freq=None, model_filename=None, num_memories=None,
    pa_type=None, period=None, trainer_cls=None, variance=None, verbose=False)

In [90]:
# crf object stores our Tag labels
labels = list(crf.classes_)
labels

['O',
 'B-geo',
 'B-gpe',
 'B-per',
 'I-geo',
 'B-org',
 'I-org',
 'B-tim',
 'B-art',
 'I-art',
 'I-per',
 'I-gpe',
 'I-tim',
 'B-nat',
 'B-eve',
 'I-eve',
 'I-nat']

In [164]:
crf_y_pred_train = crf.predict(crf_features_train)
metrics.flat_f1_score(crf_targets_train, crf_y_pred_train,
                      average='weighted')

0.9863591333898722

### Classification Report Interpretation for train and validation data:

The main reported figure is the weighted average F1 Score:

$$    F_{1} = 2* \frac{Precision * Recall}{Precision + Recall}         $$

The support column refers to how many instances there are of each class. As we've seen before, this distribution is dominated by 'O' (non-NE) and there are some, such as nat ('national phenomena') that are almost zero (there were only 17 instances in the training data). 

The table below gives us the relevant metrics for our baseline model's performance. I would draw your attention to the near bottom right, were we see the macro-average F1 score across all the NE categories. The support column indicates how many instances of each NE there are across the data. 

In [140]:
sorted_labels = sorted(
    labels,
    key=lambda name: (name[1:], name[0]))

print(metrics.flat_classification_report(
    crf_targets_train, crf_y_pred_train, labels=sorted_labels, digits=3))

             precision    recall  f1-score   support

        geo      0.849     0.936     0.890      1174
        gpe      0.906     0.858     0.881       808
        org      0.914     0.810     0.859       784
        tim      0.973     0.907     0.939       680
        per      0.972     0.970     0.971       633
        nat      1.000     0.941     0.970        17
        eve      0.958     0.958     0.958        24
        art      0.953     0.891     0.921        46

avg / total      0.913     0.897     0.904      4166



We have plenty of evidence of overfitting, our average F1 score dropping down from 0.9 to 0.7, so we will be fitting a new model using crossvalidation and GridSearchCV. 

In [145]:
crf_y_pred_valid = crf.predict(crf_features_valid)
metrics.flat_f1_score(crf_targets_valid, crf_y_pred_valid,
                      average='weighted', labels=labels)

0.7003058103975536

In [143]:
sorted_labels = sorted(
    labels,
    key=lambda name: (name[1:], name[0]))

print(metrics.flat_classification_report(
    crf_targets_valid, crf_y_pred_valid, labels=sorted_labels, digits=3))

             precision    recall  f1-score   support

        geo      0.704     0.744     0.724       422
        gpe      0.762     0.794     0.778       218
        per      0.758     0.684     0.719       206
        tim      0.862     0.720     0.785       243
        art      0.000     0.000     0.000         5
        org      0.532     0.482     0.506       226
        eve      0.500     0.267     0.348        15
        nat      0.000     0.000     0.000         1

avg / total      0.716     0.686     0.699      1336



## 3. Optimisation with GridSearchCV and fine-tuning

In [167]:
crf_optim = CRF(algorithm='lbfgs',
          max_iterations=100,
          all_possible_transitions=True)

f1_scorer = make_scorer(flat_f1_score,
                        average='weighted', labels=labels)

crf_params = {'c1': [0.05, 0.1, 0.5, 1.0, 1.5,  2.0],
              'c2': [0.05, 0.1,  0.5, 1.0, 1.5,  2.0]}

grid =  GridSearchCV(crf_optim, 
                    crf_params, 
                    f1_scorer, 
                    -1, cv=5, 
                    return_train_score=True, 
                    verbose=True)

grid.fit(crf_features_train, crf_targets_train)

Fitting 5 folds for each of 36 candidates, totalling 180 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:  3.8min
[Parallel(n_jobs=-1)]: Done 180 out of 180 | elapsed: 17.1min finished


GridSearchCV(cv=5, error_score=nan,
             estimator=CRF(algorithm='lbfgs', all_possible_states=None,
                           all_possible_transitions=True, averaging=None,
                           c=None, c1=None, c2=None,
                           calibration_candidates=None, calibration_eta=None,
                           calibration_max_trials=None, calibration_rate=None,
                           calibration_samples=None, delta=None, epsilon=None,
                           error_sensitive=None, gamma=None,
                           keep_tempfi...
             iid='deprecated', n_jobs=-1,
             param_grid={'c1': [0.05, 0.1, 0.5, 1.0, 1.5, 2.0],
                         'c2': [0.05, 0.1, 0.5, 1.0, 1.5, 2.0]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
             scoring=make_scorer(flat_f1_score, average=weighted, labels=['O', 'B-geo', 'B-gpe', 'B-per', 'I-geo', 'B-org', 'I-org', 'B-tim', 'B-art', 'I-art', 'I-per', 'I-gpe', 'I

In [172]:
best_crf = grid.best_estimator_

In [173]:
grid.best_params_

{'c1': 0.1, 'c2': 0.1}

In [186]:
# further fine tuning around the c1 penalty term
crf_optim = CRF(algorithm='lbfgs',
          max_iterations=100,
          all_possible_transitions=True)

f1_scorer = make_scorer(flat_f1_score,
                        average='weighted', labels=labels)

crf_params = {'c1': [0.095, 0.0975, 0.1, 0.1025, 0.105],
              'c2': [0.095, 0.0975, 0.1, 0.1025, 0.105]}

grid_2 =  GridSearchCV(crf_optim, 
                    crf_params, 
                    f1_scorer, 
                    -1, cv=5, 
                    return_train_score=True, 
                    verbose=True)

grid_2.fit(crf_features_train, crf_targets_train)

best_crf_2 = grid.best_estimator_

Fitting 5 folds for each of 25 candidates, totalling 125 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:  2.5min
[Parallel(n_jobs=-1)]: Done 125 out of 125 | elapsed: 12.5min finished


I am going to use the python library seqeval, which has been specifically designed to work with BIO labels and to help with measuring performance on tasks "[such as named-entity recognition, part-of-speech tagging, semantic role labeling](#https://pypi.org/project/seqeval/)".
The `seqeval` package collapses the B- and I- type of tags into fewer NE tags, as seen in the classification report below. 

In [187]:
grid_2.best_params_


{'c1': 0.0975, 'c2': 0.1025}

In [189]:
flat_f1_score(crf_targets_train, best_crf_2.predict(crf_features_train), average='weighted')


0.9811511208222455

Now let's see how this performs on validation data -- we can see that we have overfitted significantly. 


In [190]:
flat_f1_score(crf_targets_valid, best_crf_2.predict(crf_features_valid), average='weighted')


0.9496762857123459

In [191]:
# further fine tuning around the c1 penalty term
crf_optim = CRF(algorithm='lbfgs',
          max_iterations=100,
          all_possible_transitions=True)

crf_params = {'c1': [0.097, 0.0975, 0.098,],
              'c2': [ 0.103, 0.1025, 0.102]}

grid_3 =  GridSearchCV(crf_optim, 
                    crf_params, 
                    f1_scorer, 
                    -1, cv=10, 
                    return_train_score=True, 
                    verbose=True)

grid_3.fit(crf_features_train, crf_targets_train)

best_crf_3 = grid_3.best_estimator_


Fitting 10 folds for each of 9 candidates, totalling 90 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:  3.2min
[Parallel(n_jobs=-1)]: Done  90 out of  90 | elapsed:  8.2min finished


In [192]:
crf_y_valid_pred = best_crf_3.predict(crf_features_valid)

print(flat_classification_report(crf_targets_valid, crf_y_valid_pred, digits=3))


              precision    recall  f1-score   support

       B-art      0.000     0.000     0.000         5
       B-eve      0.500     0.267     0.348        15
       B-geo      0.710     0.744     0.727       422
       B-gpe      0.760     0.798     0.779       218
       B-nat      0.000     0.000     0.000         1
       B-org      0.576     0.518     0.545       226
       B-per      0.749     0.694     0.720       206
       B-tim      0.912     0.765     0.832       243
       I-art      0.000     0.000     0.000         7
       I-eve      0.333     0.231     0.273        13
       I-geo      0.693     0.604     0.646       101
       I-gpe      0.111     0.250     0.154         4
       I-org      0.633     0.654     0.643       153
       I-per      0.845     0.914     0.879       257
       I-tim      0.771     0.561     0.649        66
           O      0.984     0.990     0.987     11034

    accuracy                          0.948     12971
   macro avg      0.536   

So we have been able to achieve a weighted average F1 score of **0.947 on the validation data** (0.980 on train data respectively). Therefore there is a small train to test drop in performance, which looks greater when you compare the macro average F1 (test : 0.484; train : 0.850).------------------

In [73]:
print("Our overall macro-average F1 score is", round(seq_f1_score(crf_targets_valid, crf_y_valid_pred, average='macro'),3))

Our overall macro-average F1 score is 0.729


In [76]:
print("Our overall accuracy is", round(seq_acc(crf_targets_valid, crf_y_valid_pred),4),)


Our overall accuracy is 0.9497


As you'd expect, the results are very different compared to sklearn's estimation, however this is a much more realistic picture of how well our model is performing. The model is pulled down considerably by the low-frequency classes of 'artefacts' and 'natural phenomena'. 


In [69]:
print(seq_cr(crf_targets_valid, crf_y_valid_pred))

             precision    recall  f1-score   support

        geo       0.76      0.76      0.76       474
        gpe       0.82      0.81      0.82       204
        tim       0.85      0.74      0.79       237
        org       0.65      0.59      0.62       227
        per       0.69      0.61      0.65       268
        eve       0.50      0.33      0.40         6
        art       0.00      0.00      0.00         2
        nat       0.00      0.00      0.00         2

avg / total       0.75      0.71      0.73      1420



In [100]:
# And if we check our training classification report:
print(seq_cr(crf_targets_train, crf_y_train_pred))

             precision    recall  f1-score   support

        geo       0.60      0.94      0.73      1174
        gpe       0.69      0.79      0.74       808
        org       0.68      0.74      0.71       784
        tim       0.71      0.86      0.78       680
        per       0.71      0.93      0.80       633
        nat       1.00      0.53      0.69        17
        eve       0.56      0.79      0.66        24
        art       0.91      0.65      0.76        46

avg / total       0.67      0.85      0.75      4166



In [101]:
seqeval_scorer = make_scorer(seq_f1_score)


In [102]:
# further fine tuning around the c1 penalty term
crf_optim = CRF(algorithm='lbfgs',
          max_iterations=100,
          all_possible_transitions=True)

seqeval_scorer = make_scorer(seq_f1_score, average='macro')


crf_params = {'c1': [0.0375, 0.04, 0.0425,],
              'c2': [ 0.425, 0.45, 0.475]}

grid_seqeval =  GridSearchCV(crf_optim, 
                    crf_params, 
                    seqeval_scorer, 
                    -1, cv=5, 
                    return_train_score=True, 
                    verbose=True)

grid_seqeval.fit(crf_features_train, crf_targets_train)

best_crf_seqeval = grid_seqeval.best_estimator_


Fitting 5 folds for each of 9 candidates, totalling 45 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  45 out of  45 | elapsed:  3.6min finished


In [103]:
crf_y_valid_seqeval_preds = best_crf_seqeval.predict(crf_features_valid)

print(seq_cr(crf_targets_valid, crf_y_valid_seqeval_preds))


             precision    recall  f1-score   support

        geo       0.71      0.76      0.74       422
        gpe       0.81      0.81      0.81       218
        per       0.76      0.72      0.74       206
        tim       0.89      0.72      0.80       243
        art       0.00      0.00      0.00         5
        org       0.56      0.47      0.51       226
        eve       0.50      0.27      0.35        15
        nat       0.00      0.00      0.00         1

avg / total       0.74      0.70      0.71      1336



In [106]:
print("Model has a macro-average F1 score of", round(seq_f1_score(crf_targets_valid, crf_y_valid_seqeval_preds, average='macro'),3))

Model has a macro-average F1 score of 0.717


In [107]:
print("Our overall accuracy is", round(seq_acc(crf_targets_valid, crf_y_valid_seqeval_preds),3),)


Our overall accuracy is 0.951


<a id=ttsplit ><a/> 

## 3. Investigating our best model's weights
   
[LINK to table of contents](#contents)

In [114]:

fig = eli5.show_weights(best_crf_seqeval, top=30)
fig

From \ To,O,B-art,I-art,B-eve,I-eve,B-geo,I-geo,B-gpe,I-gpe,B-nat,I-nat,B-org,I-org,B-per,I-per,B-tim,I-tim
O,3.684,1.245,-1.359,1.356,-1.194,1.63,-1.962,1.314,-1.014,0.883,-1.016,1.552,-3.008,2.001,-1.907,1.958,-2.629
B-art,-0.135,-0.031,3.472,-0.003,-0.097,-0.439,-0.338,-0.558,-0.106,0.0,-0.058,-0.4,-0.631,-0.482,-0.528,0.02,-0.302
I-art,-0.628,-0.04,3.439,0.0,-0.06,-0.301,-0.181,-0.224,-0.052,0.0,-0.021,-0.254,-0.352,-0.571,-0.546,0.003,-0.12
B-eve,-0.748,-0.013,-0.101,-0.053,3.637,-0.36,-0.293,-0.412,-0.093,0.0,-0.061,-0.303,-0.616,-0.367,-0.388,-0.267,-0.18
I-eve,-0.066,0.0,-0.032,-0.26,2.253,-0.113,-0.099,-0.112,-0.033,0.0,0.0,-0.174,-0.213,-0.139,-0.334,-0.11,-0.055
B-geo,0.697,-0.303,-0.734,-0.183,-0.533,-1.239,3.864,0.022,-0.775,-0.102,-0.41,-1.281,-1.584,-1.532,-1.493,1.076,-1.116
I-geo,0.222,-0.118,-0.206,-0.012,-0.1,-0.851,3.142,-0.802,-0.256,-0.02,-0.105,-0.462,-0.905,-0.741,-0.834,0.279,-0.524
B-gpe,0.904,-0.339,-0.594,-0.233,-0.7,-1.104,-1.176,-1.603,2.997,-0.135,-0.384,1.08,-1.805,0.751,-1.236,-1.131,-1.117
I-gpe,-0.168,0.0,-0.024,0.0,0.0,0.27,-0.089,-0.188,2.647,0.0,0.0,-0.195,-0.307,-0.37,-0.362,-0.084,-0.058
B-nat,-0.345,0.0,-0.042,0.0,-0.061,-0.24,-0.107,-0.233,-0.017,-0.028,2.497,-0.225,-0.304,-0.27,-0.304,-0.161,-0.094

Weight?,Feature,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0,Unnamed: 10_level_0,Unnamed: 11_level_0,Unnamed: 12_level_0,Unnamed: 13_level_0,Unnamed: 14_level_0,Unnamed: 15_level_0,Unnamed: 16_level_0
Weight?,Feature,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Weight?,Feature,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
Weight?,Feature,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3
Weight?,Feature,Unnamed: 2_level_4,Unnamed: 3_level_4,Unnamed: 4_level_4,Unnamed: 5_level_4,Unnamed: 6_level_4,Unnamed: 7_level_4,Unnamed: 8_level_4,Unnamed: 9_level_4,Unnamed: 10_level_4,Unnamed: 11_level_4,Unnamed: 12_level_4,Unnamed: 13_level_4,Unnamed: 14_level_4,Unnamed: 15_level_4,Unnamed: 16_level_4
Weight?,Feature,Unnamed: 2_level_5,Unnamed: 3_level_5,Unnamed: 4_level_5,Unnamed: 5_level_5,Unnamed: 6_level_5,Unnamed: 7_level_5,Unnamed: 8_level_5,Unnamed: 9_level_5,Unnamed: 10_level_5,Unnamed: 11_level_5,Unnamed: 12_level_5,Unnamed: 13_level_5,Unnamed: 14_level_5,Unnamed: 15_level_5,Unnamed: 16_level_5
Weight?,Feature,Unnamed: 2_level_6,Unnamed: 3_level_6,Unnamed: 4_level_6,Unnamed: 5_level_6,Unnamed: 6_level_6,Unnamed: 7_level_6,Unnamed: 8_level_6,Unnamed: 9_level_6,Unnamed: 10_level_6,Unnamed: 11_level_6,Unnamed: 12_level_6,Unnamed: 13_level_6,Unnamed: 14_level_6,Unnamed: 15_level_6,Unnamed: 16_level_6
Weight?,Feature,Unnamed: 2_level_7,Unnamed: 3_level_7,Unnamed: 4_level_7,Unnamed: 5_level_7,Unnamed: 6_level_7,Unnamed: 7_level_7,Unnamed: 8_level_7,Unnamed: 9_level_7,Unnamed: 10_level_7,Unnamed: 11_level_7,Unnamed: 12_level_7,Unnamed: 13_level_7,Unnamed: 14_level_7,Unnamed: 15_level_7,Unnamed: 16_level_7
Weight?,Feature,Unnamed: 2_level_8,Unnamed: 3_level_8,Unnamed: 4_level_8,Unnamed: 5_level_8,Unnamed: 6_level_8,Unnamed: 7_level_8,Unnamed: 8_level_8,Unnamed: 9_level_8,Unnamed: 10_level_8,Unnamed: 11_level_8,Unnamed: 12_level_8,Unnamed: 13_level_8,Unnamed: 14_level_8,Unnamed: 15_level_8,Unnamed: 16_level_8
Weight?,Feature,Unnamed: 2_level_9,Unnamed: 3_level_9,Unnamed: 4_level_9,Unnamed: 5_level_9,Unnamed: 6_level_9,Unnamed: 7_level_9,Unnamed: 8_level_9,Unnamed: 9_level_9,Unnamed: 10_level_9,Unnamed: 11_level_9,Unnamed: 12_level_9,Unnamed: 13_level_9,Unnamed: 14_level_9,Unnamed: 15_level_9,Unnamed: 16_level_9
Weight?,Feature,Unnamed: 2_level_10,Unnamed: 3_level_10,Unnamed: 4_level_10,Unnamed: 5_level_10,Unnamed: 6_level_10,Unnamed: 7_level_10,Unnamed: 8_level_10,Unnamed: 9_level_10,Unnamed: 10_level_10,Unnamed: 11_level_10,Unnamed: 12_level_10,Unnamed: 13_level_10,Unnamed: 14_level_10,Unnamed: 15_level_10,Unnamed: 16_level_10
Weight?,Feature,Unnamed: 2_level_11,Unnamed: 3_level_11,Unnamed: 4_level_11,Unnamed: 5_level_11,Unnamed: 6_level_11,Unnamed: 7_level_11,Unnamed: 8_level_11,Unnamed: 9_level_11,Unnamed: 10_level_11,Unnamed: 11_level_11,Unnamed: 12_level_11,Unnamed: 13_level_11,Unnamed: 14_level_11,Unnamed: 15_level_11,Unnamed: 16_level_11
Weight?,Feature,Unnamed: 2_level_12,Unnamed: 3_level_12,Unnamed: 4_level_12,Unnamed: 5_level_12,Unnamed: 6_level_12,Unnamed: 7_level_12,Unnamed: 8_level_12,Unnamed: 9_level_12,Unnamed: 10_level_12,Unnamed: 11_level_12,Unnamed: 12_level_12,Unnamed: 13_level_12,Unnamed: 14_level_12,Unnamed: 15_level_12,Unnamed: 16_level_12
Weight?,Feature,Unnamed: 2_level_13,Unnamed: 3_level_13,Unnamed: 4_level_13,Unnamed: 5_level_13,Unnamed: 6_level_13,Unnamed: 7_level_13,Unnamed: 8_level_13,Unnamed: 9_level_13,Unnamed: 10_level_13,Unnamed: 11_level_13,Unnamed: 12_level_13,Unnamed: 13_level_13,Unnamed: 14_level_13,Unnamed: 15_level_13,Unnamed: 16_level_13
Weight?,Feature,Unnamed: 2_level_14,Unnamed: 3_level_14,Unnamed: 4_level_14,Unnamed: 5_level_14,Unnamed: 6_level_14,Unnamed: 7_level_14,Unnamed: 8_level_14,Unnamed: 9_level_14,Unnamed: 10_level_14,Unnamed: 11_level_14,Unnamed: 12_level_14,Unnamed: 13_level_14,Unnamed: 14_level_14,Unnamed: 15_level_14,Unnamed: 16_level_14
Weight?,Feature,Unnamed: 2_level_15,Unnamed: 3_level_15,Unnamed: 4_level_15,Unnamed: 5_level_15,Unnamed: 6_level_15,Unnamed: 7_level_15,Unnamed: 8_level_15,Unnamed: 9_level_15,Unnamed: 10_level_15,Unnamed: 11_level_15,Unnamed: 12_level_15,Unnamed: 13_level_15,Unnamed: 14_level_15,Unnamed: 15_level_15,Unnamed: 16_level_15
Weight?,Feature,Unnamed: 2_level_16,Unnamed: 3_level_16,Unnamed: 4_level_16,Unnamed: 5_level_16,Unnamed: 6_level_16,Unnamed: 7_level_16,Unnamed: 8_level_16,Unnamed: 9_level_16,Unnamed: 10_level_16,Unnamed: 11_level_16,Unnamed: 12_level_16,Unnamed: 13_level_16,Unnamed: 14_level_16,Unnamed: 15_level_16,Unnamed: 16_level_16
+2.087,word.lower():israeli-palestinian,,,,,,,,,,,,,,,
+2.072,word.lower():a,,,,,,,,,,,,,,,
+1.974,word.+1_POS:JJ,,,,,,,,,,,,,,,
+1.879,word.BOS,,,,,,,,,,,,,,,
+1.879,word.-2_POS:,,,,,,,,,,,,,,,
+1.879,word.-1_POS:,,,,,,,,,,,,,,,
+1.876,word.-1_POS:NNP,,,,,,,,,,,,,,,
+1.842,word.-1_POS:JJS,,,,,,,,,,,,,,,
+1.822,word.+1_POS:VB,,,,,,,,,,,,,,,
+1.652,word.+1_POS:,,,,,,,,,,,,,,,

Weight?,Feature
+2.087,word.lower():israeli-palestinian
+2.072,word.lower():a
+1.974,word.+1_POS:JJ
+1.879,word.BOS
+1.879,word.-2_POS:
+1.879,word.-1_POS:
+1.876,word.-1_POS:NNP
+1.842,word.-1_POS:JJS
+1.822,word.+1_POS:VB
+1.652,word.+1_POS:

Weight?,Feature
+1.293,word.prefix_3:Top
+1.109,word.prefix_2:Do
+1.101,word.prefix_3:Nob
+0.968,word.suffix_3:oxx
+0.968,word.suffix_2:xx
+0.968,word.lower():vioxx
+0.968,word.prefix_3:Vio
+0.956,word.lower():alhurra
+0.956,word.prefix_3:alH
+0.904,word.prefix_3:Huy

Weight?,Feature
+0.747,word.+1_POS:NN
+0.739,word.-1_POS:NNP
+0.682,word.suffix_2:le
+0.674,word.prefix_2:Sp
+0.671,word.prefix_2:3
+0.671,word.suffix_3:3
+0.671,word.prefix_3:3
+0.671,word.lower():3
+0.671,word.suffix_2:3
+0.633,word.prefix_2:Ga

Weight?,Feature
+1.030,word.lower():olympic
+1.030,word.suffix_3:pic
+1.018,word.prefix_3:Oly
+0.955,word.prefix_2:Ol
+0.923,word.suffix_3:II
+0.923,word.lower():ii
+0.923,word.prefix_3:II
+0.919,word.suffix_2:II
+0.919,word.prefix_2:II
+0.900,word.prefix_3:Gam

Weight?,Feature
+0.986,word.-1_POS:NNP
+0.970,word.prefix_3:War
+0.890,word.prefix_2:Wa
+0.709,word.prefix_2:Ol
+0.708,word.suffix_3:War
+0.696,word.lower():war
+0.694,word.prefix_3:Oly
+0.614,word.+1_POS:IN
+0.611,word.istitle()
+0.610,word.isupper()

Weight?,Feature
+2.069,word.istitle()
+1.457,word.suffix_3:tan
+1.420,word.prefix_2:Ba
+1.400,word.suffix_2:ia
+1.310,word.lower():paris
+1.309,word.suffix_2:ta
+1.289,word.suffix_3:ris
+1.251,word.suffix_2:ai
+1.243,word.suffix_3:and
+1.074,word.prefix_3:wes

Weight?,Feature
+1.438,word.suffix_3:tan
+1.265,word.istitle()
+1.052,word.-1_POS:NNP
+1.033,word.suffix_3:ica
+1.028,word.suffix_2:ca
+0.882,word.suffix_3:tes
+0.878,word.prefix_3:Mus
+0.872,word.lower():homeland
+0.855,word.suffix_2:st
+0.842,word.-1_POS:JJ

Weight?,Feature
+2.737,word.suffix_3:ese
+1.891,word.suffix_2:an
+1.832,word.suffix_3:ans
+1.830,word.suffix_3:ish
+1.674,word.suffix_2:li
+1.654,word.suffix_3:ian
+1.569,word.prefix_2:Sw
+1.451,word.suffix_3:eli
+1.451,word.lower():israeli
+1.397,word.prefix_3:Kor

Weight?,Feature
+1.361,word.+1_POS:POS
+1.252,word.suffix_3:can
+1.136,word.-1_POS:NNP
+0.975,word.prefix_3:Sta
+0.957,word.-2_POS:CC
+0.894,word.prefix_2:St
+0.759,word.prefix_3:Rep
+0.730,word.lower():republic
+0.721,word.suffix_3:lic
+0.697,word.lower():american

Weight?,Feature
+1.644,word.isupper()
+1.206,word.prefix_3:H5N
+1.206,word.prefix_2:H5
+1.033,word.lower():hurricane
+1.027,word.suffix_3:ane
+1.023,word.prefix_3:Hur
+1.009,word.prefix_2:Hu
+0.838,word.suffix_2:ne
+0.823,word.prefix_2:AI
+0.823,word.lower():aids

Weight?,Feature
+1.041,word.-1_POS:NNP
+0.951,word.lower():katrina
+0.930,word.prefix_3:Kat
+0.868,word.suffix_3:ina
+0.827,word.prefix_2:Ka
+0.823,word.suffix_2:na
+0.759,word.prefix_3:Jin
+0.759,word.lower():jing
+0.740,word.prefix_2:Ji
+0.690,word.lower():syndrome

Weight?,Feature
+2.750,word.isupper()
+1.689,word.lower():hamas
+1.529,word.lower():al-qaida
+1.511,word.prefix_3:Ham
+1.492,word.lower():kindhearts
+1.396,word.prefix_3:Mer
+1.330,word.suffix_3:ban
+1.291,word.prefix_3:Kin
+1.286,word.prefix_3:Tal
+1.281,word.lower():singapore

Weight?,Feature
+1.143,word.lower():ministry
+1.123,word.-1_POS:CC
+1.122,word.-1_POS:IN
+1.121,word.suffix_3:try
+1.070,word.lower():committee-chairman
+1.055,word.-1_POS:POS
+1.049,word.suffix_3:rce
+1.025,word.lower():nations
+1.013,word.lower():union
+1.005,word.suffix_3:ons

Weight?,Feature
+1.470,word.lower():prime
+1.428,word.prefix_3:al-
+1.388,word.prefix_2:Ob
+1.348,word.lower():president
+1.339,word.lower():sperling
+1.321,word.prefix_2:pr
+1.275,word.lower():bush
+1.258,word.suffix_3:ime
+1.225,word.lower():jupiter
+1.225,word.prefix_3:Jup

Weight?,Feature
+1.875,word.-1_POS:NNP
+1.189,word.prefix_2:Mu
+0.837,word.lower():condoleezza
+0.830,word.suffix_2:ei
+0.793,word.prefix_3:al-
+0.793,word.-2_POS:NN
+0.765,word.suffix_2:ik
+0.740,word.istitle()
+0.733,word.suffix_3:son
+0.713,word.+1_POS:VBD

Weight?,Feature
+3.073,word.suffix_3:day
+2.907,word.suffix_2:ay
+2.291,word.isdigit()
+2.285,word.suffix_3:ber
+1.907,word.prefix_2:19
+1.653,word.lower():later
+1.542,word.lower():recent
+1.488,word.suffix_2:0s
+1.398,word.lower():january
+1.396,word.prefix_3:rec

Weight?,Feature
+2.196,word.isdigit()
+2.141,word.suffix_2:ay
+2.107,word.suffix_3:day
+1.234,word.+1_POS:TO
+1.151,word.+1_POS:CD
+1.098,word.prefix_2:de
+1.096,word.+1_POS:.
+1.068,word.prefix_2:Ju
+1.065,word.lower():quarter
+1.061,word.suffix_2:ry


<a id=selection></a>

## 4. Choice of model architectures

[LINK to table of contents](#contents)

<a id=one ><a/> 

## 4.1 Model 1
    
[LINK to table of contents](#contents)

<a id=two ><a/> 

## 4.2 Model 2
    
[LINK to table of contents](#contents)

<a id=three ><a/> 

## 4.3 Model 3
    
[LINK to table of contents](#contents)

<a id=four ><a/> 

## 4.4 Model 4
    
[LINK to table of contents](#contents)

<a id=five ><a/> 

## 4.5 Model 5
    
[LINK to table of contents](#contents)

<a id=conc ><a/> 

## 7. Conclusions and model comparison table
    
[LINK to table of contents](#contents)