
##### by Anastasiia Khaburska

## Homework 5

module : **Deep Learning for NLP**


The goal of the homework is to develop a tool for Named Entity Recognition. You need to implement model **”Glove word embeddings + BiLSTM + Softmax”** for sequence labeling. Please, use the standard PyTorch example for the sequence labeling task ”Sequence models and long-short term memory networks”(https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html#sphx-glr-beginner-nlp-sequence-models-tutorial-py) as a basic code to start. Glove word embeddings can be downloaded here (http://neuroner.com/data/word_vectors/glove.6B.100d.zip).

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)

import pandas as pd
import numpy as np
import itertools
import gensim
import pickle
import os

### **Task 1**: 
Implement functionality to read and process NER 2003 English Shared Task data in CoNNL file format, data will be provided (10% of score).

In [2]:
def process(filename):
    with open(f'data/{filename}.txt') as file:
        size = 0
        for line in file:
            if line != '\n':
                w, _, _, _ = line.split(' ')
                if (w!='-DOCSTART-'):
                    size+=1
    
    df = pd.DataFrame(
            columns=['sentence', 'word', 'POS_tag', 'SCHUNK_tag', 'NE_tag'],
            data=np.zeros((size, 5))
        )
    df[:] = ''
    with open(f'data/{filename}.txt') as file:
        index = 0
        word = 0
        for line in file:
            if line == '\n':
                index += 1
            else:
                w, p, s, n = line.split(' ')
                if (w!='-DOCSTART-'):
                    df.at[word, 'sentence'] = int(index)
                    df.at[word, 'word'] = w.strip()
                    df.at[word, 'POS_tag'] = p.strip()
                    df.at[word, 'SCHUNK_tag'] = s.strip()               
                    df.at[word, 'NE_tag'] = n.strip()
                    word += 1
        return df

In [3]:
train = process('train')
dev = process('dev')
test = process('test')

In [4]:
print(f"Length of train: {len(train)}")
print(f"Length of  dev:   {len(dev)}")
print(f"Length of  test:  {len(test)}")

Length of train: 203621
Length of  dev:   51362
Length of  test:  46435


In [5]:
print(f"Number of sentences in train: {len(train['sentence'].unique())}")
print(f"Number of sentences in dev:   {len(dev['sentence'].unique())}")
print(f"Number of sentences in test:  {len(test['sentence'].unique())}")

Number of sentences in train: 14041
Number of sentences in dev:   3250
Number of sentences in test:  3453


In [6]:
print("Named entity tags:",sorted(train['NE_tag'].unique()))

Named entity tags: ['B-LOC', 'B-MISC', 'B-ORG', 'B-PER', 'I-LOC', 'I-MISC', 'I-ORG', 'I-PER', 'O']


**Prepare data:**

In [7]:
def prepare_sequence(seq, to_ix):
    
    idxs = [to_ix[w] for w in seq]
    
    return torch.tensor(idxs, dtype=torch.long)

In [8]:
tag_to_ix = {'B-LOC':0, 'B-MISC':1, 'B-ORG':2, 'B-PER':3, 'I-LOC':4, 'I-MISC':5, 'I-ORG':6, 'I-PER':7, 'O':8}

In [9]:
word_to_ix = {}

for word in train['word']:
    if word not in word_to_ix:
        word_to_ix[word] = len(word_to_ix)
for word in dev['word']:
    if word not in word_to_ix:
        word_to_ix[word] = len(word_to_ix)
for word in test['word']:
    if word not in word_to_ix:
        word_to_ix[word] = len(word_to_ix)

In [10]:
vocab_size = len(word_to_ix)

print(vocab_size)

30289


### **Task 2**: 

Implement 3 strategies for loading the embeddings:

In [11]:
glove_embeddings= {}

with open('data/glove.6B.100d.txt', 'r') as file:
    for line in file:
        elements = line.split(' ')
        word = elements[0]
        word_embedding = np.array([float(val) for val in elements[1:]])
        glove_embeddings[word] = word_embedding

In [12]:
embedding_length=len(glove_embeddings['the'])
print("Embedding vector length: ", embedding_length)

Embedding vector length:  100


**(a):** load the embeddings for original capitalization of words. If embedding for this word doesn’t exists, associate it with UNKNOWN embedding (5% of score).

In [13]:
embeddings_matrix_a = np.zeros((vocab_size, 100))
unknown_a=0
for word, ix in word_to_ix.items():
    try:
        embeddings_matrix_a[ix, :] = glove_embeddings[word]
    except KeyError as e:
        embeddings_matrix_a[ix, :] = glove_embeddings['unknown']
        unknown_a+=1

In [14]:
print("Number of unknown: ", unknown_a )

Number of unknown:  15671


**(b).** load the embeddings for lowercased capitalization of words. If embedding for this lowercased word doesn’t exists, associate it with UNKNOWN embedding (5% of score).

In [15]:
embeddings_matrix_b = np.zeros((vocab_size, 100))
unknown_b=0
for word, ix in word_to_ix.items():
    try:
        embeddings_matrix_b[ix, :] = glove_embeddings[word.lower()]
    except KeyError as e:
        embeddings_matrix_b[ix, :] = glove_embeddings['unknown']
        unknown_b+=1        

In [16]:
print("Number of unknown: ", unknown_b)

Number of unknown:  3949


**(c).** load the embeddings for original capitalization of words. If embedding for this word doesn’t exists, try to find the embedding for lowercased version and associate it to the word with original capitalization. Otherwise, associate it with UNKNOWN embedding(20% of score).

In [17]:
embeddings_matrix_c = np.zeros((vocab_size, 100))
unknown_c=0
for word, ix in word_to_ix.items():
    if word in glove_embeddings:
        embeddings_matrix_c[ix, :] = glove_embeddings[word]
    elif word.lower() in glove_embeddings:
        embeddings_matrix_c[ix, :] = glove_embeddings[word.lower()]
    else:
        embeddings_matrix_c[ix, :] = glove_embeddings['unknown']
        unknown_c+=1

In [18]:
print("Number of unknown: ", unknown_c)

Number of unknown:  3949


### **Task 3**: 

Implement training on batches (20% of score).

In [19]:
EMBEDDING_DIM = embedding_length
HIDDEN_DIM = 50
EMBEDDING_MATRIX=embeddings_matrix_c
VOCAB_SIZE=EMBEDDING_MATRIX.shape[0]
TARGET_SIZE=len(tag_to_ix)
DROPOUT=0.2
LSTM_LAYER=2
BATCH_SIZE=25

In [20]:
training_data = []
dev_data = []
test_data = []
train_grouped = train.groupby(['sentence']).agg(lambda x: list(x)).reset_index(drop=True)
for i in range(len(train_grouped)):
    training_data.append((train_grouped.loc[i, 'word'], train_grouped.loc[i, 'NE_tag']))
dev_grouped = dev.groupby(['sentence']).agg(lambda x: list(x)).reset_index(drop=True)
for i in range(len(dev_grouped)):
    dev_data.append((dev_grouped.loc[i, 'word'], dev_grouped.loc[i, 'NE_tag']))
test_grouped = test.groupby(['sentence']).agg(lambda x: list(x)).reset_index(drop=True)
for i in range(len(test_grouped)):
    test_data.append((test_grouped.loc[i, 'word'], test_grouped.loc[i, 'NE_tag']))

In [21]:
training_data [0]

(['EU', 'rejects', 'German', 'call', 'to', 'boycott', 'British', 'lamb', '.'],
 ['B-ORG', 'O', 'B-MISC', 'O', 'O', 'O', 'B-MISC', 'O', 'O'])

In [24]:
class BiLSTMTagger(nn.Module):

    def __init__(self,embedding_dim, hidden_dim, vocab_size, tagset_size):
        super(BiLSTMTagger, self).__init__()
        self.hidden_dim = hidden_dim        
        self.word_embeddings = nn.Embedding(vocab_size, embedding_dim).from_pretrained(
        torch.tensor(EMBEDDING_MATRIX, dtype=torch.float))
        self.word_embeddings.weight.requires_grad = False
        
        # The LSTM takes word embeddings as inputs, and outputs hidden states
        # with dimensionality hidden_dim.
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, bidirectional=True)

        # The linear layer that maps from hidden state space to tag space
        self.hidden2tag = nn.Linear(hidden_dim*2, tagset_size)

    def forward(self, sentence):
        embeds = self.word_embeddings(sentence)
        lstm_out, _ = self.lstm(embeds.view(len(sentence), 1, -1))
        tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))
        tag_scores = F.log_softmax(tag_space, dim=1)
        return tag_scores

In [27]:
model = BiLSTMTagger(EMBEDDING_DIM , HIDDEN_DIM, VOCAB_SIZE, TARGET_SIZE)
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)
losses=[]
epochs=[]

print("Start training..............:")
model.train()
model.zero_grad()
for epoch in range(10): 
    model.train()
    order=list(range(len(training_data)))
    for i in range((len(training_data)+1)//BATCH_SIZE):
        model.zero_grad()
        
        # Step 2. Get our inputs ready for the network, that is, turn them into
        # Tensors of word indices.
        batch_sentences=[prepare_sequence(training_data[idx][0], 
                                          word_to_ix) for idx in order[i*BATCH_SIZE:(i+1)*BATCH_SIZE]]
        batch_tags=[prepare_sequence(training_data[idx][1], 
                                          tag_to_ix) for idx in order[i*BATCH_SIZE:(i+1)*BATCH_SIZE]]
        for sentence_in, targets in zip(batch_sentences,batch_tags):
            # Step 1. Remember that Pytorch accumulates gradients.
            # We need to clear them out before each instance        
            model.zero_grad()

            # Step 3. Run our forward pass.
            tag_scores = model(sentence_in)

            # Step 4. Compute the loss, gradients, and update the parameters by
            #  calling optimizer.step()
            loss = loss_function(tag_scores, targets)
            loss.backward()
            optimizer.step()
    losses.append(loss.item())
    epochs.append(epoch)
    print(f'Epoch {epoch} :___________________________________')
    results_matrix_=results_matr(model, dev_data)
    micro_aver_precision_ = sum(results_matrix_[:,0])/(sum(results_matrix_[:,0])+sum(results_matrix_[:,2]))
    micro_aver_recall_ = sum(results_matrix_[:,0])/(sum(results_matrix_[:,0])+sum(results_matrix_[:,3]))
    f1_=2*micro_aver_precision_*micro_aver_recall_/(micro_aver_precision_+micro_aver_recall_)
    f05_=1.25*micro_aver_precision_*micro_aver_recall_/((0.25*micro_aver_precision_)+micro_aver_recall_)
    print('Micro-average of precision on dev : ', micro_aver_precision_)
    print('Micro-average of recall on dev : ', micro_aver_recall_)
    print('F1 score on dev : ', f1_)
    print('F0.5 score on dev  : ', f05_)
    results_matrix_=results_matr(model, test_data)
    micro_aver_precision_ = sum(results_matrix_[:,0])/(sum(results_matrix_[:,0])+sum(results_matrix_[:,2]))
    micro_aver_recall_ = sum(results_matrix_[:,0])/(sum(results_matrix_[:,0])+sum(results_matrix_[:,3]))
    f1_=2*micro_aver_precision_*micro_aver_recall_/(micro_aver_precision_+micro_aver_recall_)
    f05_=1.25*micro_aver_precision_*micro_aver_recall_/((0.25*micro_aver_precision_)+micro_aver_recall_)
    print('Micro-average of precision on test : ', micro_aver_precision_)
    print('Micro-average of recall on test : ', micro_aver_recall_)
    print('F1 score on test : ', f1_)
    print('F0.5 score on test  : ', f05_)
    print('____________________________________________')

Start training..............:
Epoch 0 :___________________________________
Micro-average of precision on dev :  0.9543242085588567
Micro-average of recall on dev :  0.9543242085588567
F1 score on dev :  0.9543242085588567
F0.5 score on dev  :  0.9543242085588568
Micro-average of precision on test :  0.9468719715731668
Micro-average of recall on test :  0.9468719715731668
F1 score on test :  0.9468719715731668
F0.5 score on test  :  0.9468719715731668
____________________________________________
Epoch 1 :___________________________________
Micro-average of precision on dev :  0.9616058564697636
Micro-average of recall on dev :  0.9616058564697636
F1 score on dev :  0.9616058564697636
F0.5 score on dev  :  0.9616058564697635
Micro-average of precision on test :  0.954797028103801
Micro-average of recall on test :  0.954797028103801
F1 score on test :  0.954797028103801
F0.5 score on test  :  0.9547970281038011
____________________________________________
Epoch 2 :________________________

In [28]:
filename = 'data/bimodel10.pkl'
output = open(filename, 'wb')
pickle.dump(model,output)
with open(filename, 'rb') as pickle_file:
    model_loaded = pickle.load(pickle_file)

In [29]:
model

BiLSTMTagger(
  (word_embeddings): Embedding(30289, 100)
  (lstm): LSTM(100, 50, bidirectional=True)
  (hidden2tag): Linear(in_features=100, out_features=9, bias=True)
)

In [30]:
model_loaded

BiLSTMTagger(
  (word_embeddings): Embedding(30289, 100)
  (lstm): LSTM(100, 50, bidirectional=True)
  (hidden2tag): Linear(in_features=100, out_features=9, bias=True)
)

### **Task 4**: 

Implement the calculation of token-level Precision / Recall / F1 /F 0.5 scores for all classes in average. IMPORTANT! Please, implement “micro-average”(https://tomaxent.com/2018/04/27/Micro-and-Macro-average-of-Precision-Recall-and-F-Score/) approach. Don’t use standard functions from scikit-learn or similar external packages (30% of score).

In [22]:
result_types={'TP':0, 'TN':1, 'FP':2, 'FN':3}
print(tag_to_ix)
print(result_types)
print(len(tag_to_ix))
print(len(result_types))

{'B-LOC': 0, 'B-MISC': 1, 'B-ORG': 2, 'B-PER': 3, 'I-LOC': 4, 'I-MISC': 5, 'I-ORG': 6, 'I-PER': 7, 'O': 8}
{'TP': 0, 'TN': 1, 'FP': 2, 'FN': 3}
9
4


In [23]:
def results_matr(resmodel, testdata):
    results_matrix=np.zeros((len(tag_to_ix), len(result_types)))
    with torch.no_grad():
        for w_t in testdata:
            inputs = prepare_sequence(w_t[0], word_to_ix)
            scores = resmodel(inputs)
            outputs = scores.numpy().argmax(axis=1)
            tags=prepare_sequence(w_t[1], tag_to_ix)
            for i in range(len(outputs)):
                pr=outputs[i]
                tr=tags[i]
                if pr==tr:
                    results_matrix[tr,0]+=1
                    results_matrix[:,1]+=1
                    results_matrix[tr,1]-=1
                else:
                    results_matrix[pr,2]+=1
                    results_matrix[tr,3]+=1
                    results_matrix[:,1]+=1
                    results_matrix[pr,1]-=1
                    results_matrix[tr,1]-=1
    return results_matrix

In [32]:
results_matrix=results_matr(model, dev_data)

In [33]:
results_matrix

array([[1.6590e+03, 4.9426e+04, 9.9000e+01, 1.7800e+02],
       [7.3300e+02, 5.0309e+04, 1.3100e+02, 1.8900e+02],
       [1.1230e+03, 4.9824e+04, 1.9700e+02, 2.1800e+02],
       [1.7390e+03, 4.9406e+04, 1.1400e+02, 1.0300e+02],
       [2.0300e+02, 5.1062e+04, 4.3000e+01, 5.4000e+01],
       [1.9400e+02, 5.0942e+04, 7.4000e+01, 1.5200e+02],
       [5.1900e+02, 5.0478e+04, 1.3300e+02, 2.3200e+02],
       [1.2280e+03, 5.0007e+04, 4.8000e+01, 7.9000e+01],
       [4.2419e+04, 7.8970e+03, 7.0600e+02, 3.4000e+02]])

**Micro-average Method**

In [34]:
micro_aver_precision = sum(results_matrix[:,0])/(sum(results_matrix[:,0])+sum(results_matrix[:,2]))
micro_aver_recall = sum(results_matrix[:,0])/(sum(results_matrix[:,0])+sum(results_matrix[:,3]))
f1=2*micro_aver_precision*micro_aver_recall/(micro_aver_precision+micro_aver_recall)
f05=1.25*micro_aver_precision*micro_aver_recall/((0.25*micro_aver_precision)+micro_aver_recall)

In [35]:
print('Micro-average of precision : ', micro_aver_precision)
print('Micro-average of recall : ', micro_aver_recall)
print('F1 score : ', f1)
print('F0.5 score : ', f05)

Micro-average of precision :  0.9699193956621627
Micro-average of recall :  0.9699193956621627
F1 score :  0.9699193956621627
F0.5 score :  0.9699193956621627


*If class A is predicted and the true label is B, then there is a FP for A and a FN for B. If the prediction is correct, i.e. class A is predicted and A is also the true label, denn there is neither a false positive nor a false negative but only a true positive. So there is no possibility that would increase only FP or FN but not both. That is why precision and recall are always the same when using the micro averaging scheme.* - https://simonhessner.de/why-are-precision-recall-and-f1-score-equal-when-using-micro-averaging-in-a-multi-class-problem/

In [36]:
micro_aver_precision_only_for_classes = sum(results_matrix[:8,0])/(sum(results_matrix[:8,0])+sum(results_matrix[:8,2]))
micro_aver_recall_only_for_classes = sum(results_matrix[:8,0])/(sum(results_matrix[:8,0])+sum(results_matrix[:8,3]))

In [37]:
print('Micro-average of precision only for NE tags : ', micro_aver_precision_only_for_classes)
print('Micro-average of recall only for NE tags : ', micro_aver_recall_only_for_classes)

Micro-average of precision only for NE tags :  0.8981425276192788
Micro-average of recall only for NE tags :  0.8599325816575614


*Besides micro averaging, one might also consider weighted averaging in case of an unequally distributed data set.*

**Macro-average Method**

In [38]:
precisions=results_matrix[:,0]/(results_matrix[:,0]+results_matrix[:,2])
recalls=results_matrix[:,0]/(results_matrix[:,0]+results_matrix[:,3])
print(precisions)
print(recalls)

[0.94368601 0.84837963 0.85075758 0.93847814 0.82520325 0.7238806
 0.79601227 0.96238245 0.98362899]
[0.90310289 0.79501085 0.83743475 0.94408252 0.78988327 0.56069364
 0.69107856 0.93955624 0.99204846]


In [39]:
macro_aver_precision = sum(precisions)/9
macro_aver_recall = sum(recalls)/9
macro_f1=2*macro_aver_precision*macro_aver_recall/(macro_aver_precision+macro_aver_recall)
macro_f05=1.25*macro_aver_precision*macro_aver_recall/((0.25*macro_aver_precision)+macro_aver_recall)

In [40]:
print('Macro-average of precision : ', macro_aver_precision)
print('Macro-average of recall : ', macro_aver_recall)
print('Macro F1 score : ', macro_f1)
print('Macro F0.5 score : ',macro_f05)

Macro-average of precision :  0.874712100599839
Macro-average of recall :  0.8280990184022845
Macro F1 score :  0.8507675617197139
Macro F0.5 score :  0.8649743471556832


### **Task 5**: 

5. Provide the report the performances (F1 and F0.5 scores) on the dev/ test subsets w.r.t epoch number during the training for the first 5 epochs for each strategy of loading the embeddings (10% of score).

**strategy a**

In [31]:
EMBEDDING_DIM = embedding_length
HIDDEN_DIM = 50
EMBEDDING_MATRIX=embeddings_matrix_a
VOCAB_SIZE=EMBEDDING_MATRIX.shape[0]
TARGET_SIZE=len(tag_to_ix)

In [32]:
class BiLSTMTagger_M(nn.Module):

    def __init__(self,EMBEDDING_MATRIX,embedding_dim, hidden_dim, vocab_size, tagset_size):
        super(BiLSTMTagger_M, self).__init__()
        self.hidden_dim = hidden_dim
        
        self.word_embeddings = nn.Embedding(vocab_size, embedding_dim).from_pretrained(
        torch.tensor(EMBEDDING_MATRIX, dtype=torch.float))
        self.word_embeddings.weight.requires_grad = False
        
        # The LSTM takes word embeddings as inputs, and outputs hidden states
        # with dimensionality hidden_dim.
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, bidirectional=True)

        # The linear layer that maps from hidden state space to tag space
        self.hidden2tag = nn.Linear(hidden_dim*2, tagset_size)

    def forward(self, sentence):
        embeds = self.word_embeddings(sentence)
        lstm_out, _ = self.lstm(embeds.view(len(sentence), 1, -1))
        tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))
        tag_scores = F.log_softmax(tag_space, dim=1)
        return tag_scores

In [33]:
model_a = BiLSTMTagger_M(EMBEDDING_MATRIX,EMBEDDING_DIM , HIDDEN_DIM, VOCAB_SIZE, TARGET_SIZE)
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model_a.parameters(), lr=0.1)

print("Start training strategy a..............:")
for epoch in range(5): 
    for sentence, tags in training_data:
        # Step 1. Remember that Pytorch accumulates gradients.
        # We need to clear them out before each instance        
        model_a.zero_grad()

        # Step 2. Get our inputs ready for the network, that is, turn them into
        # Tensors of word indices.
        sentence_in = prepare_sequence(sentence, word_to_ix)
        targets = prepare_sequence(tags, tag_to_ix)

        
        # Step 3. Run our forward pass.
        tag_scores = model_a(sentence_in)

        # Step 4. Compute the loss, gradients, and update the parameters by
        #  calling optimizer.step()
        loss = loss_function(tag_scores, targets)
        loss.backward()
        optimizer.step()
    print(f'Epoch {epoch} :___________________________________')
    results_matrix_=results_matr(model_a, dev_data)
    micro_aver_precision_ = sum(results_matrix_[:,0])/(sum(results_matrix_[:,0])+sum(results_matrix_[:,2]))
    micro_aver_recall_ = sum(results_matrix_[:,0])/(sum(results_matrix_[:,0])+sum(results_matrix_[:,3]))
    f1_=2*micro_aver_precision_*micro_aver_recall_/(micro_aver_precision_+micro_aver_recall_)
    f05_=1.25*micro_aver_precision_*micro_aver_recall_/((0.25*micro_aver_precision_)+micro_aver_recall_)
    print('Micro-average of precision on dev : ', micro_aver_precision_)
    print('Micro-average of recall on dev : ', micro_aver_recall_)
    print('F1 score on dev : ', f1_)
    print('F0.5 score on dev  : ', f05_)
    results_matrix_=results_matr(model_a, test_data)
    micro_aver_precision_ = sum(results_matrix_[:,0])/(sum(results_matrix_[:,0])+sum(results_matrix_[:,2]))
    micro_aver_recall_ = sum(results_matrix_[:,0])/(sum(results_matrix_[:,0])+sum(results_matrix_[:,3]))
    f1_=2*micro_aver_precision_*micro_aver_recall_/(micro_aver_precision_+micro_aver_recall_)
    f05_=1.25*micro_aver_precision_*micro_aver_recall_/((0.25*micro_aver_precision_)+micro_aver_recall_)
    print('Micro-average of precision on test : ', micro_aver_precision_)
    print('Micro-average of recall on test : ', micro_aver_recall_)
    print('F1 score on test : ', f1_)
    print('F0.5 score on test  : ', f05_)
    print('____________________________________________')

Start training strategy a..............:
Epoch 0 :___________________________________
Micro-average of precision on dev :  0.9024375997819399
Micro-average of recall on dev :  0.9024375997819399
F1 score on dev :  0.9024375997819399
F0.5 score on dev  :  0.9024375997819399
Micro-average of precision on test :  0.8970819425002692
Micro-average of recall on test :  0.8970819425002692
F1 score on test :  0.8970819425002692
F0.5 score on test  :  0.8970819425002693
____________________________________________
Epoch 1 :___________________________________
Micro-average of precision on dev :  0.909466142284179
Micro-average of recall on dev :  0.909466142284179
F1 score on dev :  0.909466142284179
F0.5 score on dev  :  0.909466142284179
Micro-average of precision on test :  0.9013675029611284
Micro-average of recall on test :  0.9013675029611284
F1 score on test :  0.9013675029611284
F0.5 score on test  :  0.9013675029611284
____________________________________________
Epoch 2 :______________

**strategy b**

In [34]:
EMBEDDING_DIM = embedding_length
HIDDEN_DIM = 50
EMBEDDING_MATRIX=embeddings_matrix_b
VOCAB_SIZE=EMBEDDING_MATRIX.shape[0]
TARGET_SIZE=len(tag_to_ix)

In [35]:
model_b = BiLSTMTagger_M(EMBEDDING_MATRIX,EMBEDDING_DIM , HIDDEN_DIM, VOCAB_SIZE, TARGET_SIZE)
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model_b.parameters(), lr=0.1)

print("Start training strategy b..............:")
for epoch in range(5): 
    for sentence, tags in training_data:
        # Step 1. Remember that Pytorch accumulates gradients.
        # We need to clear them out before each instance        
        model_b.zero_grad()

        # Step 2. Get our inputs ready for the network, that is, turn them into
        # Tensors of word indices.
        sentence_in = prepare_sequence(sentence, word_to_ix)
        targets = prepare_sequence(tags, tag_to_ix)

        
        # Step 3. Run our forward pass.
        tag_scores = model_b(sentence_in)

        # Step 4. Compute the loss, gradients, and update the parameters by
        #  calling optimizer.step()
        loss = loss_function(tag_scores, targets)
        loss.backward()
        optimizer.step()
    print(f'Epoch {epoch} :___________________________________')
    results_matrix_=results_matr(model_b, dev_data)
    micro_aver_precision_ = sum(results_matrix_[:,0])/(sum(results_matrix_[:,0])+sum(results_matrix_[:,2]))
    micro_aver_recall_ = sum(results_matrix_[:,0])/(sum(results_matrix_[:,0])+sum(results_matrix_[:,3]))
    f1_=2*micro_aver_precision_*micro_aver_recall_/(micro_aver_precision_+micro_aver_recall_)
    f05_=1.25*micro_aver_precision_*micro_aver_recall_/((0.25*micro_aver_precision_)+micro_aver_recall_)
    print('Micro-average of precision on dev : ', micro_aver_precision_)
    print('Micro-average of recall on dev : ', micro_aver_recall_)
    print('F1 score on dev : ', f1_)
    print('F0.5 score on dev  : ', f05_)
    results_matrix_=results_matr(model_b, test_data)
    micro_aver_precision_ = sum(results_matrix_[:,0])/(sum(results_matrix_[:,0])+sum(results_matrix_[:,2]))
    micro_aver_recall_ = sum(results_matrix_[:,0])/(sum(results_matrix_[:,0])+sum(results_matrix_[:,3]))
    f1_=2*micro_aver_precision_*micro_aver_recall_/(micro_aver_precision_+micro_aver_recall_)
    f05_=1.25*micro_aver_precision_*micro_aver_recall_/((0.25*micro_aver_precision_)+micro_aver_recall_)
    print('Micro-average of precision on test : ', micro_aver_precision_)
    print('Micro-average of recall on test : ', micro_aver_recall_)
    print('F1 score on test : ', f1_)
    print('F0.5 score on test  : ', f05_)
    print('____________________________________________')

Start training strategy b..............:
Epoch 0 :___________________________________
Micro-average of precision on dev :  0.9503134613138118
Micro-average of recall on dev :  0.9503134613138118
F1 score on dev :  0.9503134613138118
F0.5 score on dev  :  0.9503134613138118
Micro-average of precision on test :  0.9452998815548617
Micro-average of recall on test :  0.9452998815548617
F1 score on test :  0.9452998815548617
F0.5 score on test  :  0.9452998815548616
____________________________________________
Epoch 1 :___________________________________
Micro-average of precision on dev :  0.9599119971963709
Micro-average of recall on dev :  0.9599119971963709
F1 score on dev :  0.9599119971963709
F0.5 score on dev  :  0.9599119971963709
Micro-average of precision on test :  0.9540432863142027
Micro-average of recall on test :  0.9540432863142027
F1 score on test :  0.9540432863142027
F0.5 score on test  :  0.9540432863142025
____________________________________________
Epoch 2 :__________

**strategy c**

In [36]:
EMBEDDING_DIM = embedding_length
HIDDEN_DIM = 50
EMBEDDING_MATRIX=embeddings_matrix_c
VOCAB_SIZE=EMBEDDING_MATRIX.shape[0]
TARGET_SIZE=len(tag_to_ix)

In [37]:
model_c = BiLSTMTagger_M(EMBEDDING_MATRIX,EMBEDDING_DIM , HIDDEN_DIM, VOCAB_SIZE, TARGET_SIZE)
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model_c.parameters(), lr=0.1)

print("Start training strategy c..............:")
for epoch in range(5): 
    for sentence, tags in training_data:
        # Step 1. Remember that Pytorch accumulates gradients.
        # We need to clear them out before each instance        
        model_c.zero_grad()

        # Step 2. Get our inputs ready for the network, that is, turn them into
        # Tensors of word indices.
        sentence_in = prepare_sequence(sentence, word_to_ix)
        targets = prepare_sequence(tags, tag_to_ix)

        
        # Step 3. Run our forward pass.
        tag_scores = model_c(sentence_in)

        # Step 4. Compute the loss, gradients, and update the parameters by
        #  calling optimizer.step()
        loss = loss_function(tag_scores, targets)
        loss.backward()
        optimizer.step()
    print(f'Epoch {epoch} :___________________________________')
    results_matrix_=results_matr(model_c, dev_data)
    micro_aver_precision_ = sum(results_matrix_[:,0])/(sum(results_matrix_[:,0])+sum(results_matrix_[:,2]))
    micro_aver_recall_ = sum(results_matrix_[:,0])/(sum(results_matrix_[:,0])+sum(results_matrix_[:,3]))
    f1_=2*micro_aver_precision_*micro_aver_recall_/(micro_aver_precision_+micro_aver_recall_)
    f05_=1.25*micro_aver_precision_*micro_aver_recall_/((0.25*micro_aver_precision_)+micro_aver_recall_)
    print('Micro-average of precision on dev : ', micro_aver_precision_)
    print('Micro-average of recall on dev : ', micro_aver_recall_)
    print('F1 score on dev : ', f1_)
    print('F0.5 score on dev  : ', f05_)
    results_matrix_=results_matr(model_c, test_data)
    micro_aver_precision_ = sum(results_matrix_[:,0])/(sum(results_matrix_[:,0])+sum(results_matrix_[:,2]))
    micro_aver_recall_ = sum(results_matrix_[:,0])/(sum(results_matrix_[:,0])+sum(results_matrix_[:,3]))
    f1_=2*micro_aver_precision_*micro_aver_recall_/(micro_aver_precision_+micro_aver_recall_)
    f05_=1.25*micro_aver_precision_*micro_aver_recall_/((0.25*micro_aver_precision_)+micro_aver_recall_)
    print('Micro-average of precision on test : ', micro_aver_precision_)
    print('Micro-average of recall on test : ', micro_aver_recall_)
    print('F1 score on test : ', f1_)
    print('F0.5 score on test  : ', f05_)
    print('____________________________________________')

Start training strategy c..............:
Epoch 0 :___________________________________
Micro-average of precision on dev :  0.9521436081149488
Micro-average of recall on dev :  0.9521436081149488
F1 score on dev :  0.9521436081149488
F0.5 score on dev  :  0.9521436081149489
Micro-average of precision on test :  0.9457736621083235
Micro-average of recall on test :  0.9457736621083235
F1 score on test :  0.9457736621083235
F0.5 score on test  :  0.9457736621083235
____________________________________________
Epoch 1 :___________________________________
Micro-average of precision on dev :  0.9602040418986799
Micro-average of recall on dev :  0.9602040418986799
F1 score on dev :  0.9602040418986799
F0.5 score on dev  :  0.9602040418986798
Micro-average of precision on test :  0.9540863572736082
Micro-average of recall on test :  0.9540863572736082
F1 score on test :  0.9540863572736082
F0.5 score on test  :  0.9540863572736082
____________________________________________
Epoch 2 :__________

In [40]:
model_c

BiLSTMTagger_M(
  (word_embeddings): Embedding(30289, 100)
  (lstm): LSTM(100, 50, bidirectional=True)
  (hidden2tag): Linear(in_features=100, out_features=9, bias=True)
)

In [43]:
filename = 'data/bimodel_c5.pkl'
output = open(filename, 'wb')
pickle.dump(model_c,output)
with open(filename, 'rb') as pickle_file:
    model_c_loaded = pickle.load(pickle_file)
filename = 'data/bimodel_a5.pkl'
output = open(filename, 'wb')
pickle.dump(model_a,output)
with open(filename, 'rb') as pickle_file:
    model_a_loaded = pickle.load(pickle_file)
filename = 'data/bimodel_b5.pkl'
output = open(filename, 'wb')
pickle.dump(model_b,output)
with open(filename, 'rb') as pickle_file:
    model_b_loaded = pickle.load(pickle_file)