<a href="https://colab.research.google.com/github/BDR2939/NLP/blob/main/Assignment_3_NER.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment 3
Training a neural named entity recognition (NER) tagger 

In [None]:
# ! pip install alive-progress

In [None]:
import torch
import torch.nn as nn
import os 
import numpy as np
from random import shuffle
from sklearn.metrics import classification_report
from tqdm import tqdm
import pandas as pd
from google.colab import drive
from tqdm.notebook import tqdm_notebook
from IPython.display import display
import warnings
warnings.filterwarnings("ignore")

In [None]:
# general folder path
drive_path = '/content/gdrive'
drive_saving_path = '/content/gdrive/My Drive'
drive.mount(drive_path)

Mounted at /content/gdrive


In [None]:
%cd /content/gdrive/My Drive

/content/gdrive/My Drive


In [None]:
! git clone https://github.com/BDR2939/NLP.git


Cloning into 'NLP'...
remote: Enumerating objects: 320, done.[K
remote: Counting objects: 100% (83/83), done.[K
remote: Compressing objects: 100% (59/59), done.[K
remote: Total 320 (delta 45), reused 61 (delta 24), pack-reused 237[K
Receiving objects: 100% (320/320), 1.41 MiB | 7.17 MiB/s, done.
Resolving deltas: 100% (148/148), done.


In [None]:
USER = 'DRIVE'  # 'OR' / 'RONY'
if USER == 'OR':
    main_folder =main_folder = r'C:\MSC\NLP2\HW3'
elif USER == 'DRIVE' :
    main_folder = folder_path = os.path.join(drive_saving_path, 'NLP/HW3/data')


In this assignment you are required to build a full training and testing pipeline for a neural sequentail tagger for named entities, using LSTM.

The dataset that you will be working on is called ReCoNLL 2003, which is a corrected version of the CoNLL 2003 dataset: https://www.clips.uantwerpen.be/conll2003/ner/

[Train data](https://drive.google.com/file/d/1hG66e_OoezzeVKho1w7ysyAx4yp0ShDz/view?usp=sharing)

[Dev data](https://drive.google.com/file/d/1EAF-VygYowU1XknZhvzMi2CID65I127L/view?usp=sharing)

[Test data](https://drive.google.com/file/d/16gug5wWnf06JdcBXQbcICOZGZypgr4Iu/view?usp=sharing)

As you can see, the annotated texts are labeled according to the IOB annotation scheme, for 3 entity types: Person, Organization, Location.

**Task 1:** Write a funtion for reading the data from a single file (of the ones that are provided above). The function recieves a filepath and then it encodes every sentence individually using a pair of lists, one list contains the words and one list contains the tags. Each list pair will be added to a general list (data), which will be returned back from the function.

## set path's

In [None]:
train_path = os.path.join(main_folder, 'connl03_train.txt')
test_path = os.path.join(main_folder, 'connl03_test.txt')
dev_path = os.path.join(main_folder, 'connl03_dev.txt')

In [None]:
def read_data(filepath):
    data = []
    with open(filepath) as file:
        words = []
        labels = []

        for index, line in enumerate(file, start=1):
            if line != '\n':
                word, label = line.split()
                words.append(word)
                labels.append(label)
            else:
                data.append((words, labels))
                words = []
                labels = []
    
    return data

train = read_data(train_path)
dev = read_data(test_path)
test = read_data(dev_path)

The following Vocab class can be served as a dictionary that maps words and tags into Ids. The UNK_TOKEN should be used for words that are not part of the training data.

In [None]:
UNK_TOKEN = 0


class Vocab:
    def __init__(self):
        """
        tag2id/id2tag  - tags to each other from label to integer number
        n_words - count the # of word in sentence
        """
        self.word2id = {"__unk__": UNK_TOKEN}
        self.id2word = {UNK_TOKEN: "__unk__"}
        self.n_words = 1
        
        self.tag2id = {"O":0, "B-PER":1, "I-PER": 2, "B-LOC": 3, "I-LOC": 4, "B-ORG": 5, "I-ORG": 6}
        self.id2tag = {0:"O", 1:"B-PER", 2:"I-PER", 3:"B-LOC", 4:"I-LOC", 5:"B-ORG", 6:"I-ORG"}
    
    
    def index_words(self, words):
        """
        for given token list get token index in sentence
        """
        word_indexes = [self.index_word(w) for w in words]
        return word_indexes


    def index_tags(self, tags):
        """
        for given label list get label index
        """
        tag_indexes = [self.tag2id[t] for t in tags]
        return tag_indexes
    

    def index_word(self, w):
        """
     
        """
        if w not in self.word2id:
            self.word2id[w] = self.n_words
            self.id2word[self.n_words] = w
            self.n_words += 1
        
        return self.word2id[w]
    

**Task 2:** Write a function prepare_data that takes one of the [train, dev, test] and the Vocab instance, for converting each pair of (words,tags) to a pair of indexes. Each pair should be added to data_sequences, which will be returned back from the function.

In [None]:
vocab = Vocab()

def prepare_data(data, vocab):
    data_sequences = []
    """
    this loop run on the data, for each sequence we generating tesor to
    contain the token of sequence
    """
    for i_words, i_tags in data:
        
        words_indexes_tensor = torch.tensor(vocab.index_words(i_words), dtype=torch.long)
        tags_indexes_tensor = torch.tensor(vocab.index_tags(i_tags), dtype=torch.long)
        # append data and label tensors
        data_sequences.append((words_indexes_tensor, tags_indexes_tensor))

    return data_sequences, vocab

train_sequences, vocab = prepare_data(train, vocab)
dev_sequences, vocab = prepare_data(dev, vocab)
test_sequences, vocab = prepare_data(test, vocab)

**Task 3:** Write NERNet, a PyTorch Module for labeling words with NER tags. 

*input_size:* the size of the vocabulary

*embedding_size:* the size of the embeddings

*hidden_size:* the LSTM hidden size

*output_size:* the number tags we are predicting for

*n_layers:* the number of layers we want to use in LSTM

*directions:* could 1 or 2, indicating unidirectional or bidirectional LSTM, respectively

The input for your forward function should be a single sentence tensor.

*note:* the embeddings in this section are learned embedding. That means that you don't need to use pretrained embedding like the one used in class. You will use them in part 5

In [None]:
class NERNet(nn.Module):
    
    def __init__(self, input_size, embedding_size, hidden_size, output_size, n_layers, directions):
        super(NERNet, self).__init__()
        self.embedding = nn.Embedding(input_size, embedding_size)
        self.lstm = nn.LSTM(embedding_size, hidden_size, n_layers, bidirectional=(True if directions==2 else False))
        self.out = nn.Linear(hidden_size*directions, output_size)

        self.hidden_size = hidden_size
        self.n_layers = n_layers
        self.directions = directions


    def forward(self, input_sentence):        
        # get sentence token numbers to understand output & input size
        dimension = len(input_sentence)
        
        # initial the hidden to None because none sentence inter
        hidden = None

        # 1. foward input sentence into the embeding
        embedded = self.embedding(input_sentence)

        # 2. foward embedding to LSTM
        lstm_output, _ = self.lstm(embedded.view(dimension, 1, -1), hidden) # The view function is meant to reshape the tensor https://stackoverflow.com/a/48650355/7786691

        # 3. foward to get predictions  - linear transformation to the incoming data
        output = self.out(lstm_output.view(dimension, -1)) 

        return output


## help function to train % evaluate 

In [None]:
def get_model_results(model, test_sequences):
    """
    

    Parameters
    ----------
    model : Torch model  - 
        DESCRIPTION: LSTM model.
    test_sequences : list
        DESCRIPTION: input list of coupels [[word_tensor, lebel_tensor] , ...]
    
    the function get model results
    
    Returns
    -------
    all_test_words_pred : list
    all_test_words_true : list
    binary_test_words_pred : list
    binary_test_words_true : list
    """
    # generate test tokens prediction
    all_test_words_pred = []
    all_test_words_true = []

    # generate test binnary prediction
    binary_test_words_pred = []
    binary_test_words_true = []
    for sentence, labels in test_sequences:
        sentence_tensor = torch.LongTensor(sentence).cuda()
        labels_tensor = torch.LongTensor(labels).cuda()

        _, pred_labels = model(sentence_tensor).T.max(0)

        all_test_words_pred += pred_labels.tolist()
        all_test_words_true += labels.tolist()

        binary_test_words_pred += [1 if i >= 1 else i for i in all_test_words_pred]
        binary_test_words_true += [1 if i >= 1 else i for i in all_test_words_true]
    return all_test_words_pred, all_test_words_true, binary_test_words_pred, binary_test_words_true

**Task 4:** write a training loop, which takes a model (instance of NERNet) and number of epochs to train on. The loss is always CrossEntropyLoss and the optimizer is always Adam.

In [None]:
def train_loop(model, n_epochs, train_sequences):
    #
    all_target_names = ["O", "B-PER", "I-PER", "B-LOC", "I-LOC", "B-ORG", "I-ORG"]
    binary_target_names = ["O", "OTHERS"]
    
    # Loss function
    criterion = nn.CrossEntropyLoss()

    # Optimizer (ADAM is a fancy version of SGD)
    optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
  
    # shuffle data before training phase
    shuffle(train_sequences)
    STEP = 400 
    curr_f1_accuracy_result = 0
    best_f1_accuracy_result = 0
    best_df = pd.DataFrame()
    for e in range(1, n_epochs + 1):
        # print('start ephoc #' + str(e),  flush = True)
        """
        tqdm - add progress bar
        """
        desc = ('Ephoc #' + str(e))
        for sequence_idx in tqdm_notebook(range(train_sequences.__len__()), desc = desc):
            
            
            # get sentence tokens, and labels 
            sentence, labels = train_sequences[sequence_idx]
            
            # check if there is empty sentence
            if labels.__len__() == 0:
                continue
            
            # insert sentence tokens into tensor
            sentence_tensor = torch.LongTensor(sentence).cuda()
            
            # insert sentence labels into tensor
            labels_tensor = torch.LongTensor(labels).cuda()
            
            # Sets the gradients of all optimized to zero.
            model.zero_grad()
            
            # foward sentence to model
            scores = model(sentence_tensor)
            
            # Computes the gradient of current tensor
            criterion(scores, labels_tensor).backward()
            
            # once the gradients are computed use them to optimize model
            optimizer.step()

        
        # print('finshed ephoc #' + str(e) + ', ephoch results:' , flush = True)
        all_train_words_pred, all_train_words_true, \
        binary_train_words_pred, binary_train_words_true = get_model_results(model, train_sequences)
        train_Results_df = pd.DataFrame(classification_report(all_train_words_true, all_train_words_pred, target_names=all_target_names, output_dict = True))
        curr_f1_accuracy_result = train_Results_df.iloc[2]['accuracy']

        
        if curr_f1_accuracy_result > best_f1_accuracy_result:
          improve_string = 'f1-accuracy-score improve from ' + str(best_f1_accuracy_result) + ' to ' + str(curr_f1_accuracy_result) 
          best_f1_accuracy_result = curr_f1_accuracy_result
          best_df = train_Results_df
        else:
          improve_string = 'f1-accuracy-score did not improve from ' + str(best_f1_accuracy_result)  
        print(improve_string, flush = True)
       
    best_string = 'best-f1-accuracy-score is '+ str(best_f1_accuracy_result)  
    print(best_string, flush = True)
    return best_df


**Task 5:** write an evaluation loop on a trained model, using the dev and test datasets. This function print the true positive rate (TPR), also known as Recall and the opposite to false positive rate (FPR), also known as precision, of each label seperately (7 labels in total), and for all the 6 labels (except O) together. The caption argument for the function should be served for printing, so that when you print include it as a prefix.

In [None]:
def evaluate(model, caption, test_sequences, dev_sequences):
    # from Piazza: https://piazza.com/class/klxc3m1tzqz2o8?cid=59

    all_target_names = ["O", "B-PER", "I-PER", "B-LOC", "I-LOC", "B-ORG", "I-ORG"]
    binary_target_names = ["O", "OTHERS"]
    
    # self.tag2id = {"O":0, "B-PER":1, "I-PER": 2, "B-LOC": 3, "I-LOC": 4, "B-ORG": 5, "I-ORG": 6}
    # self.id2tag = {0:"O", 1:"B-PER", 2:"I-PER", 3:"B-LOC", 4:"I-LOC", 5:"B-ORG", 6:"I-ORG"}
    
    print(f"****************    Results for {caption}    ****************")



    # generate dev tokens prediction 
    all_dev_words_pred = []
    all_dev_words_true = []
    
    # generate dev binnary prediction 
    binary_dev_words_pred = []
    binary_dev_words_true = []

    # get test results
    all_test_words_pred, all_test_words_true, \
        binary_test_words_pred, binary_test_words_true = get_model_results(model, test_sequences)

    # get dev results
    all_dev_words_pred, all_dev_words_true, \
        binary_dev_words_pred, binary_dev_words_true = get_model_results(model, dev_sequences)

    print("Test Results:")
    Test_Results_dict = pd.DataFrame(classification_report(all_test_words_true, all_test_words_pred, target_names=all_target_names,  output_dict = True))
    display(Test_Results_dict.T)
    print("Dev Results:")
    Dev_Results_dict = pd.DataFrame(classification_report(all_dev_words_true, all_dev_words_pred, target_names=all_target_names, output_dict = True))
    display(Dev_Results_dict.T)

    print("Binary Test Results:")
    Binary_Test_Results = pd.DataFrame(classification_report(binary_test_words_true, binary_test_words_pred, target_names=binary_target_names, output_dict = True))
    display(Binary_Test_Results.T)

    print("Binary Dev Results:")
    Binary_Dev_Results  = pd.DataFrame(classification_report(binary_dev_words_true, binary_dev_words_pred, target_names=binary_target_names, output_dict = True))
    display(Binary_Dev_Results.T)

    return 

**Task 6:** Train and evaluate a few models, all with embedding_size=300, and with the following hyper parameters (you may use that as captions for the models as well):

Model 1: (hidden_size: 500, n_layers: 1, directions: 1)

Model 2: (hidden_size: 500, n_layers: 2, directions: 1)

Model 3: (hidden_size: 500, n_layers: 3, directions: 1)

Model 4: (hidden_size: 500, n_layers: 1, directions: 2)

Model 5: (hidden_size: 500, n_layers: 2, directions: 2)

Model 6: (hidden_size: 500, n_layers: 3, directions: 2)

Model 4: (hidden_size: 800, n_layers: 1, directions: 2)

Model 5: (hidden_size: 800, n_layers: 2, directions: 2)

Model 6: (hidden_size: 800, n_layers: 3, directions: 2)

In [None]:
EMBEDDING_SIZE = 300
EPOCHS = 10
HIDDEN_SIZE  = 500 
INPUT_SIZE = len(vocab.word2id) # 8955
OUTPUT_SIZE = len(vocab.tag2id) # 7

n_layers_array = np.arange(1,4)
directions_array = np.arange(1,3)

# n_layers_array = np.arange(1,2)
# directions_array = np.arange(1,2)
model_list  = []
train_res_list = [] 
for i_n_layers in n_layers_array:
    for i_directions in directions_array:
        print('----------------------------------------------------------')
        print('Train model using:\n' + \
              '  1)hidden_size = ' + str(HIDDEN_SIZE)+'\n'+ \
              '  2)n_layers = ' + str(i_n_layers) + '\n'+ \
              '  3)directions = ' + str(i_directions) , flush = True)
        
        model = NERNet(INPUT_SIZE, EMBEDDING_SIZE, HIDDEN_SIZE, OUTPUT_SIZE, int(i_n_layers), int(i_directions)).cuda()
        train_res = train_loop(model, EPOCHS, train_sequences)
        model_list.append(model)
        train_res_list.append(train_res)

DIRECTION = 2
HIDDEN_SIZE = 800
for i_n_layers in n_layers_array:
        print('----------------------------------------------------------')

        print('Train model using:\n'+ \
              '  1)hidden_size = ' + str(HIDDEN_SIZE)+'\n'+ \
              '  2)n_layers = ' + str(i_n_layers) + '\n'+ \
              '  3)directions = ' + str(i_directions) , flush = True )
        model = NERNet(INPUT_SIZE, EMBEDDING_SIZE, HIDDEN_SIZE, OUTPUT_SIZE, i_n_layers, DIRECTION).cuda()
        train_res = train_loop(model, EPOCHS, train_sequences)
        model_list.append(model)

        

----------------------------------------------------------
Train model using:
  1)hidden_size = 500
  2)n_layers = 1
  3)directions = 1


Ephoc #1:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0 to 0.8380098452883263


Ephoc #2:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.8380098452883263 to 0.889381153305204


Ephoc #3:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.889381153305204 to 0.9168424753867792


Ephoc #4:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9168424753867792 to 0.9385372714486638


Ephoc #5:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9385372714486638 to 0.9571378340365683


Ephoc #6:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9571378340365683 to 0.9746835443037974


Ephoc #7:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9746835443037974 to 0.979535864978903


Ephoc #8:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.979535864978903 to 0.9882559774964839


Ephoc #9:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9882559774964839 to 0.9939873417721519


Ephoc #10:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9939873417721519 to 0.9965189873417721
best-f1-accuracy-score is 0.9965189873417721
----------------------------------------------------------
Train model using:
  1)hidden_size = 500
  2)n_layers = 1
  3)directions = 2


Ephoc #1:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0 to 0.8730661040787623


Ephoc #2:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.8730661040787623 to 0.9186708860759494


Ephoc #3:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9186708860759494 to 0.9479957805907173


Ephoc #4:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9479957805907173 to 0.9715893108298171


Ephoc #5:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9715893108298171 to 0.9871659634317862


Ephoc #6:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9871659634317862 to 0.9940225035161744


Ephoc #7:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9940225035161744 to 0.9964486638537271


Ephoc #8:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9964486638537271 to 0.9981715893108298


Ephoc #9:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9981715893108298 to 0.9990857946554149


Ephoc #10:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score did not improve from 0.9990857946554149
best-f1-accuracy-score is 0.9990857946554149
----------------------------------------------------------
Train model using:
  1)hidden_size = 500
  2)n_layers = 2
  3)directions = 1


Ephoc #1:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0 to 0.834001406469761


Ephoc #2:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.834001406469761 to 0.8966947960618846


Ephoc #3:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.8966947960618846 to 0.9339310829817159


Ephoc #4:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9339310829817159 to 0.9635021097046413


Ephoc #5:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9635021097046413 to 0.9821026722925457


Ephoc #6:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9821026722925457 to 0.990014064697609


Ephoc #7:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.990014064697609 to 0.990295358649789


Ephoc #8:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.990295358649789 to 0.9952883263009845


Ephoc #9:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9952883263009845 to 0.9984880450070324


Ephoc #10:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9984880450070324 to 0.9989451476793249
best-f1-accuracy-score is 0.9989451476793249
----------------------------------------------------------
Train model using:
  1)hidden_size = 500
  2)n_layers = 2
  3)directions = 2


Ephoc #1:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0 to 0.895464135021097


Ephoc #2:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.895464135021097 to 0.9485583684950774


Ephoc #3:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9485583684950774 to 0.9778481012658228


Ephoc #4:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9778481012658228 to 0.9838255977496484


Ephoc #5:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9838255977496484 to 0.9942334739803094


Ephoc #6:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9942334739803094 to 0.9961673699015471


Ephoc #7:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score did not improve from 0.9961673699015471


Ephoc #8:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9961673699015471 to 0.9986286919831223


Ephoc #9:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score did not improve from 0.9986286919831223


Ephoc #10:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score did not improve from 0.9986286919831223
best-f1-accuracy-score is 0.9986286919831223
----------------------------------------------------------
Train model using:
  1)hidden_size = 500
  2)n_layers = 3
  3)directions = 1


Ephoc #1:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0 to 0.8406821378340366


Ephoc #2:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.8406821378340366 to 0.8960970464135021


Ephoc #3:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.8960970464135021 to 0.9310829817158931


Ephoc #4:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9310829817158931 to 0.960196905766526


Ephoc #5:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.960196905766526 to 0.975351617440225


Ephoc #6:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.975351617440225 to 0.9824542897327707


Ephoc #7:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9824542897327707 to 0.989803094233474


Ephoc #8:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.989803094233474 to 0.990506329113924


Ephoc #9:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.990506329113924 to 0.9963783403656822


Ephoc #10:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9963783403656822 to 0.9978551336146273
best-f1-accuracy-score is 0.9978551336146273
----------------------------------------------------------
Train model using:
  1)hidden_size = 500
  2)n_layers = 3
  3)directions = 2


Ephoc #1:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0 to 0.8859353023909986


Ephoc #2:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.8859353023909986 to 0.9355485232067511


Ephoc #3:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9355485232067511 to 0.9677215189873418


Ephoc #4:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9677215189873418 to 0.9855133614627285


Ephoc #5:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9855133614627285 to 0.9915611814345991


Ephoc #6:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9915611814345991 to 0.9966244725738397


Ephoc #7:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score did not improve from 0.9966244725738397


Ephoc #8:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score did not improve from 0.9966244725738397


Ephoc #9:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score did not improve from 0.9966244725738397


Ephoc #10:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9966244725738397 to 0.9986638537271448
best-f1-accuracy-score is 0.9986638537271448
----------------------------------------------------------
Train model using:
  1)hidden_size = 800
  2)n_layers = 1
  3)directions = 2


Ephoc #1:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0 to 0.8794303797468355


Ephoc #2:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.8794303797468355 to 0.9215541490857947


Ephoc #3:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9215541490857947 to 0.9528481012658228


Ephoc #4:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9528481012658228 to 0.974507735583685


Ephoc #5:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.974507735583685 to 0.9879043600562588


Ephoc #6:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9879043600562588 to 0.9917369901547117


Ephoc #7:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score did not improve from 0.9917369901547117


Ephoc #8:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9917369901547117 to 0.9946202531645569


Ephoc #9:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9946202531645569 to 0.9990857946554149


Ephoc #10:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9990857946554149 to 0.9996835443037975
best-f1-accuracy-score is 0.9996835443037975
----------------------------------------------------------
Train model using:
  1)hidden_size = 800
  2)n_layers = 2
  3)directions = 2


Ephoc #1:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0 to 0.8982067510548524


Ephoc #2:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.8982067510548524 to 0.9426160337552743


Ephoc #3:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9426160337552743 to 0.9714838255977496


Ephoc #4:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9714838255977496 to 0.9868846694796062


Ephoc #5:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9868846694796062 to 0.9930731364275668


Ephoc #6:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9930731364275668 to 0.9933895921237693


Ephoc #7:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9933895921237693 to 0.9979957805907173


Ephoc #8:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score did not improve from 0.9979957805907173


Ephoc #9:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score did not improve from 0.9979957805907173


Ephoc #10:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score did not improve from 0.9979957805907173
best-f1-accuracy-score is 0.9979957805907173
----------------------------------------------------------
Train model using:
  1)hidden_size = 800
  2)n_layers = 3
  3)directions = 2


Ephoc #1:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0 to 0.8868495077355837


Ephoc #2:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.8868495077355837 to 0.9279535864978903


Ephoc #3:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9279535864978903 to 0.9626582278481013


Ephoc #4:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9626582278481013 to 0.9755274261603376


Ephoc #5:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9755274261603376 to 0.9882559774964839


Ephoc #6:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9882559774964839 to 0.9939873417721519


Ephoc #7:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9939873417721519 to 0.9946554149085794


Ephoc #8:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9946554149085794 to 0.995182841068917


Ephoc #9:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score did not improve from 0.995182841068917


Ephoc #10:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.995182841068917 to 0.9954992967651195
best-f1-accuracy-score is 0.9954992967651195


## evaluate 


In [None]:
for i, model in enumerate(model_list):
    model_name = "model_"+str(i)
    evaluate(model, model_name, test_sequences, dev_sequences)

****************    Results for model_0    ****************
Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.915244,0.969638,0.941656,3096.0
B-PER,0.688776,0.675,0.681818,200.0
I-PER,0.849558,0.611465,0.711111,157.0
B-LOC,0.802548,0.688525,0.741176,183.0
I-LOC,0.833333,0.434783,0.571429,23.0
B-ORG,0.65942,0.541667,0.594771,168.0
I-ORG,0.808511,0.327586,0.466258,116.0
accuracy,0.887142,0.887142,0.887142,0.887142
macro avg,0.793913,0.606952,0.672603,3943.0
weighted avg,0.881393,0.887142,0.879067,3943.0


Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.922475,0.969392,0.945352,6567.0
B-PER,0.712082,0.638249,0.673147,434.0
I-PER,0.869565,0.675676,0.760456,296.0
B-LOC,0.797342,0.699708,0.745342,343.0
I-LOC,0.935484,0.54717,0.690476,53.0
B-ORG,0.66323,0.551429,0.602184,350.0
I-ORG,0.62,0.31,0.413333,200.0
accuracy,0.893728,0.893728,0.893728,0.893728
macro avg,0.788597,0.627375,0.690041,8243.0
weighted avg,0.886028,0.893728,0.88694,8243.0


Binary Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.912887,0.968454,0.93985,380877.0
OTHERS,0.867183,0.690278,0.768684,113647.0
accuracy,0.904526,0.904526,0.904526,0.904526
macro avg,0.890035,0.829366,0.854267,494524.0
weighted avg,0.902384,0.904526,0.900514,494524.0


Binary Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.919785,0.967244,0.942917,1616269.0
OTHERS,0.844552,0.678431,0.752431,423980.0
accuracy,0.907226,0.907226,0.907226,0.907226
macro avg,0.882169,0.822837,0.847674,2040249.0
weighted avg,0.904151,0.907226,0.903333,2040249.0


****************    Results for model_1    ****************
Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.926606,0.978682,0.951932,3096.0
B-PER,0.762431,0.69,0.724409,200.0
I-PER,0.908333,0.694268,0.787004,157.0
B-LOC,0.841772,0.726776,0.780059,183.0
I-LOC,1.0,0.347826,0.516129,23.0
B-ORG,0.672956,0.636905,0.654434,168.0
I-ORG,0.808511,0.327586,0.466258,116.0
accuracy,0.903627,0.903627,0.903627,0.903627
macro avg,0.845801,0.628863,0.697175,3943.0
weighted avg,0.89976,0.903627,0.896342,3943.0


Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.93046,0.969849,0.949746,6567.0
B-PER,0.764398,0.672811,0.715686,434.0
I-PER,0.871245,0.685811,0.767486,296.0
B-LOC,0.825243,0.74344,0.782209,343.0
I-LOC,0.969697,0.603774,0.744186,53.0
B-ORG,0.646377,0.637143,0.641727,350.0
I-ORG,0.697917,0.335,0.452703,200.0
accuracy,0.902705,0.902705,0.902705,0.902705
macro avg,0.815048,0.663975,0.721963,8243.0
weighted avg,0.89776,0.902705,0.897447,8243.0


Binary Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.925706,0.979587,0.951884,380877.0
OTHERS,0.915007,0.736517,0.816117,113647.0
accuracy,0.923727,0.923727,0.923727,0.923727
macro avg,0.920356,0.858052,0.884001,494524.0
weighted avg,0.923247,0.923727,0.920684,494524.0


Binary Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.92809,0.971723,0.949405,1616269.0
OTHERS,0.868664,0.712982,0.783161,423980.0
accuracy,0.917954,0.917954,0.917954,0.9179541
macro avg,0.898377,0.842352,0.866283,2040249.0
weighted avg,0.915741,0.917954,0.914858,2040249.0


****************    Results for model_2    ****************
Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.926543,0.969638,0.947601,3096.0
B-PER,0.743169,0.68,0.710183,200.0
I-PER,0.833333,0.700637,0.761246,157.0
B-LOC,0.828221,0.737705,0.780347,183.0
I-LOC,0.684211,0.565217,0.619048,23.0
B-ORG,0.614379,0.559524,0.58567,168.0
I-ORG,0.792453,0.362069,0.497041,116.0
accuracy,0.895765,0.895765,0.895765,0.895765
macro avg,0.774616,0.653541,0.700162,3943.0
weighted avg,0.890309,0.895765,0.889783,3943.0


Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.935985,0.964063,0.949816,6567.0
B-PER,0.7525,0.693548,0.721823,434.0
I-PER,0.754448,0.716216,0.734835,296.0
B-LOC,0.829032,0.749271,0.787136,343.0
I-LOC,0.72093,0.584906,0.645833,53.0
B-ORG,0.634615,0.565714,0.598187,350.0
I-ORG,0.571429,0.38,0.456456,200.0
accuracy,0.898459,0.898459,0.898459,0.898459
macro avg,0.742706,0.664817,0.699155,8243.0
weighted avg,0.892331,0.898459,0.894468,8243.0


Binary Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.926208,0.971174,0.948159,380877.0
OTHERS,0.884621,0.740688,0.806282,113647.0
accuracy,0.918206,0.918206,0.918206,0.918206
macro avg,0.905415,0.855931,0.87722,494524.0
weighted avg,0.916651,0.918206,0.915554,494524.0


Binary Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.934438,0.965651,0.949788,1616269.0
OTHERS,0.849951,0.741721,0.792156,423980.0
accuracy,0.919117,0.919117,0.919117,0.9191167
macro avg,0.892195,0.853686,0.870972,2040249.0
weighted avg,0.916881,0.919117,0.917031,2040249.0


****************    Results for model_3    ****************
Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.955433,0.948643,0.952026,3096.0
B-PER,0.797927,0.77,0.783715,200.0
I-PER,0.836601,0.815287,0.825806,157.0
B-LOC,0.62,0.846995,0.715935,183.0
I-LOC,0.5,0.565217,0.530612,23.0
B-ORG,0.690909,0.678571,0.684685,168.0
I-ORG,0.658537,0.465517,0.545455,116.0
accuracy,0.901598,0.901598,0.901598,0.901598
macro avg,0.722772,0.727176,0.719748,3943.0
weighted avg,0.904482,0.901598,0.901696,3943.0


Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.955497,0.944876,0.950157,6567.0
B-PER,0.758373,0.730415,0.744131,434.0
I-PER,0.730408,0.787162,0.757724,296.0
B-LOC,0.600806,0.868805,0.710369,343.0
I-LOC,0.603448,0.660377,0.630631,53.0
B-ORG,0.693215,0.671429,0.682148,350.0
I-ORG,0.722689,0.43,0.539185,200.0
accuracy,0.898823,0.898823,0.898823,0.898823
macro avg,0.723491,0.72758,0.716335,8243.0
weighted avg,0.903228,0.898823,0.899016,8243.0


Binary Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.953865,0.95393,0.953897,380877.0
OTHERS,0.845566,0.845372,0.845469,113647.0
accuracy,0.928982,0.928982,0.928982,0.928982
macro avg,0.899715,0.899651,0.899683,494524.0
weighted avg,0.928977,0.928982,0.928979,494524.0


Binary Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.953997,0.945675,0.949818,1616269.0
OTHERS,0.79957,0.826159,0.812647,423980.0
accuracy,0.920839,0.920839,0.920839,0.9208386
macro avg,0.876784,0.885917,0.881232,2040249.0
weighted avg,0.921906,0.920839,0.921313,2040249.0


****************    Results for model_4    ****************
Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.942758,0.962855,0.952701,3096.0
B-PER,0.676923,0.66,0.668354,200.0
I-PER,0.839416,0.732484,0.782313,157.0
B-LOC,0.719388,0.770492,0.744063,183.0
I-LOC,0.666667,0.434783,0.526316,23.0
B-ORG,0.620482,0.613095,0.616766,168.0
I-ORG,0.736111,0.456897,0.56383,116.0
accuracy,0.896525,0.896525,0.896525,0.896525
macro avg,0.743106,0.661515,0.693478,3943.0
weighted avg,0.893371,0.896525,0.89357,3943.0


Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.940773,0.960256,0.950414,6567.0
B-PER,0.733656,0.698157,0.715466,434.0
I-PER,0.806324,0.689189,0.743169,296.0
B-LOC,0.748588,0.772595,0.760402,343.0
I-LOC,0.681818,0.566038,0.618557,53.0
B-ORG,0.62963,0.582857,0.605341,350.0
I-ORG,0.552632,0.42,0.477273,200.0
accuracy,0.897246,0.897246,0.897246,0.897246
macro avg,0.727631,0.66987,0.695803,8243.0
weighted avg,0.892749,0.897246,0.89443,8243.0


Binary Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.940702,0.963841,0.952131,380877.0
OTHERS,0.86793,0.796378,0.830616,113647.0
accuracy,0.925357,0.925357,0.925357,0.925357
macro avg,0.904316,0.88011,0.891373,494524.0
weighted avg,0.923978,0.925357,0.924205,494524.0


Binary Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.938825,0.958899,0.948755,1616269.0
OTHERS,0.829411,0.761805,0.794172,423980.0
accuracy,0.917941,0.917941,0.917941,0.9179409
macro avg,0.884118,0.860352,0.871464,2040249.0
weighted avg,0.916088,0.917941,0.916632,2040249.0


****************    Results for model_5    ****************
Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.940902,0.977067,0.958644,3096.0
B-PER,0.787234,0.74,0.762887,200.0
I-PER,0.843284,0.719745,0.776632,157.0
B-LOC,0.866667,0.781421,0.821839,183.0
I-LOC,0.75,0.391304,0.514286,23.0
B-ORG,0.732919,0.702381,0.717325,168.0
I-ORG,0.794118,0.465517,0.586957,116.0
accuracy,0.915547,0.915547,0.915547,0.915547
macro avg,0.816446,0.682491,0.734081,3943.0
weighted avg,0.911482,0.915547,0.911309,3943.0


Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.947221,0.975636,0.961218,6567.0
B-PER,0.83558,0.714286,0.770186,434.0
I-PER,0.90535,0.743243,0.816327,296.0
B-LOC,0.804154,0.790087,0.797059,343.0
I-LOC,0.9,0.509434,0.650602,53.0
B-ORG,0.661932,0.665714,0.663818,350.0
I-ORG,0.636986,0.465,0.537572,200.0
accuracy,0.917263,0.917263,0.917263,0.917263
macro avg,0.813032,0.694771,0.742397,8243.0
weighted avg,0.913942,0.917263,0.914223,8243.0


Binary Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.934987,0.978182,0.956097,380877.0
OTHERS,0.913483,0.772049,0.836832,113647.0
accuracy,0.93081,0.93081,0.93081,0.93081
macro avg,0.924235,0.875115,0.896464,494524.0
weighted avg,0.930045,0.93081,0.928688,494524.0


Binary Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.94662,0.978207,0.962155,1616269.0
OTHERS,0.904815,0.789719,0.843358,423980.0
accuracy,0.939038,0.939038,0.939038,0.9390378
macro avg,0.925718,0.883963,0.902756,2040249.0
weighted avg,0.937933,0.939038,0.937468,2040249.0


****************    Results for model_6    ****************
Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.92956,0.976098,0.952261,3096.0
B-PER,0.833333,0.7,0.76087,200.0
I-PER,0.871212,0.732484,0.795848,157.0
B-LOC,0.842767,0.73224,0.783626,183.0
I-LOC,1.0,0.434783,0.606061,23.0
B-ORG,0.652174,0.625,0.638298,168.0
I-ORG,0.790323,0.422414,0.550562,116.0
accuracy,0.90667,0.90667,0.90667,0.90667
macro avg,0.845624,0.660431,0.726789,3943.0
weighted avg,0.902824,0.90667,0.901284,3943.0


Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.929779,0.971829,0.950339,6567.0
B-PER,0.788462,0.66129,0.719298,434.0
I-PER,0.840467,0.72973,0.781193,296.0
B-LOC,0.791531,0.708455,0.747692,343.0
I-LOC,1.0,0.528302,0.691358,53.0
B-ORG,0.652038,0.594286,0.621824,350.0
I-ORG,0.653846,0.34,0.447368,200.0
accuracy,0.901613,0.901613,0.901613,0.901613
macro avg,0.808017,0.647699,0.708439,8243.0
weighted avg,0.895342,0.901613,0.895851,8243.0


Binary Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.927005,0.976213,0.950973,380877.0
OTHERS,0.903028,0.742378,0.81486,113647.0
accuracy,0.922475,0.922475,0.922475,0.922475
macro avg,0.915016,0.859295,0.882916,494524.0
weighted avg,0.921495,0.922475,0.919693,494524.0


Binary Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.928032,0.973942,0.950433,1616269.0
OTHERS,0.877578,0.712076,0.786211,423980.0
accuracy,0.919525,0.919525,0.919525,0.9195245
macro avg,0.902805,0.843009,0.868322,2040249.0
weighted avg,0.917547,0.919525,0.916307,2040249.0


****************    Results for model_7    ****************
Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.933887,0.980943,0.956837,3096.0
B-PER,0.810056,0.725,0.765172,200.0
I-PER,0.866197,0.783439,0.822742,157.0
B-LOC,0.828025,0.710383,0.764706,183.0
I-LOC,0.692308,0.391304,0.5,23.0
B-ORG,0.771429,0.642857,0.701299,168.0
I-ORG,0.85,0.439655,0.579545,116.0
accuracy,0.913771,0.913771,0.913771,0.913771
macro avg,0.8217,0.667655,0.727186,3943.0
weighted avg,0.909198,0.913771,0.908207,3943.0


Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.935554,0.97929,0.956923,6567.0
B-PER,0.823684,0.721198,0.769042,434.0
I-PER,0.874016,0.75,0.807273,296.0
B-LOC,0.831715,0.749271,0.788344,343.0
I-LOC,0.878788,0.54717,0.674419,53.0
B-ORG,0.782946,0.577143,0.664474,350.0
I-ORG,0.725926,0.49,0.585075,200.0
accuracy,0.916171,0.916171,0.916171,0.916171
macro avg,0.83609,0.687725,0.749364,8243.0
weighted avg,0.911203,0.916171,0.911386,8243.0


Binary Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.933769,0.981382,0.956984,380877.0
OTHERS,0.924745,0.766716,0.838348,113647.0
accuracy,0.93205,0.93205,0.93205,0.93205
macro avg,0.929257,0.874049,0.897666,494524.0
weighted avg,0.931695,0.93205,0.92972,494524.0


Binary Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.936302,0.979376,0.957355,1616269.0
OTHERS,0.904658,0.746002,0.817705,423980.0
accuracy,0.930879,0.930879,0.930879,0.930879
macro avg,0.92048,0.862689,0.88753,2040249.0
weighted avg,0.929726,0.930879,0.928334,2040249.0


****************    Results for model_8    ****************
Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.951289,0.965116,0.958153,3096.0
B-PER,0.694064,0.76,0.725537,200.0
I-PER,0.796053,0.770701,0.783172,157.0
B-LOC,0.816667,0.803279,0.809917,183.0
I-LOC,0.764706,0.565217,0.65,23.0
B-ORG,0.689441,0.660714,0.674772,168.0
I-ORG,0.726027,0.456897,0.560847,116.0
accuracy,0.909206,0.909206,0.909206,0.909206
macro avg,0.776892,0.711703,0.737485,3943.0
weighted avg,0.906941,0.909206,0.906947,3943.0


Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.954566,0.956601,0.955583,6567.0
B-PER,0.714286,0.771889,0.741971,434.0
I-PER,0.736842,0.804054,0.768982,296.0
B-LOC,0.749304,0.784257,0.766382,343.0
I-LOC,0.75,0.622642,0.680412,53.0
B-ORG,0.682927,0.64,0.660767,350.0
I-ORG,0.647482,0.45,0.530973,200.0
accuracy,0.906345,0.906345,0.906345,0.906345
macro avg,0.747915,0.718492,0.729296,8243.0
weighted avg,0.905256,0.906345,0.905173,8243.0


Binary Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.947214,0.966102,0.956565,380877.0
OTHERS,0.878258,0.819564,0.847896,113647.0
accuracy,0.932426,0.932426,0.932426,0.932426
macro avg,0.912736,0.892833,0.902231,494524.0
weighted avg,0.931367,0.932426,0.931591,494524.0


Binary Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.953832,0.956756,0.955292,1616269.0
OTHERS,0.833199,0.823463,0.828303,423980.0
accuracy,0.929057,0.929057,0.929057,0.9290567
macro avg,0.893516,0.89011,0.891797,2040249.0
weighted avg,0.928764,0.929057,0.928903,2040249.0


**Task 6:** Download the GloVe embeddings from https://nlp.stanford.edu/projects/glove/ (use the 300-dim vectors from glove.6B.zip). Then intialize the nn.Embedding module in your NERNet with these embeddings, so that you can start your training with pre-trained vectors. Repeat Task 6 and print the results for each model.

Note: make sure that vectors are aligned with the IDs in your Vocab, in other words, make sure that for example the word with ID 0 is the first vector in the GloVe matrix of vectors that you initialize nn.Embedding with. For a dicussion on how to do that, check it this link:
https://discuss.pytorch.org/t/can-we-use-pre-trained-word-embeddings-for-weight-initialization-in-nn-embedding/1222

## get glove data set

In [None]:

GLOVE_PATH = 'glove.6B.300d.txt'

!wget http://nlp.stanford.edu/data/glove.6B.zip
!unzip glove.6B.zip

--2022-07-05 13:45:46--  http://nlp.stanford.edu/data/glove.6B.zip
Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://nlp.stanford.edu/data/glove.6B.zip [following]
--2022-07-05 13:45:46--  https://nlp.stanford.edu/data/glove.6B.zip
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://downloads.cs.stanford.edu/nlp/data/glove.6B.zip [following]
--2022-07-05 13:45:46--  https://downloads.cs.stanford.edu/nlp/data/glove.6B.zip
Resolving downloads.cs.stanford.edu (downloads.cs.stanford.edu)... 171.64.64.22
Connecting to downloads.cs.stanford.edu (downloads.cs.stanford.edu)|171.64.64.22|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 862182613 (822M) [application/zip]
Saving to: ‘glove.6B.zip’


202

# get embding weights

In [None]:
def get_glove_pre_trained_embeddings_weights(input_size, embedding_size, word2id = vocab.word2id):
    # generate zeros initilized embeding mask 
    weights = np.zeros((input_size, embedding_size))

    # parse embeding
    with open(GLOVE_PATH) as glove:
        for line in glove.readlines():
            split = line.split()
            word = split[0]
            word_id = word2id.get(word)

        if word_id:
            weights[word_id] = split[1:]
    tensor = torch.from_numpy(weights).float()
    return tensor

## define Glove net

In [None]:
class GloveNERNet(nn.Module):
    
    def __init__(self, input_size, embedding_size, hidden_size, output_size, n_layers, directions):
        super(GloveNERNet, self).__init__()
        
        # add first layer - the embding layer
        self.embedding = nn.Embedding(input_size, embedding_size)

        # get trained embding weights from the data we loaded
        pre_trained_weights = get_glove_pre_trained_embeddings_weights(input_size, embedding_size)
        
        # load embding weights 
        self.embedding.weight = nn.Parameter(pre_trained_weights)

        # add LSTM layer
        self.lstm = nn.LSTM(embedding_size, hidden_size, n_layers, bidirectional=(True if directions==2 else False))
        
        # Add FC layer
        self.out = nn.Linear(hidden_size*directions, output_size)
    
        self.hidden_size = hidden_size
        self.n_layers = n_layers
        self.directions = directions

    def forward(self, input_sentence):        
        # get sentence token numbers to understand output & input size
        dimension = len(input_sentence)
        
        # initial the hidden to None because none sentence inter
        hidden = None

        # 1. foward input sentence into the embeding
        embedded = self.embedding(input_sentence)

        # 2. foward embedding to LSTM
        lstm_output, _ = self.lstm(embedded.view(dimension, 1, -1), hidden) # The view function is meant to reshape the tensor https://stackoverflow.com/a/48650355/7786691

        # 3. foward to get predictions  - linear transformation to the incoming data
        output = self.out(lstm_output.view(dimension, -1)) 

        return output

## train model

In [None]:
EMBEDDING_SIZE = 300
EPOCHS = 10
HIDDEN_SIZE  = 500 
INPUT_SIZE = len(vocab.word2id) # 8955
OUTPUT_SIZE = len(vocab.tag2id) # 7

n_layers_array = np.arange(1,4)
directions_array = np.arange(1,3)

# n_layers_array = np.arange(1,2)
# directions_array = np.arange(1,2)
model_list  = []
train_res_list = [] 
for i_n_layers in n_layers_array:
    for i_directions in directions_array:
        print('----------------------------------------------------------')
        print('Train model using:\n' + \
              '  1)hidden_size = ' + str(HIDDEN_SIZE)+'\n'+ \
              '  2)n_layers = ' + str(i_n_layers) + '\n'+ \
              '  3)directions = ' + str(i_directions) , flush = True)
        model = GloveNERNet(INPUT_SIZE, EMBEDDING_SIZE, HIDDEN_SIZE, OUTPUT_SIZE, 1, 1).cuda()
        train_res = train_loop(model, EPOCHS, train_sequences)   
        model_list.append(model)
        train_res_list.append(train_res)

DIRECTION = 2
HIDDEN_SIZE = 800
for i_n_layers in n_layers_array:
        print('----------------------------------------------------------')

        print('Train model using:\n'+ \
              '  1)hidden_size = ' + str(HIDDEN_SIZE)+'\n'+ \
              '  2)n_layers = ' + str(i_n_layers) + '\n'+ \
              '  3)directions = ' + str(i_directions) , flush = True )
        model = GloveNERNet(INPUT_SIZE, EMBEDDING_SIZE, HIDDEN_SIZE, OUTPUT_SIZE, 1, 1).cuda()
        train_res = train_loop(model, EPOCHS, train_sequences)   

       
        model_list.append(model)

        

----------------------------------------------------------
Train model using:
  1)hidden_size = 500
  2)n_layers = 1
  3)directions = 1


Ephoc #1:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0 to 0.8487341772151898


Ephoc #2:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.8487341772151898 to 0.9179676511954993


Ephoc #3:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9179676511954993 to 0.9536568213783404


Ephoc #4:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9536568213783404 to 0.9710970464135021


Ephoc #5:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9710970464135021 to 0.9763361462728551


Ephoc #6:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9763361462728551 to 0.9828410689170183


Ephoc #7:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9828410689170183 to 0.9861111111111112


Ephoc #8:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9861111111111112 to 0.989240506329114


Ephoc #9:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.989240506329114 to 0.9930379746835443


Ephoc #10:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9930379746835443 to 0.9935302390998594
best-f1-accuracy-score is 0.9935302390998594
----------------------------------------------------------
Train model using:
  1)hidden_size = 500
  2)n_layers = 1
  3)directions = 2


Ephoc #1:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0 to 0.8450070323488045


Ephoc #2:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.8450070323488045 to 0.915154711673699


Ephoc #3:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.915154711673699 to 0.9447960618846695


Ephoc #4:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9447960618846695 to 0.9593178621659635


Ephoc #5:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9593178621659635 to 0.9739803094233473


Ephoc #6:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9739803094233473 to 0.9784458509142053


Ephoc #7:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9784458509142053 to 0.9833684950773558


Ephoc #8:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9833684950773558 to 0.9876230661040788


Ephoc #9:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9876230661040788 to 0.9920534458509142


Ephoc #10:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9920534458509142 to 0.9943389592123769
best-f1-accuracy-score is 0.9943389592123769
----------------------------------------------------------
Train model using:
  1)hidden_size = 500
  2)n_layers = 2
  3)directions = 1


Ephoc #1:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0 to 0.854817158931083


Ephoc #2:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.854817158931083 to 0.9231715893108298


Ephoc #3:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9231715893108298 to 0.9577355836849508


Ephoc #4:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9577355836849508 to 0.9737341772151898


Ephoc #5:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9737341772151898 to 0.9783052039381154


Ephoc #6:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9783052039381154 to 0.9806610407876231


Ephoc #7:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9806610407876231 to 0.9856891701828411


Ephoc #8:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9856891701828411 to 0.9898734177215189


Ephoc #9:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9898734177215189 to 0.9932841068917019


Ephoc #10:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9932841068917019 to 0.9941631504922644
best-f1-accuracy-score is 0.9941631504922644
----------------------------------------------------------
Train model using:
  1)hidden_size = 500
  2)n_layers = 2
  3)directions = 2


Ephoc #1:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0 to 0.8552742616033755


Ephoc #2:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.8552742616033755 to 0.9124120956399437


Ephoc #3:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9124120956399437 to 0.945675105485232


Ephoc #4:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.945675105485232 to 0.9583684950773559


Ephoc #5:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9583684950773559 to 0.9690928270042194


Ephoc #6:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9690928270042194 to 0.9723277074542898


Ephoc #7:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9723277074542898 to 0.9789029535864979


Ephoc #8:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9789029535864979 to 0.9810829817158931


Ephoc #9:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9810829817158931 to 0.9871308016877637


Ephoc #10:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9871308016877637 to 0.989662447257384
best-f1-accuracy-score is 0.989662447257384
----------------------------------------------------------
Train model using:
  1)hidden_size = 500
  2)n_layers = 3
  3)directions = 1


Ephoc #1:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0 to 0.8489803094233473


Ephoc #2:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.8489803094233473 to 0.9130801687763713


Ephoc #3:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9130801687763713 to 0.9543600562587904


Ephoc #4:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9543600562587904 to 0.9686357243319269


Ephoc #5:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9686357243319269 to 0.9776722925457103


Ephoc #6:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9776722925457103 to 0.9792194092827005


Ephoc #7:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9792194092827005 to 0.9845288326300985


Ephoc #8:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9845288326300985 to 0.9909634317862166


Ephoc #9:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9909634317862166 to 0.9926511954992968


Ephoc #10:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9926511954992968 to 0.9937412095639944
best-f1-accuracy-score is 0.9937412095639944
----------------------------------------------------------
Train model using:
  1)hidden_size = 500
  2)n_layers = 3
  3)directions = 2


Ephoc #1:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0 to 0.8494374120956399


Ephoc #2:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.8494374120956399 to 0.914803094233474


Ephoc #3:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.914803094233474 to 0.9508790436005626


Ephoc #4:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9508790436005626 to 0.9712376933895921


Ephoc #5:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9712376933895921 to 0.9797116736990155


Ephoc #6:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9797116736990155 to 0.9831223628691983


Ephoc #7:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9831223628691983 to 0.9880450070323488


Ephoc #8:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9880450070323488 to 0.9921589310829817


Ephoc #9:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9921589310829817 to 0.9930028129395218


Ephoc #10:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9930028129395218 to 0.9947257383966245
best-f1-accuracy-score is 0.9947257383966245
----------------------------------------------------------
Train model using:
  1)hidden_size = 800
  2)n_layers = 1
  3)directions = 2


Ephoc #1:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0 to 0.8651547116736991


Ephoc #2:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.8651547116736991 to 0.9253164556962026


Ephoc #3:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9253164556962026 to 0.9527777777777777


Ephoc #4:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9527777777777777 to 0.9657524613220816


Ephoc #5:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9657524613220816 to 0.9728551336146273


Ephoc #6:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9728551336146273 to 0.9812587904360056


Ephoc #7:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9812587904360056 to 0.9867791842475386


Ephoc #8:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9867791842475386 to 0.9903305203938115


Ephoc #9:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9903305203938115 to 0.9922292545710267


Ephoc #10:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9922292545710267 to 0.9932137834036568
best-f1-accuracy-score is 0.9932137834036568
----------------------------------------------------------
Train model using:
  1)hidden_size = 800
  2)n_layers = 2
  3)directions = 2


Ephoc #1:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0 to 0.8614275668073137


Ephoc #2:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.8614275668073137 to 0.924085794655415


Ephoc #3:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.924085794655415 to 0.954254571026723


Ephoc #4:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.954254571026723 to 0.9666315049226442


Ephoc #5:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9666315049226442 to 0.9776019690576653


Ephoc #6:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9776019690576653 to 0.979817158931083


Ephoc #7:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.979817158931083 to 0.984212376933896


Ephoc #8:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.984212376933896 to 0.9885372714486639


Ephoc #9:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9885372714486639 to 0.9892053445850915


Ephoc #10:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9892053445850915 to 0.9923347398030943
best-f1-accuracy-score is 0.9923347398030943
----------------------------------------------------------
Train model using:
  1)hidden_size = 800
  2)n_layers = 3
  3)directions = 2


Ephoc #1:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0 to 0.865084388185654


Ephoc #2:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.865084388185654 to 0.9253867791842475


Ephoc #3:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9253867791842475 to 0.9620956399437413


Ephoc #4:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9620956399437413 to 0.9744725738396625


Ephoc #5:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9744725738396625 to 0.9788677918424754


Ephoc #6:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9788677918424754 to 0.9823488045007033


Ephoc #7:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9823488045007033 to 0.985196905766526


Ephoc #8:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.985196905766526 to 0.9896976090014065


Ephoc #9:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score improve from 0.9896976090014065 to 0.9911040787623067


Ephoc #10:   0%|          | 0/1750 [00:00<?, ?it/s]

f1-accuracy-score did not improve from 0.9911040787623067
best-f1-accuracy-score is 0.9911040787623067


## evaluate 


In [None]:
for i, model in enumerate(model_list):
    model_name = "model_"+str(i)
    evaluate(model, model_name, test_sequences, dev_sequences)

****************    Results for model_0    ****************
Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.975953,0.917636,0.945896,3096.0
B-PER,0.909091,0.6,0.722892,200.0
I-PER,0.989474,0.598726,0.746032,157.0
B-LOC,0.740741,0.765027,0.752688,183.0
I-LOC,0.410256,0.695652,0.516129,23.0
B-ORG,0.378613,0.779762,0.509728,168.0
I-ORG,0.350649,0.698276,0.466859,116.0
accuracy,0.868121,0.868121,0.868121,0.868121
macro avg,0.679254,0.722154,0.665746,3943.0
weighted avg,0.915037,0.868121,0.882476,3943.0


Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.984904,0.904066,0.942755,6567.0
B-PER,0.946996,0.617512,0.747559,434.0
I-PER,0.907317,0.628378,0.742515,296.0
B-LOC,0.75,0.778426,0.763948,343.0
I-LOC,0.467532,0.679245,0.553846,53.0
B-ORG,0.374834,0.808571,0.512217,350.0
I-ORG,0.300557,0.81,0.43843,200.0
accuracy,0.866068,0.866068,0.866068,0.866068
macro avg,0.67602,0.7466,0.67161,8243.0
weighted avg,0.924513,0.866068,0.884829,8243.0


Binary Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.974436,0.915702,0.944157,380877.0
OTHERS,0.764963,0.919488,0.835138,113647.0
accuracy,0.916572,0.916572,0.916572,0.916572
macro avg,0.869699,0.917595,0.889647,494524.0
weighted avg,0.926297,0.916572,0.919103,494524.0


Binary Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.984552,0.90418,0.942656,1616269.0
OTHERS,0.721415,0.945917,0.818552,423980.0
accuracy,0.912853,0.912853,0.912853,0.9128533
macro avg,0.852984,0.925049,0.880604,2040249.0
weighted avg,0.92987,0.912853,0.916866,2040249.0


****************    Results for model_1    ****************
Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.983728,0.898256,0.939051,3096.0
B-PER,0.848485,0.7,0.767123,200.0
I-PER,0.880952,0.707006,0.784452,157.0
B-LOC,0.922581,0.781421,0.846154,183.0
I-LOC,0.705882,0.521739,0.6,23.0
B-ORG,0.337469,0.809524,0.476357,168.0
I-ORG,0.36,0.775862,0.491803,116.0
accuracy,0.865585,0.865585,0.865585,0.865585
macro avg,0.719871,0.741973,0.700706,3943.0
weighted avg,0.922433,0.865585,0.885014,3943.0


Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.988961,0.886706,0.935046,6567.0
B-PER,0.864407,0.705069,0.77665,434.0
I-PER,0.857724,0.712838,0.778598,296.0
B-LOC,0.915129,0.723032,0.807818,343.0
I-LOC,0.695652,0.603774,0.646465,53.0
B-ORG,0.346651,0.842857,0.491257,350.0
I-ORG,0.282794,0.83,0.421855,200.0
accuracy,0.859032,0.859032,0.859032,0.859032
macro avg,0.707331,0.757754,0.693955,8243.0
weighted avg,0.928326,0.859032,0.882644,8243.0


Binary Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.984381,0.89652,0.938399,380877.0
OTHERS,0.73305,0.952326,0.828424,113647.0
accuracy,0.909345,0.909345,0.909345,0.909345
macro avg,0.858716,0.924423,0.883411,494524.0
weighted avg,0.926622,0.909345,0.913125,494524.0


Binary Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.988923,0.88714,0.93527,1616269.0
OTHERS,0.691,0.962118,0.804327,423980.0
accuracy,0.902721,0.902721,0.902721,0.9027212
macro avg,0.839962,0.924629,0.869799,2040249.0
weighted avg,0.927012,0.902721,0.908059,2040249.0


****************    Results for model_2    ****************
Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.98279,0.885336,0.931521,3096.0
B-PER,0.909091,0.6,0.722892,200.0
I-PER,0.968421,0.585987,0.730159,157.0
B-LOC,0.809524,0.743169,0.774929,183.0
I-LOC,0.55,0.478261,0.511628,23.0
B-ORG,0.300847,0.845238,0.44375,168.0
I-ORG,0.329588,0.758621,0.45953,116.0
accuracy,0.844535,0.844535,0.844535,0.844535
macro avg,0.692894,0.699516,0.653487,3943.0
weighted avg,0.919641,0.844535,0.868536,3943.0


Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.989757,0.868129,0.924961,6567.0
B-PER,0.944637,0.629032,0.755187,434.0
I-PER,0.911765,0.628378,0.744,296.0
B-LOC,0.822981,0.772595,0.796992,343.0
I-LOC,0.64,0.603774,0.621359,53.0
B-ORG,0.295229,0.848571,0.438053,350.0
I-ORG,0.272876,0.835,0.41133,200.0
accuracy,0.839621,0.839621,0.839621,0.839621
macro avg,0.696749,0.740783,0.670269,8243.0
weighted avg,0.928509,0.839621,0.869111,8243.0


Binary Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.982273,0.882088,0.929489,380877.0
OTHERS,0.705497,0.946651,0.808474,113647.0
accuracy,0.896925,0.896925,0.896925,0.896925
macro avg,0.843885,0.914369,0.868981,494524.0
weighted avg,0.918667,0.896925,0.901678,494524.0


Binary Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.990067,0.868535,0.925328,1616269.0
OTHERS,0.658597,0.966784,0.783473,423980.0
accuracy,0.888952,0.888952,0.888952,0.8889523
macro avg,0.824332,0.91766,0.854401,2040249.0
weighted avg,0.921185,0.888952,0.895849,2040249.0


****************    Results for model_3    ****************
Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.987739,0.88469,0.933379,3096.0
B-PER,0.822857,0.72,0.768,200.0
I-PER,0.915094,0.617834,0.737643,157.0
B-LOC,0.6625,0.868852,0.751773,183.0
I-LOC,0.441176,0.652174,0.526316,23.0
B-ORG,0.355191,0.77381,0.486891,168.0
I-ORG,0.349398,0.75,0.476712,116.0
accuracy,0.854933,0.854933,0.854933,0.854933
macro avg,0.647708,0.75248,0.668673,3943.0
weighted avg,0.91247,0.854933,0.873935,3943.0


Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.991331,0.870717,0.927118,6567.0
B-PER,0.816062,0.725806,0.768293,434.0
I-PER,0.829694,0.641892,0.72381,296.0
B-LOC,0.670762,0.795918,0.728,343.0
I-LOC,0.506329,0.754717,0.606061,53.0
B-ORG,0.337408,0.788571,0.472603,350.0
I-ORG,0.284173,0.79,0.417989,200.0
accuracy,0.845566,0.845566,0.845566,0.845566
macro avg,0.63368,0.766803,0.66341,8243.0
weighted avg,0.914918,0.845566,0.869453,8243.0


Binary Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.986338,0.88103,0.930715,380877.0
OTHERS,0.706355,0.959101,0.81355,113647.0
accuracy,0.898972,0.898972,0.898972,0.898972
macro avg,0.846346,0.920066,0.872132,494524.0
weighted avg,0.921995,0.898972,0.903789,494524.0


Binary Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.991681,0.870542,0.927171,1616269.0
OTHERS,0.663287,0.972161,0.788557,423980.0
accuracy,0.891659,0.891659,0.891659,0.8916593
macro avg,0.827484,0.921352,0.857864,2040249.0
weighted avg,0.923438,0.891659,0.898366,2040249.0


****************    Results for model_4    ****************
Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.987415,0.886951,0.93449,3096.0
B-PER,0.943548,0.585,0.722222,200.0
I-PER,0.94898,0.592357,0.729412,157.0
B-LOC,0.753846,0.803279,0.777778,183.0
I-LOC,0.413793,0.521739,0.461538,23.0
B-ORG,0.311111,0.833333,0.453074,168.0
I-ORG,0.334586,0.767241,0.465969,116.0
accuracy,0.848085,0.848085,0.848085,0.848085
macro avg,0.670468,0.712843,0.649212,3943.0
weighted avg,0.921452,0.848085,0.871231,3943.0


Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.991331,0.870717,0.927118,6567.0
B-PER,0.970588,0.608295,0.747875,434.0
I-PER,0.899038,0.631757,0.742063,296.0
B-LOC,0.748663,0.816327,0.781032,343.0
I-LOC,0.555556,0.660377,0.603448,53.0
B-ORG,0.298958,0.82,0.438168,350.0
I-ORG,0.284281,0.85,0.426065,200.0
accuracy,0.842048,0.842048,0.842048,0.842048
macro avg,0.678345,0.751068,0.666539,8243.0
weighted avg,0.927472,0.842048,0.869958,8243.0


Binary Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.98614,0.884125,0.93235,380877.0
OTHERS,0.711633,0.958353,0.816768,113647.0
accuracy,0.901184,0.901184,0.901184,0.901184
macro avg,0.848886,0.921239,0.874559,494524.0
weighted avg,0.923055,0.901184,0.905788,494524.0


Binary Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.991008,0.872608,0.928047,1616269.0
OTHERS,0.666333,0.969817,0.789929,423980.0
accuracy,0.892809,0.892809,0.892809,0.8928087
macro avg,0.828671,0.921212,0.858988,2040249.0
weighted avg,0.923538,0.892809,0.899345,2040249.0


****************    Results for model_5    ****************
Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.983711,0.897287,0.938514,3096.0
B-PER,0.944,0.59,0.726154,200.0
I-PER,0.968421,0.585987,0.730159,157.0
B-LOC,0.701422,0.808743,0.751269,183.0
I-LOC,0.342857,0.521739,0.413793,23.0
B-ORG,0.343434,0.809524,0.48227,168.0
I-ORG,0.338521,0.75,0.466488,116.0
accuracy,0.854933,0.854933,0.854933,0.854933
macro avg,0.660338,0.70904,0.644092,3943.0
weighted avg,0.917987,0.854933,0.874369,3943.0


Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.990911,0.879854,0.932086,6567.0
B-PER,0.95122,0.629032,0.757282,434.0
I-PER,0.919598,0.618243,0.739394,296.0
B-LOC,0.656398,0.80758,0.724183,343.0
I-LOC,0.455556,0.773585,0.573427,53.0
B-ORG,0.33969,0.814286,0.479394,350.0
I-ORG,0.269565,0.775,0.4,200.0
accuracy,0.848235,0.848235,0.848235,0.848235
macro avg,0.654705,0.756797,0.657966,8243.0
weighted avg,0.923745,0.848235,0.872874,8243.0


Binary Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.982715,0.895914,0.937309,380877.0
OTHERS,0.730842,0.947187,0.825068,113647.0
accuracy,0.907697,0.907697,0.907697,0.907697
macro avg,0.856778,0.921551,0.881189,494524.0
weighted avg,0.924832,0.907697,0.911515,494524.0


Binary Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.991593,0.878925,0.931866,1616269.0
OTHERS,0.677943,0.971593,0.79863,423980.0
accuracy,0.898182,0.898182,0.898182,0.898182
macro avg,0.834768,0.925259,0.865248,2040249.0
weighted avg,0.926414,0.898182,0.904178,2040249.0


****************    Results for model_6    ****************
Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.977615,0.902778,0.938707,3096.0
B-PER,0.928571,0.585,0.717791,200.0
I-PER,0.978022,0.566879,0.717742,157.0
B-LOC,0.7875,0.688525,0.734694,183.0
I-LOC,0.56,0.608696,0.583333,23.0
B-ORG,0.326139,0.809524,0.464957,168.0
I-ORG,0.332075,0.758621,0.461942,116.0
accuracy,0.853411,0.853411,0.853411,0.853411
macro avg,0.69856,0.70286,0.659881,3943.0
weighted avg,0.917135,0.853411,0.872951,3943.0


Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.983629,0.887468,0.933077,6567.0
B-PER,0.934708,0.626728,0.750345,434.0
I-PER,0.909091,0.608108,0.728745,296.0
B-LOC,0.824627,0.644315,0.723404,343.0
I-LOC,0.561404,0.603774,0.581818,53.0
B-ORG,0.329596,0.84,0.47343,350.0
I-ORG,0.263072,0.805,0.396552,200.0
accuracy,0.84775,0.84775,0.84775,0.84775
macro avg,0.686589,0.716485,0.655339,8243.0
weighted avg,0.923792,0.84775,0.872601,8243.0


Binary Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.976372,0.905602,0.939656,380877.0
OTHERS,0.745466,0.926553,0.826203,113647.0
accuracy,0.910417,0.910417,0.910417,0.910417
macro avg,0.860919,0.916078,0.88293,494524.0
weighted avg,0.923307,0.910417,0.913584,494524.0


Binary Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.983332,0.88502,0.931589,1616269.0
OTHERS,0.682637,0.942811,0.791902,423980.0
accuracy,0.897029,0.897029,0.897029,0.8970292
macro avg,0.832984,0.913915,0.861745,2040249.0
weighted avg,0.920845,0.897029,0.902561,2040249.0


****************    Results for model_7    ****************
Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.976017,0.906977,0.940231,3096.0
B-PER,0.88806,0.595,0.712575,200.0
I-PER,0.968421,0.585987,0.730159,157.0
B-LOC,0.871622,0.704918,0.779456,183.0
I-LOC,0.448276,0.565217,0.5,23.0
B-ORG,0.345269,0.803571,0.483005,168.0
I-ORG,0.334572,0.775862,0.467532,116.0
accuracy,0.858737,0.858737,0.858737,0.858737
macro avg,0.690319,0.705362,0.658994,3943.0
weighted avg,0.917584,0.858737,0.876902,3943.0


Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.9855,0.890056,0.93535,6567.0
B-PER,0.934256,0.62212,0.746888,434.0
I-PER,0.88835,0.618243,0.729084,296.0
B-LOC,0.838384,0.725948,0.778125,343.0
I-LOC,0.527027,0.735849,0.614173,53.0
B-ORG,0.33412,0.808571,0.472849,350.0
I-ORG,0.27379,0.82,0.410513,200.0
accuracy,0.853209,0.853209,0.853209,0.853209
macro avg,0.683061,0.745827,0.669569,8243.0
weighted avg,0.925318,0.853209,0.877041,8243.0


Binary Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.974978,0.907626,0.940097,380877.0
OTHERS,0.748617,0.921934,0.826285,113647.0
accuracy,0.910914,0.910914,0.910914,0.910914
macro avg,0.861798,0.91478,0.883191,494524.0
weighted avg,0.922958,0.910914,0.913942,494524.0


Binary Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.986381,0.890567,0.936028,1616269.0
OTHERS,0.69556,0.953125,0.804224,423980.0
accuracy,0.903567,0.903567,0.903567,0.9035672
macro avg,0.840971,0.921846,0.870126,2040249.0
weighted avg,0.925946,0.903567,0.908638,2040249.0


****************    Results for model_8    ****************
Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.990846,0.874031,0.92878,3096.0
B-PER,0.965517,0.56,0.708861,200.0
I-PER,0.978022,0.566879,0.717742,157.0
B-LOC,0.628205,0.803279,0.705036,183.0
I-LOC,0.3125,0.652174,0.422535,23.0
B-ORG,0.318396,0.803571,0.456081,168.0
I-ORG,0.307692,0.793103,0.443373,116.0
accuracy,0.835912,0.835912,0.835912,0.835912
macro avg,0.643026,0.721862,0.626058,3943.0
weighted avg,0.919514,0.835912,0.861464,3943.0


Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.992422,0.857469,0.920023,6567.0
B-PER,0.979839,0.559908,0.71261,434.0
I-PER,0.906736,0.591216,0.715746,296.0
B-LOC,0.567347,0.810496,0.667467,343.0
I-LOC,0.448276,0.735849,0.557143,53.0
B-ORG,0.323496,0.814286,0.463038,350.0
I-ORG,0.255224,0.855,0.393103,200.0
accuracy,0.827611,0.827611,0.827611,0.827611
macro avg,0.639048,0.746318,0.632733,8243.0
weighted avg,0.921206,0.827611,0.856736,8243.0


Binary Test Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.989044,0.873397,0.92763,380877.0
OTHERS,0.695161,0.967575,0.809053,113647.0
accuracy,0.89504,0.89504,0.89504,0.89504
macro avg,0.842103,0.920486,0.868341,494524.0
weighted avg,0.921506,0.89504,0.90038,494524.0


Binary Dev Results:


Unnamed: 0,precision,recall,f1-score,support
O,0.992756,0.858601,0.920818,1616269.0
OTHERS,0.644238,0.976117,0.77619,423980.0
accuracy,0.883022,0.883022,0.883022,0.8830216
macro avg,0.818497,0.917359,0.848504,2040249.0
weighted avg,0.920331,0.883022,0.890763,2040249.0


**Good luck!**