We are trying to build a POC to prove if retraining SpaCy's pretrained model on private data improves the performance. This logically seems so. But need to see by how much does it improve.

One hurdle in this process is that the annotation labels that SpaCy's NER model is trained is different than that of our dataset. So, we would need to find the equivalence between the labels to make this happen. You can find the labels that SpaCy parses on its [documentation page](https://spacy.io/api/annotation#named-entities). Whereas  [our dataset](https://www.kaggle.com/alaakhaled/conll003-englishversion) has these labels: PER, ORG, LOC, MISC

The following are the dependencies of this notebook. Please make sure they are installed before running this notebook:
* SpaCy
* Sklearn

We need to compare the performance on the following scenarios:
* Performance on spaCy's vanilla pretrained model 'en_core_web_md'
* Performance after retraining vanilla model ('en_core_web_md') on training dataset.
* Performance on a model obtained by training from scratch on a training dataset.
* Performance on spaCy's vanilla pretrained model 'en_core_web_lg' which is the <b>largest</b> model
* Performance after retraining vanilla model ('en_core_web_lg') on training dataset.

In [4]:
import spacy
spacy.prefer_gpu()

False

In [5]:
import json
from pprint import pprint
import spacy 
import random
import copy 
from sklearn.metrics import precision_recall_fscore_support
from spacy.gold import GoldParse
from sklearn.metrics import accuracy_score

In [6]:
import sys
in_ipythonkernel = 'ipykernel' in sys.modules
in_collab = 'google.colab' in sys.modules
if in_ipythonkernel == True:
    in_jupyter = in_collab == False
print('Google Collab :       ', in_collab)
print('In Jupyter Notebook : ', in_jupyter)


Google Collab :        False
In Jupyter Notebook :  True


**Convert dataset files in NER format to JSON**

In [7]:
if in_jupyter == True:
    ! python -m spacy convert -c ner valid.txt > valid.json
    ! python -m spacy convert -c ner test.txt > test.json
    ! python -m spacy convert -c ner train.txt > train.json
elif in_collab == True:
    ! mkdir dir
    ! python -m spacy convert -c ner valid.txt dir
    ! python -m spacy convert -c ner test.txt dir
    ! python -m spacy convert -c ner train.txt dir
    ! mv dir/* . 
    ! rmdir dir

# **Utility functions**

In [8]:
if in_collab == True:
    from IPython.display import display as print

# Define a function to convert the ConLL ner data to a format that spaCy understands
def convert_conll_ner_to_spacy(json_obj):
    if in_collab == True:
        return convert_conll_ner_to_spacy_google_collab_(json_obj)
    else:
        assert in_jupyter == True
        return convert_conll_ner_to_spacy_jupyter_nbook_(json_obj)
        
def process_sentence_token_(sentence_tokens):
    end = None
    byte_count = 0
    entities = []
    set1 = set()
    for i,token in enumerate(sentence_tokens):
        #print('\n\t' + str(token))
        byte_count += len(token['orth'])
        ner_tag_scheme_list = token['ner'].split('-')
        biluo_scheme = ner_tag_scheme_list[0]
        ner_tag = None

        if biluo_scheme == 'O':
            #print(f'\tbc:{byte_count}')
            pass
        else: 
            set1.add(token['ner'].split('-')[1])
            ner_tag = ner_tag_scheme_list[1]
            if biluo_scheme == 'B':
                start = byte_count - len(token['orth'])
            elif biluo_scheme == 'I':
                pass
            elif biluo_scheme == 'L':
                end = byte_count
            elif biluo_scheme == 'U':
                start = byte_count - len(token['orth'])
                end = byte_count
        byte_count += 1      # For a single space between tokens
        if end != None:
            #print(f'\ttoken:{full[start:end]} -- start:{start} end:{end} bc:{byte_count} -- tag:{ner_tag}')
            entities.append((start, end, ner_tag))
            end = None
    return entities
    

def convert_conll_ner_to_spacy_google_collab_(json_obj):
    training_data = []
    sentences = json_obj[0]['paragraphs'][0]['sentences']
    for j,sentence in enumerate(sentences):
        byte_count = 0
        sentence_tokens = sentence['tokens']
        full = " ".join([token['orth'] for token in sentence_tokens])
        entities = process_sentence_token_(sentence_tokens)
        training_data.append((full, {"entities" : entities}))
    #print(set1)
    return training_data

def convert_conll_ner_to_spacy_jupyter_nbook_(json_obj):
    training_data = []
    for j,document in enumerate(json_obj):
        sentence_tokens = document['paragraphs'][0]['sentences'][0]['tokens']
        #print(sentence_tokens)
        full = " ".join([token['orth'] for token in sentence_tokens])
        entities = process_sentence_token_(sentence_tokens)
        training_data.append((full, {"entities" : entities}))
    #print(set1)
    return training_data


# Take convert human annotated examples in spacy format and build a dictionary 
# that maps annotation labels to annotated text. This wil help up in peering 
# into the annotations to find what the labels actually mean.
def spacy_get_annotations_by_labels(examples_in_spacy_fmt, labels='all'):
    if labels != 'all':
        raise('Not implemented for specific label')
    label_to_text_map =  {} 
    for text,annotations in examples_in_spacy_fmt:
        entities = annotations['entities']
        #print(text)
        for (start, end, label) in entities:
            #print('\t',start, end, label, '\''+text[start:end]+'\'')
            if label not in label_to_text_map:
                label_to_text_map[ label ] = set()
            else:
                label_to_text_map[ label ].add( text[start:end] )
    return label_to_text_map

# To display the this annotations label to text dictionary
def display_labels2text_dict(dictionary, num_samples_per_entity):
    temp_dict = {}
    for key, set_ in dictionary.items():
        temp_dict[key] = []
        set_list = list(set_)
        print(f'set_len({key}) = {len(set_)}')
        if len(set_list) < num_samples_per_entity:
            [ temp_dict[key].append(e) for e in set_list ]
        else: 
            for i in range(num_samples_per_entity):
                temp_dict[key].append( set_list[ int(random.random() * len(set_list)) ] )
    pprint(temp_dict, width=200)

def copy_dict(src_dict, dest_dict):
    for label,set_ in src_dict.items():
        if label not in dest_dict:
            dest_dict[label] = set()
        else:
            [ dest_dict[label].add(i) for i in src_dict[label] ]
    return
    
# Help merging the dictionaries for training and test examples
def merge_labels_to_text_dict(dict1, dict2):
    merged_dict = {}
    copy_dict(dict1, merged_dict)
    copy_dict(dict2, merged_dict)
    return merged_dict

import random

# Dictionary that maps entities in the pretrained model (en_core_web_xx) to the unique texts of the test set
# they resolve to. So that we know what all texts match a particular entity label. This will help us 
# in understanding which label maps to the labels of the dataset that is used for pre-training.
def spacy_ner_predictions_to_dict(examples_in_spacy_fmt, model):
    pred_ent_to_text_map = {} 
    for text,_ in examples_in_spacy_fmt:
        pred_doc = model(text)
        for ent in pred_doc.ents: 
            if ent.label_ not in pred_ent_to_text_map:
                pred_ent_to_text_map[ ent.label_ ] = set()
            else:
                pred_ent_to_text_map[ ent.label_ ].add( ent.text )
    return pred_ent_to_text_map

def print_annotaions_and_predictions(spacy_examples):
    for text,annotation in spacy_examples:
        pred_doc = pretrained_nlp(text)
        ypred = [ (ent.label_,ent.text) for ent in pred_doc.ents ]
        
        annot_list = [( ent[2], text[ent[0]:ent[1]] ) for ent in annotation['entities']]
        
        print('Human annotated: ', annot_list)
        print('Predictions    : ', ypred)
        print('\n')

def map_pred_tag_to_domain(pred_bilou_tag, equivalence_map):
    if pred_bilou_tag[0] == 'O':
        return 'O'
    bilou_part = pred_bilou_tag.split('-')[0]
    label_part = pred_bilou_tag.split('-')[1]
        
    if label_part not in equivalence_map.keys():
        return 'O'
    return bilou_part + '-' + equivalence_map[label_part]

def convert_doc_to_bilou_tags(doc):
    list_ = [] 
    for i in range(len(doc)):
        # Process BILOU tag
        if doc[i].ent_iob_ == 'O':
            bilou_tag = 'O'
        else:
            if doc[i].ent_iob_ == 'B':
                bilou_tag = 'U' if (i+1) < len(doc) and doc[i+1].ent_iob_ != 'I' else 'B'
            elif doc[i].ent_iob_ == 'I':
                bilou_tag = 'I' if (i+1) < len(doc) and doc[i+1].ent_iob_ == 'I' else 'L'
            else:
                assert "This is unexpected"
        bilou_tag = 'O' if doc[i].ent_type_ == '' else bilou_tag + '-' + doc[i].ent_type_
        
        list_.append( (bilou_tag, doc[i].text) )    
    #print('--->> ',list_)
    return list_

def perf_measure(y_actual, y_hat, label):
    TP = 0
    FP = 0
    TN = 0
    FN = 0
    for i in range(len(y_hat)):
        if y_actual[i]==y_hat[i]==label:
            TP += 1
        if y_hat[i]==label and y_actual[i]!=label:
            FP += 1
        if y_actual[i]!=label and y_hat[i]!=label:
            TN += 1
        if y_hat[i]!=label and y_actual[i]==label:
            FN += 1
    return(TP, FP, TN, FN)


def compute_scores(spacy_examples, model, label_map):
    perf_stats_per_tag = { }
    for text,annotation in spacy_examples:
        doc = model.make_doc(text)
        gold = GoldParse(doc, entities=annotation['entities'])
        gold_tag_list = [i for i in zip(gold.ner,gold.words)]
        #print('\nGold       : ', gold_tag_list)
        
        ner_tag_predict_doc = model(text)
        ner_tag_predict_list = convert_doc_to_bilou_tags(ner_tag_predict_doc)
        ner_tag_predict_list = list( map(lambda e: (map_pred_tag_to_domain(e[0],PRED_LABELS_EQUIV_MAP), e[1]), 
                                         ner_tag_predict_list) 
                                   )
        #print(  'Predicted  : ', ner_tag_predict_list)
        
        #for i in range(len(ner_tag_predict_list)):
        #    if ner_tag_predict_list[i][0] != gold_tag_list[i][0]:
        #        print('\t',ner_tag_predict_list[i], gold_tag_list[i])
        #        continue
        #    if ner_tag_predict_list[i][1] != gold_tag_list[i][1]:
        #        print('\t',ner_tag_predict_list[i], gold_tag_list[i])
        #        continue
        
        # Compute unique labels and populate y_true and y_pred
        unique_labels = set()
        y_true, y_pred = [],[]
        for t in gold_tag_list:
            if t[0] != 'O':
                unique_labels.add( t[0] )
            y_true.append( t[0] )
        for t in ner_tag_predict_list:
            if t[0] != 'O':
                unique_labels.add( t[0] )
            y_pred.append( t[0] )
            
        #print('\tUnique Labels :', unique_labels)
        #print('\ty_true        :', y_true)
        #print('\ty_pred        :', y_pred)
        for label in unique_labels:
            (TP, FP, TN, FN) = perf_measure(y_true, y_pred, label)
            CNT = len(y_true)
            #print(label,' ',f'(TP:{TP}, FP:{FP}, TN:{TN}, FN:{FN}, CNT:{CNT})')
            
            label_part = label.split('-')[1]
            if label_part not in perf_stats_per_tag:
                perf_stats_per_tag[label_part] = {'TP':0, 'FP':0, 'TN':0, 'FN':0, 'CNT':0}
                
            perf_stats_per_tag[label_part]['TP'] += TP
            perf_stats_per_tag[label_part]['FP'] += FP
            perf_stats_per_tag[label_part]['TN'] += TN
            perf_stats_per_tag[label_part]['FN'] += FN
            perf_stats_per_tag[label_part]['CNT'] += CNT
    return perf_stats_per_tag

def display_perf_stats_per_tag( stats_per_tag ):
    # Now compute the scores
    pprint(stats_per_tag)
    for tag,st in stats_per_tag.items():
        print(f'For label: "{tag}"')
        accuracy = (st['TP'] + st['TN']) / st['CNT']
        print("\tAccuracy : "  + str(accuracy * 100) + "%")
        
        precision = 0
        if (st['TP'] + st['FP']) != 0:
            precision = st['TP'] / (st['TP'] + st['FP'])
        print("\tPrecision : " + str(precision))
        
        recall = 0
        if (st['TP'] + st['FN']) != 0:
            recall = st['TP'] / (st['TP'] + st['FN'])
        print("\tRecall : "    + str(recall))
        
        fscore = 0
        if (precision + recall) != 0:
            fscore = (2 * precision * recall) / (precision + recall)
        print("\tF-score : "   + str(fscore))
        

def conv_dataset_to_match_domain(spacy_examples, dataset_to_model_tag_map):
    spacy_examples = copy.deepcopy(spacy_examples)
    for text,annotations in spacy_examples:
        entities = annotations['entities']
        for i,ent in enumerate(entities):
            if ent[2][0] == 'O':
                continue
            entities[i] = (ent[0],ent[1],dataset_to_model_tag_map[ent[2]])                                        
    return spacy_examples

# The 'model' parameter could either be a pretrained model. Default behavior is to 
# training from scratch. 
def train_spacy_model(train_examples, model=None):
    nlp = model  # create blank Language class
    if nlp == None:
        nlp = spacy.blank('en')
    
    # create the built-in pipeline components and add them to the pipeline
    # nlp.create_pipe works for built-ins that are registered with spaCy
    if model == None:
        if 'ner' not in nlp.pipe_names:
            ner = nlp.create_pipe('ner')
            nlp.add_pipe(ner, last=True)
        # add labels
        for _, annotations in train_examples:
             for ent in annotations.get('entities'):
                ner.add_label(ent[2])

    # get names of other pipes to disable them during training
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
    with nlp.disable_pipes(*other_pipes):  # only train NER
        optimizer = nlp.begin_training()
        for itn in range(10):
            print("Starting iteration " + str(itn))
            random.shuffle(train_examples)
            losses = {}
            for text, annotations in train_examples:
                try:
                    nlp.update(
                        [text],         # batch of texts
                        [annotations],  # batch of annotations
                        drop=0.2,       # dropout - make it harder to memorise data
                        sgd=optimizer,  # callable to update weights
                        losses=losses)
                except:
                    continue
                    #print('Hello ' + str(sys.exc_info()))
            print(losses)
    return nlp

# Serialize model
def save_model_to_file(model, file_name):
    model_bytes = model.to_bytes()
    with open(file_name, 'wb') as f:
        f.write(model_bytes)
        f.flush()

# Deserialize
def load_model_from_file(file_name, nlp_load_into=None):
    with open(file_name, 'rb') as f:
        read_bytes = f.read()
    print(f'#Bytes-read: {len(read_bytes)}')
    #if nlp_load_into == None:
    #    nlp_load_into = spacy.load('en_core_web_md')
    #    #nlp_load_into.remove_pipe('ner')
    #    #for pipe_name in ['tagger', 'parser', 'ner']:
    #    #    if pipe_name not in nlp_load_into.pipe_names:
    #    #        pipe = nlp_load_into.create_pipe(pipe_name)
    #    #        nlp_load_into.add_pipe(pipe)
    nlp_load_into.from_bytes(read_bytes)
    return nlp_load_into


# **Working with datasets**

**Loading datasets**

In [9]:
with open('./train.json', 'r') as f:
    read_bytes = f.read()
print(len(read_bytes))
train_obj = json.loads(read_bytes)

26772142


In [10]:
with open('./test.json', 'r') as f:
    read_bytes = f.read()
print(len(read_bytes))
test_obj = json.loads(read_bytes)

6145341


**Convert dataset json docs into a format that spaCy understands**

In [11]:
training_examples = convert_conll_ner_to_spacy(train_obj)
test_examples = convert_conll_ner_to_spacy(test_obj)

In [12]:
for t in (training_examples + test_examples)[0:1000]:
    print(t)

('-DOCSTART-', {'entities': []})
('EU rejects German call to boycott British lamb .', {'entities': [(0, 2, 'ORG'), (11, 17, 'MISC'), (34, 41, 'MISC')]})
('Peter Blackburn', {'entities': [(0, 15, 'PER')]})
('BRUSSELS 1996-08-22', {'entities': [(0, 8, 'LOC')]})
('The European Commission said on Thursday it disagreed with German advice to consumers to shun British lamb until scientists determine whether mad cow disease can be transmitted to sheep .', {'entities': [(4, 23, 'ORG'), (59, 65, 'MISC'), (94, 101, 'MISC')]})
("Germany 's representative to the European Union 's veterinary committee Werner Zwingmann said on Wednesday consumers should buy sheepmeat from countries other than Britain until the scientific advice was clearer .", {'entities': [(0, 7, 'LOC'), (33, 47, 'ORG'), (72, 88, 'PER'), (164, 171, 'LOC')]})
('" We do n\'t support any such recommendation because we do n\'t see any grounds for it , " the Commission \'s chief spokesman Nikolaus van der Pas told a news briefing .', {'e

# **Download and Load pretrained model**

In [13]:
import spacy.cli
spacy.cli.download("en_core_web_md")
pretrained_nlp = spacy.load('en_core_web_md')

[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_md')


# **Exploratory analysis of the dataset to find the texts corresponding different labels**

The NER entities of SpaCy's model and the manually annotated labels of the dataset dont match. So we need to map those labels to the closest entities that SpaCy's NER model deals with. 

This analysis will help us figure out the intention of the human annotator such that we can then be able to map the labels from the dataset's domain to the closest labels that the model was trained on. 

Labels is the dataset:
* 'LOC'
* 'PER'
* 'ORG'
* 'MISC'

Labels that the model was trained with:
* 'NORP'       
* 'WORK_OF_ART'
* 'FAC'        
* 'PRODUCT'    
* 'EVENT'      
* 'GPE'        
* 'LOC'        
* 'ORG'        
* 'PERSON'     



**Printing annotated texts corresponding labels in the dataset's domain**

In [14]:
labels2text_dict_train    = spacy_get_annotations_by_labels(training_examples)
labels2text_dict_test     = spacy_get_annotations_by_labels(test_examples)
merged_dict               = merge_labels_to_text_dict(labels2text_dict_train, labels2text_dict_test)

label_ = 'MISC'  # Query this labels
display_labels2text_dict(merged_dict, 50)

set_len(ORG) = 795
set_len(MISC) = 301
set_len(PER) = 1099
set_len(LOC) = 505
{'LOC': ['Portugal',
         'PRAGUE',
         'NZ',
         'Indian Ocean',
         'Greece',
         'LONDON',
         'AMARILLO',
         'DILI',
         'Niger',
         'Belgrade',
         'Colorado',
         'Mauritius',
         'Ariz',
         'Tokyo',
         'Latin America',
         'France',
         'DETROIT',
         'MONTREAL',
         'Caribbean',
         'Republic of Padania',
         'Belgrade',
         'Switzerland',
         'BEIRUT',
         'ABIDJAN',
         'Quequen',
         'Rangoon',
         'Montana',
         'Ottawa',
         'Colorado',
         'RED SEA',
         'Melbourne',
         'WEST INDIES',
         'Chile',
         'Belfast',
         'Hobart',
         'West Indies',
         'UK',
         'Yugoslavia',
         'Bahia Blanca',
         'Kuala Lumpur',
         'Chad',
         'U.S',
         'Kanpur',
         'Let',
         'Milan',
    

**Printing annotated texts corresponding labels in the model's domain**

In [15]:
doc = pretrained_nlp('Apple is a fruit')
print(pretrained_nlp.pipe_names)
for t in doc.ents: print(t.label_)

['tagger', 'parser', 'ner']
ORG


In [16]:
# Following call takes time. Uncommnet if contents of the parameter has changed
pred_dict = spacy_ner_predictions_to_dict(test_examples, pretrained_nlp)

display_labels2text_dict(pred_dict, 80)

set_len(GPE) = 572
set_len(PERSON) = 1099
set_len(ORG) = 794
set_len(DATE) = 618
set_len(EVENT) = 41
set_len(CARDINAL) = 621
set_len(ORDINAL) = 28
set_len(TIME) = 76
set_len(NORP) = 147
set_len(QUANTITY) = 86
set_len(PRODUCT) = 14
set_len(MONEY) = 162
set_len(PERCENT) = 58
set_len(LOC) = 44
set_len(FAC) = 19
set_len(WORK_OF_ART) = 9
set_len(LANGUAGE) = 2
set_len(LAW) = 4
{'CARDINAL': ['195',
              '40.78',
              '192.36',
              'Half',
              '124.2',
              '6',
              '148.20',
              '38.08',
              '59',
              '68.05',
              '1:19.21',
              '158',
              '220',
              '66129502',
              '213',
              '1:18.15',
              '3,800',
              '17,200',
              '323',
              '8.1',
              '121.3',
              '89',
              '1:49.41',
              'SIX',
              '2147.6',
              '136',
              '231.5',
              'roug

In [17]:
print(pred_dict)

{'GPE': {'Masisi', 'France', 'TORONTO', 'UTAH', 'CHICAGO', 'Blida province', 'Mongolia', 'Switzerland', 'Melville', 'NEW YORK', 'Brighton', 'Maccabi', 'MACEDONIA', "the United States '", 'Busang', "Saudi Arabia 's", 'Dallas', 'BOSTON', 'Ayr', 'Oviedo', 'Tilburg', 'Turkey', 'Sampdoria', 'Jakarta', 'New York City', 'Parma', 'Cairns', 'Minn', 'Algiers', 'Netherlands', 'Indianapolis', 'JERUSALEM', 'Athens', 'Tel Aviv', 'NEW ZEALAND', 'East Timor', 'Cairo', 'Lebanon', 'Florida', 'Manchester City', 'Dunfermline', 'BELFAST', 'Sunseeds', 'Mansfield', 'Colorado', 'Shrewsbury', 'AUSTRIA', 'Izingolweni', 'KANSAS CITY', 'TOKYO', 'Valencia', 'Southend', 'Malaysia', 'Warsaw', 'Augusta', 'East Fife', 'Pilsen', 'Vicenza', 'Leeds', 'NEW JERSEY', 'Southampton', 'LONDON', 'Walsall', 'GENEVA', 'New Zealand', 'Atletico Madrid', 'Walikale', 'BRATISLAVA', 'Venezuela', 'Swansea', 'Zaragoza', 'MOSCOW', 'Kenya', 'SAN JOSE', 'Maracaibo', 'Bonn', 'Sheffield United 1 Portsmouth', 'Lisbon', 'WHISTLER', 'Oradea', 'G

**Post analysis mapping of labels from the dataset's domain to the model's domain**
It was found after analysis that :
* the label MISC from the dataset's domain map roughly to model domain labels: 'NORP', 'WORK_OF_ART', 'FAC', 'PRODUCT', 'EVENT'.
* Dataset label 'LOC' can be mapped to model label 'GPE'/'LOC'
* Dataset label 'ORG' can be mapped to model label 'ORG'
* Dataset label 'PER' can be mapped to model label 'PERSON'

In [18]:
PRED_LABELS_EQUIV_MAP = {
    'NORP'       : 'MISC',
    'WORK_OF_ART': 'MISC',
    'FAC'        : 'MISC',
    'PRODUCT'    : 'MISC',
    'EVENT'      : 'MISC',
    'GPE'        : 'LOC',
    'LOC'        : 'LOC',
    'ORG'        : 'ORG', 
    'PERSON'     : 'PER'
}

# **Evaluating on spaCy's pretrained, medium, vanilla model**

This score would be compared with the score on the same model which has been retrained on the training data. We count the
* Number of true positives  - Labelled correctly
* Number of false positives - Labelled and did not got it correct AND those that should not have been labelled.
* Number of false negatives - Those that were not labelled at all

In [19]:
#print_annotaions_and_predictions(test_examples)
stats_per_tag = compute_scores(test_examples, pretrained_nlp, PRED_LABELS_EQUIV_MAP)
display_perf_stats_per_tag(stats_per_tag)

{'LOC': {'CNT': 36306, 'FN': 453, 'FP': 791, 'TN': 33571, 'TP': 1491},
 'MISC': {'CNT': 23206, 'FN': 468, 'FP': 276, 'TN': 21906, 'TP': 556},
 'ORG': {'CNT': 50164, 'FN': 1462, 'FP': 1431, 'TN': 46196, 'TP': 1075},
 'PER': {'CNT': 45528, 'FN': 608, 'FP': 648, 'TN': 42053, 'TP': 2219}}
For label: "LOC"
	Accuracy : 96.57356910703466%
	Precision : 0.6533742331288344
	Recall : 0.7669753086419753
	F-score : 0.705631803123521
For label: "PER"
	Accuracy : 97.24125812686698%
	Precision : 0.77397976979421
	Recall : 0.7849310222851079
	F-score : 0.7794169301018616
For label: "ORG"
	Accuracy : 94.23291603540387%
	Precision : 0.4289704708699122
	Recall : 0.423728813559322
	F-score : 0.42633353162799914
For label: "MISC"
	Accuracy : 96.79393260363699%
	Precision : 0.6682692307692307
	Recall : 0.54296875
	F-score : 0.5991379310344827


# **Retraining pretrained-medium SpaCy model and evaluation**

**Convert dataset to match domain**

In [20]:
dataset_to_model_tag_map = { 'MISC': 'NORP',
                             'LOC' : 'LOC',
                             'ORG' : 'ORG',
                             'PER' : 'PERSON'}
mapped_examples = conv_dataset_to_match_domain(training_examples, 
                                               dataset_to_model_tag_map)

In [None]:
train_spacy_model(mapped_examples, pretrained_nlp)

**Evaluating the retrained model on test data** 

In [None]:
--allow-root#nlp_md_retrained = copy.deepcopy(pretrained_nlp) # *ExpensiveResource*. comment after execution

#save_model_to_file(nlp_md_retrained, 'saved_models/retrained__md_spacy_model.bin')

loaded = load_model_from_file('saved_models/retrained__md_spacy_model.bin', 
                              spacy.load('en_core_web_md'))

stats_per_tag = compute_scores(test_examples, 
                               loaded,          ## Need to review before executing the cell
                               PRED_LABELS_EQUIV_MAP)
display_perf_stats_per_tag(stats_per_tag)

# **Training a SpaCy model from scratch and its evaluation**

Now lets train a model from sratch and compute the score. The idea here would be to see if the performance on the retrained spaCy model is better than that of on a model that has been trained from scratch 

In [None]:
#nlp_from_scratch = train_spacy_model(training_examples)   # No model specified i.e spacy.blank(...)

#save_model_to_file(nlp_from_scratch, 'saved_models/space_model_from_scratch.bin')

loaded =  load_model_from_file('saved_models/space_model_from_scratch.bin')

stats_per_tag = compute_scores(test_examples, 
                               loaded,     ## Need to review before executing the cell
                               PRED_LABELS_EQUIV_MAP)
display_perf_stats_per_tag(stats_per_tag)

# **Evaluating on SpaCy's pretrained, large, vanilla model**

Download spaCy's large model (en_core_web_lg) and find how it performs on the test examples. Get the performance scores.

Later, retrain this "large" model and see how it compares with the vanilla and the retrained results on "medium" model. 

In [None]:
# Downloading large model
import spacy.cli
spacy.cli.download("en_core_web_lg")
nlp_lg_pretrained = spacy.load('en_core_web_lg')

In [None]:
stats_per_tag = compute_scores(test_examples, nlp_lg_pretrained, PRED_LABELS_EQUIV_MAP)
display_perf_stats_per_tag(stats_per_tag)

# **Retraining pretrained-large SpaCy model and evaluation**

In [None]:
train_spacy_model(mapped_examples, pretrained_nlp) # Comment after use

In [None]:
#nlp_retrained_lg = copy.deepcopy(pretrained_nlp) # *ExpensiveResource*. comment after execution

#save_model_to_file(nlp_retrained_lg, 'saved_models/space_model_from_scratch.bin')

loaded =  load_model_from_file('saved_models/space_model_from_scratch.bin',  
                               spacy.blank('en'))

stats_per_tag = compute_scores(test_examples, 
                               loaded,     ## Need to review before executing the cell
                               PRED_LABELS_EQUIV_MAP)
display_perf_stats_per_tag(stats_per_tag)

# **Results**
Scores on 

# **Conclusions**
