# Evaluating the results of Training

The results of training (and its evaluation) will depend on how the data was split into training and testing sets. In this worksheet, we use repeated random subsampling to assess the performance of our trained model.

According to [Wikipedia](https://en.wikipedia.org/wiki/Cross-validation_(statistics)):

>This method, also known as Monte Carlo cross-validation,[16] creates multiple random splits of the dataset into training and validation data.[17] For each such split, the model is fit to the training data, and predictive accuracy is assessed using the validation data. The results are then averaged over the splits. The advantage of this method (over k-fold cross validation) is that the proportion of the training/validation split is not dependent on the number of iterations (folds). The disadvantage of this method is that some observations may never be selected in the validation subsample, whereas others may be selected more than once. In other words, validation subsets may overlap. This method also exhibits Monte Carlo variation, meaning that the results will vary if the analysis is repeated with different random splits.

We will be dividing our data into an 80-20 split, using 80% for training and 20% for testing. This will be repeated randomly for each iteration of training to evaluate how much the training improves results on average.


In [1]:
#Import necessary modules

from __future__ import unicode_literals, print_function
import spacy
from spacy.lang.es import Spanish 
from spacy.scorer import Scorer
from spacy.language import GoldParse
from spacy.util import minibatch, compounding

import pandas as pd
import numpy as np
import json
import plac
import random
from sklearn.model_selection import train_test_split
from pathlib import Path
from copy import deepcopy

In [2]:
# Read Tagged Data from JSON file
with open('AMSTrainingII_SF.json', 'r', encoding='utf-8') as fp2:
    TAGGED_DATA = json.load(fp2)

Spacy has a built-in function for evaluating a model's performance using the [command line](https://spacy.io/api/cli#evaluate), but alternatively you can define a function like the one below. It takes the NER model and examples that you input and returns several metrics:
        - UAS (Unlabelled Attachment Score) 
        - LAS (Labelled Attachment Score)
        - ents_p
        - ents_r
        - ents_f
        - tags_acc
        - token_acc

[According](https://github.com/explosion/spaCy/issues/2405) to one of the creators of Spacy, 
>The UAS and LAS are standard metrics to evaluate dependency parsing. UAS is the proportion of tokens whose head has been correctly assigned, LAS is the proportion of tokens whose head has been correctly assigned with the right dependency label (subject, object, etc).
>ents_p, ents_r, ents_f are the precision, recall and fscore for the NER task.
>tags_acc is the POS tagging accuracy.
>token_acc seems to be the precision for token segmentation.

The key metrics for this task are the precision, recall and f-score.
**Precision** (ents_p) is the ratio of correctly-labeled entities out of all the entities labeled. (True Positive/(True Positive+False Positive)).
**Recall**  (ents_r) is the ratio of correctly-labeled entities out of all true entities (True Positive/(True Positive+False Negative)). The F-score is the mean of both values.  

These metrics all appear averaged out through all the entity types (labels) and then detailed for each label in particular. We want these values to be as close as possible to 100. 

In [3]:
 def evaluate(ner_model, examples):
        scorer = Scorer()
        for sents, ents in examples:
            doc_gold = ner_model.make_doc(sents)
            gold = GoldParse(doc_gold, entities=ents['entities'])
            pred_value = ner_model(sents)
            scorer.score(pred_value, gold)
        return scorer.scores

Next, we load the Spacy model, define a blank dataframe to store the output of our different trials, and calculate the amount of data necessary for an 80-20 split. 

In [11]:
# Load Spacy Model
nlp= spacy.load('es_core_news_ml_EMS')

In [5]:
#Define a blank dataframe with columns for the information we are interested in

columns=['ents_p', 'ents_r', 'ents_f', 'label']
eval_data = pd.DataFrame(columns=columns)
eval_data = eval_data.fillna(0)

In [6]:
# Calculate 80% of data for an 80-20 split

len(TAGGED_DATA)*0.8

400.0

Finally, we run the training loop ten times, each with a different 80-20 split, and store the evaluation statistics of our NER model in our dataframe. We are using a copy of the NLP model because we want the training to start afresh for each set of training data. Otherwise, the model would be trained on all the data including the test data, leading to the model overperforming on the tagged data compared to new samples that we are interested in tagging later.

In [12]:
# Testing how much the evaluation depends on texts included in testing data

#Loop 10 times
for x in range(0,10):
    
    #Batching the Tagged Data into training and evaluation data (80-20 split)

    random.shuffle(TAGGED_DATA)
    train_data = TAGGED_DATA[:326]
    test_data = TAGGED_DATA[326:]

    #Load the model to be trained (save separately, because we do not want to repeatedly retrain the same model)
    nlp1 = deepcopy(nlp)
    
    #Create object for retrieving the NER pipeline component
    ner=nlp1.get_pipe("ner")

    #Generate new labels for the NER component (if you wish to create new labels)
    #ner.add_label("MON")
    #ner.add_label("MON")
    #ner.add_label("DATE")

    #This piece of code creates a loop in which we train the model, but only for the NER component (disabling the tagger and the parser, which we are not using here).
    with nlp1.disable_pipes('tagger','parser'):
    #Here we resume training, alternatively you could begin_training if you are starting on a new model.
        optimizer= nlp1.resume_training()
    #Would need to figure this out, they are the sizes for the minibatching
        sizes = compounding(1.0, 4.0, 1.001)
    #This loops the training mechanism 10 times, randomly shuffling the training data and creating mini-batches from which the algorithm learns to label. Each time a batch is processed, the model is updated.
        for itn in range(10):
            random.shuffle(train_data)
            batches = minibatch(train_data, size=sizes)
            losses = {}
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp1.update(texts, annotations, sgd=optimizer, drop=0.35, losses=losses)
            print("Losses", losses)
    
    #Testing NER results of existing model on test data

    results = evaluate(nlp1,test_data)
    evaluation= dict((k, results[k]) for k in ['ents_per_type'] 
                                        if k in results)
    
    ev_date = [val.get('DATE') for val in evaluation.values()]
    ev_mon= [val.get('MON') for val in evaluation.values()]
    #ev_obj= [val.get('OBJ') for val in evaluation.values()]
    ev_org= [val.get('ORG') for val in evaluation.values()]
    ev_per= [val.get('PER') for val in evaluation.values()]
    ev_loc= [val.get('LOC') for val in evaluation.values()]
    
    dlist = list(ev_date[0].values())
    newrow1= {'ents_p': dlist[0],'ents_r': dlist[1],'ents_f':dlist[2],'label':'DATE'}
    
    mlist = list(ev_mon[0].values())
    newrow2= {'ents_p': mlist[0],'ents_r':mlist[1],'ents_f':mlist[2],'label':'MON'}
                  
    #oblist = list(ev_obj[0].values())
    #newrow3= {'ents_p':oblist[0],'ents_r':oblist[1],'ents_f':oblist[2],'label':'OBJ'}
                  
    orlist = list(ev_org[0].values())
    newrow4= {'ents_p':orlist[0],'ents_r':orlist[1],'ents_f':orlist[2],'label':'ORG'}
                  
    plist = list(ev_per[0].values())
    newrow5= {'ents_p':plist[0],'ents_r':plist[1],'ents_f':plist[2],'label':'PER'}
                  
    llist = list(ev_loc[0].values())
    newrow6= {'ents_p':llist[0],'ents_r':llist[1],'ents_f':llist[2],'label':'LOC'}
                  
    eval_data=eval_data.append(newrow1,ignore_index=True)
    eval_data=eval_data.append(newrow2,ignore_index=True)
    #eval_data=eval_data.append(newrow3,ignore_index=True)
    eval_data=eval_data.append(newrow4,ignore_index=True)
    eval_data=eval_data.append(newrow5,ignore_index=True)
    eval_data=eval_data.append(newrow6,ignore_index=True)

  "__main__", mod_spec)


Losses {'ner': 3591.5471024410667}
Losses {'ner': 2690.530075185315}
Losses {'ner': 2293.0892819620667}
Losses {'ner': 1998.821414852835}
Losses {'ner': 1966.5152335241576}
Losses {'ner': 1474.1909109121507}
Losses {'ner': 1610.5948918567722}
Losses {'ner': 1527.3823162800095}
Losses {'ner': 1320.9996693693686}
Losses {'ner': 1172.837904502785}


  "__main__", mod_spec)


Losses {'ner': 3173.365915730778}
Losses {'ner': 2605.258725921552}
Losses {'ner': 2231.7535603933584}
Losses {'ner': 1894.8959838595767}
Losses {'ner': 1740.9894983682075}
Losses {'ner': 1921.943867120505}
Losses {'ner': 1543.7140949294603}
Losses {'ner': 1458.7159114796991}
Losses {'ner': 1226.3817615201285}
Losses {'ner': 1176.5516203799134}


  "__main__", mod_spec)


Losses {'ner': 3359.824695674141}
Losses {'ner': 2371.2248234036533}
Losses {'ner': 2034.2361413680435}
Losses {'ner': 1972.2211253678627}
Losses {'ner': 1749.111869845779}
Losses {'ner': 1683.5968692939193}
Losses {'ner': 1398.9753319906195}
Losses {'ner': 1385.0082297985693}
Losses {'ner': 1352.7585104838372}
Losses {'ner': 1088.2770128995617}


  "__main__", mod_spec)


Losses {'ner': 3265.25447810646}
Losses {'ner': 2355.0180194657164}
Losses {'ner': 2167.2210276501687}
Losses {'ner': 1874.0685026616434}
Losses {'ner': 1795.589476999211}
Losses {'ner': 1541.5740136824422}
Losses {'ner': 1375.9743149334263}
Losses {'ner': 1325.733408356837}
Losses {'ner': 1224.5119927092956}
Losses {'ner': 1147.1287610851045}


  "__main__", mod_spec)


Losses {'ner': 3765.1766208544823}
Losses {'ner': 3027.056084658687}
Losses {'ner': 2419.363837527841}
Losses {'ner': 2007.044103132136}
Losses {'ner': 1817.011837738826}
Losses {'ner': 1721.18782892511}
Losses {'ner': 1589.6306925566919}
Losses {'ner': 1531.711822082291}
Losses {'ner': 1281.9176980432537}
Losses {'ner': 1278.0093234992353}


  "__main__", mod_spec)


Losses {'ner': 3047.6981249443556}
Losses {'ner': 2595.772253100031}
Losses {'ner': 2163.5630413808294}
Losses {'ner': 2119.484450748131}
Losses {'ner': 1772.286471196419}
Losses {'ner': 1687.3062683072387}
Losses {'ner': 1439.2444582623395}
Losses {'ner': 1351.8477621278744}
Losses {'ner': 1368.1221400195393}
Losses {'ner': 1211.8483465643271}


  "__main__", mod_spec)


Losses {'ner': 3355.4537231886716}
Losses {'ner': 2579.0712716618573}
Losses {'ner': 2425.906617046779}
Losses {'ner': 1914.988882358907}
Losses {'ner': 1801.8912939606005}
Losses {'ner': 1682.313634275592}
Losses {'ner': 1554.5427463052836}
Losses {'ner': 1551.337292174777}
Losses {'ner': 1241.3630930274228}
Losses {'ner': 1162.6659162593583}


  "__main__", mod_spec)


Losses {'ner': 3394.151698168256}
Losses {'ner': 2376.498884129312}
Losses {'ner': 2014.6121001837673}
Losses {'ner': 1900.183606063558}
Losses {'ner': 1804.73077124402}
Losses {'ner': 1726.4452650258775}
Losses {'ner': 1592.774578937139}
Losses {'ner': 1467.208908037217}
Losses {'ner': 1536.8202473560439}
Losses {'ner': 1327.6897618564385}


  "__main__", mod_spec)


Losses {'ner': 3201.4267012121404}
Losses {'ner': 2396.6217873936653}
Losses {'ner': 2140.1321836786483}
Losses {'ner': 1872.5953236758367}
Losses {'ner': 1751.68427697363}
Losses {'ner': 1746.491105319234}
Losses {'ner': 1636.3151750145018}
Losses {'ner': 1565.267411978829}
Losses {'ner': 1443.6309273053087}
Losses {'ner': 1404.0486455916766}


  "__main__", mod_spec)


Losses {'ner': 3736.4065660341425}
Losses {'ner': 2602.4843297683783}
Losses {'ner': 2345.6740134651104}
Losses {'ner': 1967.469838794346}
Losses {'ner': 1969.1729780225508}
Losses {'ner': 1765.7099652288762}
Losses {'ner': 1623.769290027325}
Losses {'ner': 1400.5577166877981}
Losses {'ner': 1322.8651575191518}
Losses {'ner': 1334.4372154311782}


Below, we print the contents of our evaluation dataframe:

In [13]:
print(eval_data)

       ents_p     ents_r     ents_f label
0   81.651376  81.651376  81.651376  DATE
1   87.577640  87.577640  87.577640   MON
2   62.962963  65.891473  64.393939   ORG
3   92.446043  92.446043  92.446043   PER
4   89.772727  80.338983  84.794275   LOC
5   85.046729  81.250000  83.105023  DATE
6   82.758621  90.225564  86.330935   MON
7   70.588235  60.000000  64.864865   ORG
8   91.544118  90.710383  91.125343   PER
9   86.111111  88.888889  87.477954   LOC
10  81.415929  83.636364  82.511211  DATE
11  87.150838  88.636364  87.887324   MON
12  68.421053  53.719008  60.185185   ORG
13  91.218638  93.223443  92.210145   PER
14  85.559567  88.104089  86.813187   LOC
15  88.392857  83.193277  85.714286  DATE
16  61.637931  89.375000  72.959184   MON
17  70.476190  73.267327  71.844660   ORG
18  92.816635  93.702290  93.257360   PER
19  90.573770  87.007874  88.755020   LOC
20  82.178218  82.178218  82.178218  DATE
21  91.709845  93.650794  92.670157   MON
22  62.280702  62.280702  62.28070

In [14]:
#Measure mean and standard deviation of f, p and r scores for each label 
a = eval_data.groupby('label').agg({'ents_f':['mean','std'],'ents_p':['mean','std'],'ents_r':['mean','std']})

In [15]:
a

Unnamed: 0_level_0,ents_f,ents_f,ents_p,ents_p,ents_r,ents_r
Unnamed: 0_level_1,mean,std,mean,std,mean,std
label,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
DATE,83.172302,3.328496,84.643803,3.063558,81.850208,4.55444
LOC,87.391638,1.819658,88.754376,2.59842,86.168764,3.086064
MON,84.475452,7.070495,79.927858,11.820308,90.654358,2.95613
ORG,62.824572,4.71039,65.415327,5.459031,60.696991,5.936379
PER,92.71417,1.173462,92.83792,1.456385,92.605774,1.479931


#  Evaluating Spelling Normalization

We can apply the evaluation above to a model trained with text whose spelling has been normalized, thus evaluating whether the inclusion of a normalization dictionary improves training results.

In [11]:
# Read Norm Exceptions from JSON file
with open('normalizeddict.json', 'r', encoding='utf-8') as fp3:
    NORM_EXCEPTIONS = json.load(fp3)

In [12]:
#Define and add pipeline component that updates .norm attribute

def add_custom_norms(doc):
    for token in doc:
        if token.text in NORM_EXCEPTIONS:
            token.norm_ = NORM_EXCEPTIONS[token.text]
    return doc

#Add component to the pipeline

nlp.add_pipe(add_custom_norms, first=True)

In [13]:
#Define a new blank dataframe with columns for the information we are interested in

columns=['ents_p', 'ents_r', 'ents_f', 'label']
eval_data2 = pd.DataFrame(columns=columns)
eval_data2 = eval_data2.fillna(0)

In [14]:
# Train and evaluate Model trained with EMS dictionary

#Loop 10 times
for x in range(0,10):
    
    random.shuffle(TAGGED_DATA)
    train_data = TAGGED_DATA[:326]
    test_data = TAGGED_DATA[326:]
    
    #Load the model to be trained
    nlp2 = deepcopy(nlp)
    
    #Create object for retrieving the NER pipeline component
    ner=nlp2.get_pipe("ner")

    #Generate new labels for the NER component (if you wish to create new labels)
    ner.add_label("OBJ")
    ner.add_label("MON")
    ner.add_label("DATE")

    #This piece of code creates a loop in which we train the model, but only for the NER component (disabling the tagger and the parser, which we are not using here).
    with nlp2.disable_pipes('tagger','parser'):
    #Here we resume training, alternatively you could begin_training if you are starting on a new model.
        optimizer= nlp2.resume_training()
    #Would need to figure this out, they are the sizes for the minibatching
        sizes = compounding(1.0, 4.0, 1.001)
    #This loops the training mechanism 10 times, randomly shuffling the training data and creating mini-batches from which the algorithm learns to label. Each time a batch is processed, the model is updated.
        for itn in range(10):
            random.shuffle(train_data)
            batches = minibatch(train_data, size=sizes)
            losses = {}
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp2.update(texts, annotations, sgd=optimizer, drop=0.35, losses=losses)
            print("Losses", losses)
   
 #Testing NER results of existing model on test data

    results = evaluate(nlp2,test_data)
    evaluation= dict((k, results[k]) for k in ['ents_per_type'] 
                                        if k in results)
    
    ev_date = [val.get('DATE') for val in evaluation.values()]
    ev_mon= [val.get('MON') for val in evaluation.values()]
    ev_obj= [val.get('OBJ') for val in evaluation.values()]
    ev_org= [val.get('ORG') for val in evaluation.values()]
    ev_per= [val.get('PER') for val in evaluation.values()]
    ev_loc= [val.get('LOC') for val in evaluation.values()]
    
    dlist = list(ev_date[0].values())
    newrow1= {'ents_p': dlist[0],'ents_r': dlist[1],'ents_f':dlist[2],'label':'DATE'}
    
    mlist = list(ev_mon[0].values())
    newrow2= {'ents_p': mlist[0],'ents_r':mlist[1],'ents_f':mlist[2],'label':'MON'}
                  
    oblist = list(ev_obj[0].values())
    newrow3= {'ents_p':oblist[0],'ents_r':oblist[1],'ents_f':oblist[2],'label':'OBJ'}
                  
    orlist = list(ev_org[0].values())
    newrow4= {'ents_p':orlist[0],'ents_r':orlist[1],'ents_f':orlist[2],'label':'ORG'}
                  
    plist = list(ev_per[0].values())
    newrow5= {'ents_p':plist[0],'ents_r':plist[1],'ents_f':plist[2],'label':'PER'}
                  
    llist = list(ev_loc[0].values())
    newrow6= {'ents_p':llist[0],'ents_r':llist[1],'ents_f':llist[2],'label':'LOC'}
                  
    eval_data2=eval_data2.append(newrow1,ignore_index=True)
    eval_data2=eval_data2.append(newrow2,ignore_index=True)
    eval_data2=eval_data2.append(newrow3,ignore_index=True)
    eval_data2=eval_data2.append(newrow4,ignore_index=True)
    eval_data2=eval_data2.append(newrow5,ignore_index=True)
    eval_data2=eval_data2.append(newrow6,ignore_index=True)

  "__main__", mod_spec)


Losses {'ner': 29499.62115381894}
Losses {'ner': 27432.198336651258}
Losses {'ner': 27237.476574885826}
Losses {'ner': 26886.309408007888}
Losses {'ner': 26540.757857780347}
Losses {'ner': 26545.37359688431}
Losses {'ner': 26574.1013068147}
Losses {'ner': 26679.740034505725}
Losses {'ner': 26516.571289405227}
Losses {'ner': 26684.599556565285}


  "__main__", mod_spec)


Losses {'ner': 30651.302070253238}
Losses {'ner': 28284.725300171824}
Losses {'ner': 27811.04373792749}
Losses {'ner': 27858.08806299469}
Losses {'ner': 27718.562570926966}
Losses {'ner': 27328.900965396315}
Losses {'ner': 27189.881448478438}
Losses {'ner': 27317.36295453459}
Losses {'ner': 27218.52796754241}
Losses {'ner': 27065.22095376253}


  "__main__", mod_spec)


Losses {'ner': 29590.74765747037}
Losses {'ner': 27184.64154662579}
Losses {'ner': 27026.793694191245}
Losses {'ner': 26454.55213338224}
Losses {'ner': 26839.336028540452}
Losses {'ner': 26708.574336301535}
Losses {'ner': 26650.113122107927}
Losses {'ner': 26635.132207155228}
Losses {'ner': 26199.49300429225}
Losses {'ner': 26226.21255362034}


  "__main__", mod_spec)


Losses {'ner': 28093.531449958275}
Losses {'ner': 25667.964437861654}
Losses {'ner': 24951.78715877945}
Losses {'ner': 25100.753037386166}
Losses {'ner': 24785.410763391832}
Losses {'ner': 24760.63866879698}
Losses {'ner': 24289.866849032464}
Losses {'ner': 24400.87175379321}
Losses {'ner': 24482.260458946228}
Losses {'ner': 24565.933971002698}


  "__main__", mod_spec)


Losses {'ner': 28332.016745980498}
Losses {'ner': 26362.2281262184}
Losses {'ner': 25635.542517440506}
Losses {'ner': 25117.322425551873}
Losses {'ner': 25136.405352118192}
Losses {'ner': 25244.448437670246}
Losses {'ner': 24999.708883602172}
Losses {'ner': 25051.532636642456}
Losses {'ner': 24823.247329860926}
Losses {'ner': 24896.657667689025}


  "__main__", mod_spec)


Losses {'ner': 28390.664817206565}
Losses {'ner': 26001.15853855208}
Losses {'ner': 25853.95808242492}
Losses {'ner': 25646.67148854825}
Losses {'ner': 25621.01740635198}
Losses {'ner': 25310.714385457337}
Losses {'ner': 25641.85621431656}
Losses {'ner': 25141.601619463414}
Losses {'ner': 25471.13282689452}
Losses {'ner': 25402.038847267628}


  "__main__", mod_spec)


Losses {'ner': 28857.829207675437}
Losses {'ner': 26427.575190555293}
Losses {'ner': 25773.047303230134}
Losses {'ner': 25860.04211168643}
Losses {'ner': 25870.547863037034}
Losses {'ner': 25445.04180361796}
Losses {'ner': 25652.979458531365}
Losses {'ner': 25652.355613194406}
Losses {'ner': 25368.51256494224}
Losses {'ner': 25482.41212736629}


  "__main__", mod_spec)


Losses {'ner': 31183.07043193781}
Losses {'ner': 28938.966911083655}
Losses {'ner': 28314.097444169314}
Losses {'ner': 27981.325828345012}
Losses {'ner': 27896.63830166764}
Losses {'ner': 27506.291575556505}
Losses {'ner': 27805.07676478289}
Losses {'ner': 27623.085354603827}
Losses {'ner': 27410.957109078765}
Losses {'ner': 27565.289154559374}


  "__main__", mod_spec)


Losses {'ner': 29698.50474952518}
Losses {'ner': 28100.42544671885}
Losses {'ner': 27684.143731447766}
Losses {'ner': 27413.299044790547}
Losses {'ner': 27221.78061976153}
Losses {'ner': 27076.371534661856}
Losses {'ner': 26987.433071333915}
Losses {'ner': 27206.584793627262}
Losses {'ner': 26959.954725861317}
Losses {'ner': 27405.67962694168}


  "__main__", mod_spec)


Losses {'ner': 27282.973104532}
Losses {'ner': 25546.693975118553}
Losses {'ner': 24631.723043808382}
Losses {'ner': 24399.073844784987}
Losses {'ner': 24632.41723647827}
Losses {'ner': 24773.259103098884}
Losses {'ner': 24214.003735827282}
Losses {'ner': 24327.542434114963}
Losses {'ner': 23951.511757960543}
Losses {'ner': 24057.59735112358}


In [15]:
b= eval_data2.groupby('label').agg({'ents_f':['mean','std'],'ents_p':['mean','std'],'ents_r':['mean','std']})

Below, we print the statistics for the training with (b) and without (a) spelling normalization. As can be seen, there is a slight improvement on most measurements (as well as a reduction in variability) when we normalize spelling. 

In [16]:
a

Unnamed: 0_level_0,ents_f,ents_f,ents_p,ents_p,ents_r,ents_r
Unnamed: 0_level_1,mean,std,mean,std,mean,std
label,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
DATE,9.782182,9.578643,41.666667,40.253824,5.821155,5.936515
LOC,80.577654,3.017314,85.26357,4.642661,76.664881,4.973763
MON,55.384011,13.193679,65.566869,13.052479,48.752034,13.813612
OBJ,1.888861,4.417157,23.333333,41.722185,1.003731,2.375146
ORG,22.120174,14.267794,32.923158,19.451804,16.979782,11.556657
PER,85.964481,2.293672,87.336613,2.670296,84.66708,2.5638


In [17]:
b

Unnamed: 0_level_0,ents_f,ents_f,ents_p,ents_p,ents_r,ents_r
Unnamed: 0_level_1,mean,std,mean,std,mean,std
label,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
DATE,6.181599,9.23811,21.666667,33.379598,3.616384,5.377218
LOC,82.013233,2.651031,86.110284,4.35966,78.437365,3.290363
MON,51.947078,9.837542,70.758458,12.463683,42.015463,10.894793
OBJ,2.933807,2.513359,48.333333,41.16363,1.525103,1.316534
ORG,22.247787,8.778882,31.986232,12.151958,17.145127,6.982109
PER,86.558603,2.209405,88.364413,2.650517,84.850763,2.290407
