In [2]:
%load_ext autoreload
%autoreload 2

In [21]:
import os, sys, ast
import ujson as json
from tqdm import tqdm
import random
import pandas as pd
import numpy as np
from pathlib import Path
from collections import defaultdict

from IPython.core.display import display, HTML

display(HTML("<style>.container { width:90% !important; }</style>"))
pd.options.display.max_colwidth = 500
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 5000)

### Load Squad + Entity Predictions

A current task we're exploring now is how to add entity embeddings from our system Bootleg [https://github.com/HazyResearch/bootleg] into downstream tasks. This is a simple model we trained that adds Bootleg entity embeddings to Squad (like, the _very_ first attempt we've done at this). The goal is to do some error analysis to understand if the entity embeddings are helping and, importantly, what can we do to improve performance

In [66]:
dump = Path("predictions_dump_filtered_to_incorrect_preds.csv")
df = pd.read_csv(dump)
# Due to csv saving arrays as strings, we need to do some conversions ... this is HACKY
df["gold_answer"] = df["gold_answer"].apply(lambda x: ast.literal_eval(x.replace("\n", "").replace("' '", "', '")))
df["ent_mentions"] = df["ent_mentions"].apply(lambda x: ast.literal_eval(x.replace("\n", "").replace("' '", "', '")))
df["ent_qids"] = df["ent_qids"].apply(lambda x: eval(x.replace("\n", "").replace("' '", "', '")))
df["ent_qids_with_unk"] = df["ent_qids_with_unk"].apply(lambda x: eval(x.replace("\n", "").replace("' '", "', '")))
df["ent_spans"] = df["ent_spans"].apply(lambda x: ast.literal_eval(x.replace("\n", "").replace("array", ",").replace("[,", "[")))
df["gold_answer_start"] = df["gold_answer_start"].apply(lambda x: eval(" ".join(x.split()).replace("[ ", "[").replace(" ]", "]").replace(" ", ", ")))

### Step 1: Look at a few examples
The goal is just to see what's going on. As you can see, it's hard to parse all the text. At the moment, I kind of just deal with it, but I'll show some statistical stuff later on.

In [67]:
df.sample(30)

Unnamed: 0,f1_score,input_text,prediction,gold_answer,ent_mentions,ent_qids,proportion_toks_as_ents,question,context,ent_qids_with_unk,ent_spans,example_id,gold_answer_start
1749,80.0,"[CLS] in which galleries are the french paintings donated by jones displayed? [SEP] several french paintings entered the collection as part of the 260 paintings and miniatures ( not all the works were french, for example carlo crivelli's virgin and child ) that formed part of the jones bequest of 1882 and as such are displayed in the galleries of continental art 1600 – 1800, including the portrait of francois, duc d'alencon by francois clouet, gaspard dughet and works by francois boucher inc...",continental art,"[continental art 1600–1800, of continental art 1600–1800, galleries of continental art]","[jones, miniatures, virgin and child ), jones, bequest, portrait, francois,, francois clouet,, gaspard dughet, francois boucher, portrait, madame de pompadour, jean francois de troy,, jean - baptiste]","[Q204943, Q282129, Q926743, Q216406, Q211557, Q134307, Q180932, Q336747, Q741375, Q180932, Q134307, Q188965, Q707729, Q347139]",0.101562,In which galleries are the French paintings donated by Jones displayed?,"Several French paintings entered the collection as part of the 260 paintings and miniatures (not all the works were French, for example Carlo Crivelli's Virgin and Child) that formed part of the Jones bequest of 1882 and as such are displayed in the galleries of continental art 1600–1800, including the portrait of François, Duc d'Alençon by François Clouet, Gaspard Dughet and works by François Boucher including his portrait of Madame de Pompadour dated 1758, Jean François de Troy, Jean-Bapti...","[UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q204943, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q282129, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q926743, Q926743, Q926743, Q926743, UNK, UNK, UNK, UNK, UNK, Q216406, Q211557, Q211557, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q134307, UNK, Q180932, Q180932, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q336747, Q336747,...","[[10, 11], [27, 28], [44, 48], [53, 54], [54, 56], [75, 76], [77, 79], [86, 91], [91, 95], [98, 101], [103, 104], [105, 111], [114, 119], [119, 122]]",5726f755708984140094d737,"[263, 260, 250]"
1686,80.0,"[CLS] what does phycoerytherin appear in? [SEP] phycobilins are a third group of pigments found in cyanobacteria, and glaucophyte, red algal, and cryptophyte chloroplasts. phycobilins come in all colors, though phycoerytherin is one of the pigments that makes many red algae red. phycobilins often organize into relatively large protein complexes about 40 nanometers across called phycobilisomes. like photosystem i and atp synthase, phycobilisomes jut into the stroma, preventing thylakoid stack...",red algae red,"[red algae, red algae, algae]","[pigments, cyanobacteria,, algal,, chloroplasts., colors,, pigments, algae, protein complexes, nanometers, atp, jut, stroma,, chloroplasts., chloroplasts, cyanobacteria, pigments]","[Q910979, Q93315, Q37868, Q47263, Q1075, Q161179, Q37868, Q420927, Q178674, Q80863, Q154845, Q557179, Q47263, Q47263, Q93315, Q910979]",0.135417,What does phycoerytherin appear in?,"Phycobilins are a third group of pigments found in cyanobacteria, and glaucophyte, red algal, and cryptophyte chloroplasts. Phycobilins come in all colors, though phycoerytherin is one of the pigments that makes many red algae red. Phycobilins often organize into relatively large protein complexes about 40 nanometers across called phycobilisomes. Like photosystem I and ATP synthase, phycobilisomes jut into the stroma, preventing thylakoid stacking in red algal chloroplasts. Cryptophyte chlor...","[UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q910979, Q910979, UNK, UNK, Q93315, Q93315, Q93315, Q93315, Q93315, Q93315, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q37868, Q37868, Q37868, UNK, UNK, UNK, UNK, UNK, Q47263, Q47263, Q47263, Q47263, Q47263, Q47263, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q1075, Q1075, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q161179, Q161179, UNK, UNK, UNK, UNK, Q37868, UNK, UNK, UN...","[[23, 25], [27, 33], [41, 44], [49, 55], [63, 65], [76, 78], [82, 83], [95, 97], [99, 102], [117, 118], [127, 129], [131, 135], [145, 151], [155, 160], [162, 167], [177, 179]]",5729714daf94a219006aa42f,"[217, 217, 221]"
1401,66.666667,"[CLS] what kind of protists are euglenophytes? [SEP] euglenophytes are a group of common flagellated protists that contain chloroplasts derived from a green alga. euglenophyte chloroplasts have three membranes — it is thought that the membrane of the primary endosymbiont was lost, leaving the cyanobacterial membranes, and the secondary host's phagosomal membrane. euglenophyte chloroplasts have a pyrenoid and thylakoids stacked in groups of three. starch is stored in the form of paramylon, wh...",flagellated,"[common flagellated, common flagellated, common flagellated]","[protists, protists, chloroplasts, alga., chloroplasts, membrane, membrane., chloroplasts, starch, membrane - bound]","[Q10892, Q10892, Q47263, Q37868, Q47263, Q29548, Q1587185, Q47263, Q41534, Q1587185]",0.083333,What kind of protists are Euglenophytes?,"Euglenophytes are a group of common flagellated protists that contain chloroplasts derived from a green alga. Euglenophyte chloroplasts have three membranes—it is thought that the membrane of the primary endosymbiont was lost, leaving the cyanobacterial membranes, and the secondary host's phagosomal membrane. Euglenophyte chloroplasts have a pyrenoid and thylakoids stacked in groups of three. Starch is stored in the form of paramylon, which is contained in membrane-bound granules in the cyto...","[UNK, UNK, UNK, UNK, Q10892, Q10892, Q10892, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q10892, Q10892, Q10892, UNK, UNK, Q47263, Q47263, Q47263, Q47263, Q47263, UNK, UNK, UNK, UNK, Q37868, Q37868, Q37868, UNK, UNK, UNK, UNK, UNK, Q47263, Q47263, Q47263, Q47263, Q47263, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q29548, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, ...","[[4, 7], [28, 31], [33, 38], [42, 45], [50, 55], [64, 65], [95, 97], [102, 107], [123, 125], [139, 142]]",572962953f37b319004782f6,"[29, 29, 29]"
1768,80.0,"[CLS] in the case geven v land nordrhein - westfalen, how many hours was the dutch woman in question working in germany? [SEP] economic situation "" of the netherlands. conversely, in geven v land nordrhein - westfalen the court of justice held that a dutch woman living in the netherlands, but working between 3 and 14 hours a week in germany, did not have a right to receive german child benefits, even though the wife of a man who worked full - time in germany but was resident in austria could...",between 3 and 14,"[between 3 and 14 hours a week, 3 and 14 hours a week, between 3 and 14 hours a week]","[nordrhein - westfalen,, netherlands., nordrhein - westfalen, court of justice, netherlands,, child benefits,, germany, austria, free movement, "" public policy,, public security, public health "",, public service "".]","[Q1198, Q55, Q1198, Q1518827, Q55, Q1455934, Q183, Q40, Q1344824, Q1156854, Q294240, Q189603, Q11771944]",0.111979,"In the case Geven v Land Nordrhein-Westfalen, how many hours was the Dutch woman in question working in Germany?","The Free Movement of Workers Regulation articles 1 to 7 set out the main provisions on equal treatment of workers. First, articles 1 to 4 generally require that workers can take up employment, conclude contracts, and not suffer discrimination compared to nationals of the member state. In a famous case, the Belgian Football Association v Bosman, a Belgian footballer named Jean-Marc Bosman claimed that he should be able to transfer from R.F.C. de Liège to USL Dunkerque when his contract finish...","[UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q1198, Q1198, Q1198, Q1198, Q1198, Q1198, Q1198, Q1198, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q55, Q55, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q1198, Q1198, Q1198, Q1198, Q1198, Q1198, Q1198, UNK, Q1518827, Q1518827, Q1518827, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q55, Q55, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q1455934, Q1455934, Q1455...","[[8, 16], [35, 37], [44, 51], [52, 55], [63, 65], [85, 88], [101, 102], [106, 107], [115, 117], [127, 131], [131, 133], [134, 138], [156, 160]]",5726bc1add62a815002e8eaa,"[2922, 2930, 2922]"
459,0.0,"[CLS] what can amyloplasts become? [SEP] plastid differentiation is not permanent, in fact many interconversions are possible. chloroplasts may be converted to chromoplasts, which are pigment - filled plastids responsible for the bright colors seen in flowers and ripe fruit. starch storing amyloplasts can also be converted to chromoplasts, and it is possible for proplastids to develop straight into chromoplasts. chromoplasts and amyloplasts can also become chloroplasts, like what happens whe...",chloroplasts,"[chromoplasts, chromoplasts, chromoplasts]","[differentiation, chloroplasts, colors, flowers, fruit., starch, chloroplasts,, carrot, potato, chloroplasts, chloroplast,]","[Q210861, Q47263, Q1075, Q506, Q1364, Q41534, Q47263, Q81, Q10998, Q47263, Q47263]",0.080729,What can amyloplasts become?,"Plastid differentiation is not permanent, in fact many interconversions are possible. Chloroplasts may be converted to chromoplasts, which are pigment-filled plastids responsible for the bright colors seen in flowers and ripe fruit. Starch storing amyloplasts can also be converted to chromoplasts, and it is possible for proplastids to develop straight into chromoplasts. Chromoplasts and amyloplasts can also become chloroplasts, like what happens when a carrot or a potato is illuminated. If a...","[UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q210861, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q47263, Q47263, Q47263, Q47263, Q47263, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q1075, UNK, UNK, Q506, UNK, UNK, Q1364, Q1364, Q41534, Q41534, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK...","[[14, 15], [29, 34], [56, 57], [59, 60], [62, 64], [64, 66], [116, 122], [127, 128], [130, 131], [157, 162], [176, 182]]",572977fbaf94a219006aa4af,"[285, 285, 285]"
117,0.0,"[CLS] how did the yuan come to have the 4 schools of medicine? [SEP] the chinese medical tradition of the yuan had "" four great schools "" that the yuan inherited from the jin dynasty. all four schools were based on the same intellectual foundation, but advocated different theoretical approaches toward medicine. under the mongols, the practice of chinese medicine spread to other parts of the empire. chinese physicians were brought along military campaigns by the mongols as they expanded towar...",based on the same intellectual foundation,"[inherited from the Jin dynasty, inherited from the Jin dynasty, inherited from the Jin dynasty]","[yuan, schools of medicine?, tradition, yuan, yuan, jin dynasty., medicine., mongols,, chinese medicine, mongols, acupuncture,, moxibustion,, elixirs, middle east, medical advances, yuan period., the physician, suspension, joints,, anesthetics., the mongol, diet, treatise.]","[Q7313, Q494230, Q1055765, Q7313, Q7313, Q5066, Q200253, Q12557, Q200253, Q7313, Q121713, Q937737, Q7005626, Q7204, Q380274, Q7313, Q687787, Q1188533, Q9644, Q4990531, Q733059, Q474191, Q3267928]",0.138021,How did the Yuan come to have the 4 schools of medicine?,"The Chinese medical tradition of the Yuan had ""Four Great Schools"" that the Yuan inherited from the Jin dynasty. All four schools were based on the same intellectual foundation, but advocated different theoretical approaches toward medicine. Under the Mongols, the practice of Chinese medicine spread to other parts of the empire. Chinese physicians were brought along military campaigns by the Mongols as they expanded towards the west. Chinese medical techniques such as acupuncture, moxibustio...","[UNK, UNK, UNK, UNK, Q7313, UNK, UNK, UNK, UNK, UNK, Q494230, Q494230, Q494230, Q494230, UNK, UNK, UNK, UNK, Q1055765, UNK, UNK, Q7313, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q7313, UNK, UNK, UNK, Q5066, Q5066, Q5066, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q200253, Q200253, UNK, UNK, Q12557, Q12557, UNK, UNK, UNK, Q200253, Q200253, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q7313, UNK, UNK, UNK, UNK, UNK...","[[4, 5], [10, 14], [18, 19], [21, 22], [30, 31], [34, 37], [54, 56], [58, 60], [63, 65], [82, 83], [95, 100], [100, 105], [113, 116], [121, 123], [131, 133], [137, 140], [140, 142], [154, 155], [162, 164], [168, 173], [173, 175], [185, 186], [191, 193]]",572881704b864d1900164a51,"[81, 81, 81]"
1711,80.0,"[CLS] what city did super bowl 50 take place in? [SEP] super bowl 50 was an american football game to determine the champion of the national football league ( nfl ) for the 2015 season. the american football conference ( afc ) champion denver broncos defeated the national football conference ( nfc ) champion carolina panthers 24 – 10 to earn their third super bowl title. the game was played on february 7, 2016, at levi's stadium in the san francisco bay area at santa clara, california. as th...","Santa Clara, California","[Santa Clara, Santa Clara, Santa Clara]","[super bowl 50, super bowl 50, american football, national football league, ( nfl ), 2015 season., american football conference, ( afc ), denver broncos, national football conference, ( nfc ) champion, carolina panthers, super bowllevi's stadiumsan francisco bay area, santa clara, california., "" golden anniversary "", initiatives,, tradition, super bowl, roman numerals, "" super bowl l "" ),, arabic numerals]","[Q7642193, Q7642193, Q41323, Q1215884, Q319007, Q18698858, Q276530, Q431944, Q223507, Q319007, Q6591121, Q330120, Q32096, Q7419343, Q213205, Q159260, Q4948446, Q660064, Q6023792, Q32096, Q38918, Q7642193, Q29961325]",0.190104,What city did Super Bowl 50 take place in?,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden a...","[UNK, UNK, UNK, UNK, Q7642193, Q7642193, Q7642193, UNK, UNK, UNK, UNK, UNK, Q7642193, Q7642193, Q7642193, UNK, UNK, Q41323, Q41323, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q1215884, Q1215884, Q1215884, Q319007, Q319007, Q319007, UNK, UNK, Q18698858, Q18698858, Q18698858, UNK, Q276530, Q276530, Q276530, Q431944, Q431944, Q431944, UNK, Q223507, Q223507, UNK, UNK, Q319007, Q319007, Q319007, Q6591121, Q6591121, Q6591121, Q6591121, Q330120, Q330120, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q32096, Q32096, U...","[[4, 7], [12, 15], [17, 19], [26, 29], [29, 32], [34, 37], [38, 41], [41, 44], [45, 47], [49, 52], [52, 56], [56, 58], [65, 67], [80, 84], [86, 90], [91, 96], [108, 112], [117, 119], [126, 127], [130, 132], [134, 138], [148, 155], [163, 167]]",56beace93aeaaa14008c91e1,"[403, 403, 403]"
918,40.0,"[CLS] how did luther's writings sound as he became less healthy? [SEP] his poor physical health made him short - tempered and even harsher in his writings and comments. his wife katharina was overheard saying, "" dear husband, you are too rude, "" and he responded, "" they are teaching me to be rude. "" in 1545 and 1546 luther preached three times in the market church in halle, staying with his friend justus jonas during christmas. [SEP]",short-tempered and even harsher,"[harsher, harsher, harsher]","[physical health, katharina, luther, church in, halle,, christmas.]","[Q12147, Q77239, Q9554, Q1318624, Q2814, Q19809]",0.03125,How did Luther's writings sound as he became less healthy?,"His poor physical health made him short-tempered and even harsher in his writings and comments. His wife Katharina was overheard saying, ""Dear husband, you are too rude,"" and he responded, ""They are teaching me to be rude."" In 1545 and 1546 Luther preached three times in the Market Church in Halle, staying with his friend Justus Jonas during Christmas.","[UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q12147, Q12147, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q77239, Q77239, Q77239, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q9554, UNK, UNK, UNK, UNK, UNK, UNK, Q1318624, Q1318624, Q2814, Q2814, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q198...","[[17, 19], [36, 39], [73, 74], [80, 82], [82, 84], [92, 94]]",56f8c43d9b226e1400dd0f63,"[58, 58, 58]"
347,0.0,"[CLS] when do bathyctena chuni, euplokamis stationis and eurhamphaea vexilligera excrete secretions? [SEP] when some species, including bathyctena chuni, euplokamis stationis and eurhamphaea vexilligera, are disturbed, they produce secretions ( ink ) that luminesce at much the same wavelengths as their bodies. juveniles will luminesce more brightly in relation to their body size than adults, whose luminescence is diffused over their bodies. detailed statistical investigation has not suggeste...",When some species,"[are disturbed,, disturbed, are disturbed]","[chuni,, chuni,, ( ink ), juveniles, luminescencectenophores 'bioluminescence, correlation, mid - ocean, waters.]","[Q5371103, Q5371103, Q927860, Q1516282, Q184240, Q102778, Q179924, Q310486, Q6840885, Q202008]",0.075521,"When do bathyctena chuni, euplokamis stationis and eurhamphaea vexilligera excrete secretions?","When some species, including Bathyctena chuni, Euplokamis stationis and Eurhamphaea vexilligera, are disturbed, they produce secretions (ink) that luminesce at much the same wavelengths as their bodies. Juveniles will luminesce more brightly in relation to their body size than adults, whose luminescence is diffused over their bodies. Detailed statistical investigation has not suggested the function of ctenophores' bioluminescence nor produced any correlation between its exact color and any a...","[UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q5371103, Q5371103, Q5371103, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q5371103, Q5371103, Q5371103, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q927860, Q927860, Q927860, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q1516282, UNK...","[[7, 10], [43, 46], [71, 74], [88, 89], [106, 110], [126, 131], [131, 135], [138, 139], [162, 165], [165, 167]]",572686fc708984140094c8e6,"[97, 101, 97]"
490,0.0,"[CLS] what term do islamists think should be applied to them? [SEP] islamists have asked the question, "" if islam is a way of life, how can we say that those who want to live by its principles in legal, social, political, economic, and political spheres of life are not muslims, but islamists and believe in islamism, not [ just ] islam? "" similarly, a writer for the international crisis group maintains that "" the conception of'political islam'"" is a creation of americans to explain the irania...",political Islam,"[Muslims, Muslims]","[islamists, islamists, islam, way of life,, spheres, muslims,, islamists, islamism,, islam? "", international crisis group, ' political islam'"", americans, iranian islamic revolution, islam, fluke, arab nationalism, islam,, islamism,, explanation.]","[Q189746, Q189746, Q432, Q32090, Q12507, Q47740, Q189746, Q189746, Q432, Q1072857, Q3057291, Q846570, Q126065, Q432, Q1865281, Q114213, Q432, Q189746, Q7958]",0.114583,What term do Islamists think should be applied to them?,"Islamists have asked the question, ""If Islam is a way of life, how can we say that those who want to live by its principles in legal, social, political, economic, and political spheres of life are not Muslims, but Islamists and believe in Islamism, not [just] Islam?"" Similarly, a writer for the International Crisis Group maintains that ""the conception of 'political Islam'"" is a creation of Americans to explain the Iranian Islamic Revolution and apolitical Islam was a historical fluke of the ...","[UNK, UNK, UNK, UNK, Q189746, Q189746, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q189746, Q189746, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q432, UNK, UNK, Q32090, Q32090, Q32090, Q32090, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, UNK, Q12507, UNK, UNK, UNK, UNK, Q47740, Q47740, UNK, Q189746, Q189746, UNK, UNK, UNK, Q189746, Q189746, Q189746, UNK, UNK, UNK, UNK, Q432, Q432, Q432, UNK, UNK, UNK, UNK, UNK, UNK, Q1072857, Q1072857,...","[[4, 6], [14, 16], [23, 24], [26, 30], [54, 55], [59, 61], [62, 64], [67, 70], [74, 77], [83, 86], [92, 97], [101, 102], [105, 108], [112, 113], [116, 118], [131, 133], [148, 150], [151, 154], [156, 158]]",572ffabf04bcaa1900d76f9f,"[201, 201]"


### Step 2: Gather some simple aggregates

In [74]:
average_f1 = df["f1_score"].mean()
max_f1 = df["f1_score"].max()

# We'll approximate if a QID was in the question by looking at the first 7 tokens (a UNK means no entity)
df["is_ent_in_question"] = df.apply(lambda x: not all([a == "UNK" for a in x["ent_qids_with_unk"][:7]]), axis=1)

average_num_ents = df["proportion_toks_as_ents"].mean()
max_num_ents = df["proportion_toks_as_ents"].max()
num_early_ent_qs = df["is_ent_in_question"].sum()
# Bin f1 score for group by
df["f1_score_bin"] = df["f1_score"].apply(lambda x: int(x/10)*10)

grp_by = df[["f1_score_bin", "proportion_toks_as_ents", "is_ent_in_question"]].groupby(['f1_score_bin']).mean()

print(f"Average F1 {average_f1}, Max F1 {max_f1}")
print(f"Average Num Ents {average_num_ents}, Max Num Ents {max_num_ents}")
print(f"Num Early Entity Questions {num_early_ent_qs}")
display(grp_by)

Average F1 39.16742963701258, Max F1 96.7741935483871
Average Num Ents 0.1166794654797726, Max Num Ents 0.5963541666666666
Num Early Entity Questions 1073


Unnamed: 0_level_0,proportion_toks_as_ents,is_ent_in_question
f1_score_bin,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0.119593,0.553731
10,0.104026,0.554054
20,0.111824,0.544554
30,0.120525,0.535211
40,0.115291,0.491228
50,0.111199,0.531646
60,0.120515,0.509259
70,0.116102,0.595238
80,0.116995,0.548638
90,0.106729,0.483871


There are other metrics I'd consider computing
* A distribution over the QIDs that appear in the questions? Are some more popular than others? Does this correlate with performance?
* Breaking down performance by the length of the sentence/question
* Where is the answer with respect to the question?

### Step 3: Add aditional metadata

I often have metadata associate with an entity that may be of interest, and I'd like to add that to my analysis. I'm not going to show this as it's the same as above except looking at accuracy over the metadata.

For example, suppose I have a mapping of QID types -> the type of that entity. (I normally have this in a json). I would then get the predicted type of the entities. I could then compute our f1 score with respect to the majority type in the sentence (or something like that)

### Step 4: Share
The final step is to share the results with others. I typically plot or just simply print tables and add them to a power point or even slack conversation.