# End-to-End NED Tutorial

In this tutorial, we walk through how to use Bootleg as an end-to-end pipeline to detect and label entities in a set of sentences. First, we show how to use Bootleg to detect and disambiguate mentions to entities. We then compare to an existing system named TAGME. Finally, we show how to use Bootleg to annotate individual sentences on the fly. 

To understand how Bootleg performs on more natural language than we find in Wikipedia, we hand label the mentions and corresponding entities in 50 questions sampled from the [Natural Questions dataset (Google)](https://ai.google.com/research/NaturalQuestions). 

### Requirements

You will need to download the following files for this notebook:
- Pretrained Bootleg model and config [here](https://bootleg-emb.s3.amazonaws.com/models/2020_12_09/bootleg_wiki.tar.gz)*
- Sample of Natural Questions with hand-labelled entities [here](https://bootleg-emb.s3.amazonaws.com/data/nq.tar.gz)
- Entity data [here](https://bootleg-emb.s3.amazonaws.com/data/wiki_entity_data.tar.gz)*
- Embedding data [here](https://bootleg-emb.s3.amazonaws.com/data/emb_data.tar.gz)*
- Pretrained BERT model [here](https://bootleg-emb.s3.amazonaws.com/pretrained_bert_models.tar.gz)*

*Same file as in benchmark tutorial and does not need to be re-downloaded.

For convenience, you can run the commands below (from the root directory of the repo) to download all the above files and unpack them to `models`, `data`, and `pretrained_bert_models` directories. It will take several minutes to download all the files. 

    bash download_model.sh 
    bash download_data.sh 
    bash download_bert.sh

In [1]:
import numpy as np 
import pandas as pd
import ujson
from utils import load_mentions, tagme_annotate

# set up logging
import sys
import logging
from importlib import reload
reload(logging)
logging.basicConfig(stream=sys.stdout, format='%(asctime)s %(message)s', level=logging.INFO)
logger = logging.getLogger(__name__)

root_dir = "" 
cand_map = f'{root_dir}/data/wiki_entity_data/entity_mappings/alias2qids_wiki.json'

If you have a GPU with at least 12GB of memory available, set the below to `False` to run inference on a GPU. 

In [2]:
use_cpu = True

## 1. Detect Mentions
Bootleg uses a simple mention extraction algorithm that extracts mentions using a given candidate map. We will use a Wikipedia candidate map that we mined using Wikipedia anchor links and Wikidata aliases for a total of ~8 million mentions (provided in the Requirements section of this notebook).

For the input dataset for the end-to-end pipeline, we assume a jsonlines file with a single dictionary with the key "sentence" and value as the text of the sentence, per line. For instance, you may have a file with the lines:

    {"sentence": "who did the voice of the magician in frosty the snowman"}
    {"sentence": "what is considered the outer banks in north carolina"}
    
Below, we have additional keys to keep track of the hand-labelled mentions, but this is purely for evaluating the quality of the end-to-end pipeline and is not needed in the common use cases of using Bootleg to detect and label mentions.

In [3]:
nq_sample_orig = f'{root_dir}/data/nq/test_natural_questions_50.jsonl'
nq_sample_bootleg = f'{root_dir}/data/nq/test_natural_questions_50_bootleg.jsonl'

In [4]:
from bootleg.extract_mentions import extract_mentions
extract_mentions(in_filepath=nq_sample_orig, out_filepath=nq_sample_bootleg, cand_map_file=cand_map, logger=logger)

2020-12-18 00:20:05,534 Loading candidate mapping...


100%|██████████| 8002525/8002525 [00:16<00:00, 485888.15it/s]

2020-12-18 00:20:22,008 Loaded candidate mapping with 8002525 aliases.





2020-12-18 00:20:35,476 Using 8 workers...
2020-12-18 00:20:35,477 Reading in /dfs/scratch0/lorr1/bootleg/bootleg-internal/tutorial_data/data/nq/test_natural_questions_50.jsonl
2020-12-18 00:20:35,771 Wrote out data chunks in 0.29s
2020-12-18 00:20:35,772 Calling subprocess...
2020-12-18 00:20:37,162 Merging files...
2020-12-18 00:20:37,210 Removing temporary files...
2020-12-18 00:20:37,394 Finished in 1.9379315376281738 seconds. Wrote out to /dfs/scratch0/lorr1/bootleg/bootleg-internal/tutorial_data/data/nq/test_natural_questions_50_bootleg.jsonl


By looking at a sample of the extracted mentions, we can compare the mention extraction phase to the hand-labelled mentions.

In [5]:
orig_mentions_df = load_mentions(nq_sample_orig)
bootleg_mentions_df = load_mentions(nq_sample_bootleg)

# join dataframes and sample
pd.merge(orig_mentions_df, bootleg_mentions_df, on=['sentence'], suffixes=['_hand', '_bootleg']).sample(15)

Unnamed: 0,sentence,aliases_hand,spans_hand,aliases_bootleg,spans_bootleg
5,the u.s. supreme court hears appeals from circuit courts,"[u.s. supreme court, circuit courts]","[[1, 4], [7, 9]]","[us supreme court, circuit courts]","[[1, 4], [7, 9]]"
2,the nashville sound brought a polished and cosmopolitan sound to country music by,"[nashville sound, country music]","[[1, 3], [10, 12]]","[the nashville sound, cosmopolitan]","[[0, 3], [7, 8]]"
20,what is the worth of the catholic church,[catholic church],"[[6, 8]]",[catholic church],"[[6, 8]]"
23,where is israel located on the world map,"[israel, world map]","[[2, 3], [6, 8]]",[israel],"[[2, 3]]"
10,who plays norman bates in the tv show,[norman bates],"[[2, 4]]",[norman bates],"[[2, 4]]"
12,what was dennis hopper 's bike in easy rider,"[dennis hopper, easy rider]","[[2, 4], [7, 9]]","[dennis hopper, easy rider]","[[2, 4], [7, 9]]"
19,who played the bank robber in dirty harry,[dirty harry],"[[6, 8]]",[dirty harry],"[[6, 8]]"
21,the pair of hand drums used in indian classical music is called,[indian classical music],"[[7, 10]]",[indian classical music],"[[7, 10]]"
37,landmark supreme court cases dealing with the first amendment,"[supreme court, first amendment]","[[1, 3], [7, 9]]","[supreme court, first amendment]","[[1, 3], [7, 9]]"
48,what was the japanese motivation for bombing pearl harbor,"[japanese, pearl harbor]","[[3, 4], [7, 9]]",[pearl harbor],"[[7, 9]]"


In the sample above, we see that generally Bootleg detects the same mentions as the hand-labelled mentions. However, sometimes Bootleg extracts extra mentions or fewer mentions. This is expected as Bootleg's mention extractor finds all mentions and then filters based on some simple heuristics if the mention is an entity or not. It will be the job of the backbone model and postprocessing to filter out any extra mentions by either thresholding the prediction probability or predicting a candidate that represents "No Candidate" (we refer to this as "NC").  

## 2. Disambiguate Mentions to Entities

We run inference using a pretrained Bootleg model to disambiguate the extracted mentions to Wikidata QIDs. 

First, load the model config so we can set additional parameters and load the saved model during evaluation. We need to update the config parameters to point to the downloaded model checkpoint and data.

In [6]:
from bootleg import run
from bootleg.utils.parser_utils import get_full_config

config_path = f'{root_dir}/models/bootleg_wiki/bootleg_config.json'
config_args = get_full_config(config_path)

# decrease number of data threads as this is a small file
config_args.run_config.dataset_threads = 2

# set the model checkpoint path 
config_args.run_config.init_checkpoint = f'{root_dir}/models/bootleg_wiki/bootleg_model.pt'

# set the path for the entity db and candidate map
config_args.data_config.entity_dir = f'{root_dir}/data/wiki_entity_data'
config_args.data_config.alias_cand_map = 'alias2qids_wiki.json'

# set the data path and RSS500 test file 
config_args.data_config.data_dir = f'{root_dir}/data/nq'

# to speed things up for the tutorial, we have already prepped the data with the mentions detected by Bootleg
config_args.data_config.test_dataset.file = 'test_natural_questions_50_bootleg.jsonl'

# set the embedding paths 
config_args.data_config.emb_dir =  f'{root_dir}/data/emb_data'
config_args.data_config.word_embedding.cache_dir =  f'{root_dir}/pretrained_bert_models'

# set the save directory 
config_args.run_config.save_dir = f'{root_dir}/results'

# set whether to run inference on the CPU
config_args.run_config.cpu = use_cpu

Run evaluation in `dump_embs` mode to dump predictions and contextualized entity embeddings. Note that this command is about 10 times slower using a notebook than on the command line. To speed up the next command, run the following on the command line first. Then come back and run the next cell.

```
python3 -m bootleg.run --mode dump_embs \
    --config_script <root_dir>/models/bootleg_wiki/bootleg_config.json \
    --run_config.dataset_threads 2 \
    --run_config.init_checkpoint <root_dir>/models/bootleg_wiki/bootleg_model.pt \
    --data_config.entity_dir <root_dir>/data/wiki_entity_data \
    --data_config.alias_cand_map alias2qids_wiki.json \
    --data_config.data_dir <root_dir>/data/nq \
    --data_config.test_dataset.file test_natural_questions_50_bootleg.jsonl \
    --data_config.emb_dir <root_dir>/data/emb_data \
    --data_config.word_embedding.cache_dir <root_dir>/pretrained_bert_models 
```

In [7]:
bootleg_label_file, bootleg_emb_file = run.model_eval(args=config_args, mode="dump_embs", logger=logger, is_writer=True)

2020-12-18 00:20:38,123 Loading entity_symbols...
2020-12-18 00:21:26,749 Loaded entity_symbols with 5310039 entities.
2020-12-18 00:21:28,078 Loading slices...
2020-12-18 00:22:47,182 Finished loading slices.
2020-12-18 00:23:07,719 Loading dataset...


Building alias table: 100%|██████████| 8002525/8002525 [07:59<00:00, 16705.09it/s]


2020-12-18 00:36:07,234 Finished loading dataset.
2020-12-18 00:36:11,548 Loading embeddings...
2020-12-18 00:36:35,719 Finished loading embeddings.
2020-12-18 00:36:35,816 Loading model from /dfs/scratch0/lorr1/bootleg/bootleg-internal/tutorial_data/models/bootleg_wiki/bootleg_model.pt...




2020-12-18 00:36:42,367 Successfully loaded model from /dfs/scratch0/lorr1/bootleg/bootleg-internal/tutorial_data/models/bootleg_wiki/bootleg_model.pt starting from checkpoint epoch 1 and step 0.
2020-12-18 00:36:42,431 ************************DUMPING PREDICTIONS FOR test_natural_questions_50_bootleg.jsonl************************
2020-12-18 00:36:42,506 64 samples, 4 batches, 49 len dataset
2020-12-18 00:36:47,073 Writing predictions to /dfs/scratch0/lorr1/bootleg/bootleg-internal/tutorial_data/results/20200914_104853/test_natural_questions_50_bootleg/eval/bootleg_model/bootleg_labels.jsonl...
2020-12-18 00:36:47,076 Total number of mentions across all sentences: 73


Reading values for marisa trie: 100%|██████████| 50/50 [00:00<00:00, 169672.49it/s]

2020-12-18 00:36:47,107 Merging sentences together with 2 processes. Starting pool



100%|██████████| 25/25 [00:00<00:00, 5221.47it/s]
100%|██████████| 24/24 [00:00<00:00, 47.08it/s]


2020-12-18 00:36:47,906 Time to merge sub-sentences 0.5228734016418457s


Reading values for marisa trie: 100%|██████████| 73/73 [00:00<00:00, 182469.72it/s]

2020-12-18 00:36:48,178 Starting to write files with 2 processes



Writing data: 100%|██████████| 25/25 [00:00<00:00, 8037.53it/s]
Writing data: 100%|██████████| 25/25 [00:00<00:00, 10172.45it/s]


2020-12-18 00:37:56,478 Time to write files 68.30824828147888s
2020-12-18 00:37:57,040 Saving contextual entity embeddings to /dfs/scratch0/lorr1/bootleg/bootleg-internal/tutorial_data/results/20200914_104853/test_natural_questions_50_bootleg/eval/bootleg_model/bootleg_embs.npy
2020-12-18 00:37:57,041 Wrote predictions to /dfs/scratch0/lorr1/bootleg/bootleg-internal/tutorial_data/results/20200914_104853/test_natural_questions_50_bootleg/eval/bootleg_model/bootleg_labels.jsonl


We can now evaluate the overall quality of the end-to-end pipeline via precision / recall metrics, where the *recall* indicates what proportion of the hand-labelled mentions Bootleg correctly detects and disambiguates, and *precision* indicates what proportion of the mentions that Bootleg labels are correct. For instance, if Bootleg only labelled the few mentions it was very confident in, then it would have a low recall and high precision.

To detect if mentions match the hand-labelled mention spans, we allow for +1/-1 word in the left span boundaries (e.g., 'the wizard of oz' and 'wizard of oz' are counted as the same mention). 

In [8]:
%load_ext autoreload
%autoreload 2
from utils import compute_precision_and_recall

bootleg_errors = compute_precision_and_recall(orig_label_file=nq_sample_orig, 
                                              new_label_file=bootleg_label_file, 
                                              threshold=0.0)

Recall: 0.68 (53/78)
Precision: 0.73 (53/73)
F1: 0.7


We analyze three classes of errors in the end-to-end pipeline below: 
1. *Missing mentions*: Fail to extract the mention 
2. *Wrong entity*: Correctly extract the mention but disambiguate to the wrong candidate  
3. *Extra mentions*: Label a mention that is not hand-labelled as a mention

In [9]:
pd.DataFrame(bootleg_errors['missing_mention'])

Unnamed: 0,sent_idx,sentence,gold_aliases,gold_qids,gold_spans,pred_aliases,pred_spans,pred_qids,pred_probs,error
0,2,the nashville sound brought a polished and cosmopolitan sound to country music by,"[nashville sound, country music]","[Q1751782, Q83440]","[[1, 3], [10, 12]]","[the nashville sound, cosmopolitan]","[[0, 3], [7, 8]]","[Q30645502, Q190656]","[1.0, 0.836]",country music
1,11,hitchhiker 's guide to the galaxy slartibartfast quotes,"[hitchhiker 's guide to the galaxy, slartibartfast]","[Q25169, Q779920]","[[0, 6], [6, 7]]",[hitchhikers guide to the galaxy],"[[0, 6]]",[Q25169],[0.55],slartibartfast
2,16,where did britain create colonies for its empire,"[britain, empire]","[Q161885, Q8680]","[[2, 3], [7, 8]]",[its empire],"[[6, 8]]",[Q200464],[0.539],britain
3,18,1970 world cup semi final italy vs germany,"[1970 world cup, italy, germany]","[Q132664, Q676899, Q43310]","[[0, 3], [5, 6], [7, 8]]","[1970 world cup, germany]","[[0, 3], [7, 8]]","[Q132664, Q43310]","[0.967, 0.812]",italy
4,23,where is israel located on the world map,"[israel, world map]","[Q801, Q653848]","[[2, 3], [6, 8]]",[israel],"[[2, 3]]",[Q155321],[0.219],world map
5,30,what is the t rex name in land before time,"[t rex, land before time]","[Q14332, Q192403]","[[3, 5], [7, 10]]",[t rex],"[[3, 5]]",[Q14332],[0.964],land before time
6,35,reasons why south africa should include renewable energy in its energy mix,"[south africa, renewable energy]","[Q258, Q12705]","[[2, 4], [6, 8]]","[south africa, energy mix]","[[2, 4], [10, 12]]","[Q258, Q1341346]","[0.686, 1.0]",renewable energy
7,42,who proposed the coordinate system to describe the position of a point in a plane accurately,[coordinate system],[Q62912],"[[3, 5]]",[],[],[],[],coordinate system
8,43,when was last time england were in a world cup semi final,"[england, world cup]","[Q47762, Q19317]","[[4, 5], [8, 10]]",[england],"[[4, 5]]",[Q47762],[0.282],world cup
9,44,the representative of the british crown in nz,"[british crown, nz]","[Q21941952, Q664]","[[4, 6], [7, 8]]",[british crown],"[[4, 6]]",[Q21941952],[0.372],nz


The mentions were discarded due to not being in our candidate map or being filtered out during mention extractions.

In [10]:
pd.DataFrame(bootleg_errors['wrong_entity']).sample(5)

Unnamed: 0,sent_idx,sentence,gold_aliases,gold_qids,gold_spans,pred_aliases,pred_spans,pred_qids,pred_probs,error
6,25,which of these was not an export of ancient greece,[ancient greece],[Q11772],"[[8, 10]]",[ancient greece],"[[8, 10]]",[Q1294184],[0.394],ancient greece
7,29,who plays claire underwood 's mom on house of cards,"[claire underwood, house of cards]","[Q14915624, Q3330940]","[[2, 4], [7, 10]]","[claire underwood, house of cards]","[[2, 4], [7, 10]]","[Q14915624, Q578361]","[1.0, 0.556]",house of cards
10,40,where does the last name vigil come from,[vigil],[Q16878937],"[[5, 6]]",[vigil],"[[5, 6]]",[Q1238731],[0.672],vigil
0,0,who did the voice of the magician in frosty the snowman,[frosty the snowman],[Q5506238],"[[8, 11]]","[magician, frosty the snowman]","[[6, 7], [8, 11]]","[Q148442, Q2569914]","[0.701, 0.949]",frosty the snowman
1,2,the nashville sound brought a polished and cosmopolitan sound to country music by,"[nashville sound, country music]","[Q1751782, Q83440]","[[1, 3], [10, 12]]","[the nashville sound, cosmopolitan]","[[0, 3], [7, 8]]","[Q30645502, Q190656]","[1.0, 0.836]",nashville sound


Some of the errors Bootleg makes is predicting too general of a candidate (e.g. house of cards -- structure made of playing cards -- instead of the political drama). Other errors are due to ambiguous sentences. Finally another bucket of errors suggests that we need to boost certain training signals -- this is an area we're actively pursuing in Bootleg with an investigation of model guidability!

In [11]:
pd.DataFrame(bootleg_errors['extra_mention']).sample(5)

Unnamed: 0,sent_idx,sentence,gold_aliases,gold_qids,gold_spans,pred_aliases,pred_spans,pred_qids,pred_probs,error
3,24,who played smiley in tinker tailor soldier spy,[tinker tailor soldier spy],[Q681962],"[[4, 8]]","[smiley, tinker tailor soldier spy]","[[2, 3], [4, 8]]","[Q11241, Q582811]","[0.324, 0.532]",smiley
0,0,who did the voice of the magician in frosty the snowman,[frosty the snowman],[Q5506238],"[[8, 11]]","[magician, frosty the snowman]","[[6, 7], [8, 11]]","[Q148442, Q2569914]","[0.701, 0.949]",magician
4,27,i see the river tiber foaming with much blood,[river tiber],[Q13712],"[[3, 5]]","[river tiber, foaming]","[[3, 5], [5, 6]]","[Q13712, Q7243541]","[1.0, 1.0]",foaming
2,8,once upon a time season 6 episode list,[once upon a time season 6],[Q23301616],"[[0, 6]]","[once upon a time season 6, episode list]","[[0, 6], [6, 8]]","[Q23301616, Q7537343]","[1.0, 1.0]",episode list
1,2,the nashville sound brought a polished and cosmopolitan sound to country music by,"[nashville sound, country music]","[Q1751782, Q83440]","[[1, 3], [10, 12]]","[the nashville sound, cosmopolitan]","[[0, 3], [7, 8]]","[Q30645502, Q190656]","[1.0, 0.836]",cosmopolitan


We see that Bootleg may detect and label extraneous mentions that were not hand-labelled. Setting the threshold higher helps to reduce these predictions, as does using a 'NC' candidate for training, which Bootleg also supports. 

## 3. Compare to TAGME 

To get a sense of how Bootleg is doing compared to other systems, we evaluate [TAGME](https://arxiv.org/pdf/1006.3498.pdf), an existing tool to extract and disambiguate mentions. To run TAGME, you need to get a (free) authorization token. Instructions for obtaining a token are [here](https://sobigdata.d4science.org/web/tagme/tagme-help). You will need to verify your account and then follow the "access the VRE") link. We've also provided the file with TAGME labels for a given threshold for download if you want to skip the authorization token.

We note that unlike TAGME, Bootleg also outputs contextual entity embeddings which can be loaded for use in downstream tasks (e.g. relation extraction, question answering). Check out the Entity Embedding tutorial for more details! 

In [12]:
import tagme
# Set the authorization token for subsequent calls.
tagme.GCUBE_TOKEN = ""

In [13]:
tagme_label_file = f'{root_dir}/data/nq/test_natural_questions_50_tagme.jsonl'

If you do not have a token, skip the cell below and load the pre-generated TAGME labels. If you do have a token, you can play with changing the threshold below and see how it affects the results. Increasing the threshold increases the precision but decreases the recall as TAGME, as TAGME will label fewer mentions.

In [14]:
# We use a mapping from Wikipedia pageids to Wikidata QIDs to get the QIDs predicted by TAGME 
wpid2qid = ujson.load(open(f'{root_dir}/data/wiki_entity_data/entity_mappings/wpid2qid.json'))

# As the threshold increases, the precision increases, but the recall decreases
tagme_annotate(in_file=nq_sample_orig, out_file=tagme_label_file, threshold=0.3, wpid2qid=wpid2qid)

In [15]:
from utils import compute_precision_and_recall
tagme_errors = compute_precision_and_recall(orig_label_file=nq_sample_orig, 
                                            new_label_file=tagme_label_file)

Recall: 0.63 (49/78)
Precision: 0.58 (49/84)
F1: 0.6


We see that TAGME has worse recall and precision than Bootleg. 

## 4. Annotate On-the-Fly

To annotate individual sentences with Bootleg, we  also support annotate-on-the-fly mode. 

**Note that Annotator is not optimized and is only intended to be used for quick experimentation and for demos. We recommend using the above pipeline (`extract_mentions` and `model_eval` functions) for evaluating datasets. These functions leverage multiprocessing, caching of preprocessed data, and batching to speed up evaluation.**

To do this, we create an annotator object. This loads the model and entity databases. We use the `config_args` loaded from the previous step. Note it takes several minutes for the initial load of the model and the entity data. 

In [16]:
%load_ext autoreload
%autoreload 2
from bootleg.annotator import Annotator

ann = Annotator(config_args=config_args, cand_map=cand_map, device='cuda' if not use_cpu else 'cpu')

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
2020-12-18 00:38:42,703 Reading entity database
2020-12-18 00:39:46,212 Reading word tokenizers
2020-12-18 00:39:46,279 Loading model
2020-12-18 00:40:02,269 Loading embeddings...
2020-12-18 00:40:26,525 Finished loading embeddings.
2020-12-18 00:40:27,087 Loading candidate map
2020-12-18 00:40:49,061 Loading candidate mapping...


100%|██████████| 8002525/8002525 [00:18<00:00, 443699.44it/s]

2020-12-18 00:41:07,100 Loaded candidate mapping with 8002525 aliases.
2020-12-18 00:41:20,004 Reading in alias table





Similar to TAGME, we allow setting a threshold to only return mentions with labels greater than some probability. 

In [17]:
ann.set_threshold(0.0)

Fill in sentences to see what Bootleg predicts! For each mention, Bootleg outputs
- QIDs (or "NC" for "No Candidate")
- probabilities
- QID title
- mention candidates
- mention candidate probabilities
- spans of mentions
- mentions

The QIDs map to Wikidata -- to look them up you can use https://www.wikidata.org/wiki/Q1454 and replace the QID. "NC" means Bootleg did not find a good match among the candidates in the candidate list given the context. 

In [18]:
ann.label_mentions("where is the outer banks in north carolina")[:3]

Prepping data: 100%|██████████| 1/1 [00:00<00:00, 28.88it/s]
Evaluating model: 100%|██████████| 1/1 [00:00<00:00,  2.06it/s]


([['Q1517373', 'Q1454']],
 [[1.0, 0.9986315369606018]],
 [['Outer Banks', 'North Carolina']])

In [19]:
ann.label_mentions("cast of characters in fiddler on the roof")[:3]

Prepping data: 100%|██████████| 1/1 [00:00<00:00, 48.84it/s]
Evaluating model: 100%|██████████| 1/1 [00:00<00:00,  2.21it/s]


([['Q487330']], [[0.8821306228637695]], [['Fiddler on the Roof']])

Sometimes the entity disambiguation problem can be quite tricky -- in the above example we predict the song "Fiddler on the Roof" the music instead of the hand-label of the movie (https://www.wikidata.org/wiki/Q934036). Giving additional cues may help though -- for instance, if we add "the movie", the prediction changes to the movie! 

In [20]:
ann.label_mentions("cast of characters in the movie fiddler on the roof")[:3]

Prepping data: 100%|██████████| 1/1 [00:00<00:00, 35.71it/s]
Evaluating model: 100%|██████████| 1/1 [00:01<00:00,  1.08s/it]


([['Q934036']], [[0.7687084674835205]], [['Fiddler on the Roof (film)']])