# BERT (from HuggingFace Transformers) for Text Extraction

**Author:** [Apoorv Nandan](https://twitter.com/NandanApoorv)<br>
**Date created:** 2020/05/23<br>
**Last modified:** 2020/05/23<br>
**Description:** Fine tune pretrained BERT from HuggingFace Transformers on SQuAD.

## Introduction

This demonstration uses SQuAD (Stanford Question-Answering Dataset).
In SQuAD, an input consists of a question, and a paragraph for context.
The goal is to find the span of text in the paragraph that answers the question.
We evaluate our performance on this data with the "Exact Match" metric,
which measures the percentage of predictions that exactly match any one of the
ground-truth answers.

We fine-tune a BERT model to perform this task as follows:

1. Feed the context and the question as inputs to BERT.
2. Take two vectors S and T with dimensions equal to that of
   hidden states in BERT.
3. Compute the probability of each token being the start and end of
   the answer span. The probability of a token being the start of
   the answer is given by a dot product between S and the representation
   of the token in the last layer of BERT, followed by a softmax over all tokens.
   The probability of a token being the end of the answer is computed
   similarly with the vector T.
4. Fine-tune BERT and learn S and T along the way.

**References:**

- [BERT](https://arxiv.org/pdf/1810.04805.pdf)
- [SQuAD](https://arxiv.org/abs/1606.05250)


## Setup


In [1]:
!pip install transformers



In [2]:
import os
import re
import json
import string
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tokenizers import BertWordPieceTokenizer
from transformers import BertTokenizer, TFBertModel, BertConfig

EMBEDDING_SIZE = 300
max_len = 384
configuration = BertConfig()  # default parameters and configuration for BERT


## Set-up BERT tokenizer


In [3]:
# Save the slow pretrained tokenizer
slow_tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
save_path = "bert_base_uncased/"
if not os.path.exists(save_path):
    os.makedirs(save_path)
slow_tokenizer.save_pretrained(save_path)

# Load the fast tokenizer from saved file
tokenizer = BertWordPieceTokenizer("bert_base_uncased/vocab.txt", lowercase=True)


## Load the data


In [4]:
"""
train_data_url = "https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json"
train_path = keras.utils.get_file("train.json", train_data_url)
eval_data_url = "https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json"
eval_path = keras.utils.get_file("eval.json", eval_data_url)
"""

'\ntrain_data_url = "https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json"\ntrain_path = keras.utils.get_file("train.json", train_data_url)\neval_data_url = "https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json"\neval_path = keras.utils.get_file("eval.json", eval_data_url)\n'

## Preprocess the data

1. Go through the JSON file and store every record as a `SquadExample` object.
2. Go through each `SquadExample` and create `x_train, y_train, x_eval, y_eval`.


In [72]:

class SquadExample:
    def __init__(self, question, context, start_char_idx, answer_text, all_answers):
        self.question = question
        self.context = context
        self.start_char_idx = start_char_idx
        self.answer_text = answer_text
        self.all_answers = all_answers
        self.skip = False

    def preprocess(self):
        context = self.context
        question = self.question
        answer_text = self.answer_text
        start_char_idx = self.start_char_idx

        # Clean context, answer and question
        context = " ".join(str(context).split())
        question = " ".join(str(question).split())
        answer = " ".join(str(answer_text).split())

        # Find end character index of answer in context
        end_char_idx = start_char_idx + len(answer)
        if end_char_idx >= len(context):
            self.skip = True
            return

        # Mark the character indexes in context that are in answer
        is_char_in_ans = [0] * len(context)
        for idx in range(start_char_idx, end_char_idx):
            is_char_in_ans[idx] = 1

        # Tokenize context
        tokenized_context = tokenizer.encode(context)

        # Find tokens that were created from answer characters
        ans_token_idx = []
        for idx, (start, end) in enumerate(tokenized_context.offsets):
            if sum(is_char_in_ans[start:end]) > 0:
                ans_token_idx.append(idx)

        if len(ans_token_idx) == 0:
            self.skip = True
            return

        # Find start and end token index for tokens from answer
        start_token_idx = ans_token_idx[0]
        end_token_idx = ans_token_idx[-1]

        # Tokenize question
        tokenized_question = tokenizer.encode(question)

        # Create inputs
        input_ids = tokenized_context.ids + tokenized_question.ids[1:]
        token_type_ids = [0] * len(tokenized_context.ids) + [1] * len(
            tokenized_question.ids[1:]
        )
        attention_mask = [1] * len(input_ids)

        #count = 0
        # Pad and create attention masks.
        # Skip if truncation is needed
        padding_length = max_len - len(input_ids)
        if padding_length > 0:  # pad
            input_ids = input_ids + ([0] * padding_length)
            attention_mask = attention_mask + ([0] * padding_length)
            token_type_ids = token_type_ids + ([0] * padding_length)
        elif padding_length < 0:  # skip
            #count += 1
            self.skip = True
            return

        self.input_ids = input_ids
        self.token_type_ids = token_type_ids
        self.attention_mask = attention_mask
        self.start_token_idx = start_token_idx
        self.end_token_idx = end_token_idx
        self.context_token_to_char = tokenized_context.offsets
        
       # print(count)




def create_squad_examples(raw_data):
    squad_examples = []
    squad_id = []
    for item in raw_data["data"]:
        for para in item["paragraphs"]:
            context = para["context"]
            for qa in para["qas"]:
                id = qa["id"]
                question = qa["question"]
                answer_text = qa["answers"][0]["text"]
                all_answers = [_["text"] for _ in qa["answers"]]
                start_char_idx = qa["answers"][0]["answer_start"]
                squad_eg = SquadExample(
                    question, context, start_char_idx, answer_text, all_answers
                )
                squad_id.append(id)
                squad_eg.preprocess()
                squad_examples.append(squad_eg)
    return squad_id, squad_examples


def create_inputs_targets(squad_examples):
    dataset_dict = {
        "input_ids": [],
        "token_type_ids": [],
        "attention_mask": [],
        "start_token_idx": [],
        "end_token_idx": [],
    }
    for item in squad_examples:
        if item.skip == False:
            for key in dataset_dict:
                dataset_dict[key].append(getattr(item, key))
    for key in dataset_dict:
        dataset_dict[key] = np.array(dataset_dict[key])

    x = [
        dataset_dict["input_ids"],
        dataset_dict["token_type_ids"],
        dataset_dict["attention_mask"],
    ]
    y = [dataset_dict["start_token_idx"], dataset_dict["end_token_idx"]]
    return x, y





In [74]:
! git clone https://github.com/amrlnic/SQuAD.git

fatal: destination path 'SQuAD' already exists and is not an empty directory.


In [75]:
"""
with open("SQuAD/BERT/BERT_train_set.json") as f:
    raw_train_data = json.load(f)
with open("SQuAD/BERT/BERT_valid_set.json") as f:
    raw_eval_data = json.load(f)
"""
with open("BERT_train_set2.json") as f:
    raw_train_data = json.load(f)
with open("BERT_valid_set2.json") as f:
    raw_eval_data = json.load(f)

train_id, train_squad_examples = create_squad_examples(raw_train_data)
x_train, y_train = create_inputs_targets(train_squad_examples)
print(f"{len(train_squad_examples)} training points created.")



eval_id, eval_squad_examples = create_squad_examples(raw_eval_data)
x_eval, y_eval = create_inputs_targets(eval_squad_examples)
print(f"{len(eval_squad_examples)} evaluation points created.")

69358 training points created.
17348 evaluation points created.


Create the Question-Answering Model using BERT and Functional API


In [8]:

def create_model():
    ## BERT encoder
    encoder = TFBertModel.from_pretrained("bert-base-uncased")

    ## QA Model
    input_ids = layers.Input(shape=(max_len,), dtype=tf.int32)
    token_type_ids = layers.Input(shape=(max_len,), dtype=tf.int32)
    attention_mask = layers.Input(shape=(max_len,), dtype=tf.int32)
    embedding = encoder(
        input_ids, token_type_ids=token_type_ids, attention_mask=attention_mask
    )[0]

    start_logits = layers.Dense(1, name="start_logit", use_bias=False)(embedding)
    start_logits = layers.Flatten()(start_logits)

    end_logits = layers.Dense(1, name="end_logit", use_bias=False)(embedding)
    end_logits = layers.Flatten()(end_logits)

    start_probs = layers.Activation(keras.activations.softmax)(start_logits)
    end_probs = layers.Activation(keras.activations.softmax)(end_logits)

    model = keras.Model(
        inputs=[input_ids, token_type_ids, attention_mask],
        outputs=[start_probs, end_probs],
    )
    loss = keras.losses.SparseCategoricalCrossentropy(from_logits=False)
    optimizer = keras.optimizers.Adam(lr=5e-5)
    model.compile(optimizer=optimizer, loss=[loss, loss])
    return model



This code should preferably be run on Google Colab TPU runtime.
With Colab TPUs, each epoch will take 5-6 minutes.


In [9]:
use_tpu = True
if use_tpu:
    # Create distribution strategy
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu)

    # Create model
    with strategy.scope():
        model = create_model()
else:
    model = create_model()

model.summary()


INFO:absl:Entering into master device scope: /job:worker/replica:0/task:0/device:CPU:0


INFO:tensorflow:Initializing the TPU system: grpc://10.14.10.242:8470


INFO:tensorflow:Initializing the TPU system: grpc://10.14.10.242:8470


INFO:tensorflow:Clearing out eager caches


INFO:tensorflow:Clearing out eager caches


INFO:tensorflow:Finished initializing TPU system.


INFO:tensorflow:Finished initializing TPU system.


INFO:tensorflow:Found TPU system:


INFO:tensorflow:Found TPU system:


INFO:tensorflow:*** Num TPU Cores: 8


INFO:tensorflow:*** Num TPU Cores: 8


INFO:tensorflow:*** Num TPU Workers: 1


INFO:tensorflow:*** Num TPU Workers: 1


INFO:tensorflow:*** Num TPU Cores Per Worker: 8


INFO:tensorflow:*** Num TPU Cores Per Worker: 8


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)
Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method




Cause: while/else statement not yet supported


Cause: while/else statement not yet supported


Cause: while/else statement not yet supported




Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 384)]        0                                            
__________________________________________________________________________________________________
input_3 (InputLayer)            [(None, 384)]        0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, 384)]        0                                            
__________________________________________________________________________________________________
tf_bert_model (TFBertModel)     TFBaseModelOutputWit 109482240   input_1[0][0]                    
                                                                 input_3[0][0]                

## Create evaluation Callback

This callback will compute the exact match score using the validation data
after every epoch.


In [24]:

def normalize_text(text):
    text = text.lower()

    # Remove punctuations
    exclude = set(string.punctuation)
    text = "".join(ch for ch in text if ch not in exclude)

    # Remove articles
    regex = re.compile(r"\b(a|an|the)\b", re.UNICODE)
    text = re.sub(regex, " ", text)

    # Remove extra white space
    text = " ".join(text.split())
    return text


class ExactMatch(keras.callbacks.Callback):
    """
    Each `SquadExample` object contains the character level offsets for each token
    in its input paragraph. We use them to get back the span of text corresponding
    to the tokens between our predicted start and end tokens.
    All the ground-truth answers are also present in each `SquadExample` object.
    We calculate the percentage of data points where the span of text obtained
    from model predictions matches one of the ground-truth answers.
    """

    def __init__(self, x_eval, y_eval):
        self.x_eval = x_eval
        self.y_eval = y_eval

    def on_epoch_end(self, epoch, logs=None):
        pred_start, pred_end = self.model.predict(self.x_eval)
        count = 0
        eval_examples_no_skip = [_ for _ in eval_squad_examples if _.skip == False]
        for idx, (start, end) in enumerate(zip(pred_start, pred_end)):
            squad_eg = eval_examples_no_skip[idx]
            offsets = squad_eg.context_token_to_char
            start = np.argmax(start)
            end = np.argmax(end)
            if start >= len(offsets):
                continue
            pred_char_start = offsets[start][0]
            if end < len(offsets):
                pred_char_end = offsets[end][1]
                pred_ans = squad_eg.context[pred_char_start:pred_char_end]
            else:
                pred_ans = squad_eg.context[pred_char_start:]

            normalized_pred_ans = normalize_text(pred_ans)
            normalized_true_ans = [normalize_text(_) for _ in squad_eg.all_answers]
            if normalized_pred_ans in normalized_true_ans:
                count += 1
        acc = count / len(self.y_eval[0])
        print(f"\nepoch={epoch+1}, exact match score={acc:.2f}")



## Train and Evaluate


In [11]:
exact_match_callback = ExactMatch(x_eval, y_eval)
model.fit(
    x_train,
    y_train,
    epochs=1,  # For demonstration, 3 epochs are recommended
    verbose=1,
    batch_size=64,
    callbacks=[exact_match_callback],
)


Epoch 1/3





































epoch=1, exact match score=0.66
Epoch 2/3

epoch=2, exact match score=0.66
Epoch 3/3

epoch=3, exact match score=0.66


<tensorflow.python.keras.callbacks.History at 0x7f5d83d93750>

In [25]:
x_train[0].shape # INPUT_IDS
x_train[1].shape # TOKEN_TYPE_IDS
x_train[2].shape # ATTENTION_MASK

#x_train[0][0]     # INPUT_IDS
#x_train[1][0]     # TOKEN_TYPE_IDS
#x_train[2][0]     # ATTENTION_MASK

(68568, 384)

In [13]:
print("Model inputs:")
print()
print(x_train[0][0])
print()
print(x_train[1][0])
print()
print(x_train[2][0])

Model inputs:

[  101  1006  1996  5796  1020  1012  1015  8372  2006  2257  2382  1010
  2263  1999  2670 20980  2001  2025  2112  1997  2023  2186  2138  2009
  2001  3303  2011  1037  2367  6346  1012  2156  2263  6090 19436 14691
  8372  2005  4751  1012  1007   102  2339  2001  2009  2025  2443  1999
  1996  2186  1029   102     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0

In [14]:
a = model.predict([x_train[0][0:1],x_train[1][0:1],x_train[2][0:1]])
print(a)  # output shape : (2,384)

[array([[6.1769406e-08, 1.5845340e-04, 1.4947782e-03, 9.0557249e-04,
        3.0789844e-05, 1.1696299e-06, 5.0299918e-06, 1.5274632e-04,
        1.2590793e-05, 3.8245187e-05, 1.9947138e-06, 3.3946799e-07,
        1.4620039e-05, 2.1922810e-05, 1.5795887e-04, 6.4287665e-06,
        1.4152463e-05, 5.8257003e-05, 8.2178376e-06, 1.2058583e-06,
        6.7898113e-06, 1.0345827e-05, 7.2045815e-01, 1.6987588e-01,
        4.9786647e-03, 6.7692444e-02, 8.4279157e-04, 1.7256994e-02,
        1.1833601e-02, 2.2685963e-03, 1.8139956e-05, 5.1103775e-06,
        1.5396252e-04, 1.4527593e-03, 2.2533568e-06, 2.3347093e-06,
        1.8666953e-05, 9.0061673e-07, 2.4103811e-06, 2.0737916e-07,
        4.9990280e-07, 1.4832287e-05, 4.2984669e-07, 1.1212550e-07,
        8.3456982e-08, 1.4654078e-07, 1.5534831e-06, 2.3232158e-07,
        1.3832464e-07, 2.3247892e-07, 1.3661458e-08, 1.5982501e-05,
        1.1966400e-09, 1.1653186e-09, 1.1628662e-09, 1.1496303e-09,
        1.1570837e-09, 1.1713327e-09, 1.1276681

In [15]:
answer_bounds = model.predict([x_train[0][0:1],x_train[1][0:1],x_train[2][0:1]])
start=np.argmax(answer_bounds[0])
end=np.argmax(answer_bounds[1])
print(start,end)  # output shape : (2,384)

22 29


In [16]:
which_input = 31432

question_id = train_id[which_input]
print(question_id)

context = tokenizer.decode([x_train[0][which_input:which_input+1][0][k] for k in range(384) if x_train[1][which_input:which_input+1][0][k] == 0   ])
print(context)
print("\n")

question = tokenizer.decode([x_train[0][which_input:which_input+1][0][k] for k in range(384) if x_train[1][which_input:which_input+1][0][k] == 1   ])
print(question)
print("\n")

#######################################################################################
answer_bounds = model.predict([x_train[0][which_input:which_input+1],x_train[1][which_input:which_input+1],x_train[2][which_input:which_input+1]])
start=np.argmax(answer_bounds[0])
end=np.argmax(answer_bounds[1])
#######################################################################################


tokenized_answer = x_train[0][which_input:which_input+1][0][start:end+1]
#print(tokenized_answer)
decoded = tokenizer.decode(tokenized_answer)
print(decoded)

572b76f734ae481900deae31
incandescent lamps are nearly pure resistive loads with a power factor of 1. this means the actual power consumed ( in watts ) and the apparent power ( in volt - amperes ) are equal. incandescent light bulbs are usually marketed according to the electrical power consumed. this is measured in watts and depends mainly on the resistance of the filament, which in turn depends mainly on the filament's length, thickness, and material. for two bulbs of the same voltage, type, color, and clarity, the higher - powered bulb gives more light.


what does a power factor of 1 mean?


the actual power consumed ( in watts ) and the apparent power ( in volt - amperes ) are equal


In [18]:
print(len(predictions[0]))

17166


In [76]:
predictions = model.predict(x_eval) 

predictions2 = {}
for i in range(len(predictions[0])):
  start=np.argmax(predictions[0][i])
  end=np.argmax(predictions[1][i])
  tokenized_answer = x_eval[0][i:i+1][0][start:end+1]

  decoded = tokenizer.decode(tokenized_answer)

  predictions2[eval_id[i]] = decoded

##### Save model predictions on val set as a .JSON file  #####

import json

with open('pred.json', 'w') as fp:
    json.dump(predictions2, fp)

In [77]:
print(len(y_eval[0]))
print(y_eval[0][0])
print(np.argmax(predictions[0][0]))

17166
14
14


In [78]:
print(predictions2)
print(len(predictions2))

17166


evaluation script

Problem: missing prediction prevents the evaluation

In [88]:
!python3 SQuAD/evaluation/evaluate.py data.json pred.json

Missing prediction for 56d8e071dc89441400fdb376
Missing prediction for 56db0f17e7c41114004b4cea
Missing prediction for 56db6df1e7c41114004b50eb
Missing prediction for 56db2747e7c41114004b4e33
Missing prediction for 56db2747e7c41114004b4e31
Missing prediction for 56d8e665dc89441400fdb3ad
Missing prediction for 570da9c916d0071400510c75
Missing prediction for 570da9c916d0071400510c73
Missing prediction for 570de5f70b85d914000d7b9b
Missing prediction for 570de5f70b85d914000d7b9d
Missing prediction for 5732475e0fdd8d15006c68d1
Missing prediction for 56ded3dac65bf219000b3d5d
Missing prediction for 571ae22232177014007e9f8e
Missing prediction for 571ae22232177014007e9f8a
Missing prediction for 571aab8910f8ca14003052b7
Missing prediction for 57293c133f37b31900478148
Missing prediction for 56beb4023aeaaa14008c9252
Missing prediction for 56bfb502a10cfb140055125c
Missing prediction for 56d4d8702ccc5a1400d83296
Missing prediction for 56d4d8702ccc5a1400d83297
Missing prediction for 56bf97aba10cfb140

In [82]:
!python3 SQuAD/evaluation/evaluate.py BERT_valid_set2.json pred.json

Missing prediction for 570b28f1ec8fbc190045b89a
Missing prediction for 570b2ce2ec8fbc190045b8bc
Missing prediction for 570b1d3d6b8089140040f71e
Missing prediction for 570b2aab6b8089140040f7c6
Missing prediction for 570b2aab6b8089140040f7c5
Missing prediction for 570b2aab6b8089140040f7c2
Missing prediction for 570b33686b8089140040f806
Missing prediction for 570b2dfcec8fbc190045b8c3
Missing prediction for 570b2212ec8fbc190045b84b
Missing prediction for 570b1ef16b8089140040f73d
Missing prediction for 570b1ef16b8089140040f740
Missing prediction for 570b260c6b8089140040f792
Missing prediction for 570b20036b8089140040f751
Missing prediction for 570b20036b8089140040f750
Missing prediction for 570b23406b8089140040f773
Missing prediction for 570b23406b8089140040f772
Missing prediction for 570b29d26b8089140040f7b0
Missing prediction for 570b30386b8089140040f7f5
Missing prediction for 570b30386b8089140040f7f6
Missing prediction for 570b212d6b8089140040f75e
Missing prediction for 570b2f83ec8fbc190