# Question answering with BERT (HuggingFace)

Deep learning has been revolutionized by transformer models. Transformer based models like BERT are heavily used in NLP to solve tasks due to the rich numerical representations of text they provide. Here we will be discussing how to use HuggingFace's transformers library to conveniently explore various transformer based NLP models. We will be training a question answering model on the famous SQUAD v1 dataset.


<table align="left">
    <td>
        <a target="_blank" href="https://colab.research.google.com/github/thushv89/manning_tf2_in_action/blob/master/Ch13-Transormers-with-TF2-and-Huggingface/13.2_Question_answering_with_BERT.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
    </td>
</table>


## Import libraries

In [4]:
import random
import numpy as np
import transformers
from datasets import load_dataset
from transformers import DistilBertTokenizerFast
from transformers import DistilBertConfig, TFDistilBertForQuestionAnswering
import tensorflow as tf
import time

def fix_random_seed(seed):
    """ Setting the random seed of various libraries """
    try:
        np.random.seed(seed)
    except NameError:
        print("Warning: Numpy is not imported. Setting the seed for Numpy failed.")
    try:
        tf.random.set_seed(seed)
    except NameError:
        print("Warning: TensorFlow is not imported. Setting the seed for TensorFlow failed.")
    try:
        random.seed(seed)
    except NameError:
        print("Warning: random module is not imported. Setting the seed for random failed.")
    try:
        transformers.trainer_utils.set_seed(seed)
    except NameError:
        print("Warning: transformers module is not imported. Setting the seed for transformers failed.")
        
# Fixing the random seed
random_seed=4321
fix_random_seed(random_seed)


## Download the dataset

For this we will be using the [SQUAD v1 dataset](https://rajpurkar.github.io/SQuAD-explorer/). It is a question answering dataset. You are provided with a question, a context (e.g. a paragraph in which the answer to the question may exist) and finally the answer. Your goal is to, given the question and the context predict the answer.

In [5]:
# Section 13.3

from datasets import load_dataset

dataset = load_dataset("squad")

print(dataset)

Reusing dataset squad (C:\Users\thush\.cache\huggingface\datasets\squad\plain_text\1.0.0\d6ec3ceb99ca480ce37cdd35555d6cb2511d223b9150cce08a837ef62ffea453)


  0%|          | 0/2 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 87599
    })
    validation: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 10570
    })
})


## Print the first 5 samples in the training set

In [6]:
for q, a in zip(dataset["train"]["question"][:5], dataset["train"]["answers"][:5]):
    print(f"{q} -> {a}")

To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France? -> {'text': ['Saint Bernadette Soubirous'], 'answer_start': [515]}
What is in front of the Notre Dame Main Building? -> {'text': ['a copper statue of Christ'], 'answer_start': [188]}
The Basilica of the Sacred heart at Notre Dame is beside to which structure? -> {'text': ['the Main Building'], 'answer_start': [279]}
What is the Grotto at Notre Dame? -> {'text': ['a Marian place of prayer and reflection'], 'answer_start': [381]}
What sits on top of the Main Building at Notre Dame? -> {'text': ['a golden statue of the Virgin Mary'], 'answer_start': [92]}


## Correcting incorrect offsets of the provided answers

The answers are provided by means of the, starting index (`answer_start`) and the answer it self (`text`). We will add `answer_end`, which will denote the index of the position the answer ends.

In [9]:
def compute_end_index(answers, contexts):
    """ Add end index to answers """
    
    fixed_answers = []
    for answer, context in zip(answers, contexts):

        gold_text = answer['text'][0]
        answer['text'] = gold_text
        start_idx = answer['answer_start'][0]
        answer['answer_start'] = start_idx
        
        # Make sure the starting index is valid and there is an answer
        assert start_idx >=0 and len(gold_text.strip()) > 0
        
        end_idx = start_idx + len(gold_text)        
        answer['answer_end'] = end_idx
        
        # Make sure the corresponding context matches the actual answer
        assert context[start_idx:end_idx] == gold_text
        
        fixed_answers.append(answer)
    
    return fixed_answers, contexts

train_questions = dataset["train"]["question"]
print("Training data corrections")
train_answers, train_contexts = compute_end_index(
    dataset["train"]["answers"], dataset["train"]["context"]
)
test_questions = dataset["validation"]["question"]
print("\nValidation data correction")
test_answers, test_contexts = compute_end_index(
    dataset["validation"]["answers"], dataset["validation"]["context"]
)

Training data corrections

Validation data correction


## Question answering with Bert

Now we will start our way to train a question answering model. The pretrained model we'll be using is [Bert](https://arxiv.org/pdf/1810.04805.pdf).

### Defining the tokenizer

In [10]:
from transformers import BertTokenizerFast
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')

### Convert some text to tokens with the tokenizer

In [11]:
context = "This is the context"
question = "This is the question"

token_ids = tokenizer(context, question, return_tensors='tf')
print(token_ids)
print(tokenizer.convert_ids_to_tokens(token_ids['input_ids'].numpy()[0]))

{'input_ids': <tf.Tensor: shape=(1, 11), dtype=int32, numpy=array([[ 101, 2023, 2003, 1996, 6123,  102, 2023, 2003, 1996, 3160,  102]])>, 'token_type_ids': <tf.Tensor: shape=(1, 11), dtype=int32, numpy=array([[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1]])>, 'attention_mask': <tf.Tensor: shape=(1, 11), dtype=int32, numpy=array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])>}
['[CLS]', 'this', 'is', 'the', 'context', '[SEP]', 'this', 'is', 'the', 'question', '[SEP]']


## Converting the inputs to tokens

In adition to converting inputs to tokens and adding special tokens, it will truncate and pad inputs to the maximum length of the sequences defined in the model config. For example, you can check model config with, `tokenizer.model_max_length`.

In [12]:
# Encode train data
# train_encodings -> transformers.tokenization_utils_base.BatchEncoding
train_encodings = tokenizer(train_contexts, train_questions, truncation=True, padding=True, return_tensors='tf')
print(f"train_encodings.shape: {train_encodings["input_ids"].shape}")
# Encode test data
test_encodings = tokenizer(test_contexts, test_questions, truncation=True, padding=True, return_tensors='tf')
print(f"test_encodings.shape: {test_encodings["input_ids"].shape}")


train_encodings.shape: (87599, 512)
test_encodings.shape: (10570, 512)


### Dealing with truncated answers

In the original dataset the `answer_start` and `answer_end` denote the *character*-level position of the answer. But in the model, since we deal in tokens we need the *token*-level position of the answer. For that, we will use the `char_to_token` function in the tokenizer. It will convert the character index to a token index.

Because we are enforcing a maximum sequence length of 512, some answers will be inevitably truncated if they are present after the 512th token. Although this is rare, we still need to take care of this as it can result in numerical errors otherwise. Therefore, if the positions are `None` (i.e. couldn't find the answer), it is set to the maximum position.

In [13]:
def replace_char_with_token_indices(encodings, answers):
    start_positions = []
    end_positions = []
    n_updates = 0
    # Go through all the answers
    for i in range(len(answers)):        
        
        # Get the token position for both start end char positions
        start_positions.append(encodings.char_to_token(i, answers[i]['answer_start']))
        end_positions.append(encodings.char_to_token(i, answers[i]['answer_end'] - 1))
        
        if start_positions[-1] is None or end_positions[-1] is None:
            n_updates += 1
        # if start position is None, the answer passage has been truncated
        # In the guide, https://huggingface.co/transformers/custom_datasets.html#qa-squad
        # they set it to model_max_length, but this will result in NaN losses as the last
        # available label is model_max_length-1 (zero-indexed)
        if start_positions[-1] is None:
            start_positions[-1] = tokenizer.model_max_length -1
            
        if end_positions[-1] is None:
            end_positions[-1] = tokenizer.model_max_length -1
            
    print(f"{n_updates}/{len(answers)} had answers truncated")
    encodings.update({'start_positions': start_positions, 'end_positions': end_positions})

replace_char_with_token_indices(train_encodings, train_answers)
replace_char_with_token_indices(test_encodings, test_answers)

10/87599 had answers truncated
8/10570 had answers truncated


### Creating TensorFlow dataset

In [17]:
import tensorflow as tf
from functools import partial


def data_gen(input_ids, attention_mask, start_positions, end_positions):
    """ Generator for data """
    for inps, attn, start_pos, end_pos in zip(input_ids, attention_mask, start_positions, end_positions):
        
        yield (inps, attn), (start_pos, end_pos)
        
print("Creating train data")

# Define the generator as a callable (not the generator it self)
train_data_gen = partial(data_gen,
    input_ids=train_encodings['input_ids'], attention_mask=train_encodings['attention_mask'],
    start_positions=train_encodings['start_positions'], end_positions=train_encodings['end_positions']
)

# Define the dataset
train_dataset = tf.data.Dataset.from_generator(
    train_data_gen, output_types=(('int32', 'int32'), ('int32', 'int32'))
)
# Shuffling the data
train_dataset = train_dataset.shuffle(1000)
print('\tDone')

# Valid set is taken as the first 10000 samples in the shuffled set
valid_dataset = train_dataset.take(10000)
valid_dataset = valid_dataset.batch(1)

# Rest is kept as the training data
train_dataset = train_dataset.skip(10000)
train_dataset = train_dataset.batch(1)

# Creating test data
print("Creating test data")

# Define the generator as a callable
test_data_gen = partial(data_gen,
    input_ids=test_encodings['input_ids'], attention_mask=test_encodings['attention_mask'],
    start_positions=test_encodings['start_positions'], end_positions=test_encodings['end_positions']
)
test_dataset = tf.data.Dataset.from_generator(
    test_data_gen, output_types=(('int32', 'int32'), ('int32', 'int32'))
)
test_dataset = test_dataset.batch(8)
print("\tDone")

Creating train data
	Done
Creating test data
	Done


### Defining the model

Here we define a DistilBert model (particularly a TF variant)

In [16]:
from transformers import BertConfig,  TFBertForQuestionAnswering

config = BertConfig.from_pretrained("bert-base-uncased", return_dict=False)
print(config)
model =  TFBertForQuestionAnswering.from_pretrained("bert-base-uncased", config=config)

def tf_wrap_model(model):
    """ Wraps the huggingface's model with in the Keras Functional API """
    
    # If this is not wrapped in a keras model by taking the correct tensors from
    # TFQuestionAnsweringModelOutput produced, you will get the following error
    # setting return_dict did not seem to work as it should
    
    # TypeError: The two structures don't have the same sequence type. 
    # Input structure has type <class 'tuple'>, while shallow structure has type 
    # <class 'transformers.modeling_tf_outputs.TFQuestionAnsweringModelOutput'>.
    
    # Define inputs
    input_ids = tf.keras.layers.Input([None,], dtype=tf.int32, name="input_ids")
    attention_mask = tf.keras.layers.Input([None,], dtype=tf.int32, name="attention_mask")
    
    # Define the output (TFQuestionAnsweringModelOutput)
    out = model([input_ids, attention_mask])
    
    # Get the correct attributes in the produced object to generate an output tuple
    wrap_model = tf.keras.models.Model([input_ids, attention_mask], outputs=(out.start_logits, out.end_logits))
    
    return wrap_model


# Define and compile the model

# Keras will assign a separate loss for each output and add them together. So we'll just use the standard CE loss
# instead of using the built-in model.compute_loss, which expects a dict of outputs and averages the two terms.
# Note that this means the loss will be 2x of when using TFTrainer since we're adding instead of averaging them.
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
acc = tf.keras.metrics.SparseCategoricalAccuracy()
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-5)

model_v2 = tf_wrap_model(model)
model_v2.compile(optimizer=optimizer, loss=loss, metrics=[acc])


BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "return_dict": false,
  "transformers_version": "4.15.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}



All model checkpoint layers were used when initializing TFBertForQuestionAnswering.

Some layers of TFBertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### Training the model

In [18]:
import time

t1 = time.time()

model_v2.fit(
    train_dataset, 
    validation_data=valid_dataset,    
    epochs=3
)

t2 = time.time()

print(f"It took {t2-t1} seconds to complete the training")

Epoch 1/3


ResourceExhaustedError: 2 root error(s) found.
  (0) RESOURCE_EXHAUSTED:  OOM when allocating tensor with shape[1,12,512,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node model_1/tf_bert_for_question_answering_1/bert/encoder/layer_._5/attention/self/transpose_2
 (defined at C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py:244)
]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

	 [[sparse_categorical_crossentropy_1/cond/then/_12/sparse_categorical_crossentropy_1/cond/cond/then/_183/sparse_categorical_crossentropy_1/cond/cond/remove_squeezable_dimensions/Equal_1/_664]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

  (1) RESOURCE_EXHAUSTED:  OOM when allocating tensor with shape[1,12,512,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node model_1/tf_bert_for_question_answering_1/bert/encoder/layer_._5/attention/self/transpose_2
 (defined at C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py:244)
]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_36517]

Errors may have originated from an input operation.
Input Source operations connected to node model_1/tf_bert_for_question_answering_1/bert/encoder/layer_._5/attention/self/transpose_2:
In[0] model_1/tf_bert_for_question_answering_1/bert/encoder/layer_._5/attention/self/Reshape_2 (defined at C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py:241)	
In[1] model_1/tf_bert_for_question_answering_1/bert/encoder/layer_._5/attention/self/transpose_2/perm:

Operation defined at: (most recent call last)
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\runpy.py", line 193, in _run_module_as_main
>>>     "__main__", mod_spec)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\runpy.py", line 85, in _run_code
>>>     exec(code, run_globals)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\ipykernel_launcher.py", line 16, in <module>
>>>     app.launch_new_instance()
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\traitlets\config\application.py", line 846, in launch_instance
>>>     app.start()
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\ipykernel\kernelapp.py", line 677, in start
>>>     self.io_loop.start()
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\tornado\platform\asyncio.py", line 199, in start
>>>     self.asyncio_loop.run_forever()
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\asyncio\base_events.py", line 541, in run_forever
>>>     self._run_once()
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\asyncio\base_events.py", line 1786, in _run_once
>>>     handle._run()
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\asyncio\events.py", line 88, in _run
>>>     self._context.run(self._callback, *self._args)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\ipykernel\kernelbase.py", line 457, in dispatch_queue
>>>     await self.process_one()
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\ipykernel\kernelbase.py", line 446, in process_one
>>>     await dispatch(*args)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\ipykernel\kernelbase.py", line 353, in dispatch_shell
>>>     await result
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\ipykernel\kernelbase.py", line 648, in execute_request
>>>     reply_content = await reply_content
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\ipykernel\ipkernel.py", line 353, in do_execute
>>>     res = shell.run_cell(code, store_history=store_history, silent=silent)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\ipykernel\zmqshell.py", line 532, in run_cell
>>>     return super().run_cell(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\IPython\core\interactiveshell.py", line 2915, in run_cell
>>>     raw_cell, store_history, silent, shell_futures)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\IPython\core\interactiveshell.py", line 2960, in _run_cell
>>>     return runner(coro)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\IPython\core\async_helpers.py", line 78, in _pseudo_sync_runner
>>>     coro.send(None)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\IPython\core\interactiveshell.py", line 3186, in run_cell_async
>>>     interactivity=interactivity, compiler=compiler, result=result)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\IPython\core\interactiveshell.py", line 3377, in run_ast_nodes
>>>     if (await self.run_code(code, result,  async_=asy)):
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\IPython\core\interactiveshell.py", line 3457, in run_code
>>>     exec(code_obj, self.user_global_ns, self.user_ns)
>>> 
>>>   File "C:\Users\thush\AppData\Local\Temp/ipykernel_16572/1305528741.py", line 8, in <module>
>>>     epochs=3
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\training.py", line 1216, in fit
>>>     tmp_logs = self.train_function(iterator)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\training.py", line 878, in train_function
>>>     return step_function(self, iterator)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\training.py", line 867, in step_function
>>>     outputs = model.distribute_strategy.run(run_step, args=(data,))
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\training.py", line 860, in run_step
>>>     outputs = model.train_step(data)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\training.py", line 808, in train_step
>>>     y_pred = self(x, training=True)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\functional.py", line 452, in call
>>>     inputs, training=training, mask=mask)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\functional.py", line 589, in _run_internal_graph
>>>     outputs = node.layer(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py", line 2123, in call
>>>     outputs = self.bert(
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py", line 868, in call
>>>     encoder_outputs = self.encoder(
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py", line 535, in call
>>>     for i, layer_module in enumerate(self.layer):
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py", line 541, in call
>>>     layer_outputs = layer_module(
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py", line 451, in call
>>>     self_attention_outputs = self.attention(
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py", line 367, in call
>>>     self_outputs = self.self_attention(
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py", line 283, in call
>>>     query_layer = self.transpose_for_scores(mixed_query_layer, batch_size)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py", line 244, in transpose_for_scores
>>>     return tf.transpose(tensor, perm=[0, 2, 1, 3])
>>> 

Input Source operations connected to node model_1/tf_bert_for_question_answering_1/bert/encoder/layer_._5/attention/self/transpose_2:
In[0] model_1/tf_bert_for_question_answering_1/bert/encoder/layer_._5/attention/self/Reshape_2 (defined at C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py:241)	
In[1] model_1/tf_bert_for_question_answering_1/bert/encoder/layer_._5/attention/self/transpose_2/perm:

Operation defined at: (most recent call last)
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\runpy.py", line 193, in _run_module_as_main
>>>     "__main__", mod_spec)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\runpy.py", line 85, in _run_code
>>>     exec(code, run_globals)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\ipykernel_launcher.py", line 16, in <module>
>>>     app.launch_new_instance()
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\traitlets\config\application.py", line 846, in launch_instance
>>>     app.start()
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\ipykernel\kernelapp.py", line 677, in start
>>>     self.io_loop.start()
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\tornado\platform\asyncio.py", line 199, in start
>>>     self.asyncio_loop.run_forever()
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\asyncio\base_events.py", line 541, in run_forever
>>>     self._run_once()
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\asyncio\base_events.py", line 1786, in _run_once
>>>     handle._run()
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\asyncio\events.py", line 88, in _run
>>>     self._context.run(self._callback, *self._args)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\ipykernel\kernelbase.py", line 457, in dispatch_queue
>>>     await self.process_one()
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\ipykernel\kernelbase.py", line 446, in process_one
>>>     await dispatch(*args)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\ipykernel\kernelbase.py", line 353, in dispatch_shell
>>>     await result
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\ipykernel\kernelbase.py", line 648, in execute_request
>>>     reply_content = await reply_content
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\ipykernel\ipkernel.py", line 353, in do_execute
>>>     res = shell.run_cell(code, store_history=store_history, silent=silent)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\ipykernel\zmqshell.py", line 532, in run_cell
>>>     return super().run_cell(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\IPython\core\interactiveshell.py", line 2915, in run_cell
>>>     raw_cell, store_history, silent, shell_futures)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\IPython\core\interactiveshell.py", line 2960, in _run_cell
>>>     return runner(coro)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\IPython\core\async_helpers.py", line 78, in _pseudo_sync_runner
>>>     coro.send(None)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\IPython\core\interactiveshell.py", line 3186, in run_cell_async
>>>     interactivity=interactivity, compiler=compiler, result=result)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\IPython\core\interactiveshell.py", line 3377, in run_ast_nodes
>>>     if (await self.run_code(code, result,  async_=asy)):
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\IPython\core\interactiveshell.py", line 3457, in run_code
>>>     exec(code_obj, self.user_global_ns, self.user_ns)
>>> 
>>>   File "C:\Users\thush\AppData\Local\Temp/ipykernel_16572/1305528741.py", line 8, in <module>
>>>     epochs=3
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\training.py", line 1216, in fit
>>>     tmp_logs = self.train_function(iterator)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\training.py", line 878, in train_function
>>>     return step_function(self, iterator)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\training.py", line 867, in step_function
>>>     outputs = model.distribute_strategy.run(run_step, args=(data,))
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\training.py", line 860, in run_step
>>>     outputs = model.train_step(data)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\training.py", line 808, in train_step
>>>     y_pred = self(x, training=True)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\functional.py", line 452, in call
>>>     inputs, training=training, mask=mask)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\functional.py", line 589, in _run_internal_graph
>>>     outputs = node.layer(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py", line 2123, in call
>>>     outputs = self.bert(
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py", line 868, in call
>>>     encoder_outputs = self.encoder(
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py", line 535, in call
>>>     for i, layer_module in enumerate(self.layer):
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py", line 541, in call
>>>     layer_outputs = layer_module(
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py", line 451, in call
>>>     self_attention_outputs = self.attention(
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py", line 367, in call
>>>     self_outputs = self.self_attention(
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py", line 283, in call
>>>     query_layer = self.transpose_for_scores(mixed_query_layer, batch_size)
>>> 
>>>   File "C:\Anaconda3\envs\packt.nlp.tf2\lib\site-packages\transformers\models\bert\modeling_tf_bert.py", line 244, in transpose_for_scores
>>>     return tf.transpose(tensor, perm=[0, 2, 1, 3])
>>> 

Function call stack:
train_function -> train_function


### Save the model

In [13]:
print(model_v2.summary())

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_ids (InputLayer)         [(None, None)]       0           []                               
                                                                                                  
 attention_mask (InputLayer)    [(None, None)]       0           []                               
                                                                                                  
 tf_bert_for_question_answering  TFQuestionAnswering  108893186  ['input_ids[0][0]',              
  (TFBertForQuestionAnswering)  ModelOutput(loss=No               'attention_mask[0][0]']         
                                ne, start_logits=(N                                               
                                one, None),                                                   

**Note**: We cannot save `model_v2` as is, because it raises an error about not finding config for the transformer model layer. THerefore, we will save just the transformer model layer, so that we can call the `tf_wrap_model()` function anytime and get the wrapped model. 

In [15]:
import os

# Create folders
if not os.path.exists('models'):
    os.makedirs('models')
if not os.path.exists('tokenizers'):
    os.makedirs('tokenizers')
    
# Save the modle
model_v2.get_layer("tf_bert_for_question_answering").save_pretrained(os.path.join('models', 'bert_qa'))

# Save the tokenizer
tokenizer.save_pretrained(os.path.join('tokenizers', 'bert_qa'))



('tokenizers\\bert_qa\\tokenizer_config.json',
 'tokenizers\\bert_qa\\special_tokens_map.json',
 'tokenizers\\bert_qa\\vocab.txt',
 'tokenizers\\bert_qa\\added_tokens.json',
 'tokenizers\\bert_qa\\tokenizer.json')

### Testing on unseen data

In [14]:
model_v2.evaluate(test_dataset)



[2.4768142700195312,
 1.2703046798706055,
 1.2065105438232422,
 0.657048225402832,
 0.6935666799545288]

## Ask BERT a question ...

In [16]:
i = 5

# Define sample question
sample_q = test_questions[i]
# Define sample context
sample_c = test_contexts[i]
# Define sample answer 
sample_a = test_answers[i]

# Get the input in the format BERT accepts
sample_input = (test_encodings["input_ids"][i:i+1], test_encodings["attention_mask"][i:i+1])

def ask_bert(sample_input, tokenizer, model):
    """ This function takes an input, a tokenizer, a model and returns the prediciton """
    out = model.predict(sample_input)
    pred_ans_start = tf.argmax(out[0][0])
    pred_ans_end = tf.argmax(out[1][0])
    print(f"{pred_ans_start}-{pred_ans_end} token ids contain the answer")
    ans_tokens = sample_input[0][0][pred_ans_start:pred_ans_end+1]
    
    return " ".join(tokenizer.convert_ids_to_tokens(ans_tokens))

print("Question")
print("\t", sample_q, "\n")
print("Context")
print("\t", sample_c, "\n")
print("Answer (char indexed)")
print("\t", sample_a, "\n")
print('='*50,'\n')

sample_pred_ans = ask_bert(sample_input, tokenizer, model_v2)

print("Answer (predicted)")
print(sample_pred_ans)
print('='*50,'\n')

Question
	 What was the theme of Super Bowl 50? 

Context
	 Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50. 

Answer (char indexed)
	 {'answer_start': 487, 'text': '"golden anniversary"', 'answer_end': 507} 


98-99 token ids contain the answer
Answer (predicted)
golden a