In [0]:
# -----------------------------------------------------
# Natural Language Processing
# Assignment 2 - Automatic Sense Making and Explanation
# Task 1 - Validation
# Michael McAleer R00143621
# -----------------------------------------------------

# Note: This has been run and tested on Python 3.6 with TensorFlow 1.15.0

### 1. Install package dependencies

In [0]:
!pip install bert-tensorflow

Collecting bert-tensorflow
[?25l  Downloading https://files.pythonhosted.org/packages/a6/66/7eb4e8b6ea35b7cc54c322c816f976167a43019750279a8473d355800a93/bert_tensorflow-1.0.1-py2.py3-none-any.whl (67kB)
[K     |████▉                           | 10kB 25.2MB/s eta 0:00:01[K     |█████████▊                      | 20kB 1.7MB/s eta 0:00:01[K     |██████████████▋                 | 30kB 2.6MB/s eta 0:00:01[K     |███████████████████▍            | 40kB 1.7MB/s eta 0:00:01[K     |████████████████████████▎       | 51kB 2.1MB/s eta 0:00:01[K     |█████████████████████████████▏  | 61kB 2.5MB/s eta 0:00:01[K     |████████████████████████████████| 71kB 2.2MB/s 
Installing collected packages: bert-tensorflow
Successfully installed bert-tensorflow-1.0.1


### 2. Import libraries

In [0]:
import numpy as np
import os
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow.python.keras as keras

from bert.tokenization import FullTokenizer
from tqdm import tqdm_notebook

### 3. Set paths, download dependencies, set constants

In [0]:
# Set root path dependent on system
# If System is Windows, set ROOT_DIR as current working directory
if os.name == 'nt':
    ROOT_DIR = os.getcwd()
# Else running on CoLab, set ROOT_DIR to match environment path
else:
    from google.colab import drive

    drive.mount('/content/drive')
    ROOT_DIR = '/content/drive/My Drive/Colab Notebooks/'

# Paths to data and model output dir
DATA_DIR = '{root}/data'.format(root=ROOT_DIR)
TRAIN_DIR = '{data}/train'.format(data=DATA_DIR)
TEST_DIR = '{data}/test'.format(data=DATA_DIR)

# Params for bert model and tokenisation
BERT_PATH = 'https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1'
MAX_SEQ_LENGTH = 256

# Initialize TensorFlow session
session = tf.Session()

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


### 4. Import training and test data

In [0]:
# Set training data path
training_data_path = '{td}/subtaskA_data_all.csv'.format(td=TRAIN_DIR)
training_answers_path = '{td}/subtaskA_answers_all.csv'.format(td=TRAIN_DIR)
# Load training data
train_data = pd.read_csv(training_data_path, index_col='id')
train_answers = pd.read_csv(training_answers_path,
                            names=['id', 'most_unlikely'], index_col='id')

# Set test data path
test_data_path = '{td}/taskA_trial_data.csv'.format(td=TEST_DIR)
test_answers_path = '{td}/taskA_trial_answer.csv'.format(td=TEST_DIR)
# Load test data
test_data = pd.read_csv(test_data_path, index_col='id')
test_answers = pd.read_csv(training_answers_path,
                           names=['id', 'most_unlikely'], index_col='id')

# Add labels to the train and test data frames
train_data['most_unlikely'] = train_answers['most_unlikely']
test_data['most_unlikely'] = test_answers['most_unlikely']

### 5. Clean data before bert pre-processing

In [0]:
def clean_raw_data(data):
    """Given raw input data, normalise to lower case and remove all
    punctuation from the string.

    :param data: raw data -- pandas dataframe
    :return: normalised data -- pandas dataframe
    """
    # For each data frame column holding a sentence...
    for col in ['sent0', 'sent1']:
        # Normalise case...
        data[col] = data[col].apply(
            lambda x: ' '.join(w.lower() for w in x.split()))
        # Remove any punctuation or symbols...
        data[col] = data[col].str.replace(r'[^\w\s]', '')
    return data


# Clean both the training and test data - the model used is uncased so it is
# necessary to have our input data also uncased
train_data = clean_raw_data(train_data)
test_data = clean_raw_data(test_data)

### 6. Split training data into training and validation data

In [0]:
# Split -- 95%:5% 
train_size = train_data.shape[0]
split = int((train_size / 100) * 95)
train_split = train_data[:split]
val_split = train_data[split:]

# Output details of split process and new training dataset size
print('Initial training data size: {size}'.format(size=train_size))
print('Training Data Size: {size}'.format(size=train_split.shape[0]))
print('Validation Data: {size}'.format(size=val_split.shape[0]))

Initial training data size: 10000
Training Data Size: 9500
Validation Data: 500


### 7. Prepare data for conversion for Bert use

In [0]:
def transform_data_for_bert_use(data):
    """Take normalised input data frame, reduce the sequence length to half
    of the maxiumum sequence length -2 to allow for two large sentences and
    the required tags for bert to be added (three in total).

    :param data: normalised data -- pandas dataframe
    :return: sentence0, sentence1, labels -- np.array, np.array, list
    """
    def _inner_prep(data_column):
        # Convert the column to a list
        d_s = data[data_column].to_list()
        # Trim each of the column values to (max seq / 2) - 2) if they exceed
        # that size
        d_s = [' '.join(
            t.split()[0:int((MAX_SEQ_LENGTH / 2) - 2)]) for t in d_s]
        # Convert the list to a numpy array and convert to 2D
        d_s = np.array(d_s, dtype=object)[:, np.newaxis]
        return d_s

    return (_inner_prep('sent0'), _inner_prep('sent1'),
            data['most_unlikely'].tolist())


# Transform the training, validation and test data so it ready to become an
# input example and later converted to Bert feature
train_s0, train_s1, train_labels = transform_data_for_bert_use(train_split)
val_s0, val_s1, val_labels = transform_data_for_bert_use(val_split)
test_s0, test_s1, test_labels = transform_data_for_bert_use(test_data)

### 8. Get the Bert tokenizer from TensorFlow Hub

In [0]:
def create_bert_tokeniser():
    """Get the Bert tokeniser for the Bert model in use in this task,
    specifically the lower case vocabulary set.

    :return: Bert Tokeniser
    """
    # Define the Bert module in use
    bert_module = hub.Module(BERT_PATH)
    # Get the Bert tokeniser info
    tokeniser_info = bert_module(signature='tokenization_info',
                                 as_dict=True)
    # Get the lower case Bert tokeniser vocabulary
    vocab_file, do_lower_case = session.run([
        tokeniser_info['vocab_file'], tokeniser_info['do_lower_case']])

    return FullTokenizer(vocab_file=vocab_file, do_lower_case=do_lower_case)


# Instantiate tokenizer
bert_tokeniser = create_bert_tokeniser()

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore








### 9. Create Bert Input 'Examples'

In [0]:
class InputExample(object):
    """A single Bert input 'example'."""

    def __init__(self, text_a, text_b, label=None):
        """Construct an input example.

        :param text_a: The raw text of the first sentence -- str
        :param text_b: The raw text of the second sentence -- str
        :param label: The label of the example -- int [0, 1]
        """
        self.text_a = text_a
        self.text_b = text_b
        self.label = label


def convert_data_to_examples(sent0, sent1, labels):
    """Create input examples from entire input dataset.

    :param sent0: sentence 0 values -- list
    :param sent1: sentence 1 values -- list
    :param labels: sentence label -- list
    :return: InputExamples -- list
    """
    input_examples = list()
    # For each of the rows in the data
    for s0, s1, label in zip(sent0, sent1, labels):
        # Convert to InputExample object and add to list of examples
        input_examples.append(
            InputExample(text_a=' '.join(s0),
                         text_b=' '.join(s1),
                         label=label))

    return input_examples


# Convert data to InputExample format
train_examples = convert_data_to_examples(train_s0, train_s1, train_labels)
val_examples = convert_data_to_examples(val_s0, val_s1, val_labels)
test_examples = convert_data_to_examples(test_s0, test_s1, test_labels)

### 10. Convert input examples to Bert features

In [0]:
def convert_examples_to_features(tokeniser, examples, dataset):
    """Convert input examples into features ready for input into Bert using the
    loaded tokeniser.

    :param tokeniser: Bert tokeniser -- obj
    :param examples: input examples -- list
    :param dataset: dataset being converted -- str
    :return: Bert features (input IDs, input masks, segment IDs, labels)
             -- (np.array, np.array, np.array, np.array)
    """
    # Initialise lists to hold input ids, masks, segments, and labels
    i_ids, i_masks, s_ids, lbls = list(), list(), list(), list()
    # Message to return from progress bar
    msg = 'Feature conversion of {d} dataset in progress...'.format(d=dataset)
    # For each of the examples in the dataset (track progress with tqdm lib)
    for example in tqdm_notebook(examples, desc=msg):
        # Convert the example to a Bert feature
        i_id, i_mask, s_id, lbl = convert_single_example(tokeniser, example)
        # Append the feature values to the respective lists
        i_ids.append(i_id)
        i_masks.append(i_mask)
        s_ids.append(s_id)
        lbls.append(lbl)

    return (np.array(i_ids), np.array(i_masks),
            np.array(s_ids), np.array(lbls).reshape(-1, 1))


def convert_single_example(tokeniser, example):
    """Convert an input example into a Bert feature.

    :param tokeniser: Bert tokeniser -- obj
    :param example: input example -- obj
    :return: input ids, input mask, segment IDs, label -- list, list, list, int
    """
    # Tokenise sentence 0 with Bert tokeniser
    tokens_a = tokeniser.tokenize(example.text_a)
    # Tokenise sentence 1 with Bert tokeniser
    tokens_b = tokeniser.tokenize(example.text_b)
    # Initialise list to hold tokens and tags, and another for the segment IDs
    tokens, segment_ids = list(), list()
    # Start the tokens with the Bert sentence start tag [CLS]
    tokens.append('[CLS]')
    # Append 0 to segment IDs to indicate start of sentence 0
    segment_ids.append(0)
    # For each of the tokens in sentence 0
    for token in tokens_a:
        # Append the word token to the list of tokens
        tokens.append(token)
        # Append 0 to the list of segment IDs
        segment_ids.append(0)
    # After all sentence 0 tokens are processed add the sentence seperator
    # tag [SEP]
    tokens.append('[SEP]')
    # Append 0 to the list of segment IDs for the seperator tag, this is a Bert
    # model requirement
    segment_ids.append(0)
    # For each of the word tokens in sentence 1
    for token in tokens_b:
        # Append the word token to the list of tokens
        tokens.append(token)
        # Append 1 to the list of segment IDs to indicate second sentence
        segment_ids.append(1)
    # After all sentence 1 tokens are processed add the [CLS] tag to indicate
    # the end of the sentence
    tokens.append('[CLS]')
    # Append 1 to the segment IDs for the last token in the sequence
    segment_ids.append(1)
    # Convert the list of tokens to a list of integers indicating their index
    # in the Bert model vocabulary
    input_ids = tokeniser.convert_tokens_to_ids(tokens)
    # Create a mask equal to the length of the tokens with value 1 in each
    # position to tell Bert what tokens to pay attention to, 0 will be ignored
    input_mask = [1] * len(input_ids)
    # Pad each of the sequences with 0 values so they are equal in length to
    # the maximum sequence length of 256
    while len(input_ids) < MAX_SEQ_LENGTH:
        input_ids.append(0)
        input_mask.append(0)
        segment_ids.append(0)
    # Assert each of the sequences are equal to the maximum sequence length
    assert len(input_ids) == MAX_SEQ_LENGTH
    assert len(input_mask) == MAX_SEQ_LENGTH
    assert len(segment_ids) == MAX_SEQ_LENGTH

    return input_ids, input_mask, segment_ids, example.label


# Convert the training input examples into Bert input features
(train_input_ids, train_input_masks, train_segment_ids, train_labels) = (
    convert_examples_to_features(
        bert_tokeniser, train_examples, dataset='Training'))
# Convert the valdation input examples into Bert input features
(val_input_ids, val_input_masks, val_segment_ids, val_labels) = (
    convert_examples_to_features(
        bert_tokeniser, val_examples, dataset='Validation'))
# Convert the test input examples into Bert input features
(test_input_ids, test_input_masks, test_segment_ids, test_labels) = (
    convert_examples_to_features(
        bert_tokeniser, test_examples, dataset='Test'))

HBox(children=(IntProgress(value=0, description='Feature conversion of Training dataset in progress...', max=9…




HBox(children=(IntProgress(value=0, description='Feature conversion of Validation dataset in progress...', max…




HBox(children=(IntProgress(value=0, description='Feature conversion of Test dataset in progress...', max=2021,…




### 11. Transfer the Bert model from TensorFlow Hub and fine-tune for task

In [0]:
class BertLayer(tf.keras.layers.Layer):
    """The transferred Bert Model (uncased/large - L-12_H-768_A-12)."""

    def __init__(self, n_fine_tune_layers=3, **kwargs):
        """Initialise the Bert layer parameters.

        :param n_fine_tune_layers: How many layers of Bert model to
                                   fine-tune -- int
        :param kwargs: additional key-word arguements
        """
        # Amount of layers to fine tune
        self.n_fine_tune_layers = n_fine_tune_layers
        # Is model trainable
        self.trainable = True
        # Model output size
        self.output_size = 768
        # Path to Bert model in TF Hub
        self.bert_path = BERT_PATH
        # Super() the BertLayer
        super(BertLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        """Build the Bert model.
        
        :param input_shape: input feature shape -- int
        """
        # Load trainable Bert model
        self.bert = hub.Module(
            self.bert_path, trainable=self.trainable,
            name='bert_projectA_module')
        # Set model trainable layers
        trainable_layers = ['pooler/dense']
        # Remove unused layers
        trainable_vars = [
            var for var in self.bert.variables if '/cls/' not in var.name]
        # Select all the layers for fine-tuning
        for i in range(self.n_fine_tune_layers):
            trainable_layers.append('encoder/layer_{x}'.format(x=str(11 - i)))
        # Update trainable variables to contain only the specified layers
        trainable_vars = [
            var for var in trainable_vars if (
                any([l in var.name for l in trainable_layers]))]
        # Add trainable variables to trainable weights
        for var in trainable_vars:
            self._trainable_weights.append(var)
        # Add non-trainable variables to non-trainable weights
        for var in self.bert.variables:
            if var not in self._trainable_weights:
                self._non_trainable_weights.append(var)
        # Super() the Bert Layer to build with the input shape
        super(BertLayer, self).build(input_shape)

    def call(self, inputs):
        """Called after build, apply layers to input tensors.

        :param inputs: Bert model feature inputs -- tuple
        :return: Bert model -- TensorFlow model
        """
        # Cast all the inputs as integers
        inputs = [keras.backend.cast(x, dtype='int32') for x in inputs]
        # Extract layer inputs
        input_ids, input_mask, segment_ids = inputs
        # Create input feature dict
        bert_inputs = dict(
            input_ids=input_ids, input_mask=input_mask,
            segment_ids=segment_ids)
        # Return Bert model
        return self.bert(inputs=bert_inputs, signature='tokens',
                         as_dict=True)['pooled_output']

    def compute_output_shape(self, input_shape):
        """Shape transformation logic.

        :param input_shape: input shape -- int
        :return: input shape, output size -- int, int
        """
        return input_shape[0], self.output_size

### 12. Build & Train Model

In [0]:
def build_model():
    """Build the complete Bert transfer model with fine-tuning.

    :return: Bert model -- Keras Model
    """
    # Define input layer to hold feature input IDs
    in_id = tf.keras.layers.Input(shape=(MAX_SEQ_LENGTH,),
                                  name='input_ids')
    # Define input layer to hold feature input masks
    in_mask = tf.keras.layers.Input(shape=(MAX_SEQ_LENGTH,),
                                    name='input_masks')
    # Define input layer to hold feature input segment IDs
    in_segment = tf.keras.layers.Input(shape=(MAX_SEQ_LENGTH,),
                                       name='segment_ids')
    # Colate all input layers
    bert_inputs = [in_id, in_mask, in_segment]

    # A number of attempts made to further fine-tune and reduce over fitting
    # on training data, ultimately this was abandoned due to the impact on
    # accuracy whilst not having the same noticeable impact on loss reduction

    # Add the Bert model layer and the amount of layers to fine-tune, the
    # bert inputs will serve as the inputs into the Bert Layer
    bert_output = BertLayer(n_fine_tune_layers=3)(bert_inputs)
    # Add a dropout layer between the Bert model and the first fully connected
    # dense layer
    drop = tf.keras.layers.Dropout(0.1)(bert_output)
    # Add a fully connected dense layer with 256 neurons and relu activation
    # function
    dense = tf.keras.layers.Dense(256, activation='relu')(drop)

    # drop1 = tf.keras.layers.Dropout(0.2)(bert_output)
    # dense1 = tf.keras.layers.Dense(256, activation='relu')(drop1)
    # drop2 = tf.keras.layers.Dropout(0.1)(dense1)
    # dense2 = tf.keras.layers.Dense(128, activation='relu')(drop2)

    # Add a softmax classifier with 2 neurons to represent [0, 1] labels
    pred = tf.keras.layers.Dense(2, activation='softmax')(dense)
    # Initialise the Keras model, define the input layer and output layer, the
    # defined layers above handle the inter-connects
    bert_model = tf.keras.models.Model(inputs=bert_inputs, outputs=pred)
    bert_adam = keras.optimizers.Adam(lr=2e-5, decay=0.001)

    # bert_model.compile(loss='binary_crossentropy', optimizer=bert_adam,
    #                    metrics=['accuracy'])

    # Compile the model with loss function, optimiser and return accuracy
    bert_model.compile(loss='sparse_categorical_crossentropy',
                       optimizer=bert_adam, metrics=['accuracy'])
    # Output the model summary
    bert_model.summary()

    return bert_model


def initialise_session(s):
    """Iniitialise all the TensorFlow session variables.

    :param s: TensorFlow session -- obj
    """
    # Initialise local variables
    s.run(tf.local_variables_initializer())
    # Initialise global variables
    s.run(tf.global_variables_initializer())
    # Initialise TensorFlow tables
    s.run(tf.tables_initializer())
    # Set session as Keras backend session
    keras.backend.set_session(s)


# Initialise the Bert model
model = build_model()
# Instantiate variables
initialise_session(session)
# Fit the model on the training data and validate on the validation data
model.fit(
    [train_input_ids, train_input_masks, train_segment_ids], train_labels,
    validation_data=(
        [val_input_ids, val_input_masks, val_segment_ids], val_labels),
    epochs=3, batch_size=32)

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Instructions for updating:
If using Keras pass *_constraint arguments to layers.


Instructions for updating:
If using Keras pass *_constraint arguments to layers.


Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_ids (InputLayer)          [(None, 256)]        0                                            
__________________________________________________________________________________________________
input_masks (InputLayer)        [(None, 256)]        0                                            
__________________________________________________________________________________________________
segment_ids (InputLayer)        [(None, 256)]        0                                            
__________________________________________________________________________________________________
bert_layer (BertLayer)          (None, 768)          110104890   input_ids[0][0]                  
                                                                 input_masks[0][0]            

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Train on 9500 samples, validate on 500 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7f1b4b707048>

### 13. Make predictions on the most unlikely sentence in the test data set

In [0]:
# Make predictions on the test dataset values
predictions = model.predict([test_input_ids,
                             test_input_masks,
                             test_segment_ids])

# Get the predicted label for each test dataset input
predictions_max = np.argmax(predictions, axis=1)

# Create answer dictionary for use by evaluation tool
answers = dict()
for i, pred in enumerate(predictions_max):
    answers[str(i + 1)] = str(pred)

# Output total predicted answer count
print(len(answers.keys()))

2021


### 14. Evaluate Results against gold labels

In [0]:
# Evaluation tools taken from competition website
# There is no need to output the predictions to CSV to load them again, we can
# pass in the expected prediction dictionary directly

import csv
import logging
import sys

def read_gold(filename):
    answers = {}

    with open(filename, "rt", encoding="UTF-8", errors="replace") as f:
        reader = csv.reader(f)
        next(reader)
        try:
            for row in reader:
                try:
                    instance_id = row[0]
                    answer = row[1]
                except IndexError as e:
                    logging.error(
                        "Error reading value from CSV file %s on line %d: %s",
                        filename, reader.line_num, e)
                    sys.exit(EXIT_STATUS_ANSWERS_MALFORMED)

                if instance_id in answers:
                    logging.error("Key %s repeated in %s",
                                  instance_id, filename)
                    sys.exit(EXIT_STATUS_ANSWERS_MALFORMED)

                answers[instance_id] = answer

        except csv.Error as e:
            logging.error('file %s, line %d: %s', filename, reader.line_num, e)
            sys.exit(EXIT_STATUS_ANSWERS_MALFORMED)

    if len(answers) == 0:
        logging.error("No answers found in file %s", filename)
        sys.exit(EXIT_STATUS_ANSWERS_MALFORMED)

    return answers


def calculate_accuracy(gold_labels, predictions):
    score = 0.0

    for instance_id, answer in gold_labels.items():
        try:
            predictions_for_current = predictions[instance_id]
        except KeyError:
            logging.error("Missing prediction for question '%s'.", instance_id)
            sys.exit(EXIT_STATUS_PREDICTION_MISSING)

        if answer == predictions_for_current:
            score += 1.0 / len(predictions_for_current)

        del predictions[instance_id]

    if len(predictions) > 0:
        logging.error("Found %d extra predictions, for example: %s", len(
            predictions), ", ".join(list(predictions.keys())[:3]))
        sys.exit(EXIT_STATUS_PREDICTIONS_EXTRA)

    return score / len(gold_labels)

In [0]:
gold_labels = read_gold(test_answers_path)
accuracy = calculate_accuracy(gold_labels, answers)
print(f'Accuracy: {accuracy * 100:.4f}%')

Accuracy: 73.4785%


### 15. Task-1 Results

The validation task of sense making was undertaken using Google’s BERT model, specifically the uncased large model with 12 layers due to hardware constraints posed by the Colab platform.

Sentence pairs with an associated label indicating the most unlikely sentence in the pair were input into the BERT model after they had been transformed and embedded into the correct format required for BERT.

Example:

[CLS]He put a turkey into the fridge.[SEP]He put an elephant into the fridge.[CLS] [1]

During the training process a variation of model architectures were implemented to gauge the impact on accuracy and loss. These variations 
included

•	Adding fully connected dense layers 

•	Adding dropout between the layers

•	Enabling layers in the base BERT model to become trainable

•	Varying the number of epochs that the BERT model is fine-tuned for

Training BERT on the training data presented a model which was prone to overfitting extremely quickly, after just three epochs the validation loss started to increase at a very fast rate with no improvement seen in accuracy.
To attempt to alleviate the impact of overfitting on the training data additional dense layers with varying rates of dropout were added both before and after the BERT layers but ultimately just a single dropout layer with a rate of 0.1 and a single fully connected layer after the BERT layers provided the best results. It was not possible to reduce overfitting by other traditional methods such as capacity reduction in the hidden layers as we do not want to negatively impact the quality of BERT. 

On the validation data the model obtained a loss of 0.7023 and accuracy of 0.6920 (this was seen to go as high as 0.71 in previous runs), on the test dataset an accuracy of 0.7347 was achieved. Whilst not particularly high accuracy values, they do match or exceed the results achieved by the competition organisers [1]. 

It is my opinion that with further time spent fine-tuning the model and a change in the input format of the sentences this number could be further increased. Implementing regularization on the weights may also have a positive impact on the rate of overfitting. After achieving good results in task-2 by creating more training samples by having an input for every answer in the dataset, a similar approach could be tried here where one sentence at a time input into BERT with a classification label for that one input. This format would double the size of the dataset and potentially improve overall accuracy and loss of the model after training. However, it is also taken into consideration that BERT was built to ‘understand’ sentence pairs so this assumption on inputting single tokens may not be valid.

[1] Cunxiang Wang, Shuailong Liang , Yue Zhang , Xiaonan Li and Tian Gao. Does It Make Sense? And Why? A Pilot Study for Sense Making and Explanation. https://arxiv.org/pdf/1906.00363.pdf
