<b>Chris Huber\
    CSC820, Prof. Anagha Kulkari
    Spring 2020<b>
    
<p>This notebook shows how to train the BERT model on new data and run it to create inference predictions of 0, 1, or 2 which correspond to contradiction, netural, or entailment on a test set of data that was submitted for a Kaggle competition. This is based on another notebook used for the Contradictory, Dear Watson TPU competition by a user named Marsh but adapted for GPU/CUDA processing. It requires a CUDA-enabled computer with an appropriate GPU in order to run. It uses a PyTorch neural network to learn the labels in the train data, uses a validation set to improve the model, and then outputs inference predictions for a test set using that model.</p>
    
<p> My results came out at 0.83738 F1 score indicating a far high degree of accuracy than could be achieved by randomly selecting classes which tells us that the experiment was a success. By altering the model or adding/changing features the score should be able to be improved upon.</p>
    
<p>NB: This notebook took approximately 3 days to fully train on the 392702 rows in the train set on a gaming PC, so if you use your own machine be prepared to have it tied up for a while and make sure to turn sleep mode off.</p>

In [1]:
import torch
from transformers import AutoConfig, AutoModel
from transformers import RobertaTokenizer, RobertaForMaskedLM, RobertaForCausalLM, RobertaForSequenceClassification
from torch.utils.data import Dataset, DataLoader
import pandas as pd
import os

from datasets import Dataset, load_dataset

# I created a dataset for this project which includes the MNLI 1.0 train dataset, the Kaggle dev dataset for the 
# Matched MulitNLI Competition (https://www.kaggle.com/competitions/multinli-matched-open-evaluation) and an 
# unlabelled test set. 
raw_datasets = load_dataset("chrishuber/kaggle_mnli")
raw_datasets

Using custom data configuration chrishuber--kaggle_mnli-df90bb2b9c35e99b
Reusing dataset json (C:\Users\chris\.cache\huggingface\datasets\json\chrishuber--kaggle_mnli-df90bb2b9c35e99b\0.0.0\ac0ca5f5289a6cf108e706efcf040422dbbfa8e658dee6a819f20d76bb84d26b)


  0%|          | 0/3 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['annotator_labels', 'genre', 'gold_label', 'pairID', 'promptID', 'sentence1', 'sentence1_binary_parse', 'sentence1_parse', 'sentence2', 'sentence2_binary_parse', 'sentence2_parse'],
        num_rows: 392702
    })
    test: Dataset({
        features: ['annotator_labels', 'genre', 'gold_label', 'pairID', 'promptID', 'sentence1', 'sentence1_binary_parse', 'sentence1_parse', 'sentence2', 'sentence2_binary_parse', 'sentence2_parse'],
        num_rows: 19643
    })
    validation: Dataset({
        features: ['annotator_labels', 'genre', 'gold_label', 'pairID', 'promptID', 'sentence1', 'sentence1_binary_parse', 'sentence1_parse', 'sentence2', 'sentence2_binary_parse', 'sentence2_parse'],
        num_rows: 20000
    })
})

In [2]:
df_train = raw_datasets['train'].to_pandas()
df_train.shape

(392702, 11)

In [3]:
df_val = raw_datasets['validation'].to_pandas()
df_val.shape

(20000, 11)

In [62]:
# this is the incorrect test set, should be 9796 rows
# df_test = raw_datasets['test'].to_pandas()
# df_test.shape

# Unfortunately, the data that I uploaded to my Huggingface.co repo for the test set is not the correct one so I had to
# re-import it locally from a JSON file
df_test = pd.read_json("./multinli_0.9_test_matched_unlabeled.jsonl", lines=True)
df_test.head()

Unnamed: 0,annotator_labels,genre,gold_label,pairID,promptID,sentence1,sentence1_binary_parse,sentence1_parse,sentence2,sentence2_binary_parse,sentence2_parse
0,"[hidden, hidden, hidden, hidden, hidden]",slate,hidden,9847,9847,That which binds together Chinese.,( ( That ( which ( binds ( together Chinese ) ...,(ROOT (FRAG (NP (NP (DT That)) (SBAR (WHNP (WD...,This is a shared value among Chinese people.,( This ( ( is ( ( a ( shared value ) ) ( among...,(ROOT (S (NP (DT This)) (VP (VBZ is) (NP (NP (...
1,"[hidden, hidden, hidden, hidden, hidden]",government,hidden,9848,9848,The actual length of an individual worker's H-...,( ( ( The ( actual length ) ) ( of ( ( an ( in...,(ROOT (S (NP (NP (DT The) (JJ actual) (NN leng...,The location of the employer effects the lengt...,( ( ( The location ) ( of ( the employer ) ) )...,(ROOT (S (NP (NP (DT The) (NN location)) (PP (...
2,"[hidden, hidden, hidden, hidden, hidden]",fiction,hidden,9849,9849,Every man I put down left me empty.,( ( ( Every man ) ( I ( put down ) ) ) ( ( lef...,(ROOT (S (NP (NP (DT Every) (NN man)) (SBAR (S...,I felt empty after every man I put down.,( I ( ( ( felt ( empty ( after ( every man ) )...,(ROOT (S (NP (PRP I)) (VP (VBD felt) (ADJP (JJ...
3,"[hidden, hidden, hidden, hidden, hidden]",telephone,hidden,9850,9850,and uh i really think that if uh like after se...,( and ( ( uh i ) ( really ( think ( ( that if ...,(ROOT (FRAG (CC and) (NP (NP (FW uh) (FW i)) (...,Women wouldn't have gone to work after the sec...,( Women ( ( ( would n't ) ( have ( gone ( to (...,(ROOT (S (NP (NNP Women)) (VP (MD would) (RB n...
4,"[hidden, hidden, hidden, hidden, hidden]",telephone,hidden,9851,9851,yep yeah yeah it was i ended up going into ban...,( ( yep yeah ) ( yeah ( ( ( ( it ( ( was i ) (...,(ROOT (S (NP (NN yep) (NN yeah)) (VP (VBP yeah...,I have no idea what bankruptcy is like.,( I ( ( have ( ( no idea ) ( what ( bankruptcy...,(ROOT (S (NP (PRP I)) (VP (VBP have) (NP (NP (...


In [63]:
len(df_test)

9796

In [64]:
df_test["premise"] = df_test["sentence1"]
df_test["hypothesis"] = df_test["sentence2"]

In [65]:
### truncating datasets for testing
# df_train = df_train.head(3900)
# df_val = df_val.head(200)
# df_test = df_test.head(200)
# df_train.shape

In [6]:
def convert_to_int(label):
    if label == "neutral":
        return 1
    elif label == "contradiction":
        return 0
    elif label == "entailment":
        return 2
    
df_train["labels"] = df_train["gold_label"].apply(lambda x: convert_to_int(x))
df_val["labels"] = df_val["gold_label"].apply(lambda x: convert_to_int(x))

In [7]:
df_train["premise"] = df_train["sentence1"]
df_train["hypothesis"] = df_train["sentence2"]
df_val["premise"] = df_val["sentence1"]
df_val["hypothesis"] = df_val["sentence2"]
df_test["premise"] = df_test["sentence1"]
df_test["hypothesis"] = df_test["sentence2"]

In [8]:
df_train.head()

Unnamed: 0,annotator_labels,genre,gold_label,pairID,promptID,sentence1,sentence1_binary_parse,sentence1_parse,sentence2,sentence2_binary_parse,sentence2_parse,labels,premise,hypothesis
0,[neutral],government,neutral,31193n,31193,Conceptually cream skimming has two basic dime...,( ( Conceptually ( cream skimming ) ) ( ( has ...,(ROOT (S (NP (JJ Conceptually) (NN cream) (NN ...,Product and geography are what make cream skim...,( ( ( Product and ) geography ) ( ( are ( what...,(ROOT (S (NP (NN Product) (CC and) (NN geograp...,1,Conceptually cream skimming has two basic dime...,Product and geography are what make cream skim...
1,[entailment],telephone,entailment,101457e,101457,you know during the season and i guess at at y...,( you ( ( know ( during ( ( ( the season ) and...,(ROOT (S (NP (PRP you)) (VP (VBP know) (PP (IN...,You lose the things to the following level if ...,( You ( ( ( ( lose ( the things ) ) ( to ( the...,(ROOT (S (NP (PRP You)) (VP (VBP lose) (NP (DT...,2,you know during the season and i guess at at y...,You lose the things to the following level if ...
2,[entailment],fiction,entailment,134793e,134793,One of our number will carry out your instruct...,( ( One ( of ( our number ) ) ) ( ( will ( ( (...,(ROOT (S (NP (NP (CD One)) (PP (IN of) (NP (PR...,A member of my team will execute your orders w...,( ( ( A member ) ( of ( my team ) ) ) ( ( will...,(ROOT (S (NP (NP (DT A) (NN member)) (PP (IN o...,2,One of our number will carry out your instruct...,A member of my team will execute your orders w...
3,[entailment],fiction,entailment,37397e,37397,How do you know? All this is their information...,( ( How ( ( ( do you ) know ) ? ) ) ( ( All th...,(ROOT (S (SBARQ (WHADVP (WRB How)) (SQ (VBP do...,This information belongs to them.,( ( This information ) ( ( belongs ( to them )...,(ROOT (S (NP (DT This) (NN information)) (VP (...,2,How do you know? All this is their information...,This information belongs to them.
4,[neutral],telephone,neutral,50563n,50563,yeah i tell you what though if you go price so...,( yeah ( i ( ( tell you ) ( what ( ( though ( ...,(ROOT (S (VP (VB yeah) (S (NP (FW i)) (VP (VB ...,The tennis shoes have a range of prices.,( ( The ( tennis shoes ) ) ( ( have ( ( a rang...,(ROOT (S (NP (DT The) (NN tennis) (NNS shoes))...,1,yeah i tell you what though if you go price so...,The tennis shoes have a range of prices.


<h3>Using BERT base uncased as our training model. I had to reduce the max_len to 256 to prevent CUDA from giving me an out-of-memory error which I think affected the accuracy since many of the passages were quite long.</h3>

In [81]:
MODEL_TYPE = 'bert-base-uncased'

NUM_FOLDS = 5

# Saving 5 TPU models will exceed the 4.9GB disk space.
# Therefore, will will only train on 3 folds.
NUM_FOLDS_TO_TRAIN = 3

L_RATE = 1e-5
MAX_LEN = 256
NUM_EPOCHS = 1
BATCH_SIZE = 4
NUM_CORES = os.cpu_count()

NUM_CORES

16

<h3>Verify that CUDA is enabled and being used as the device.</h3>

In [13]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

cuda:0


<h3>We need to use the BertTokeizer with the BERT model</h3>

In [82]:
from transformers import BertTokenizer

# Load the BERT tokenizer.
print('Loading BERT tokenizer...')
tokenizer = BertTokenizer.from_pretrained(MODEL_TYPE, do_lower_case=True)

Loading BERT tokenizer...


In [15]:
class CompDataset(Dataset):

    def __init__(self, df):
        self.df_data = df



    def __getitem__(self, index):

        # get the sentences to compare from the dataframe
        premise = self.df_data.loc[index, 'sentence1']
        hypothesis = self.df_data.loc[index, 'sentence2']
        # optional output if you want to watch it run
        # print(sentence1)
        # print(sentence2)
 
        # Process the sentence
        # ---------------------

        encoded_dict = tokenizer.encode_plus(
                    premise, hypothesis,           # Sentences to encode.
                    add_special_tokens = True,      # Add '[CLS]' and '[SEP]'
                    max_length = MAX_LEN,           # Pad or truncate all sentences.
                    pad_to_max_length = True,
                    return_attention_mask = True,   # Construct attn. masks.
                    return_tensors = 'pt',          # Return pytorch tensors.
               )  
        
        # These are torch tensors already.
        padded_token_list = encoded_dict['input_ids'][0]
        att_mask = encoded_dict['attention_mask'][0]
        token_type_ids = encoded_dict['token_type_ids'][0]
        
        # Convert the target to a torch tensor
        target = torch.tensor(self.df_data.loc[index, 'labels'])
        target = target.type(torch.LongTensor)
        
        sample = (padded_token_list, att_mask, token_type_ids, target)


        return sample


    def __len__(self):
        return len(self.df_data)

class TestDataset(Dataset):

    def __init__(self, df):
        self.df_data = df

    def __getitem__(self, index):

        # get the sentence from the dataframe
        sentence1 = self.df_data.loc[index, 'sentence1']
        sentence2 = self.df_data.loc[index, 'sentence2']

        # Process the sentence
        # ---------------------

        encoded_dict = tokenizer.encode_plus(
                    sentence1, sentence2,           # Sentence to encode.
                    add_special_tokens = True,      # Add '[CLS]' and '[SEP]'
                    max_length = MAX_LEN,           # Pad or truncate all sentences.
                    pad_to_max_length = True,
                    return_attention_mask = True,   # Construct attn. masks.
                    return_tensors = 'pt',          # Return pytorch tensors.
               )
        
        # These are torch tensors already.
        padded_token_list = encoded_dict['input_ids'][0]
        att_mask = encoded_dict['attention_mask'][0]
        token_type_ids = encoded_dict['token_type_ids'][0]
               
        sample = (padded_token_list, att_mask, token_type_ids)

        return sample


    def __len__(self):
        return len(self.df_data)

In [16]:
df_train = df_train.reset_index(drop=True)
df_val = df_val.reset_index(drop=True)

In [17]:
df_train.columns

Index(['annotator_labels', 'genre', 'gold_label', 'pairID', 'promptID',
       'sentence1', 'sentence1_binary_parse', 'sentence1_parse', 'sentence2',
       'sentence2_binary_parse', 'sentence2_parse', 'labels', 'premise',
       'hypothesis'],
      dtype='object')

In [18]:
train_data = CompDataset(df_train)
val_data = CompDataset(df_val)
test_data = TestDataset(df_test)

train_dataloader = torch.utils.data.DataLoader(train_data,
                                        batch_size=BATCH_SIZE,
                                        shuffle=True,
                                       num_workers=0)

val_dataloader = torch.utils.data.DataLoader(val_data,
                                        batch_size=BATCH_SIZE,
                                        shuffle=True,
                                       num_workers=0)

test_dataloader = torch.utils.data.DataLoader(test_data,
                                        batch_size=BATCH_SIZE,
                                        shuffle=False,
                                       num_workers=0)

print(len(train_dataloader))
print(len(val_dataloader))
print(len(test_dataloader))

98176
5000
4911


In [19]:
# Get one train batch
padded_token_list, att_mask, token_type_ids, target = next(iter(train_dataloader))

print(padded_token_list.shape)
print(att_mask.shape)
print(token_type_ids.shape)
print(target.shape)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


torch.Size([4, 256])
torch.Size([4, 256])
torch.Size([4, 256])
torch.Size([4])




In [20]:
# Get one test batch
padded_token_list, att_mask, token_type_ids = next(iter(test_dataloader))

print(padded_token_list.shape)
print(att_mask.shape)
print(token_type_ids.shape)

torch.Size([4, 256])
torch.Size([4, 256])
torch.Size([4, 256])


<h3>We are using BERTForSequenceClassification here which is the correct one to produce classification-based results.</h3>

In [21]:
from transformers import BertForSequenceClassification
# Load BertForSequenceClassification, the pretrained BERT model with a single 
# linear classification layer on top. 
model = BertForSequenceClassification.from_pretrained(
    MODEL_TYPE, 
    num_labels = 3, 
    output_attentions = False,
    output_hidden_states = False)

# Send the model to the device.
model.to(device)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, element

In [22]:
train_dataloader = torch.utils.data.DataLoader(train_data,
                                        batch_size=8,
                                        shuffle=True,
                                       num_workers=0)

batch = next(iter(train_dataloader))

b_input_ids = batch[0].to(device)
b_input_mask = batch[1].to(device)
b_token_type_ids = batch[2].to(device)
b_labels = batch[3].to(device)

In [23]:
outputs = model(b_input_ids, 
                token_type_ids=b_token_type_ids, 
                attention_mask=b_input_mask,
                labels=b_labels)

In [24]:
outputs

SequenceClassifierOutput(loss=tensor(1.0508, device='cuda:0', grad_fn=<NllLossBackward0>), logits=tensor([[-0.7317, -0.5870, -0.2513],
        [-0.4535, -0.4830, -0.2485],
        [-0.7164, -0.5949, -0.2626],
        [-0.5369, -0.4946, -0.2705],
        [-0.7294, -0.5723, -0.2729],
        [-0.5691, -0.5592, -0.2260],
        [-0.6258, -0.5219, -0.2485],
        [-0.6047, -0.4873, -0.3121]], device='cuda:0',
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [25]:
# The output is a tuple: (loss, preds)
len(outputs)

2

In [26]:
# This is the loss.
outputs[0]

tensor(1.0508, device='cuda:0', grad_fn=<NllLossBackward0>)

In [27]:
# These are the predictions.
outputs[1]

tensor([[-0.7317, -0.5870, -0.2513],
        [-0.4535, -0.4830, -0.2485],
        [-0.7164, -0.5949, -0.2626],
        [-0.5369, -0.4946, -0.2705],
        [-0.7294, -0.5723, -0.2729],
        [-0.5691, -0.5592, -0.2260],
        [-0.6258, -0.5219, -0.2485],
        [-0.6047, -0.4873, -0.3121]], device='cuda:0',
       grad_fn=<AddmmBackward0>)

In [28]:
import numpy as np

preds = outputs[1].detach().cpu().numpy()

y_true = b_labels.detach().cpu().numpy()
y_pred = np.argmax(preds, axis=1)

y_pred

array([2, 2, 2, 2, 2, 2, 2, 2], dtype=int64)

In [29]:
len(y_pred)

8

In [30]:
from sklearn.metrics import accuracy_score

# This is the accuracy without any fine tuning.
val_acc = accuracy_score(y_true, y_pred)
val_acc

0.5

In [31]:
# The loss and preds are Torch tensors
print(type(outputs[0]))
print(type(outputs[1]))

<class 'torch.Tensor'>
<class 'torch.Tensor'>


In [32]:
train_df = df_train
val_df = df_val

In [33]:
import random
import gc

# Set a seed value.
seed_val = 1024

random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)

# Store the accuracy scores for each fold model in this list.
# [[model_0 scores], [model_1 scores], [model_2 scores], [model_3 scores], [model_4 scores]]
# [[ecpoch 1, epoch 2, ...], [ecpoch 1, epoch 2, ...], [ecpoch 1, epoch 2, ...], [ecpoch 1, epoch 2, ...], [ecpoch 1, epoch 2, ...]]

# Create a list of lists to store the val acc results.
# The number of items in this list will correspond to
# the number of folds that the model is being trained on.
fold_val_acc_list = []
for i in range(0, NUM_FOLDS):
    
    # append an empty list
    fold_val_acc_list.append([])
    
# For each epoch...
for epoch in range(0, NUM_EPOCHS):
    
    print("\nNum folds used for training:", NUM_FOLDS_TO_TRAIN)
    print('======== Epoch {:} / {:} ========'.format(epoch + 1, NUM_EPOCHS))
    
    # Get the number of folds
    num_folds = len(train_df)

    # For this epoch, store the val acc scores for each fold in this list.
    # We will use this list to calculate the cv at the end of the epoch.
    epoch_acc_scores_list = []
    
    # For each fold...
    for fold_index in range(0, NUM_FOLDS_TO_TRAIN):
        
        print('\n== Fold Model', fold_index)
        
        # .........................
        # Load the fold model
        # .........................
        
        if epoch == 0:
            # define the model
            model = BertForSequenceClassification.from_pretrained(
                MODEL_TYPE, 
                num_labels = 3,       
                output_attentions = False, 
                output_hidden_states = False,
            )
            
            # Send the model to the device.
            model.to(device)
            
            optimizer = torch.optim.AdamW(model.parameters(),
              lr = L_RATE, 
              eps = 1e-8
            )
            
        else:
            # Get the fold model
            path_model = 'model_' + str(fold_index) + '.bin'
            model.load_state_dict(torch.load(path_model))

            # Send the model to the device.
            model.to(device)
        
        # .....................................
        # Set up the train and val dataloaders
        # .....................................
        
        
        # Intialize the fold dataframes
        # df_train = train_df[fold_index]
        # df_val = val_df_list[fold_index]
        
        # Reset the indices or the dataloader won't work.
        df_train = df_train.reset_index(drop=True)
        df_val = df_val.reset_index(drop=True)
    
        # Create the dataloaders
        train_data = CompDataset(df_train)
        val_data = CompDataset(df_val)

        train_dataloader = torch.utils.data.DataLoader(train_data,
                                                batch_size=BATCH_SIZE,
                                                shuffle=True,
                                               num_workers=0)

        val_dataloader = torch.utils.data.DataLoader(val_data,
                                                batch_size=BATCH_SIZE,
                                                shuffle=True,
                                               num_workers=0)
    
        # ========================================
        #               Training
        # ========================================
        
        stacked_val_labels = []
        targets_list = []

        print('Training...')

        # put the model into train mode
        model.train()

        # This turns gradient calculations on and off.
        torch.set_grad_enabled(True)


        # Reset the total loss for this epoch.
        total_train_loss = 0

        for i, batch in enumerate(train_dataloader):

            train_status = 'Batch ' + str(i+1) + ' of ' + str(len(train_dataloader))

            print(train_status, end='\r')


            b_input_ids = batch[0].to(device)
            b_input_mask = batch[1].to(device)
            b_token_type_ids = batch[2].to(device)
            b_labels = batch[3].to(device)

            model.zero_grad()        


            outputs = model(b_input_ids, 
                        token_type_ids=b_token_type_ids, 
                        attention_mask=b_input_mask,
                        labels=b_labels)

            # Get the loss from the outputs tuple: (loss, logits)
            loss = outputs[0]

            # Convert the loss from a torch tensor to a number.
            # Calculate the total loss.
            total_train_loss = total_train_loss + loss.item()

            # Zero the gradients
            optimizer.zero_grad()

            # Perform a backward pass to calculate the gradients.
            loss.backward()
            
            # Clip the norm of the gradients to 1.0.
            # This is to help prevent the "exploding gradients" problem.
            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

            # Use the optimizer to update Weights
            
            # Optimizer for GPU
            optimizer.step() 
            
            # Optimizer for TPU
            # https://pytorch.org/xla/
            # xm.optimizer_step(optimizer, barrier=True)
            
        print('Train loss:' ,total_train_loss)


        # ========================================
        #               Validation
        # ========================================

        print('\nValidation...')

        # Put the model in evaluation mode.
        model.eval()

        # Turn off the gradient calculations.
        # This tells the model not to compute or store gradients.
        # This step saves memory and speeds up validation.
        torch.set_grad_enabled(False)

        # Reset the total loss for this epoch.
        total_val_loss = 0

        for j, val_batch in enumerate(val_dataloader):

            val_status = 'Batch ' + str(j+1) + ' of ' + str(len(val_dataloader))

            print(val_status, end='\r')

            b_input_ids = val_batch[0].to(device)
            b_input_mask = val_batch[1].to(device)
            b_token_type_ids = val_batch[2].to(device)
            b_labels = val_batch[3].to(device)      


            outputs = model(b_input_ids, 
                    token_type_ids=b_token_type_ids, 
                    attention_mask=b_input_mask, 
                    labels=b_labels)

            # Get the loss from the outputs tuple: (loss, logits)
            loss = outputs[0]

            # Convert the loss from a torch tensor to a number.
            # Calculate the total loss.
            total_val_loss = total_val_loss + loss.item()

            # Get the preds
            preds = outputs[1]

            # Move preds to the CPU
            val_preds = preds.detach().cpu().numpy()

            # Move the labels to the cpu
            targets_np = b_labels.to('cpu').numpy()

            # Append the labels to a numpy list
            targets_list.extend(targets_np)

            if j == 0:  # first batch
                stacked_val_preds = val_preds

            else:
                stacked_val_preds = np.vstack((stacked_val_preds, val_preds))
                
                
                
        # .........................................
        # Calculate the val accuracy for this fold
        # .........................................      


        # Calculate the validation accuracy
        y_true = targets_list
        y_pred = np.argmax(stacked_val_preds, axis=1)

        val_acc = accuracy_score(y_true, y_pred)
        
        epoch_acc_scores_list.append(val_acc)

        print('Val loss:' ,total_val_loss)
        print('Val acc: ', val_acc)
        
        # .........................
        # Save the best model
        # .........................
        
        if epoch == 0:
            # Save the Model
            model_name = 'model_' + str(fold_index) + '.bin'
            torch.save(model.state_dict(), model_name)
            print('Saved model as ', model_name)
            
        if epoch != 0:
            val_acc_list = fold_val_acc_list[fold_index]
            best_val_acc = max(val_acc_list)
            
            if val_acc > best_val_acc:
                # save the model
                model_name = 'model_' + str(fold_index) + '.bin'
                torch.save(model.state_dict(), model_name)
                print('Val acc improved. Saved model as ', model_name)
                
                
                
        # .....................................
        # Save the val_acc for this fold model
        # .....................................
        
        # Note: Don't do this before the above 'Save Model' code or 
        # the save model code won't work. This is because the best_val_acc will
        # become current val accuracy.
                
        # fold_val_acc_list is a list of lists.
        # Each fold model has it's own list corresponding to the fold index.
        # Here we choose a list corresponding to the fold number and append the acc score to that list.
        fold_val_acc_list[fold_index].append(val_acc)
        
        # Use the garbage collector to save memory.
        gc.collect()
        
        
    # .............................................................
    # Calculate the CV accuracy score over all folds in this epoch
    # .............................................................   
        
        
    # Print the average val accuracy for all 5 folds
    cv_acc = sum(epoch_acc_scores_list)/NUM_FOLDS_TO_TRAIN
    print("\nCV Acc:", cv_acc)


Num folds used for training: 3

== Fold Model 0


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
Batch 1 of 98176



Batch 22 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 1682 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 6461 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 7814 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 10296 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 11083 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 12054 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 13193 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 13745 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 14816 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 15937 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 17928 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 21316 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 25968 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 29546 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 29777 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 32534 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 33049 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 33524 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 34766 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 36577 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 37960 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 38942 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 39543 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 39766 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 40821 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 44137 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 44261 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 44441 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 47961 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 49427 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 49479 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 49698 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 49836 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 49879 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 50855 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 51792 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 52725 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 53263 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 55274 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 55690 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 58029 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 58584 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 60174 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 61276 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 61797 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 62635 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 63506 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 64689 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 65284 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 67896 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 70003 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 73188 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 75512 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 76271 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 78224 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 80028 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 80051 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 81286 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 85917 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 86389 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 89894 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 90644 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 91436 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 92644 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 93465 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 94981 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 95495 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 95686 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Train loss: 62182.2570842671

Validation...
Val loss: 3278.731503564166
Val acc:  0.81405
Saved model as  model_0.bin

== Fold Model 1


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
Batch 509 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 2793 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 3073 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 3491 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 3614 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 3826 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 4819 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 7688 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 9426 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 10430 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 12266 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 15007 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 17907 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 18410 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 18883 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 18906 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 24374 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 24600 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 25106 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 25820 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 26243 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 28888 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 30744 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 31865 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 32049 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 32473 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 32510 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 38326 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 39137 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 43618 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 43677 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 44136 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 44789 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 50045 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 50801 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 54043 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 55310 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 55501 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 58006 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 58231 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 58416 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 59064 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 59573 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 65541 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 65726 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 66507 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 68430 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 68966 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 71095 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 71956 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 73536 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 75952 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 78310 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 79094 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 79913 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 80462 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 81470 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 82298 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 85356 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 86821 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 88230 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 88260 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 88433 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 89558 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 89964 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 90346 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 90601 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 93713 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 95067 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Train loss: 63010.6075701816

Validation...
Val loss: 2971.0408669109456
Val acc:  0.80035
Saved model as  model_1.bin

== Fold Model 2


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
Batch 474 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 4301 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 5294 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 9967 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 10321 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 11980 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 12269 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 12441 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 13086 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 13511 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 13971 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 15640 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 18246 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 18306 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 19262 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 20180 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 21667 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 21991 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 22851 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 23081 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 25464 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 30946 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 31260 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 33824 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 37731 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 40032 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 42601 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 46562 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 48986 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 49018 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 50652 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 51024 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 52490 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 54412 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 54950 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 55375 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 55401 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 57771 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 58620 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 61720 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 63423 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 63485 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 65130 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 67410 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 69740 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 71642 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 74738 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 75306 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 76277 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 80428 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 81083 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 81771 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 83433 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 83827 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 85936 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 86457 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 87629 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 88696 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 89089 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 89436 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 89934 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 89935 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 90727 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 91141 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 91520 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 92228 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 95832 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 97208 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Batch 97599 of 98176

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Train loss: 61690.020144118695

Validation...
Val loss: 3333.2564018784324
Val acc:  0.81315
Saved model as  model_2.bin

CV Acc: 0.8091833333333334


In [34]:
# Display the accuracy scores for each fold model.  
# For info: 
# Fold model 0 is only training on fold 0 in each epoch.
# The same applies to the other fold models.

fold_val_acc_list

[[0.81405], [0.80035], [0.81315], [], []]

In [66]:
# Create the dataloader

test_data = TestDataset(df_test)
len(test_data)

9796

In [67]:
test_dataloader = torch.utils.data.DataLoader(test_data,
                                        batch_size=BATCH_SIZE,
                                        shuffle=False,
                                       num_workers=0)

print(len(test_dataloader))

2449


In [68]:
# ========================================
#               Test Set
# ========================================

print('\nTest Set...')

model_preds_list = []

print('Total batches:', len(test_dataloader))

for fold_index in range(0, NUM_FOLDS_TO_TRAIN):
    
    print('\nFold Model', fold_index)

    # Load the fold model
    path_model = 'model_' + str(fold_index) + '.bin'
    model.load_state_dict(torch.load(path_model))

    # Send the model to the device.
    model.to(device)


    stacked_val_labels = []
    

    # Put the model in evaluation mode.
    model.eval()

    # Turn off the gradient calculations.
    # This tells the model not to compute or store gradients.
    # This step saves memory and speeds up validation.
    torch.set_grad_enabled(False)


    # Reset the total loss for this epoch.
    total_val_loss = 0

    for j, h_batch in enumerate(test_dataloader):

        inference_status = 'Batch ' + str(j + 1)

        print(inference_status, end='\r')

        b_input_ids = h_batch[0].to(device)
        b_input_mask = h_batch[1].to(device)
        b_token_type_ids = h_batch[2].to(device)     


        outputs = model(b_input_ids, 
                token_type_ids=b_token_type_ids, 
                attention_mask=b_input_mask)


        # Get the preds
        preds = outputs[0]


        # Move preds to the CPU
        val_preds = preds.detach().cpu().numpy()
        
        
        # Stack the predictions.

        if j == 0:  # first batch
            stacked_val_preds = val_preds

        else:
            stacked_val_preds = np.vstack((stacked_val_preds, val_preds))

        
    model_preds_list.append(stacked_val_preds)
    
            
print('\nPrediction complete.')


Test Set...
Total batches: 2449

Fold Model 0




Batch 2449
Fold Model 1
Batch 2449
Fold Model 2
Batch 2449
Prediction complete.


In [69]:
model_preds_list

[array([[-2.3344262 ,  0.6812976 ,  1.9206414 ],
        [-1.8758273 , -1.5486668 ,  3.872561  ],
        [-1.9655329 , -1.645029  ,  3.691682  ],
        ...,
        [ 4.492677  , -0.90280604, -3.264653  ],
        [-1.0716397 ,  4.316246  , -3.094708  ],
        [-0.99402064, -0.7566957 ,  2.2035198 ]], dtype=float32),
 array([[-1.8577287 ,  1.8884472 , -0.35532168],
        [-2.1804807 , -0.6227617 ,  3.28483   ],
        [-1.5078266 , -1.1880586 ,  3.0865061 ],
        ...,
        [ 1.9394854 ,  0.17191732, -2.4049466 ],
        [-1.4307463 ,  3.7661881 , -2.6184115 ],
        [ 0.05966856, -0.23168522, -0.12326422]], dtype=float32),
 array([[-3.527332  ,  2.351716  ,  0.957407  ],
        [-2.091532  ,  0.78125805,  1.7542137 ],
        [-2.6522515 ,  0.9257445 ,  2.2227507 ],
        ...,
        [ 3.1736846 ,  0.15085834, -3.5222862 ],
        [-0.84656435,  3.9485471 , -3.6424134 ],
        [-0.72639185,  0.14315908,  1.3065228 ]], dtype=float32)]

In [70]:
# Sum the predictions of all fold models
for i, item in enumerate(model_preds_list):
    if i == 0:
        preds = item
    else:
        # Sum the matrices
        preds = item + preds
        
# Average the predictions
avg_preds = preds/(len(model_preds_list))

test_preds = np.argmax(avg_preds, axis=1)

In [71]:
test_preds

array([1, 2, 2, ..., 0, 1, 2], dtype=int64)

In [76]:
data = {'pairID': df_test['pairID'], 'gold_label': test_preds}

submission_df = pd.DataFrame(data)
submission_df.head()

Unnamed: 0,pairID,gold_label
0,9847,1
1,9848,2
2,9849,2
3,9850,1
4,9851,0


In [77]:
def convert_to_label(code):
    if code == 1:
        return "neutral"
    elif code == 0:
        return "contradiction"
    elif code == 2:
        return "entailment"
    
submission_df["gold_label"] = submission_df["gold_label"].apply(lambda x: convert_to_label(x))
submission_df.head()

Unnamed: 0,pairID,gold_label
0,9847,neutral
1,9848,entailment
2,9849,entailment
3,9850,neutral
4,9851,contradiction


In [78]:
len(submission_df)

9796

In [79]:
submission_df.to_csv('bert_mnli_kaggle_submission.csv', index=False)

<h3>My submission retrained on BERT scored 83.738% meaning that I need to alter or add features to improve the score.</h3>

![title](bert_retrained_result.png)