# Multi Choice Model


*   Take the train dataset generated from the 4 way self ensemble training and train a multi choice model.
*   Use Huggingface transformers: BertForMultipleChoice
*   Will preprocess and batch the text
*   Evaluate results



### Step 1: File set up. 

In [None]:
#Mount my drive so that I can access the split training sets. 

from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [None]:
#copy the training data to colab

%cp -R /content/drive/My\ Drive/train_with_split.csv /content/
%cp -R /content/drive/My\ Drive/dev_with_split.csv /content/


### Step 2: Set up GPU and HuggingFace

In [None]:
# Connect to GPU
import torch

if torch.cuda.is_available():     
    device = torch.device("cuda")
    print('There are %d GPU(s) available.' % torch.cuda.device_count())
    print('GPU:', torch.cuda.get_device_name(0))
else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")

There are 1 GPU(s) available.
GPU: Tesla P100-PCIE-16GB


In [None]:
!pip install transformers

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/27/3c/91ed8f5c4e7ef3227b4119200fc0ed4b4fd965b1f0172021c25701087825/transformers-3.0.2-py3-none-any.whl (769kB)
[K     |████████████████████████████████| 778kB 4.6MB/s 
[?25hCollecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
[K     |████████████████████████████████| 890kB 20.7MB/s 
[?25hCollecting tokenizers==0.8.1.rc1
[?25l  Downloading https://files.pythonhosted.org/packages/40/d0/30d5f8d221a0ed981a186c8eb986ce1c94e3a6e87f994eae9f4aa5250217/tokenizers-0.8.1rc1-cp36-cp36m-manylinux1_x86_64.whl (3.0MB)
[K     |████████████████████████████████| 3.0MB 31.0MB/s 
Collecting sentencepiece!=0.1.92
[?25l  Downloading https://files.pythonhosted.org/packages/d4/a4/d0a884c4300004a78cca907a6ff9a5e9fe4f090f5d95ab341c53d28cbc58/sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl 

### Step 3: Load Dataset

In [None]:
import pandas as pd

df = pd.read_csv('train_with_split.csv')
#the empty choice is converted to a NaN when I reload, so this will correct the issue.
df['a'].fillna("", inplace=True)

print('Number of training sentences: {:,}\n'.format(df.shape[0]))

mini_df = df.iloc[0:1000]
mini_df

In [None]:
# ------ FOR A MINI TRAINING SET ------

#Get the lists of sentences and their labels.
contexts = mini_df.context.values
questions = mini_df.question.values
choices = mini_df[['a','b','c','d','e']].values
#now converted to an INT
mini_df.correct_index = mini_df.correct_index.fillna(0)
labels = mini_df.correct_index.astype(int).values


# ------ FOR THE FULL  TRAINING SET ------
# contexts = df.context.values
# questions = df.question.values
# choices = df[['a','b','c','d','e']].values
# #now converted to an INT
# df.correct_index = df.correct_index.fillna(0)
# labels = df.correct_index.astype(int).values

print(labels.shape)
print(torch.tensor(labels).unsqueeze(0))

(1000,)
tensor([[3, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1,
         1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 0, 1, 1, 4, 1, 1, 1,
         2, 1, 4, 1, 2, 1, 0, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 2, 4, 1,
         0, 1, 1, 1, 1, 3, 1, 1, 2, 1, 2, 1, 0, 2, 4, 1, 1, 1, 2, 1, 0, 1, 1, 4,
         0, 1, 0, 1, 1, 1, 2, 1, 1, 3, 1, 1, 1, 3, 1, 1, 0, 2, 2, 1, 2, 1, 0, 1,
         1, 1, 0, 2, 0, 1, 1, 1, 3, 3, 1, 1, 1, 1, 1, 2, 3, 2, 0, 1, 1, 4, 1, 1,
         1, 1, 1, 1, 0, 1, 1, 4, 4, 1, 1, 2, 1, 1, 1, 3, 1, 1, 3, 1, 4, 2, 0, 1,
         1, 3, 2, 0, 1, 1, 1, 1, 1, 2, 3, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 0,
         1, 1, 1, 2, 1, 1, 4, 4, 1, 1, 1, 0, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1,
         3, 1, 2, 4, 1, 2, 1, 1, 1, 1, 2, 4, 0, 1, 2, 1, 4, 1, 0, 1, 2, 3, 1, 3,
         2, 1, 1, 3, 2, 4, 1, 1, 1, 0, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 0, 1, 1,
         4, 1, 1, 1, 2, 4, 2, 1, 1, 1, 2, 1, 1, 0, 3, 1, 2, 1, 1, 3, 3, 1, 2, 1,
         1, 2, 1, 3,

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value


### Step 4: Tokenize the Text

In [None]:
from transformers import BertTokenizer

# Load the BERT tokenizer.
tokenizer = BertTokenizer.from_pretrained('bert-base-cased', do_lower_case=False)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…




In [None]:
#Because the question and answer are combined, this may result
#questions with greater than 512 tokens.

max_len = 0
for sent in contexts:
    input_ids = tokenizer.encode(sent, add_special_tokens=True)
    max_len = max(max_len, len(input_ids))
print('Max sentence length: ', max_len)

#Quite a few errors here:  I will have to take the input length to max and truncate. 

Max sentence length:  450


In [None]:
# Tokenize all of the sentences and map the tokens to thier word IDs.

# contexts = mini_df.context.values
# questions = mini_df.question.values
# choices = mini_df[['a','b','c','d','e']].values
# labels = mini_df.correct_index.values

input_ids = []
attention_masks = []
choices_features = []

#---- THIS IS THE LOOP TO COMBINE THE QUESTIONS WITH THE CHOICES ----
for i in range(len(questions)):
    row = list(choices[i])
    temp_list = []
    for choice in row:
      text = (str(questions[i])+' '+str(choice))
      temp_list.append(text)

    encoded_dict = tokenizer(
                        [contexts[i],contexts[i],contexts[i], contexts[i], contexts[i]],
                        temp_list,
                        add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                        max_length = 384,           # Pad & truncate all sentences.
                        pad_to_max_length = True,
                        return_attention_mask = True,   # Construct attn. masks.
                        return_tensors = 'pt',
                        truncation = True)

# ---- THIS IS THE LOOP TO COMBINE THE QUESTIONS WITH THE CONTEXTS ----   
# for i in range(len(questions)):
#     row = list(choices[i])
#     temp_list = []
#     new_question = (str(questions[i])+' '+str(contexts[i]))
#     for choice in row:
#       text = (str(choice))
#       temp_list.append(text)

#     encoded_dict = tokenizer(
#                         [new_question,new_question,new_question,new_question,new_question],
#                         temp_list,
#                         add_special_tokens = True, # Add '[CLS]' and '[SEP]'
#                         max_length = 384,           # Pad & truncate all sentences.
#                         pad_to_max_length = True,
#                         return_attention_mask = True,   # Construct attn. masks.
#                         return_tensors = 'pt',
#                         truncation = True)



# #Add the encoded sentence to the list.
    input_ids.append(encoded_dict['input_ids'])
    attention_masks.append(encoded_dict['attention_mask'])

# # Convert the lists into tensors.
#input_ids = torch.cat(input_ids, dim=0)
input_ids = torch.stack(input_ids)
attention_masks = torch.stack(attention_masks)
labels = torch.tensor(labels).long()

# # Print sentence 0, now as a list of IDs.
# print('Original: ', contexts[0])
# print('Token IDs:', input_ids[0])
# print('Labels', labels[0])

In [None]:
# torch.save(input_ids, '/content/drive/My Drive/input_ids_384.pt')
# torch.save(attention_masks, '/content/drive/My Drive/attn_mask_384.pt')
# torch.save(labels, '/content/drive/My Drive/labels_384.pt')

print(input_ids.size(0))
print(attention_masks.size(0))
print(labels.size(0))

1000
1000
1000


In [None]:
input_ids = torch.load('/content/drive/My Drive/input_ids_384.pt')
attention_masks = torch.load('/content/drive/My Drive/attn_mask_384.pt')
labels = torch.load('/content/drive/My Drive/labels_384.pt')

print(input_ids.size(0))
print(attention_masks.size(0))
print(labels.size(0))

130319
130319
130319


In [None]:
# Going to do some prevalidation so I can watch the training loss
# Before I run it on the dev set. 

from torch.utils.data import TensorDataset, random_split

# Combine the training inputs into a TensorDataset.
dataset = TensorDataset(input_ids, attention_masks, labels)

# Calculate the number of samples to include in each set.
train_size = int(0.9 * len(dataset))
val_size = len(dataset) - train_size

# Divide the dataset by randomly selecting samples.
train_dataset, val_dataset = random_split(dataset, [train_size, val_size])

print('{:>5,} training samples'.format(train_size))
print('{:>5,} validation samples'.format(val_size))



117,287 training samples
13,032 validation samples


In [None]:
# Set Up data Loader 

from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

batch_size = 4
# Create the DataLoaders for our training and validation sets.
train_dataloader = DataLoader(
            train_dataset,  # The training samples.
            sampler = RandomSampler(train_dataset), # Select batches randomly
            batch_size = batch_size)

# For validation the order doesn't matter, so we'll just read them sequentially.
validation_dataloader = DataLoader(
            val_dataset, # The validation samples.
            sampler = SequentialSampler(val_dataset), # Pull out batches sequentially.
            batch_size = batch_size)


### Step 5: Load model 

In [None]:
# Load the pretrained Bert Model for multiple choice. 
 
from transformers import BertForMultipleChoice, AdamW, BertConfig

### NEED TO FIGURE OUT HOW TO TRAIN THIS MODEL FOR MULTIPLE CHOICE.
model = BertForMultipleChoice.from_pretrained(
    "bert-base-cased",
    num_labels = 5,  
    output_attentions = False, 
    output_hidden_states = False)

model.cuda()

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=433.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=435779157.0, style=ProgressStyle(descri…




Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMultipleChoice: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForMultipleChoice from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForMultipleChoice from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultipleChoice were not initialized from the model checkpoint at bert-base-cased and are newly ini

BertForMultipleChoice(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(28996, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_aff

In [None]:
# Set optimizer 

optimizer = AdamW(model.parameters(),
                  lr = 2e-5, # args.learning_rate 
                  eps = 1e-8 # args.adam_epsilon  
                )

In [None]:
from transformers import get_linear_schedule_with_warmup

epochs = 1
# Total number of training steps is [number of batches] x [number of epochs]. 
total_steps = len(train_dataloader) * epochs

# Create the learning rate scheduler.
scheduler = get_linear_schedule_with_warmup(optimizer, 
                                            num_warmup_steps = 0, # Default value in run_glue.py
                                            num_training_steps = total_steps)

### Step 6: Training loop

In [None]:
# Helper functions for training and timing.

import numpy as np
import time
import datetime
import random
import matplotlib.pyplot as plt
% matplotlib inline
import seaborn as sns
import os

# Function to calculate the accuracy of our predictions vs labels
def flat_accuracy(preds, labels):
    pred_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()
    return np.sum(pred_flat == labels_flat) / len(labels_flat)

# Format as hh:mm:ss
def format_time(elapsed):
    elapsed_rounded = int(round((elapsed)))
    return str(datetime.timedelta(seconds=elapsed_rounded))

  import pandas.util.testing as tm


In [None]:
# memory footprint support libraries/code
!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
!pip install gputil
!pip install psutil
!pip install humanize

import psutil
import humanize
import os
import GPUtil as GPU

GPUs = GPU.getGPUs()
# XXX: only one GPU on Colab and isn’t guaranteed
gpu = GPUs[0]
def printm():
    process = psutil.Process(os.getpid())
    print("Gen RAM Free: " + humanize.naturalsize(psutil.virtual_memory().available), " |     Proc size: " + humanize.naturalsize(process.memory_info().rss))
    print("GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total     {3:.0f}MB".format(gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil*100, gpu.memoryTotal))
printm()

Collecting gputil
  Downloading https://files.pythonhosted.org/packages/ed/0e/5c61eedde9f6c87713e89d794f01e378cfd9565847d4576fa627d758c554/GPUtil-1.4.0.tar.gz
Building wheels for collected packages: gputil
  Building wheel for gputil (setup.py) ... [?25l[?25hdone
  Created wheel for gputil: filename=GPUtil-1.4.0-cp36-none-any.whl size=7413 sha256=8371ad299b82d8a3b25759b229f315337a4098a67b064a33b64d1a457cbb373b
  Stored in directory: /root/.cache/pip/wheels/3d/77/07/80562de4bb0786e5ea186911a2c831fdd0018bda69beab71fd
Successfully built gputil
Installing collected packages: gputil
Successfully installed gputil-1.4.0
Gen RAM Free: 24.2 GB  |     Proc size: 4.3 GB
GPU RAM Free: 15015MB | Used: 1265MB | Util   8% | Total     16280MB


In [None]:
#Set Seed
seed_val = 1
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)

training_stats = []
total_t0 = time.time()
#Training Loop
for epoch_i in range(0, epochs):
    print('======== Epoch {:} / {:} ========'.format(epoch_i + 1, epochs))
    print('Training...')

    t0 = time.time()
    total_train_loss = 0 #Reset total loss. 
    model.train() #put model into training mode.

    # Iterate through the batch.
    for step, batch in enumerate(train_dataloader):
        if step % 100 == 0 and not step == 0:
            elapsed = format_time(time.time() - t0)
            print('  Batch {:>5,}  of  {:>5,}.    Elapsed: {:}.'.format(step, len(train_dataloader), elapsed))

        b_input_ids = batch[0].to(device)
        b_input_mask = batch[1].to(device)
        b_labels = batch[2].to(device)
    
        model.zero_grad() #reset gradient       

        loss, logits = model(b_input_ids, 
                             token_type_ids=None, 
                             attention_mask=b_input_mask, 
                             labels=b_labels)

        total_train_loss += loss.item() #calc loss
        loss.backward() #update gradients 
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0) #clip the gradients
        optimizer.step() #update parameters 
        scheduler.step() # Update the learning rate.

    # Calculate the average loss over all of the batches.
    avg_train_loss = total_train_loss / len(train_dataloader)            
    # Measure how long this epoch took.
    training_time = format_time(time.time() - t0)
    print("  Average training loss: {0:.2f}".format(avg_train_loss))
    print("  Training epcoh took: {:}".format(training_time))
        
    # Validation
    print("Running Validation...")
    t0 = time.time()
    model.eval() #put the model in evaluation mode. 
    total_eval_accuracy = 0
    total_eval_loss = 0
    nb_eval_steps = 0

    # Evaluate data for one epoch
    for batch in validation_dataloader:
        
        b_input_ids = batch[0].to(device)
        b_input_mask = batch[1].to(device)
        b_labels = batch[2].to(device)
        
        with torch.no_grad():        
 
            (loss, logits) = model(b_input_ids, 
                                   token_type_ids=None, 
                                   attention_mask=b_input_mask,
                                   labels=b_labels)
            
        total_eval_loss += loss.item() #calc loss
        logits = logits.detach().cpu().numpy() # Move logits and labels to CPU
        label_ids = b_labels.to('cpu').numpy()
        total_eval_accuracy += flat_accuracy(logits, label_ids) # running accuracy
        
    # Report the final accuracy for this validation run.
    avg_val_accuracy = total_eval_accuracy / len(validation_dataloader)
    print("  Accuracy: {0:.2f}".format(avg_val_accuracy))

    # Calculate the average loss over all of the batches.
    avg_val_loss = total_eval_loss / len(validation_dataloader)
    
    # Measure how long the validation run took.
    validation_time = format_time(time.time() - t0)
    
    print("  Validation Loss: {0:.2f}".format(avg_val_loss))
    print("  Validation took: {:}".format(validation_time))

    # Record all statistics from this epoch.
    training_stats.append(
        {
            'epoch': epoch_i + 1,
            'Training Loss': avg_train_loss,
            'Valid. Loss': avg_val_loss,
            'Valid. Accur.': avg_val_accuracy,
            'Training Time': training_time,
            'Validation Time': validation_time
        }
    )

print("")
print("Training complete!")

print("Total training took {:} (h:mm:ss)".format(format_time(time.time()-total_t0)))

Training...
  Batch   100  of  29,322.    Elapsed: 0:01:15.
  Batch   200  of  29,322.    Elapsed: 0:02:29.
  Batch   300  of  29,322.    Elapsed: 0:03:43.
  Batch   400  of  29,322.    Elapsed: 0:04:57.
  Batch   500  of  29,322.    Elapsed: 0:06:12.
  Batch   600  of  29,322.    Elapsed: 0:07:26.
  Batch   700  of  29,322.    Elapsed: 0:08:40.
  Batch   800  of  29,322.    Elapsed: 0:09:55.
  Batch   900  of  29,322.    Elapsed: 0:11:09.
  Batch 1,000  of  29,322.    Elapsed: 0:12:23.
  Batch 1,100  of  29,322.    Elapsed: 0:13:38.
  Batch 1,200  of  29,322.    Elapsed: 0:14:52.
  Batch 1,300  of  29,322.    Elapsed: 0:16:06.
  Batch 1,400  of  29,322.    Elapsed: 0:17:21.
  Batch 1,500  of  29,322.    Elapsed: 0:18:35.
  Batch 1,600  of  29,322.    Elapsed: 0:19:49.
  Batch 1,700  of  29,322.    Elapsed: 0:21:03.
  Batch 1,800  of  29,322.    Elapsed: 0:22:18.
  Batch 1,900  of  29,322.    Elapsed: 0:23:32.
  Batch 2,000  of  29,322.    Elapsed: 0:24:46.
  Batch 2,100  of  29,322.  

### Step 7: Visualize Training Results

In [None]:
#Make a dataframe of results. 

# Display floats with two decimal places.
pd.set_option('precision', 2)
# Create a DataFrame from our training statistics.
df_stats = pd.DataFrame(data=training_stats)
# Use the 'epoch' as the row index.
df_stats = df_stats.set_index('epoch')
# A hack to force the column headers to wrap.
#df = df.style.set_table_styles([dict(selector="th",props=[('max-width', '70px')])])
# Display the table.
df_stats

In [None]:
#Plot the results from the Dataframe

# Use plot styling from seaborn.
sns.set(style='darkgrid')

# Increase the plot size and font size.
sns.set(font_scale=1.5)
plt.rcParams["figure.figsize"] = (12,6)

# Plot the learning curve.
plt.plot(df_stats['Training Loss'], 'b-o', label="Training")
plt.plot(df_stats['Valid. Loss'], 'g-o', label="Validation")

# Label the plot.
plt.title("Training & Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()
plt.xticks([1, 2, 3, 4])

plt.show()

### Step 8: Prep Dev Set


In [None]:
import pandas as pd

#import Dev set. 
dev_df = pd.read_csv('dev_with_split.csv')
#the empty choice is converted to a NaN when I reload, so this will correct the issue.
dev_df['a'].fillna("", inplace=True)

print('Number of dev sentences: {:,}\n'.format(dev_df.shape[0]))

Number of training sentences: 11,873



In [None]:
#pull out the relevant columns.

contexts = dev_df.context.values
questions = dev_df.question.values
choices = dev_df[['a','b','c','d','e']].values
#now converted to an INT
dev_df.correct_index = dev_df.correct_index.fillna(1)
labels = dev_df.correct_index.astype(int).values

In [None]:
input_ids = []
attention_masks = []
choices_features = []

#---- THIS IS THE LOOP TO COMBINE THE QUESTIONS WITH THE CHOICES ----
for i in range(len(questions)):
    row = list(choices[i])
    temp_list = []
    for choice in row:
      text = (str(questions[i])+' '+str(choice))
      temp_list.append(text)

    encoded_dict = tokenizer(
                        [contexts[i],contexts[i],contexts[i], contexts[i], contexts[i]],
                        temp_list,
                        add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                        max_length = 384,           # Pad & truncate all sentences.
                        pad_to_max_length = True,
                        return_attention_mask = True,   # Construct attn. masks.
                        return_tensors = 'pt',
                        truncation = True)

# #Add the encoded sentence to the list.
    input_ids.append(encoded_dict['input_ids'])
    attention_masks.append(encoded_dict['attention_mask'])

# # Convert the lists into tensors.
#input_ids = torch.cat(input_ids, dim=0)
input_ids = torch.stack(input_ids)
attention_masks = torch.stack(attention_masks)
labels = torch.tensor(labels).long()

In [None]:
#Check the shape to make sure it worked correctly. 
print(input_ids.size(0))
print(attention_masks.size(0))
print(labels.size(0))

11873
11873
11873


In [None]:
# Set the batch size.  
batch_size = 4  

# Create the DataLoader.
prediction_data = TensorDataset(input_ids, attention_masks, labels)
prediction_sampler = SequentialSampler(prediction_data)
prediction_dataloader = DataLoader(prediction_data, sampler=prediction_sampler, batch_size=batch_size)

### Step 9: Evaluate Dev Set


In [None]:
# Prediction on test set
print('Predicting labels for {:,} test sentences...'.format(len(input_ids)))

# Put model in evaluation mode
model.eval()
predictions , true_labels = [], []

# Predict 
for batch in prediction_dataloader:
  # Add batch to GPU
  batch = tuple(t.to(device) for t in batch)
  # Unpack the inputs from our dataloader
  b_input_ids, b_input_mask, b_labels = batch
  # Telling the model not to compute or store gradients, saving memory and 
  # speeding up prediction
  with torch.no_grad():
      # Forward pass, calculate logit predictions
      outputs = model(b_input_ids, token_type_ids=None, 
                      attention_mask=b_input_mask)
  logits = outputs[0]

  # Move logits and labels to CPU
  logits = logits.detach().cpu().numpy()
  label_ids = b_labels.to('cpu').numpy()
  
  # Store predictions and true labels
  predictions.append(logits)
  true_labels.append(label_ids)

print('    DONE.')

Predicting labels for 11,873 test sentences...
    DONE.


In [None]:
#len(true_labels)
def get_label_list(true_labels):
  full_label_list = []
  for i in range(len(true_labels)):
    for j in range(len(true_labels[i])):
      full_label_list.append(true_labels[i][j])
  return full_label_list

full_labels = get_label_list(true_labels)

In [None]:
answer_df = dev_df[['a','b','c','d','e']]

for i in range(10):
  print(answer_df.iloc[i, full_labels[i]])

France
10th and 11th centuries
Denmark, Iceland and Norway
Rollo
10th century




William the Conqueror


In [None]:
def get_pred_dict(full_labels, df):
    pred_dict = {}
    for i in range(len(full_labels)):
        key = str(dev_df['id'][i])
        best_guess = str(df.iloc[i, full_labels[i]])
        pred_dict[key] = best_guess
    return pred_dict 

def output_predictions(predictions):
    with open('preds.json', 'w', encoding = 'utf-8') as json_file:
        json.dump(pred_dict, json_file, ensure_ascii=True)


In [None]:
import json

answer_df = dev_df[['a','b','c','d','e']]

pred_dict = get_pred_dict(full_labels, answer_df)
output_predictions(pred_dict)


In [None]:
# save a copy in my drive.

%cp -R /content/preds.json /content/drive/My\ Drive/model_save 

In [None]:
# Clone SQUAD repo for the evaluation file.
# Move the eval file to my content folder 

!git clone https://github.com/white127/SQUAD-2.0-bidaf.git
%mv /content/SQUAD-2.0-bidaf/evaluate-v2.0.py /content/

In [None]:
# Still download the Dev set.
!wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json

--2020-07-17 21:24:10--  https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json
Resolving rajpurkar.github.io (rajpurkar.github.io)... 185.199.108.153, 185.199.109.153, 185.199.110.153, ...
Connecting to rajpurkar.github.io (rajpurkar.github.io)|185.199.108.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4370528 (4.2M) [application/json]
Saving to: ‘dev-v2.0.json’


2020-07-17 21:24:10 (19.9 MB/s) - ‘dev-v2.0.json’ saved [4370528/4370528]



In [1]:
print("Results for SE-4, with 5 way Mutli Choice")
!python evaluate-v2.0.py dev-v2.0.json preds.json


### Step 10: Save Fine-Tuned Model 

In [None]:
##### MAKE SURE YOU MOVE A COPY TO YOUR BUCKET.

# Saving best-practices: if you use defaults names for the model, you can reload it using from_pretrained()

output_dir = '/content/drive/My Drive/model_save/'

# Create output directory if needed
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

print("Saving model to %s" % output_dir)

# Save a trained model, configuration and tokenizer using `save_pretrained()`.
# They can then be reloaded using `from_pretrained()`
model_to_save = model.module if hasattr(model, 'module') else model  # Take care of distributed/parallel training
model_to_save.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)
torch.save(model.state_dict(), '/content/drive/My Drive/model_save/model_state_dict.pth')

# Good practice: save your training arguments together with the trained model
#torch.save(args, os.path.join(output_dir, 'training_args.bin'))

Saving model to /content/drive/My Drive/model_save/


NameError: ignored