# NLP Coursework 2021: Assessing the Funniness of Edited News Headlines

Monika Jotautaite, Anna Hledikova, Candela Martinez Mirat

### README

This notebook contains code for Task 2 of the Codalab competition called *Assessing the Funniness of Edited News Headlines*. \\

The code for both Approach 1 and Approach 2 of the coursework is included and the notebook can be run as is. 
However, please note that some cells take a while to run, so we recommend skimming through the saved outputs first  (e.g. for hyperparamater search results).

Approach 1 contains 6 version of the BERT model, with hyperparameter search done for the second version, which showed the most promise.

Approach 2 contains our experiments described in section 4 of the report, namely applying a FFNN to evaluate 'funniness' of individual words and a CBOW for headline classification based on the compatibility of the original headline and the edit word.
For more details on the notebook structure please see the table of contents.

### Initial set-up:

In [None]:
!pip install transformers

In [None]:
# Imports
import random
import torch
import torch.nn as nn
import torch.nn.functional as F
import pandas as pd
import numpy as np

from sklearn.feature_extraction.text import CountVectorizer
from torch.utils.data import Dataset, random_split, DataLoader
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
import torch.optim as optim
import codecs
import tqdm


from transformers import BertForSequenceClassification, AdamW
import re
from transformers import BertTokenizer
from transformers import get_linear_schedule_with_warmup

from torch.autograd import Variable

In [None]:
# Setting random seed and device

def seed(value=1):
    """
    We set a random seed for better reproducibility.
    """
    random.seed(value)
    np.random.seed(value)
    torch.manual_seed(value)
    torch.cuda.manual_seed_all(value)
    torch.backends.cudnn.deterministic = True

seed()

use_cuda = torch.cuda.is_available()
device = torch.device("cuda:0" if use_cuda else "cpu")

In [None]:
# get the data from google drive

!wget -O test.csv https://drive.google.com/u/0/uc?id=1QCY3SPdVj5QMygL0mIZlv8vZ3PMRIcs4&export=download
!wget -O dev.csv https://drive.google.com/u/0/uc?id=19B26WPcwh0USNcflb9ab0AVmg88LdCNS&export=download
!wget -O train_funlines.csv https://drive.google.com/u/0/uc?id=1FAquNgmICfCuMZRefSB0GNGKnDSskDIh&export=download
!wget -O train.csv https://drive.google.com/u/0/uc?id=1sx15OVfmoEelmsNHlY4JtspLjHGRN1f6&export=download
!wget -O healines_df.csv https://drive.google.com/u/0/uc?id=1XrACKBhj3xztL63-RZOD84QpRPjAjC8k&export=download

In [None]:
# Load data
train_df = pd.read_csv('train.csv')
test_df = pd.read_csv('test.csv')
dev_df = pd.read_csv('dev.csv')

# Load additional training data
train_extra_df = pd.read_csv('train_funlines.csv')

extra_headlines = pd.read_csv('healines_df.csv')

## Approach 1: Pre-trained models

The following functions help prepare our data corpus and labels for train, dev and test sets.

In [None]:
# Set hyperparameters

epochs = 4 # BERT authors recommend 2-4
batch_size = 32

In [None]:
def get_orig_headl_and_new_word_tuples(data):
    """
    Takes a pandas data frame as input.
    Each sample of the data set contains a headline article, two edited
    versions of this article and a label indicating which of the two edits is
    funnier.
    Selects relevant columns of the input data, one with the original headlines
    and one with the new word to be inserted instead of one of its words, and 
    converts them into lists of tuples.
    """

    headlines_to_edit_1 = [(original_1, new_word_1) for (original_1, new_word_1) \
                           in zip(data.original1.to_list(), data.edit1.to_list())]

    headlines_to_edit_2 = [(original_2, new_word_2) for (original_2, new_word_2) \
                           in zip(data.original2.to_list(), data.edit2.to_list())]

    labels = data.label.to_list()

    return headlines_to_edit_1, headlines_to_edit_2, labels


def get_edited_headlines(headline_tuples:list)-> list:
    """
    Takes a list of tuples of form (original_headline, new_word) as input.
    Returns a list of edited headlines.
    """
    # list of new edited headlines
    edited_headlines = []

    # The word to be replaced in each sentence is denoted as follows:
    # <word/> to be replaced
    pattern = re.compile(r'\<(.*?)\/\>')
    
    for original, new_word in headline_tuples:
      edited_headline = pattern.sub(new_word, original)
      edited_headlines.append(edited_headline)

    return edited_headlines


def get_original_headlines(data):
    """
    Takes a pandas data frame as input.
    Returns a list of the original headlines without brackets around the word
    to be replaced.
    """

    original_headlines = []
    pattern = re.compile(r'\<(.*?)\/\>')

    for headline in data.original1.to_list():

        # finds the word to be replaced
        origin_word = re.search('\<(.*?)\/\>', headline)

        #removes the <, > and / symbols around the word
        origin_word = re.sub('[<>/]', '', origin_word.group())

        orig_headline = pattern.sub(origin_word, headline)
        original_headlines.append(orig_headline)
    
    return original_headlines
    

In [None]:
### process training data

orig_train_headlines = get_original_headlines(train_df)
h_to_edit_1, h_to_edit_2, train_labels = get_orig_headl_and_new_word_tuples(train_df)

edited_headlines_1 = get_edited_headlines(h_to_edit_1)
edited_headlines_2 = get_edited_headlines(h_to_edit_2)

edited_headlines_1_2 = [tup for tup in zip(edited_headlines_1, edited_headlines_2)]

print('Original headline:', h_to_edit_1[0][0])
print('Edited headline:', edited_headlines_1[0])

### process dev data

orig_dev_headlines = get_original_headlines(dev_df)
dev_h_to_edit_1, dev_h_to_edit_2, dev_labels = get_orig_headl_and_new_word_tuples(dev_df)

dev_edited_headlines_1 = get_edited_headlines(dev_h_to_edit_1)
dev_edited_headlines_2 = get_edited_headlines(dev_h_to_edit_2)

### process extended training data

ext_train_df = pd.concat([train_df, train_extra_df])

orig_ext_headlines = get_original_headlines(ext_train_df)
ext_h_to_edit_1, ext_h_to_edit_2, ext_labels = get_orig_headl_and_new_word_tuples(ext_train_df)

ext_edited_headlines_1 = get_edited_headlines(ext_h_to_edit_1)
ext_edited_headlines_2 = get_edited_headlines(ext_h_to_edit_2)

### process test data

orig_test_headlines = get_original_headlines(test_df)
t_h_to_edit_1, t_h_to_edit_2, test_labels = get_orig_headl_and_new_word_tuples(test_df)
                                                                            
test_edited_headlines_1 = get_edited_headlines(t_h_to_edit_1)
test_edited_headlines_2 = get_edited_headlines(t_h_to_edit_2)


Original headline: " Gene Cernan , Last <Astronaut/> on the Moon , Dies at 82 "
Edited headline: " Gene Cernan , Last Dancer on the Moon , Dies at 82 "


Below, we tokenize our corpuses.

In [None]:
# Load the BERT tokenizer.

def find_max_len(sentences, n_components):
    max_len = 0
    for sentence in sentences:
        max_len = max(max_len, len(sentence.split()))
    max_len *= n_components
    max_len += n_components
    return max_len

# get the maximum input length for padding purposes
max_h_len = max(find_max_len(edited_headlines_1, 2), 
                find_max_len(ext_edited_headlines_1,2))

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)

### train set

train_set_embedding = tokenizer(edited_headlines_1, edited_headlines_2, 
                            max_length = max_h_len, padding = 'max_length',
                            truncation = True, return_tensors ="pt")

###dev set

dev_set_embedding = tokenizer(dev_edited_headlines_1, dev_edited_headlines_2, 
                            max_length = max_h_len, padding = 'max_length',
                            truncation = True, return_tensors ="pt")

###test set

test_set_embedding = tokenizer(test_edited_headlines_1, test_edited_headlines_2, 
                            max_length = max_h_len, padding = 'max_length',
                            truncation = True, return_tensors ="pt")

### extended train set (incl. funlines)

ext_set_embedding = tokenizer(ext_edited_headlines_1, ext_edited_headlines_2, 
                            max_length = max_h_len, padding = 'max_length',
                            truncation = True, return_tensors ="pt")


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…




In [None]:
####################### PROVIDED (ADJUSTED) #######################

# We create a Dataset so we can create minibatches

class Task2Dataset_BERT(Dataset):

    def __init__(self, ids, att_mask, token_type_id, labels):
        self.x1_train = ids.to(device)
        self.x2_train = att_mask.to(device)
        self.x3_train = token_type_id.to(device)
        self.y_train = labels.to(device)

    def __len__(self):
        return len(self.y_train)

    def __getitem__(self, item):
        return self.x1_train[item],self.x2_train[item],self.x3_train[item], self.y_train[item]

Below we create Task2Dataset_BERT class instances and DataLoader objects to be fed into the BERT models.

In [None]:
### train set

train_dataset = Task2Dataset_BERT(train_set_embedding['input_ids'], 
                                  train_set_embedding['attention_mask'], 
                                  train_set_embedding['token_type_ids'], 
                                  torch.tensor(train_labels))

train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)


### dev set

dev_dataset = Task2Dataset_BERT(dev_set_embedding['input_ids'], 
                                  dev_set_embedding['attention_mask'], 
                                  dev_set_embedding['token_type_ids'], 
                                  torch.tensor(dev_labels))

dev_dataloader = DataLoader(dev_dataset, batch_size=batch_size, shuffle=True)

### test set

test_dataset = Task2Dataset_BERT(test_set_embedding['input_ids'], 
                                  test_set_embedding['attention_mask'], 
                                  test_set_embedding['token_type_ids'], 
                                  torch.tensor(test_labels))

test_dataloader = DataLoader(test_dataset, batch_size=batch_size, shuffle=True)

### extended train set

ext_dataset = Task2Dataset_BERT(ext_set_embedding['input_ids'], 
                                ext_set_embedding['attention_mask'], 
                                ext_set_embedding['token_type_ids'], 
                                torch.tensor(ext_labels))

ext_dataloader = DataLoader(ext_dataset, batch_size=batch_size, shuffle=True)

The following are the functions provided (with slight adjustments) for model training and evaluation.

In [None]:
####################### PROVIDED #######################

def model_performance(output, target, print_output=False):
    """
    Returns accuracy per batch, 
    i.e. if you get 8/10 right, this returns 0.8, NOT 8
    """
    correct_answers = (output == target)
    correct = sum(correct_answers)
    acc = np.true_divide(correct,len(output))

    if print_output:
        print(f'| Acc: {acc:.2f} ')

    return correct, acc

In [None]:
####################### PROVIDED (ADJUSTED) #######################

# Adjustments are mainly the commented out lines 
# (these are left in for the marker's convenience)

def eval(data_iter, model):
    """
    Evaluating model performance on the dev set
    """
    model.eval()
    epoch_loss = 0
    epoch_correct = 0
    pred_all = []
    trg_all = []
    no_observations = 0

    with torch.no_grad():
        for batch in data_iter:
            id, att_mask, token_type_id, labels = batch

            #feature, target = feature.to(device), target.to(device)

            # for RNN:
            #model.batch_size = target.shape[0]
            no_observations = no_observations + labels.shape[0]
            #model.hidden = model.init_hidden()
            out = model(id, attention_mask = att_mask,
                                    token_type_ids = token_type_id,
                                    labels = labels)
            loss = out[0]
            preds = out[1]

            # We get the mse
            
            correct, __ = model_performance(
                np.argmax(preds.detach().cpu().numpy(), axis=1), 
                labels.cpu().numpy())

            epoch_loss += loss.item()*labels.shape[0]
            epoch_correct += correct
            pred_all.extend(preds.detach())
            trg_all.extend(labels.detach())

    return epoch_loss/no_observations, epoch_correct/no_observations, np.array(pred_all), np.array(trg_all)

In [None]:
####################### PROVIDED (ADJUSTED) #######################

# Similarly to the eval function, djustments are mainly the commented out lines 
# (& these are left in for the marker's convenience)

def train(train_iter, model, number_epoch, optimizer, scheduler, dev_iter = None):
    """
    Training loop for the model, which calls on eval to evaluate after each epoch
    """
    print("Training model.")
    model = model.to(device)

    for epoch in range(1, number_epoch+1):
        
        model.train()
        
        epoch_loss = 0
        epoch_correct = 0
        no_observations = 0  # Observations used for training so far

        for batch in train_iter:
            id, att_mask, token_type_id, labels = batch

            # for RNN:
            #model.batch_size = target.shape[0]
            #model.hidden = model.init_hidden()
            no_observations = no_observations + labels.shape[0]

            optimizer.zero_grad()
            out = model(id, attention_mask = att_mask,
                                    token_type_ids = token_type_id,
                                    labels = labels)
            
            loss = out[0]
            preds = out[1]

            correct, __ = model_performance(
                np.argmax(preds.detach().cpu().numpy(), axis=1), 
                labels.cpu().numpy())

            loss.backward()
            optimizer.step()
            scheduler.step()

            epoch_loss += loss.item()*labels.shape[0]
            epoch_correct += correct

        valid_loss, valid_acc, __, __ = eval(dev_iter, model)

        epoch_loss, epoch_acc = epoch_loss / no_observations, epoch_correct / no_observations        
        print(f'| Epoch: {epoch:02} | Train Loss: {epoch_loss:.2f} | Train Accuracy: {epoch_acc:.2f} | \
        Val. Loss: {valid_loss:.2f} | Val. Accuracy: {valid_acc:.2f} |')

#### BERT Version 1: BertForSequenceClassification() using the base training data

In [None]:
# Load the BertForSequenceClassification model

model = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                      num_labels = 3,   
                                                      output_attentions = False,
                                                      output_hidden_states = False)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=433.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=440473133.0, style=ProgressStyle(descri…




Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

In [None]:
# Hyperparameters for BERT v.1:

decay = 1e-2
lr = 1e-05
steps = len(train_dataloader) * epochs
wu = 0.06
wu_steps = int(steps * wu)

no_decay = ['bias', 'LayerNorm.weight']

optimizer_grouped_parameters = [
    {'params': [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': decay},
    {'params': [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}]


optimizer = AdamW(optimizer_grouped_parameters, lr=lr)
scheduler = get_linear_schedule_with_warmup(optimizer, 
                                            num_warmup_steps = wu_steps,
                                            num_training_steps = steps)

In [None]:
train(train_dataloader, model, epochs,optimizer, scheduler, dev_dataloader)

In [None]:
# Evaluation on test set

test_loss, test_acc, __, __ = eval(test_dataloader, model)
print(test_acc)

0.46587837837837837


#### BERT Version 2: BertForSequenceClassification() using the base training data, but removing data with label 0

Note: This has not been used in the report for consistency reasons

In [None]:
# Train the same model but this time without the zero label data

# train set

h_to_edit_1, h_to_edit_2, train_labels = get_orig_headl_and_new_word_tuples(train_df)

filtered_train_data = [(h_1, h_2, label) for (h_1, h_2, label) in zip(h_to_edit_1, h_to_edit_2, train_labels) if label != 0 ]

filt_h_to_edit_1 = [tup[0] for tup in filtered_train_data]
filt_h_to_edit_2 = [tup[1] for tup in filtered_train_data]
filt_labels = [tup[2]-1 for tup in filtered_train_data]

filt_edited_headlines_1 = get_edited_headlines(filt_h_to_edit_1)
filt_edited_headlines_2 = get_edited_headlines(filt_h_to_edit_2)

filt_train_set_embedding = tokenizer(filt_edited_headlines_1, filt_edited_headlines_2, 
                            max_length = max_h_len, padding = 'max_length',
                            truncation = True, return_tensors ="pt")


filt_train_dataset = Task2Dataset_BERT(filt_train_set_embedding['input_ids'], 
                                  filt_train_set_embedding['attention_mask'], 
                                  filt_train_set_embedding['token_type_ids'], 
                                  torch.tensor(filt_labels))


filt_train_dataloader = DataLoader(filt_train_dataset, batch_size=batch_size, shuffle=True)

# dev set

dev_h_to_edit_1, dev_h_to_edit_2, dev_labels = get_orig_headl_and_new_word_tuples(dev_df)

filtered_dev_data = [(h_1, h_2, label) for (h_1, h_2, label) in zip(dev_h_to_edit_1, dev_h_to_edit_2, dev_labels) if label != 0 ]

filt_dev_h_to_edit_1 = [tup[0] for tup in filtered_dev_data]
filt_dev_h_to_edit_2 = [tup[1] for tup in filtered_dev_data]
filt_dev_labels = [tup[2]-1 for tup in filtered_dev_data]

filt_dev_edited_headlines_1 = get_edited_headlines(filt_dev_h_to_edit_1)
filt_dev_edited_headlines_2 = get_edited_headlines(filt_dev_h_to_edit_2)

filt_dev_set_embedding = tokenizer(filt_dev_edited_headlines_1, filt_dev_edited_headlines_2, 
                            max_length = max_h_len, padding = 'max_length',
                            truncation = True, return_tensors ="pt")


filt_dev_dataset = Task2Dataset_BERT(filt_dev_set_embedding['input_ids'], 
                                  filt_dev_set_embedding['attention_mask'], 
                                  filt_dev_set_embedding['token_type_ids'], 
                                  torch.tensor(filt_dev_labels))


filt_dev_dataloader = DataLoader(filt_dev_dataset, batch_size=batch_size, shuffle=True)

# test set

test_h_to_edit_1, test_h_to_edit_2, test_labels = get_orig_headl_and_new_word_tuples(test_df)

filtered_test_data = [(h_1, h_2, label) for (h_1, h_2, label) in zip(test_h_to_edit_1, test_h_to_edit_2, test_labels) if label != 0 ]

filt_test_h_to_edit_1 = [tup[0] for tup in filtered_test_data]
filt_test_h_to_edit_2 = [tup[1] for tup in filtered_test_data]
filt_test_labels = [tup[2]-1 for tup in filtered_test_data]

filt_test_edited_headlines_1 = get_edited_headlines(filt_test_h_to_edit_1)
filt_test_edited_headlines_2 = get_edited_headlines(filt_test_h_to_edit_2)

filt_test_set_embedding = tokenizer(filt_test_edited_headlines_1, filt_test_edited_headlines_2, 
                            max_length = max_h_len, padding = 'max_length',
                            truncation = True, return_tensors ="pt")


filt_test_dataset = Task2Dataset_BERT(filt_test_set_embedding['input_ids'], 
                                  filt_test_set_embedding['attention_mask'], 
                                  filt_test_set_embedding['token_type_ids'], 
                                  torch.tensor(filt_test_labels))


filt_test_dataloader = DataLoader(filt_test_dataset, batch_size=batch_size, shuffle=True)


In [None]:
model_filt = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                        num_labels = 2,   
                                                        output_attentions = False,
                                                        output_hidden_states = False)

In [None]:
# Hyperparameters for BERT v.2:

epochs = 4
decay = 1e-2
lr = 1e-05
steps = len(train_dataloader) * epochs
wu = 0.06
wu_steps = int(steps * wu)

no_decay = ['bias', 'LayerNorm.weight']

optimizer_grouped_parameters = [
    {'params': [p for n, p in model_filt.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': decay},
    {'params': [p for n, p in model_filt.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}]

optimizer = AdamW(optimizer_grouped_parameters, lr=lr)
scheduler = get_linear_schedule_with_warmup(optimizer, 
                                            num_warmup_steps = wu_steps,
                                            num_training_steps = steps)

In [None]:
train(filt_train_dataloader, model_filt, epochs,optimizer, scheduler, filt_dev_dataloader)

Training model.
| Epoch: 01 | Train Loss: 0.71 | Train Accuracy: 0.51 |         Val. Loss: 0.69 | Val. Accuracy: 0.51 |
| Epoch: 02 | Train Loss: 0.68 | Train Accuracy: 0.55 |         Val. Loss: 0.71 | Val. Accuracy: 0.51 |
| Epoch: 03 | Train Loss: 0.62 | Train Accuracy: 0.66 |         Val. Loss: 0.73 | Val. Accuracy: 0.55 |
| Epoch: 04 | Train Loss: 0.53 | Train Accuracy: 0.75 |         Val. Loss: 0.76 | Val. Accuracy: 0.56 |
| Epoch: 05 | Train Loss: 0.46 | Train Accuracy: 0.79 |         Val. Loss: 0.82 | Val. Accuracy: 0.56 |
| Epoch: 06 | Train Loss: 0.42 | Train Accuracy: 0.82 |         Val. Loss: 0.84 | Val. Accuracy: 0.57 |


In [None]:
# Evaluation on test set

test_loss, test_acc, __, __ = eval(filt_test_dataloader, model_filt)

print(test_acc)

0.5555555555555556


#### BERT Version 3: BertForSequenceClassification() using the extended training data

In [None]:
# Hyperparameter search for BERT v.2:

decays = [1e-2, 1e-3]
lrs = [1e-05, 1e-04]
steps = len(train_dataloader) * epochs
wus = [0.06, 0.03]
wu_steps = int(steps * wu)

In [None]:
i = 1

for decay in decays:
    for lr in lrs:
        for wu in wus:
            # message
            print(f"Training model number {i} with lr = {lr}, decay = {decay} and wu = {wu}.")

            # initialize model
            model_ext = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                        num_labels = 3,   
                                                        output_attentions = False,
                                                        output_hidden_states = False)
            # initialize remaining hyperparams
            no_decay = ['bias', 'LayerNorm.weight']
            optimizer_grouped_parameters = [
                {'params': [p for n, p in model_ext.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': decay},
                {'params': [p for n, p in model_ext.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}]

            optimizer = AdamW(optimizer_grouped_parameters, lr=lr)
            scheduler = get_linear_schedule_with_warmup(optimizer, 
                                                        num_warmup_steps = wu_steps,
                                                        num_training_steps = steps)
            # train model
            train(ext_dataloader, model_ext, epochs,optimizer, scheduler, dev_dataloader)
            print("")
            i+=1

Training model number 1 with lr = 1e-05, decay = 0.01 and wu = 0.06.


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training model.
| Epoch: 01 | Train Loss: 0.99 | Train Accuracy: 0.44 |         Val. Loss: 0.96 | Val. Accuracy: 0.43 |
| Epoch: 02 | Train Loss: 0.96 | Train Accuracy: 0.46 |         Val. Loss: 0.96 | Val. Accuracy: 0.47 |
| Epoch: 03 | Train Loss: 0.94 | Train Accuracy: 0.50 |         Val. Loss: 0.96 | Val. Accuracy: 0.47 |
| Epoch: 04 | Train Loss: 0.93 | Train Accuracy: 0.54 |         Val. Loss: 0.97 | Val. Accuracy: 0.46 |

Training model number 2 with lr = 1e-05, decay = 0.01 and wu = 0.03.


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training model.
| Epoch: 01 | Train Loss: 0.99 | Train Accuracy: 0.44 |         Val. Loss: 0.96 | Val. Accuracy: 0.46 |
| Epoch: 02 | Train Loss: 0.95 | Train Accuracy: 0.49 |         Val. Loss: 0.97 | Val. Accuracy: 0.45 |
| Epoch: 03 | Train Loss: 0.92 | Train Accuracy: 0.54 |         Val. Loss: 0.98 | Val. Accuracy: 0.46 |
| Epoch: 04 | Train Loss: 0.89 | Train Accuracy: 0.59 |         Val. Loss: 0.98 | Val. Accuracy: 0.46 |

Training model number 3 with lr = 0.0001, decay = 0.01 and wu = 0.06.


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training model.
| Epoch: 01 | Train Loss: 0.99 | Train Accuracy: 0.43 |         Val. Loss: 0.97 | Val. Accuracy: 0.43 |
| Epoch: 02 | Train Loss: 0.96 | Train Accuracy: 0.45 |         Val. Loss: 0.96 | Val. Accuracy: 0.46 |
| Epoch: 03 | Train Loss: 0.96 | Train Accuracy: 0.45 |         Val. Loss: 0.96 | Val. Accuracy: 0.43 |
| Epoch: 04 | Train Loss: 0.96 | Train Accuracy: 0.45 |         Val. Loss: 0.96 | Val. Accuracy: 0.46 |

Training model number 4 with lr = 0.0001, decay = 0.01 and wu = 0.03.


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training model.
| Epoch: 01 | Train Loss: 0.97 | Train Accuracy: 0.45 |         Val. Loss: 0.96 | Val. Accuracy: 0.46 |
| Epoch: 02 | Train Loss: 0.96 | Train Accuracy: 0.46 |         Val. Loss: 0.96 | Val. Accuracy: 0.46 |
| Epoch: 03 | Train Loss: 0.90 | Train Accuracy: 0.57 |         Val. Loss: 0.95 | Val. Accuracy: 0.53 |
| Epoch: 04 | Train Loss: 0.69 | Train Accuracy: 0.73 |         Val. Loss: 1.07 | Val. Accuracy: 0.52 |

Training model number 5 with lr = 1e-05, decay = 0.001 and wu = 0.06.


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training model.
| Epoch: 01 | Train Loss: 0.97 | Train Accuracy: 0.45 |         Val. Loss: 0.96 | Val. Accuracy: 0.46 |
| Epoch: 02 | Train Loss: 0.96 | Train Accuracy: 0.46 |         Val. Loss: 0.96 | Val. Accuracy: 0.45 |
| Epoch: 03 | Train Loss: 0.94 | Train Accuracy: 0.51 |         Val. Loss: 0.97 | Val. Accuracy: 0.46 |
| Epoch: 04 | Train Loss: 0.92 | Train Accuracy: 0.55 |         Val. Loss: 0.97 | Val. Accuracy: 0.46 |

Training model number 6 with lr = 1e-05, decay = 0.001 and wu = 0.03.


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training model.
| Epoch: 01 | Train Loss: 0.98 | Train Accuracy: 0.45 |         Val. Loss: 0.96 | Val. Accuracy: 0.46 |
| Epoch: 02 | Train Loss: 0.95 | Train Accuracy: 0.51 |         Val. Loss: 0.96 | Val. Accuracy: 0.49 |
| Epoch: 03 | Train Loss: 0.88 | Train Accuracy: 0.60 |         Val. Loss: 0.98 | Val. Accuracy: 0.51 |
| Epoch: 04 | Train Loss: 0.84 | Train Accuracy: 0.65 |         Val. Loss: 0.98 | Val. Accuracy: 0.51 |

Training model number 7 with lr = 0.0001, decay = 0.001 and wu = 0.06.


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training model.
| Epoch: 01 | Train Loss: 0.97 | Train Accuracy: 0.44 |         Val. Loss: 0.96 | Val. Accuracy: 0.44 |
| Epoch: 02 | Train Loss: 0.96 | Train Accuracy: 0.46 |         Val. Loss: 0.96 | Val. Accuracy: 0.43 |
| Epoch: 03 | Train Loss: 0.96 | Train Accuracy: 0.45 |         Val. Loss: 0.96 | Val. Accuracy: 0.43 |
| Epoch: 04 | Train Loss: 0.96 | Train Accuracy: 0.45 |         Val. Loss: 0.96 | Val. Accuracy: 0.46 |

Training model number 8 with lr = 0.0001, decay = 0.001 and wu = 0.03.


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training model.
| Epoch: 01 | Train Loss: 0.98 | Train Accuracy: 0.45 |         Val. Loss: 0.98 | Val. Accuracy: 0.46 |
| Epoch: 02 | Train Loss: 0.97 | Train Accuracy: 0.44 |         Val. Loss: 0.96 | Val. Accuracy: 0.46 |
| Epoch: 03 | Train Loss: 0.96 | Train Accuracy: 0.45 |         Val. Loss: 0.96 | Val. Accuracy: 0.44 |
| Epoch: 04 | Train Loss: 0.95 | Train Accuracy: 0.49 |         Val. Loss: 0.97 | Val. Accuracy: 0.43 |



In [None]:
train(ext_dataloader, model_ext, epochs,optimizer, scheduler, dev_dataloader)

Training model.
| Epoch: 01 | Train Loss: 0.99 | Train Accuracy: 0.45 |         Val. Loss: 0.97 | Val. Accuracy: 0.46 |
| Epoch: 02 | Train Loss: 0.95 | Train Accuracy: 0.48 |         Val. Loss: 0.96 | Val. Accuracy: 0.47 |
| Epoch: 03 | Train Loss: 0.91 | Train Accuracy: 0.57 |         Val. Loss: 0.96 | Val. Accuracy: 0.51 |
| Epoch: 04 | Train Loss: 0.82 | Train Accuracy: 0.65 |         Val. Loss: 1.00 | Val. Accuracy: 0.52 |
| Epoch: 05 | Train Loss: 0.76 | Train Accuracy: 0.69 |         Val. Loss: 1.02 | Val. Accuracy: 0.51 |
| Epoch: 06 | Train Loss: 0.74 | Train Accuracy: 0.70 |         Val. Loss: 1.02 | Val. Accuracy: 0.51 |


In [None]:
# Evaluation on test set

test_loss, test_acc, __, __ = eval(test_dataloader, model_ext)

print(test_acc)

0.5155405405405405


#### BERT Version 4: BertForSequenceClassification() using the extended training data and including the original headline

In [None]:
# GET EMBEDDINGS

max_h_len = max(find_max_len(edited_headlines_1, 3), find_max_len(ext_edited_headlines_1,3))

#dev set

dev_set_embedding_o = tokenizer(dev_edited_headlines_1, dev_edited_headlines_2, orig_dev_headlines, 
                            max_length = max_h_len, padding = 'max_length',
                            truncation = True, return_tensors ="pt")

#test set

test_set_embedding_o = tokenizer(test_edited_headlines_1, test_edited_headlines_2, orig_test_headlines,
                            max_length = max_h_len, padding = 'max_length',
                            truncation = True, return_tensors ="pt")

# extended train set (incl. funlines)

ext_set_embedding_o = tokenizer(ext_edited_headlines_1, ext_edited_headlines_2, orig_ext_headlines,
                            max_length = max_h_len, padding = 'max_length',
                            truncation = True, return_tensors ="pt")


# GET DATA LOADERS

# dev set

dev_dataset_o = Task2Dataset_BERT(dev_set_embedding_o['input_ids'], 
                                  dev_set_embedding_o['attention_mask'], 
                                  dev_set_embedding_o['token_type_ids'], 
                                  torch.tensor(dev_labels))

dev_dataloader_o = DataLoader(dev_dataset_o, batch_size=batch_size, shuffle=True)

# test set

test_dataset_o = Task2Dataset_BERT(test_set_embedding_o['input_ids'], 
                                  test_set_embedding_o['attention_mask'], 
                                  test_set_embedding_o['token_type_ids'], 
                                  torch.tensor(test_labels))

test_dataloader_o = DataLoader(test_dataset_o, batch_size=batch_size, shuffle=True)

# extended train set

ext_dataset_o = Task2Dataset_BERT(ext_set_embedding_o['input_ids'], 
                                ext_set_embedding_o['attention_mask'], 
                                ext_set_embedding_o['token_type_ids'], 
                                torch.tensor(ext_labels))

ext_dataloader_o = DataLoader(ext_dataset_o, batch_size=batch_size, shuffle=True)

In [None]:
# Load the BertForSequenceClassification model

model_ext_o = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                        num_labels = 3,   
                                                        output_attentions = False,
                                                        output_hidden_states = False)

In [None]:
# Hyperparameters for BERT v.4:

decay = 1e-2
lr = 1e-05
steps = len(train_dataloader) * epochs
wu = 0.06
wu_steps = int(steps * wu)

no_decay = ['bias', 'LayerNorm.weight']

optimizer_grouped_parameters = [
    {'params': [p for n, p in model_ext_o.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': decay},
    {'params': [p for n, p in model_ext_o.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}]


optimizer = AdamW(optimizer_grouped_parameters, lr=lr)
scheduler = get_linear_schedule_with_warmup(optimizer, 
                                            num_warmup_steps = wu_steps,
                                            num_training_steps = steps)

In [None]:
train(ext_dataloader_o, model_ext_o, epochs,optimizer, scheduler, dev_dataloader_o)

Training model.
| Epoch: 01 | Train Loss: 0.98 | Train Accuracy: 0.46 |         Val. Loss: 0.96 | Val. Accuracy: 0.45 |
| Epoch: 02 | Train Loss: 0.95 | Train Accuracy: 0.49 |         Val. Loss: 0.97 | Val. Accuracy: 0.45 |
| Epoch: 03 | Train Loss: 0.91 | Train Accuracy: 0.56 |         Val. Loss: 0.98 | Val. Accuracy: 0.47 |
| Epoch: 04 | Train Loss: 0.83 | Train Accuracy: 0.64 |         Val. Loss: 1.02 | Val. Accuracy: 0.48 |
| Epoch: 05 | Train Loss: 0.78 | Train Accuracy: 0.67 |         Val. Loss: 1.05 | Val. Accuracy: 0.48 |
| Epoch: 06 | Train Loss: 0.76 | Train Accuracy: 0.69 |         Val. Loss: 1.05 | Val. Accuracy: 0.48 |


In [None]:
# Evaluation on test set

test_loss, test_acc, __, __ = eval(test_dataloader_o, model_ext_o)

print(test_acc)

0.46452702702702703


#### BERT Version 5: Use word pairs only for the prediction

In [None]:
def get_new_words(data):
    """
    Instead of a list of headlines, this function only extracts the edit words.
    """

    edit_words_1 = [new_word_1 for (original_1, new_word_1) \
                           in zip(data.original1.to_list(), data.edit1.to_list())]

    edit_words_2 = [new_word_2 for (original_2, new_word_2) \
                           in zip(data.original2.to_list(), data.edit2.to_list())]

    labels = data.label.to_list()
    return edit_words_1, edit_words_2, labels

train_words_1, train_words_2, train_labels = get_new_words(train_extra_df)
dev_words_1, dev_words_2, dev_labels = get_new_words(dev_df)
test_words_1, test_words_2, test_labels = get_new_words(test_df)

In [None]:
# Tokenize

### train set incl. funlines

train_words_embedding = tokenizer(train_words_1, train_words_2, max_length = 4, 
                                  padding = 'max_length',truncation = True, 
                                  return_tensors ="pt")
###dev set

dev_words_embedding = tokenizer(dev_words_1, dev_words_2, max_length = 4, 
                                padding = 'max_length', truncation = True, 
                                return_tensors ="pt")
 ###test set

test_words_embedding = tokenizer(test_words_1, test_words_2, max_length = 4, 
                                 padding = 'max_length', truncation = True, 
                                 return_tensors ="pt")


In [None]:
### train set incl. funlines

train_dataset_w = Task2Dataset_BERT(train_words_embedding['input_ids'], 
                                  train_words_embedding['attention_mask'], 
                                  train_words_embedding['token_type_ids'], 
                                  torch.tensor(train_labels))

train_dataloader_w = DataLoader(train_dataset_w, batch_size=batch_size, shuffle=True)


### dev set

dev_dataset_w = Task2Dataset_BERT(dev_words_embedding['input_ids'], 
                                  dev_words_embedding['attention_mask'], 
                                  dev_words_embedding['token_type_ids'], 
                                  torch.tensor(dev_labels))

dev_dataloader_w = DataLoader(dev_dataset_w, batch_size=batch_size, shuffle=True)

### test set

test_dataset_w = Task2Dataset_BERT(test_words_embedding['input_ids'], 
                                  test_words_embedding['attention_mask'], 
                                  test_words_embedding['token_type_ids'], 
                                  torch.tensor(test_labels))

test_dataloader_w = DataLoader(test_dataset_w, batch_size=batch_size, shuffle=True)

In [None]:
# Load the BertForSequenceClassification model

model_w = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                      num_labels = 3,   
                                                      output_attentions = False,
                                                      output_hidden_states = False)

# Hyperparameters for BERT v.5:

decay = 1e-2
lr = 1e-05
steps = len(train_dataloader) * epochs
wu = 0.06
wu_steps = int(steps * wu)

no_decay = ['bias', 'LayerNorm.weight']

optimizer_grouped_parameters = [
    {'params': [p for n, p in model_w.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': decay},
    {'params': [p for n, p in model_w.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}]


optimizer = AdamW(optimizer_grouped_parameters, lr=lr)
scheduler = get_linear_schedule_with_warmup(optimizer, 
                                            num_warmup_steps = wu_steps,
                                            num_training_steps = steps)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

In [None]:
# run 1
train(train_dataloader_w, model_w, epochs,optimizer, scheduler, dev_dataloader_w)

Training model.
| Epoch: 01 | Train Loss: 1.04 | Train Accuracy: 0.45 |         Val. Loss: 0.97 | Val. Accuracy: 0.46 |
| Epoch: 02 | Train Loss: 0.96 | Train Accuracy: 0.45 |         Val. Loss: 0.96 | Val. Accuracy: 0.48 |
| Epoch: 03 | Train Loss: 0.95 | Train Accuracy: 0.47 |         Val. Loss: 0.96 | Val. Accuracy: 0.47 |
| Epoch: 04 | Train Loss: 0.95 | Train Accuracy: 0.49 |         Val. Loss: 0.96 | Val. Accuracy: 0.50 |


In [None]:
# Evaluation on test set

test_loss, test_acc, __, __ = eval(test_dataloader_w, model_w)
print(test_acc)

0.47635135135135137


#### BERT Version 6: Use word pairs & original headlines for the prediction

In [None]:
max_len = find_max_len(orig_ext_headlines,1)+10

# Tokenize

### train set incl. funlines

train_words_embedding_o = tokenizer(train_words_1, train_words_2, orig_ext_headlines,
                                  max_length = max_len, padding = 'max_length',
                                  truncation = True, return_tensors ="pt")
###dev set

dev_words_embedding_o = tokenizer(dev_words_1, dev_words_2, orig_dev_headlines,
                                max_length = max_len, padding = 'max_length', 
                                truncation = True, return_tensors ="pt")
 ###test set

test_words_embedding_o = tokenizer(test_words_1, test_words_2, orig_test_headlines,
                                 max_length = max_len,  padding = 'max_length', 
                                 truncation = True, return_tensors ="pt")

In [None]:
### train set incl. funlines

train_dataset_wo = Task2Dataset_BERT(train_words_embedding_o['input_ids'], 
                                  train_words_embedding_o['attention_mask'], 
                                  train_words_embedding_o['token_type_ids'], 
                                  torch.tensor(train_labels))

train_dataloader_wo = DataLoader(train_dataset_wo, batch_size=batch_size, shuffle=True)


### dev set

dev_dataset_wo = Task2Dataset_BERT(dev_words_embedding_o['input_ids'], 
                                  dev_words_embedding_o['attention_mask'], 
                                  dev_words_embedding_o['token_type_ids'], 
                                  torch.tensor(dev_labels))

dev_dataloader_wo = DataLoader(dev_dataset_wo, batch_size=batch_size, shuffle=True)

### test set

test_dataset_wo = Task2Dataset_BERT(test_words_embedding_o['input_ids'], 
                                  test_words_embedding_o['attention_mask'], 
                                  test_words_embedding_o['token_type_ids'], 
                                  torch.tensor(test_labels))

test_dataloader_wo = DataLoader(test_dataset_wo, batch_size=batch_size, shuffle=True)

In [None]:
# Load the BertForSequenceClassification model

model_wo = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                      num_labels = 3,   
                                                      output_attentions = False,
                                                      output_hidden_states = False)

# Hyperparameters for BERT v.6:

decay = 1e-2
lr = 1e-05
steps = len(train_dataloader) * epochs
wu = 0.06
wu_steps = int(steps * wu)

no_decay = ['bias', 'LayerNorm.weight']

optimizer_grouped_parameters = [
    {'params': [p for n, p in model_wo.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': decay},
    {'params': [p for n, p in model_wo.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}]


optimizer = AdamW(optimizer_grouped_parameters, lr=lr)
scheduler = get_linear_schedule_with_warmup(optimizer, 
                                            num_warmup_steps = wu_steps,
                                            num_training_steps = steps)

In [None]:
# run 1
train(train_dataloader_wo, model_wo, epochs,optimizer, scheduler, dev_dataloader_wo)

Training model.
| Epoch: 01 | Train Loss: 1.11 | Train Accuracy: 0.40 |         Val. Loss: 0.97 | Val. Accuracy: 0.45 |
| Epoch: 02 | Train Loss: 0.95 | Train Accuracy: 0.49 |         Val. Loss: 0.96 | Val. Accuracy: 0.48 |
| Epoch: 03 | Train Loss: 0.95 | Train Accuracy: 0.51 |         Val. Loss: 0.96 | Val. Accuracy: 0.50 |
| Epoch: 04 | Train Loss: 0.91 | Train Accuracy: 0.56 |         Val. Loss: 0.98 | Val. Accuracy: 0.49 |


In [None]:
# Evaluation on test set

test_loss, test_acc, __, __ = eval(test_dataloader_wo, model_wo)
print(test_acc)

0.46824324324324323


In [None]:
# run 2
train(train_dataloader_wo, model_wo, epochs,optimizer, scheduler, dev_dataloader_wo)

Training model.
| Epoch: 01 | Train Loss: 1.08 | Train Accuracy: 0.36 |         Val. Loss: 0.97 | Val. Accuracy: 0.44 |
| Epoch: 02 | Train Loss: 0.96 | Train Accuracy: 0.47 |         Val. Loss: 0.96 | Val. Accuracy: 0.44 |
| Epoch: 03 | Train Loss: 0.96 | Train Accuracy: 0.47 |         Val. Loss: 0.96 | Val. Accuracy: 0.47 |
| Epoch: 04 | Train Loss: 0.94 | Train Accuracy: 0.50 |         Val. Loss: 0.97 | Val. Accuracy: 0.48 |


In [None]:
# Evaluation on test set

test_loss, test_acc, __, __ = eval(test_dataloader_wo, model_wo)
print(test_acc)

0.46756756756756757


# Approach 2 - No Pre-Trained Embeddings


#### FFNN for word funniness score


In [None]:
# Data formatted for the need of this part
# Proportion of training data for train compared to dev
train_proportion = 0.8



training_data = train_df[['edit1','edit2']]
testing_data = test_df[['edit1','edit2']]
training_y = train_df[['meanGrade1','meanGrade2']]
testing_y = test_df[['meanGrade1','meanGrade2']]
training_labels = train_df['label']
testing_labels = test_df['label']

training_data, dev_data, training_y, dev_y, labels, dev_labels = train_test_split(training_data, training_y, training_labels,
                                                                        test_size=(1-train_proportion),
                                                                        random_state=42)
# define training and validation data
training_data = training_data.to_numpy().reshape(1, 2*len(training_data))[0]
training_y = training_y.to_numpy().reshape(1, 2*len(training_y))[0]
training_labels = labels.to_numpy()

# define validation sets
valid_data = dev_data.to_numpy().reshape(1, 2*len(dev_data))[0]
valid_y = dev_y.to_numpy().reshape(1, 2*len(dev_y))[0]
valid_labels = dev_labels.to_numpy()

# define test data
testing_data = testing_data.to_numpy().reshape(1, 2*len(testing_data))[0]
testing_y = testing_y.to_numpy().reshape(1, 2*len(testing_y))[0]
testing_labels = testing_labels.to_numpy()

In [None]:
# Preprocessing data
def get_tokenized_corpus(corpus):
  tokenized_corpus = []
  for sentence in corpus:
    tokenized_sentence = []
    for token in sentence.split(' '): 
      tokenized_sentence.append(token)
    tokenized_corpus.append(tokenized_sentence)
 
  return tokenized_corpus

In [None]:
#  method that returns a word to index dictionary
def get_word2idx(tokenized_corpus):
  vocabulary = []
  for sentence in tokenized_corpus:
    for token in sentence:
        if token not in vocabulary:
            vocabulary.append(token)
  
  word2idx = {w: idx+1 for (idx, w) in enumerate(vocabulary)}
  # we reserve the 0 index for the padding token
  word2idx['<pad>'] = 0
  
 
  return word2idx

In [None]:
def get_model_inputs(tokenized_corpus, word2idx, labels):
  # we index our sentences
  vectorized_sents = [[word2idx[tok] for tok in sent if tok in word2idx] for sent in tokenized_corpus]

  # Sentence lengths
  sent_lengths = [len(sent) for sent in vectorized_sents]

  # Get maximum length
  max_len = max(sent_lengths)
  
  # we create a tensor of a fixed size filled with zeroes for padding
  sent_tensor = torch.zeros((len(vectorized_sents), max_len)).long()

  # we fill it with our vectorized sentences 
  for idx, (sent, sentlen) in enumerate(zip(vectorized_sents, sent_lengths)):
    sent_tensor[idx, :sentlen] = torch.LongTensor(sent)

  # Label tensor
  label_tensor = torch.FloatTensor(labels)
  return sent_tensor, label_tensor

###

tokenized_corpus =training_data
word2idx = get_word2idx(tokenized_corpus)


train_sent_tensor, train_label_tensor = get_model_inputs(tokenized_corpus, word2idx, training_y)

print(f'Vocabulary size: {len(word2idx)}')
print('Training set tensor:')
print(train_sent_tensor.shape)

Vocabulary size: 53
Training set tensor:
torch.Size([15008, 17])


In [None]:
class FFNN(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, vocab_size, num_classes):  
        super(FFNN, self).__init__()
        
        # embedding (lookup layer) layer
        # padding_idx argument makes sure that the 0-th token in the vocabulary
        # is used for padding purposes i.e. its embedding will be a 0-vector
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=0)
    
        
        # hidden layer
        self.fc1 = nn.Linear(embedding_dim, hidden_dim)
    
     
        # activation
        self.relu = nn.LeakyReLU(0.05)

        # output layer
        self.fc2 = nn.Linear(hidden_dim, num_classes)  
 
    
    def forward(self, x):
        # x has shape (batch_size, max_sent_len)

        embedded = self.embedding(x)
        # `embedding` has shape (batch size, max_sent_len, embedding dim)

        ########################################################################
        # Q: Compute the average embeddings of shape (batch_size, embedding_dim)
        ########################################################################
        # Implement averaging that ignores padding (average using actual sentence lengths).
        # How this effect the result?
        
        sent_lens = x.ne(0).sum(1, keepdims=True)

        averaged = embedded.sum(1) / sent_lens

        out = self.fc1(averaged)
        out = self.relu(out)

        out = self.fc2(out)

        return out

In [None]:
def accuracy(output, target, labels):
  #get accuracy based on which word has a greater meanGrade
  predicted_label1 = np.zeros(int(len(output)/2))
  predicted_label1[output[0]> output[1]] += 1
  predicted_label2 = np.ones(int(len(output)/2))
  predicted_label2[output[0]< output[1]] += 2
  predicted_labels = predicted_label1 +predicted_label2  

  correct = np.zeros(int(len(output)/2))
  
  correct[predicted_labels.astype(int) == labels.astype(int)] = 1

  acc = correct.mean()

  return acc

In [None]:
tokenized_valid_corpus = get_tokenized_corpus(valid_data)
valid_sent_tensor, valid_label_tensor = get_model_inputs(tokenized_valid_corpus, word2idx, valid_y)


In [None]:
# we will train for N epochs (The model will see the corpus N times)
EPOCHS = 3

# Learning rate 
LRATE = 0.01

# we define our embedding dimension (dimensionality of the output of the first layer)
EMBEDDING_DIM = 20

# dimensionality of the output of the second hidden layer
HIDDEN_DIM = 20

# the output dimension is the number of classes, 1 for binary classification
OUTPUT_DIM = 1

# Construct the model
model = FFNN(EMBEDDING_DIM, HIDDEN_DIM, len(word2idx), OUTPUT_DIM)

# we use the stochastic gradient descent (SGD) optimizer
optimizer = optim.SGD(model.parameters(), lr=LRATE)

loss_fn = nn.MSELoss()

# Input and label tensors for training
feature_train = train_sent_tensor
target_train = train_label_tensor

# Input and label tensors for validation
feature_valid =  valid_sent_tensor
target_valid = valid_label_tensor

################
# Start training
################
print(f'Will train for {EPOCHS} epochs')
for epoch in range(1, EPOCHS + 1):
  model.train()
  
  # we zero the gradients as they are not removed automatically
  optimizer.zero_grad()
  
  # squeeze is needed as the predictions will have the shape (batch size, 1)
  # and we need to remove the dimension of size 1
  predictions = model(feature_train).squeeze(1)

  # Compute the loss
  loss = loss_fn(predictions, target_train)
  train_loss = loss.item()

  # Compute training accuracy
  train_acc = accuracy(predictions, target_train, training_labels)

  # calculate the gradient of each parameter
  loss.backward()

  # update the parameters using the gradients and optimizer algorithm 
  optimizer.step()
  
  # this puts the model in "evaluation mode" (turns off dropout and batch normalization)
  model.eval()

  # we do not compute gradients within this block, i.e. no training
  with torch.no_grad():
    predictions_valid = model(feature_valid).squeeze(1)
    predictions_valid = torch.where(predictions_valid.isnan(), torch.zeros(predictions_valid.shape), predictions_valid)
    torch.set_printoptions(edgeitems=100)
    valid_loss = loss_fn(predictions_valid, target_valid).item()
    valid_acc = accuracy(predictions_valid, target_valid, valid_labels)
  
  #print(f'| Epoch: {epoch:02} | Train Loss: {train_loss:.3f} | Train Acc: {train_acc*100:6.2f}% | Val. Loss: {valid_loss:.3f} | Val. Acc: {valid_acc*100:6.2f}% |')

In [None]:
tokenized_test_corpus = get_tokenized_corpus(testing_data)
test_sent_tensor, test_label_tensor = get_model_inputs(tokenized_test_corpus, word2idx, testing_y)
test_sent_tensor.shape, test_label_tensor.shape

(torch.Size([5920, 0]), torch.Size([5920]))

In [None]:
model.eval()

feature_test = test_sent_tensor
target_test = test_label_tensor

with torch.no_grad():
  predictions = model(feature_test).squeeze(1)
  predictions = torch.where(predictions.isnan(), torch.zeros(predictions.shape), predictions)
  test_loss = loss_fn(predictions, target_test).item()
  test_acc = accuracy(predictions, target_test, testing_labels)

  # Print
  print(f'Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}%')

Test Loss: 1.206 | Test Acc: 43.51%


####CBOW 

In [None]:
# To create our vocab (tokenized, without punctuation and all in lower case)
def create_vocab(data):
    """
    Creating a corpus of all the tokens used
    """
    re_punctuation_string = '[\s,/.\']'
    tokenized_corpus = [] # Let us put the tokenized corpus in a list

    for sentence in data:

        tokenized_sentence = []

        for token in sentence.split(' '): # simplest split is

            token = token.lower()
            token = re.sub(re_punctuation_string,'', token)
            tokenized_sentence.append(token)

        tokenized_corpus.append(tokenized_sentence)

    # Create single list of all vocabulary
    vocabulary = []  # Let us put all the tokens (mostly words) appearing in the vocabulary in a list

    for sentence in tokenized_corpus:

        for token in sentence:

            if token not in vocabulary:

                if True:
                    vocabulary.append(token)

    return vocabulary, tokenized_corpus

In [None]:
# We evaluate performance on our dev set
def eval(data_iter, model):
    """
    Evaluating model performance on the dev set
    """
    model.eval()
    epoch_loss = 0
    epoch_correct = 0
    pred_all = []
    trg_all = []
    no_observations = 0

    with torch.no_grad():
        for batch in data_iter:
            feature, target = batch

            feature, target = feature.to(device), target.to(device)

            # for RNN:
            model.batch_size = target.shape[0]
            no_observations = no_observations + target.shape[0]
            model.hidden = model.init_hidden()

            predictions = model(feature).squeeze(1)
            loss = loss_fn(predictions, target)

            # We get the mse
            pred, trg = predictions.detach().cpu().numpy(), target.detach().cpu().numpy()
            correct, __ = model_performance(np.argmax(pred, axis=1), trg)

            epoch_loss += loss.item()*target.shape[0]
            epoch_correct += correct
            pred_all.extend(pred)
            trg_all.extend(trg)

    return epoch_loss/no_observations, epoch_correct/no_observations, np.array(pred_all), np.array(trg_all)

In [None]:
def get_orig_headl_and_new_word_tuples(data):
    """
    Takes a pandas data frame as input.
    Each sample of the data set contains a headline article, two edited
    versions of this article and a label indicating which of the two edits is
    funnier.
    Selects relevant columns of the input data, one with the original headlines
    and one with the new word to be inserted instead of one of its words, and 
    converts them into lists of tuples.
    Returns four lists.
        - A list of tuples of the form (original_headline, new_word) for the 
          first edited headline
        - A list of tuples of the same form for the first edited headline
        - A list of labels (3 possible values: 0, 1, and 2. 1 and 2 indicate 
          which of the edits is funnier, 0 is assigned if they received the same
          score.)
    """

    headlines_to_edit_1 = [(original_1, new_word_1) for (original_1, new_word_1) \
                           in zip(data.original1.to_list(), data.edit1.to_list())]

    headlines_to_edit_2 = [(original_2, new_word_2) for (original_2, new_word_2) \
                           in zip(data.original2.to_list(), data.edit2.to_list())]

    
    return headlines_to_edit_1, headlines_to_edit_2

def get_original_headlines(data):
    pattern = "\<(.*?)\/\>"
    pattern2 = re.compile(r'\<(.*?)\/\>')
    original_headlines = []
    for sentence in data:
      original_word = re.search(pattern, sentence[0]).group(1)
      headline = pattern2.sub(original_word, sentence[0])
      original_headlines.append(headline)
    return original_headlines
    

def get_edited_headlines(headline_tuples:list)-> list:
    """
    Takes a list of tuples of form (original_headline, new_word) as input.
    Returns a list of edited headlines.
    """
    # list of new edited headlines
    edited_headlines = []
    #print(headline_tuples[0])

    # The word to be replaced in each sentence is denoted as follows:
    # <word/> to be replaced
    pattern = re.compile(r'\<(.*?)\/\>')
    pattern2 = "\<(.*?)\/\>"
    original_word = re.search(pattern2, headline_tuples[0][0]).group(1)
    
    for original, new_word in headline_tuples:
      edited_headline = pattern.sub(new_word, original)
      edited_headlines.append(edited_headline)
      

    return edited_headlines

In [None]:
# data needed for this part
train_df = train_df
train2_df = train_extra_df
# this dev file includes the labels, that will be used to get accuracy, but 
# won't be used at any point during training (following machine learning methodology)
blind_df = dev_df
# Extra headlines from online data set
extra_headlines = extra_headlines

# The training data was split into training and validation during hyperparameter
#tunning
#training_data, validation_data = train_test_split(train_df,
                                           #test_size=(1-train_proportion),
                                          #random_state=42)

# after hyperparameter tuning - use the original headlines from the three files 
# above for training (the extra headlines are added in a later cell)
training_data = pd.concat([train_df, train2_df, blind_df])
testing_data = blind_df

# to get model performance
training_y = training_data['label'] 
testing_y = blind_df['label']

# new words in headlines  (train)
new_word1_training_y = list(training_data['edit1'])
new_word2_training_y = list(training_data['edit2'])

# new words in headlines (blind-test)
new_word1_test_y = list(testing_data['edit1'])
new_word2_test_y = list(testing_data['edit2'])

h_to_edit_1, h_to_edit_2 = get_orig_headl_and_new_word_tuples(training_data)
h_to_edit_1_test, h_to_edit_2_test = get_orig_headl_and_new_word_tuples(testing_data)

# train 
edited_headlines_1 = get_edited_headlines(h_to_edit_1)
edited_headlines_2 = get_edited_headlines(h_to_edit_2)
original_headlines = get_original_headlines(h_to_edit_1)
# The input below was used to try to increase our data set (details about this 
# are in the report) 
#extra_original_headlines = extra_headlines.Headline.to_list()

# test 
edited_headlines_1_test = get_edited_headlines(h_to_edit_1_test) 
edited_headlines_2_test = get_edited_headlines(h_to_edit_2_test)
original_headlines_test = get_original_headlines(h_to_edit_1_test)


In [None]:
# create vocabulary sets and corpus for trainign and testing
vocab, corpus = create_vocab(original_headlines)
vocab_test, corpus_test = create_vocab(original_headlines_test)
vocab2, corpus2 = create_vocab(edited_headlines_1)
vocab2_test, corpus2_test = create_vocab(edited_headlines_1_test)
vocab3, corpus3 = create_vocab(edited_headlines_2)
vocab3_test, corpus3_test = create_vocab(edited_headlines_2_test)
# The input below was used to try to increase our data set (details about this 
# are in the report) 
#extra_vocab, extra_corpus = create_vocab(extra_original_headlines)

# add all the words into the training vocab (including words into edited headlines and online dataset)
training_vocabulary = vocab + vocab_test + vocab2 + vocab2_test + vocab3 + vocab3_test
training_vocabulary = set(training_vocabulary)

# add the extra headlines into the training corpus
#corpus += extra_corpus

In [None]:
# CBOW model
torch.manual_seed(1)

# tried different context sizes
context_size = 2
embedding_dim = 200

training_set = corpus

def make_context_vector(context, word_to_idx):
    idxs = [word_to_idx[w] for w in context]
    return torch.tensor(idxs, dtype=torch.long)

vocabulary = training_vocabulary
vocabulary_size = len(vocabulary)
print(vocabulary_size)

word_to_idx = {word: i for i, word in enumerate(vocabulary)}
idx_to_word = {i: word for i, word in enumerate(vocabulary)}

data = []

for sentence in corpus:
    for i in range(2, len(sentence) - 2):
        context = [sentence[i-2], sentence[i-1],
               sentence[i+1], sentence[i+2]]
        target = sentence[i]
        data.append((context, target))

class CBOW(nn.Module):
    
    def __init__(self, vocabulary_size, embedding_dim):
        super(CBOW, self).__init__()
        self.embeddings = nn.Embedding(vocabulary_size, embedding_dim)
        self.proj = nn.Linear(embedding_dim, 128)
        self.output = nn.Linear(128, vocabulary_size)
        
    def forward(self, inputs):
        embeds = sum(self.embeddings(inputs)).view(1, -1)
        out = F.relu(self.proj(embeds))
        out = self.output(out)
        nll_prob = F.log_softmax(out, dim=-1)
        return nll_prob

model = CBOW(vocabulary_size, embedding_dim)
optimizer = optim.SGD(model.parameters(), lr=0.001)

losses = []
loss_function = nn.NLLLoss()

for epoch in range(6):
    total_loss = 0
    print(epoch)
    for context, target in data:
        context_vector = make_context_vector(context, word_to_idx)
        
        model.zero_grad()
        
        nll_prob = model(context_vector)
        loss = loss_function(nll_prob, Variable(torch.tensor([word_to_idx[target]])))
        
        # backpropagation
        loss.backward()
        # update the parameters
        optimizer.step() 
        
        total_loss += loss.item()
        
    losses.append(total_loss)

14891
0
1
2
3
4
5


In [None]:
####### get predictions and run evaluation #############

# for edited sentences, need to also do all lower case, punctuation removal and lemmatisation
predicted_training = []
predicted_test = []
re_punctuation_string = '[\s,/.\']'
for i in range(len(edited_headlines_1)):
  headline = edited_headlines_1[i].lower().split(' ')
  tokenized_headline = []
  for word in headline:
    word = re.sub(re_punctuation_string,'', word)
    tokenized_headline.append(word)
  # for each of the edited headlines, create the context vector around the edited word
  #1st edited headline
  new_word1_training_y[i] = new_word1_training_y[i].lower()
  idx = tokenized_headline.index(new_word1_training_y[i])
  # check if edited word is at the begining or end of the sentence, to make 
  # context smaller than 4 words (given window size = 2)
  if idx == 0:
    context1 = [tokenized_headline[idx+1], tokenized_headline[idx+2]]
  elif idx == 1:
    context1 = [ tokenized_headline[idx-1],
             tokenized_headline[idx+1], tokenized_headline[idx+2]]
  elif idx == len(tokenized_headline) - 1:
    context1 = [tokenized_headline[idx-2], tokenized_headline[idx-1]]
  elif idx == len(tokenized_headline) - 2:
    context1 = [tokenized_headline[idx-2], tokenized_headline[idx-1],
             tokenized_headline[idx+1]]
  else:
    context1 = [tokenized_headline[idx-2], tokenized_headline[idx-1],
             tokenized_headline[idx+1], tokenized_headline[idx+2]]

  context_vector1 = make_context_vector(context1, word_to_idx)
  # 2nd edited headline
  headline2 = edited_headlines_2[i].lower().split(' ')
  tokenized_headline2 = []
  for word in headline2:
    word = re.sub(re_punctuation_string,'', word)
    tokenized_headline2.append(word)
  new_word2_training_y[i] = new_word2_training_y[i].lower()
  idx = tokenized_headline2.index(new_word2_training_y[i])
  if idx == 0:
    context2 = [tokenized_headline2[idx+1], tokenized_headline2[idx+2]]
  elif idx == 1:
    context2 = [ tokenized_headline2[idx-1],
             tokenized_headline2[idx+1], tokenized_headline2[idx+2]]
  elif idx == len(tokenized_headline2) - 1:
    context2 = [tokenized_headline2[idx-2], tokenized_headline2[idx-1]]
  elif idx == len(tokenized_headline2) - 2:
    context2 = [tokenized_headline2[idx-2], tokenized_headline2[idx-1],
             tokenized_headline2[idx+1]]
  else:
    context2 = [tokenized_headline2[idx-2], tokenized_headline2[idx-1],
             tokenized_headline2[idx+1], tokenized_headline2[idx+2]]         
  context_vector2 = make_context_vector(context2, word_to_idx)

  #get the model's predictions for given context
  
  prediction1 = model(context_vector1)
  prediction2 = model(context_vector2)

  #get the loss between the model's prediction for the given context and the
  # word present in the edited headline 
  loss1 = loss_function(prediction1, Variable(torch.tensor([word_to_idx[new_word1_training_y[i]]])))
  loss2 = loss_function(prediction2, Variable(torch.tensor([word_to_idx[new_word2_training_y[i]]])))
  
  # append label prediction to list
  if loss1 > loss2:
    predicted_training.append(1)
  else:
    predicted_training.append(2)
  
  ###############################################################
  #repeat everything for the testing set
  ###############################################################

  # for each of the edited headlines, create the context vector around the edited word
  #1st edited headline
for i in range(len(edited_headlines_1_test)):
  headline = edited_headlines_1_test[i].lower().split(' ')
  tokenized_headline = []
  for word in headline:
    word = re.sub(re_punctuation_string,'', word)
    tokenized_headline.append(word)
  new_word1_test_y[i] = new_word1_test_y[i].lower()
  idx = tokenized_headline.index(new_word1_test_y[i])
  if idx == 0:
    context1 = [tokenized_headline[idx+1], tokenized_headline[idx+2]]
  elif idx == 1:
    context1 = [ tokenized_headline[idx-1],
             tokenized_headline[idx+1], tokenized_headline[idx+2]]
  elif idx == len(tokenized_headline) - 1:
    context1 = [tokenized_headline[idx-2], tokenized_headline[idx-1]]
  elif idx == len(tokenized_headline) - 2:
    context1 = [tokenized_headline[idx-2], tokenized_headline[idx-1],
             tokenized_headline[idx+1]]
  else:
    context1 = [tokenized_headline[idx-2], tokenized_headline[idx-1],
             tokenized_headline[idx+1], tokenized_headline[idx+2]]
  context_vector1 = make_context_vector(context1, word_to_idx)
  # 2nd edited headline
  headline2 = edited_headlines_2_test[i].lower().split(' ')
  tokenized_headline2 = []
  for word in headline2:
    word = re.sub(re_punctuation_string,'', word)
    tokenized_headline2.append(word) 
  new_word2_test_y[i] = new_word2_test_y[i].lower()
  idx = tokenized_headline2.index(new_word2_test_y[i])
  if idx == 0:
    context2 = [tokenized_headline2[idx+1], tokenized_headline2[idx+2]]
  elif idx == 1:
    context2 = [tokenized_headline2[idx-1],
             tokenized_headline2[idx+1], tokenized_headline2[idx+2]]
  elif idx == len(tokenized_headline2) - 1:
    context2 = [tokenized_headline2[idx-2], tokenized_headline2[idx-1]]
  elif idx == len(tokenized_headline2) - 2:
    context2 = [tokenized_headline2[idx-2], tokenized_headline2[idx-1],
             tokenized_headline2[idx+1]]
  else:
    context2 = [tokenized_headline2[idx-2], tokenized_headline2[idx-1],
             tokenized_headline2[idx+1], tokenized_headline2[idx+2]]
  context_vector2 = make_context_vector(context2, word_to_idx)

  #get the model's predictions for given context
  predicton1 = model(context_vector1)
  prediction2 = model(context_vector2)

  #get the loss between the model's prediction for the given context and the
  # word present in the edited headline 
  loss1 = loss_function(prediction1, Variable(torch.tensor([word_to_idx[new_word1_test_y[i]]])))
  loss2 = loss_function(prediction2, Variable(torch.tensor([word_to_idx[new_word2_test_y[i]]])))
  
  if loss1 > loss2:
    predicted_test.append(1)
  else:
    predicted_test.append(2)

  
# We run the evaluation:
print("\nTrain performance:")
sse, mse = model_performance(predicted_training, training_y, True)

print("\nDev performance:")
sse, mse = model_performance(predicted_test, testing_y, True)


Train performance:
| Acc: 0.49 

Dev performance:
| Acc: 0.48 
