# APROACH
* combined Question and Answer with a sep_token.
* created label 1 for the provided question.
* generated negative samples by either copying question into answer or shuffling answers for the question.
* Created a custom classfication model, which takes inputs as embeddings of the sentences (derived from BERT Tokenizer)
and returns the label.
* Added the BERT layer in the model, that gives learned excellent representation of the words wrt context to the classifier.
* Achieved 51% accuracy, which can be improved with generalisation techniques and more data and training.

# Installs

In [None]:
!pip install datasets

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting datasets
  Downloading datasets-2.6.1-py3-none-any.whl (441 kB)
[K     |████████████████████████████████| 441 kB 8.3 MB/s 
Collecting xxhash
  Downloading xxhash-3.1.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)
[K     |████████████████████████████████| 212 kB 52.5 MB/s 
Collecting responses<0.19
  Downloading responses-0.18.0-py3-none-any.whl (38 kB)
Collecting huggingface-hub<1.0.0,>=0.2.0
  Downloading huggingface_hub-0.10.1-py3-none-any.whl (163 kB)
[K     |████████████████████████████████| 163 kB 43.7 MB/s 
Collecting multiprocess
  Downloading multiprocess-0.70.14-py37-none-any.whl (115 kB)
[K     |████████████████████████████████| 115 kB 44.8 MB/s 
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
  Downloading urllib3-1.25.11-py2.py3-none-any.whl (127 kB)
[K     |████████████████████████████████| 127 kB 40.6 MB/s 
Collecting multiprocess
  D

In [None]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.23.1-py3-none-any.whl (5.3 MB)
[K     |████████████████████████████████| 5.3 MB 7.1 MB/s 
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 58.6 MB/s 
Installing collected packages: tokenizers, transformers
Successfully installed tokenizers-0.13.1 transformers-4.23.1


# Imports

In [None]:
from transformers import BertModel,BertTokenizer
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import random
from torch import nn
import torch
from sklearn.metrics import classification_report, confusion_matrix, roc_curve
import matplotlib.pyplot as plt
import seaborn as sns
from torch.optim import Adam

# Loading Data

In [None]:

df = pd.read_csv('./assignment_data (1).csv')

In [None]:
dff = pd.DataFrame({'qa_pair':[None]*1000,'label':[None]*1000})

In [None]:
anss = list(df['Answer'])[:500]
quess = list(df['Question'])[:500]
qa_pair  = [' [SEP]'.join([i[0],i[1]]) for i in zip(quess,anss)]
dff.iloc[:500,0] = qa_pair
dff.iloc[:500,1] = 1.0

anss = list(df['Answer'])[500:750]
quess = list(df['Question'])[500:750]

random.shuffle(quess)
random.shuffle(anss)
wrng_pairs = [' [SEP]'.join([i[0],i[1]]) for i in zip(list(df['Question'])[500:750],list(df['Answer'])[500:750])]
dff.iloc[500:750,0] =wrng_pairs
dff.iloc[500:750,1] = 0.0

anss = list(df['Question'][750:1000])
quess = list(df['Question'][750:1000])
wrng_pairs2 = [' [SEP]'.join([i[0],i[1]]) for i in zip(quess,anss)]
dff.iloc[750:1000,0] =wrng_pairs2
dff.iloc[750:1000,1] = 0.0


In [None]:
df2 = dff.sample(frac=1).reset_index(drop=True)
df2.head()

Unnamed: 0,qa_pair,label
0,How much larger is the Earth's diameter? [SEP]...,1.0
1,What does V-fib stand for? [SEP] ventricular f...,0.0
2,What is the contraction of a muscle in respons...,1.0
3,What is the process of crystal formation calle...,1.0
4,What animals are used as livestock in some par...,0.0


# Data Preparation
* Stratified split on label with 80:20 ratio
* Encoding with BERT tokenizer

In [None]:

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(df2['qa_pair'],df2['label'],stratify=df2['label'],random_state=0)

In [None]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

In [None]:
# Tokenize the input (takes some time) 
# here tokenizer using from bert-base-cased
X_train = tokenizer.batch_encode_plus(
    x_train.tolist(),
    add_special_tokens=True,
    max_length=50, # average length
    truncation=True,
    pad_to_max_length=True, 
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)
X_test = tokenizer.batch_encode_plus(
    x_test.tolist(),
    add_special_tokens=True,
    max_length=50,
    truncation=True,
    pad_to_max_length=True, 
    return_token_type_ids = False,
    return_attention_mask = True,
    verbose = True)



In [None]:
tokenizer.decode(X_train['input_ids'][0])

'[CLS] what characteristic of a liquid is usually close to that of a solid? [SEP] what characteristic of a liquid is usually close to that of a solid? [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]'

In [None]:
# for train set
train_seq = torch.tensor(X_train['input_ids'])
train_mask = torch.tensor(X_train['attention_mask'])
train_y = torch.tensor(y_train.tolist())

# for validation set
val_seq = torch.tensor(X_test['input_ids'])
val_mask = torch.tensor(X_test['attention_mask'])
val_y = torch.tensor(y_test.tolist())

train_data = torch.utils.data.TensorDataset(train_seq, train_mask, train_y)
val_data = torch.utils.data.TensorDataset(val_seq, val_mask, val_y)

batch_size = 10
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size)
val_loader = torch.utils.data.DataLoader(val_data, batch_size=batch_size)

In [None]:

print(len(X_train['input_ids'][0]))
print(len(X_test['input_ids'][0]))

50
50


# Model Preparation
#### ***Architecture***


* Input Layer: Input Words(encoded) and Attention Masks from  BERT ()
* Embedding Layer: Hidden states of BERT
* * Freezed the BERT layer for training to reduce time.
* MaxPool Layer: reducing the dimension of BERT output from 3 to 1.
* Dense Layer: Reducing the output of prev layer to 128 neurons with RELU 
* Dropout Layer: Dropping Information for regularisation by 30%
* Dense Layer: Reducing the output of prev layer to 32 neurons with RELU
* Dense Layer: Reducing the output of prev layer to 1 neuron with SIGMOID

#### ***Configurations***
* Optimizer: Adam optimizer with learning rate: 0.0005
* Loss Function: BinaryCrossEntropy
* Epochs: 6
* Batch_size: 32
* metric: Accuracy 

(changed some layers later for performance)

In [None]:
class BertArch(nn.Module):

    def __init__(self, bert , dropout=0.3):

        super(BertArch, self).__init__()

        self.bert = bert
        self.drop = nn.Dropout(dropout)
        self.pool =  nn.AdaptiveMaxPool1d(328)
        self.fc1 = nn.Linear(328, 128)
        self.fc2 = nn.Linear(128, 32)
        self.layer_out = nn.Linear(32, 1) 
        self.sigmoid = nn.Sigmoid()
        

    def forward(self, input_id, mask):

        _, pooled_output = self.bert(input_ids= input_id, attention_mask=mask,return_dict=False) #
        x = self.pool(pooled_output)
        x = self.fc1(x)
        x = self.drop(x)
        x = self.fc2(x)
        x = self.sigmoid(self.layer_out(x))
        return x

In [None]:
device = torch.device("cuda")
# pass the pre-trained BERT to our define architecture
bert = BertModel.from_pretrained('bert-base-uncased')
# # freeze all the parameters
# for param in bert.parameters():
#     param.requires_grad = False
model = BertArch(bert)

Downloading:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [None]:
epochs = 3
optimizer = Adam(model.parameters(), lr= 1e-5)
cross_entropy = nn.BCELoss()

In [None]:
def save_checkpoint(filename, epoch, model, optimizer):
    state = {
        'epoch': epoch,
        'model': model,
        'optimizer': optimizer,}
    torch.save(state, filename)

In [None]:
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")


# push the model to GPU
if use_cuda:
    model = model.cuda()
    criterion = cross_entropy.cuda()

for i in range(epochs):
    total_acc_train = 0
    total_loss_train = 0
    for data in train_loader:
        embed,mask,label = data
        embed = embed.squeeze(1).to(device)
        mask = mask.to(device)
        label = label.unsqueeze(1).to(device)

        output = model(embed,mask)
        # print(output)
        batch_loss = cross_entropy(output, label)
        total_loss_train += batch_loss.item()
        acc = ( torch.round(output) == label).sum().item()
        total_acc_train += acc
        model.zero_grad()
        batch_loss.backward()
        optimizer.step()
        # break

    total_acc_val = 0
    total_loss_val = 0
    best_val_acc = 0
    with torch.no_grad():
        for data in val_loader:

            valembed,valmask,vallabel = data
            
            valembed = valembed.squeeze(1).to(device)
            valmask = valmask.to(device)
            vallabel = vallabel.unsqueeze(1).to(device)

            valoutput = model(valembed, valmask)
            # print("val output: ",valoutput)
            batch_loss = cross_entropy(output, vallabel)
            total_loss_val += batch_loss.item()
            acc = ( torch.round(output) == vallabel).sum().item()
            print(acc)
            total_acc_val += acc
            # break
        
        if (total_acc_val / len(val_data)) > best_val_acc:
            file_name = 'topic_saved_weights.pt'
            save_checkpoint(file_name, i, model, optimizer)    
    print(f'Epochs: {i + 1} | Train Loss: {round(total_loss_train / len(train_data),4)} \
        | Train Accuracy: {round(total_acc_train / len(train_data),4)} \
        | Val Loss: {round(total_loss_val / len(val_data),4)} \
        | Val Accuracy: {round(total_acc_val / len(val_data),4)}')
    print('----')
    #calculate output


# Result

In [None]:
# get predictions for test data

test_seq = torch.tensor(X_test['input_ids'])
test_mask = torch.tensor(X_test['attention_mask'])
test_y = torch.tensor(y_test.tolist())
path = 'topic_saved_weights.pt'
checkpoint = torch.load(path,map_location=device)
model = checkpoint.get("model")
with torch.no_grad():
    preds = model(test_seq.squeeze(1).to(device), test_mask.to(device))
    # print(preds)
    preds = torch.round(preds).detach().cpu().numpy()
print(classification_report(test_y, preds))

              precision    recall  f1-score   support

         0.0       0.53      0.22      0.31       125
         1.0       0.51      0.81      0.62       125

    accuracy                           0.51       250
   macro avg       0.52      0.51      0.47       250
weighted avg       0.52      0.51      0.47       250



In [None]:
from sklearn.metrics import accuracy_score
print(accuracy_score(preds.squeeze(),test_y))

0.512


In [None]:
test_y

tensor([0., 1., 0., 1., 1., 0., 0., 1., 0., 1., 0., 0., 1., 1., 1., 0., 0., 1.,
        1., 1., 0., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1., 0., 1., 0., 1.,
        1., 1., 1., 0., 1., 1., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0.,
        0., 0., 1., 0., 1., 0., 0., 0., 1., 1., 0., 1., 0., 0., 0., 1., 0., 1.,
        0., 0., 1., 0., 0., 0., 1., 1., 0., 1., 1., 0., 1., 1., 1., 0., 1., 0.,
        1., 1., 1., 1., 1., 0., 1., 1., 0., 1., 0., 1., 0., 1., 1., 0., 0., 0.,
        0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1.,
        1., 1., 1., 0., 1., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 1., 0., 0.,
        1., 1., 1., 1., 0., 0., 0., 1., 0., 0., 1., 0., 1., 1., 0., 0., 0., 1.,
        1., 0., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0.,
        0., 1., 1., 0., 0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 1., 0., 1., 1.,
        0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 0., 1., 0.,
        1., 1., 1., 1., 1., 1., 0., 0., 

In [None]:
preds.squeeze()

array([0., 1., 1., 1., 1., 0., 0., 0., 1., 1., 1., 1., 1., 1., 0., 1., 0.,
       1., 0., 1., 1., 1., 0., 0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 0., 1., 1., 1., 1., 0., 1., 0., 1., 1., 1., 1., 1., 1., 1.,
       0., 1., 1., 1., 1., 1., 0., 1., 0., 1., 1., 1., 1., 1., 1., 0., 0.,
       1., 1., 1., 1., 0., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 1.,
       1., 0., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 0., 1.,
       1., 1., 1., 1., 1., 1., 0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 0., 1., 1., 1., 1., 0., 1., 1., 1., 0., 1., 0., 1., 1., 1.,
       1., 1., 0., 1., 0., 0., 1., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 0., 1.,
       1., 1., 1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 0., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 1.,
       1., 1., 0., 1., 1.