# Precondition Inference

Fine-tune ALBERT for Precondition Inference
##### Submitted By: Sneha Kedia (2492819731)

### Installation of libraries

In [1]:
!pip install datasets==1.0.1
!pip install transformers==3.1.0

Collecting datasets==1.0.1
  Downloading datasets-1.0.1-py3-none-any.whl (1.8 MB)
[K     |████████████████████████████████| 1.8 MB 7.5 MB/s 
Collecting xxhash
  Downloading xxhash-3.0.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)
[K     |████████████████████████████████| 212 kB 44.2 MB/s 
Installing collected packages: xxhash, datasets
Successfully installed datasets-1.0.1 xxhash-3.0.0
Collecting transformers==3.1.0
  Downloading transformers-3.1.0-py3-none-any.whl (884 kB)
[K     |████████████████████████████████| 884 kB 7.2 MB/s 
Collecting tokenizers==0.8.1.rc2
  Downloading tokenizers-0.8.1rc2-cp37-cp37m-manylinux1_x86_64.whl (3.0 MB)
[K     |████████████████████████████████| 3.0 MB 38.9 MB/s 
[?25hCollecting sacremoses
  Downloading sacremoses-0.0.49-py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 49.4 MB/s 
Collecting sentencepiece!=0.1.92
  Downloading sentencepiece-0.1.96-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_

### Imports

In [2]:
import torch
import torch.nn as nn
import os
import matplotlib.pyplot as plt
import copy
import torch.optim as optim
import random
import numpy as np
import pandas as pd
from torch.utils.data import DataLoader, Dataset
from torch.cuda.amp import autocast, GradScaler
from tqdm import tqdm
from transformers import AutoTokenizer, AutoModel, AdamW, get_linear_schedule_with_warmup
from datasets import load_dataset, load_metric

os.environ["TOKENIZERS_PARALLELISM"] = "false"

PyTorch version 1.10.0+cu111 available.
TensorFlow version 2.8.0 available.


### Configuring COLAB GPU

Checking that we are using 100% of GPU memory

In [3]:
!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
!pip -q install gputil
!pip -q install psutil
!pip -q install humanize
import psutil
import humanize
import os
import GPUtil as GPU
GPUs = GPU.getGPUs()
gpu = GPUs[0]
def printm():
    process = psutil.Process(os.getpid())
    print("Gen RAM Free: " + humanize.naturalsize( psutil.virtual_memory().available ), " | Proc size: " + humanize.naturalsize( process.memory_info().rss))
    print("GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total {3:.0f}MB".format(gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil*100, gpu.memoryTotal))
printm()

  Building wheel for gputil (setup.py) ... [?25l[?25hdone
Gen RAM Free: 11.9 GB  | Proc size: 1.6 GB
GPU RAM Free: 15109MB | Used: 0MB | Util   0% | Total 15109MB


### Read training, dev and unlabeled test data

The following provides a starting code (Python 3) of how to read the labeled training and dev sentence pairs, and unlabeled test sentence pairs, into lists.

In [4]:
from google.colab import files
import csv
import pandas as pd

In [5]:
uploaded = files.upload()

Saving pnli_dev.csv to pnli_dev.csv
Saving pnli_test_unlabeled.csv to pnli_test_unlabeled.csv
Saving pnli_train.csv to pnli_train.csv


In [6]:
def loadDataset(filename, filetype):
    dataset = []
    sentence1 = []
    sentence2 = []
    labels = []

    if filetype == 'train' or filetype == 'valid':
        with open(filename) as fp:
            csvreader = csv.reader(fp)
            for x in csvreader:
                # x[2] will be the label (0 or 1). x[0] and x[1] will be the sentence pairs.
                sentence1.append(x[0].strip())
                sentence2.append(x[1].strip())
                labels.append(int(x[2]))
                dataset.append([x[0].strip(), x[1].strip(), int(x[2])])
        return (dataset, sentence1, sentence2, labels)

    elif filetype == 'test':
        with open(filename) as fp:
            csvreader = csv.reader(fp)
            for x in csvreader:
                # x[2] will be the label (0 or 1). x[0] and x[1] will be the sentence pairs.
                dataset.append(x)
                sentence1.append(x[0].strip())
                sentence2.append(x[1].strip())
                dataset.append([x[0].strip(), x[1].strip()])
        return (dataset, sentence1, sentence2)

train_filename = 'pnli_train.csv'
valid_filename = 'pnli_dev.csv'

(train_dataset, train_sentence1, train_sentence2, train_labels) = loadDataset(train_filename, 'train')
(valid_dataset, valid_sentence1, valid_sentence2, valid_labels) = loadDataset(valid_filename, 'valid')

train_dev_header = ["sentence1","sentence2","label"]
df_train = pd.DataFrame(train_dataset, columns=train_dev_header)
df_val = pd.DataFrame(valid_dataset, columns=train_dev_header)

# df_train['label'] = df_train['label'].astype(int)
# df_val['label'] = df_val['label'].astype(int)

display(df_train.head())
display(df_val.head())

Unnamed: 0,sentence1,sentence2,label
0,Sometimes do exercise.,A person typically desire healthy life.,1
1,Who eats junk foods.,A person typically desire healthy life.,0
2,A person is sick.,A person typically desire healthy life.,1
3,A person is dead.,A person typically desire healthy life.,0
4,A person eats properly and do exercise regularly.,A person typically desire healthy life.,1


Unnamed: 0,sentence1,sentence2,label
0,A person is looking for accuracy.,A person typically desires accurate results.,1
1,A person does not care for accuracy.,A person typically desires accurate results.,0
2,The person double checks their data.,A person typically desires accurate results.,1
3,The person speeds through the experiment.,A person typically desires accurate results.,0
4,A person is studying well.,A person typically desires accurate results.,1


In [7]:
print(df_train.shape)
print(df_val.shape)

(5983, 3)
(1055, 3)


## Main Code Body

You may choose to experiment with different methods using your program. However, you need to embed the training and inference processes at here. We will use your prediction on the unlabeled test data to grade, while checking this part to understand how your method has produced the predictions.

### Classes and functions for model

In [8]:
class CustomDataset(Dataset):

    def __init__(self, data, maxlen, with_labels=True, bert_model='albert-base-v2'):
        self.data = data
        self.tokenizer = AutoTokenizer.from_pretrained(bert_model)  
        self.maxlen = maxlen
        self.with_labels = with_labels 

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        sent1 = str(self.data.loc[index, 'sentence1'])
        sent2 = str(self.data.loc[index, 'sentence2'])

        encoded_pair = self.tokenizer(sent1, sent2, 
                                      padding='max_length',
                                      truncation=True,
                                      max_length=self.maxlen,  
                                      return_tensors='pt')
        
        token_ids = encoded_pair['input_ids'].squeeze(0)
        attn_masks = encoded_pair['attention_mask'].squeeze(0)
        token_type_ids = encoded_pair['token_type_ids'].squeeze(0)

        if self.with_labels:
            label = self.data.loc[index, 'label']
            return token_ids, attn_masks, token_type_ids, label  
        else:
            return token_ids, attn_masks, token_type_ids

In [9]:
class SentencePairClassifier(nn.Module):

    def __init__(self, bert_model="albert-base-v2", freeze_bert=False):
        super(SentencePairClassifier, self).__init__()
        self.bert_layer = AutoModel.from_pretrained(bert_model)

        if bert_model == "albert-base-v2":
            hidden_size = 768
        elif bert_model == "albert-large-v2":
            hidden_size = 1024
        elif bert_model == "albert-xlarge-v2":
            hidden_size = 2048
        elif bert_model == "albert-xxlarge-v2":
            hidden_size = 4096
        elif bert_model == "bert-base-uncased":
            hidden_size = 768

        if freeze_bert:
            for p in self.bert_layer.parameters():
                p.requires_grad = False

        self.cls_layer = nn.Linear(hidden_size, 1)
        self.dropout = nn.Dropout(p=0.1)

    @autocast()
    def forward(self, input_ids, attn_masks, token_type_ids):
        cont_reps, pooler_output = self.bert_layer(input_ids, attn_masks, token_type_ids)
        logits = self.cls_layer(self.dropout(pooler_output))
        return logits

In [10]:
def set_seed(seed):
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    np.random.seed(seed)
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    

def evaluate_loss(net, device, criterion, dataloader):
    net.eval()

    mean_loss = 0
    count = 0

    with torch.no_grad():
        for it, (seq, attn_masks, token_type_ids, labels) in enumerate(tqdm(dataloader)):
            seq, attn_masks, token_type_ids, labels = \
                seq.to(device), attn_masks.to(device), token_type_ids.to(device), labels.to(device)
            logits = net(seq, attn_masks, token_type_ids)
            mean_loss += criterion(logits.squeeze(-1), labels.float()).item()
            count += 1

    return mean_loss / count

In [11]:
print("Creation of the models' folder...")
!mkdir models

Creation of the models' folder...


In [12]:
def train_bert(net, criterion, opti, lr, lr_scheduler, train_loader, val_loader, epochs, iters_to_accumulate):

    best_loss = np.Inf
    best_ep = 1
    nb_iterations = len(train_loader)
    print_every = nb_iterations // 5
    iters = []
    train_losses = []
    val_losses = []

    scaler = GradScaler()

    for ep in range(epochs):

        net.train()
        running_loss = 0.0
        for it, (seq, attn_masks, token_type_ids, labels) in enumerate(tqdm(train_loader)):
            seq, attn_masks, token_type_ids, labels = \
                seq.to(device), attn_masks.to(device), token_type_ids.to(device), labels.to(device)
            
            with autocast():
                logits = net(seq, attn_masks, token_type_ids)
                loss = criterion(logits.squeeze(-1), labels.float())
                loss = loss / iters_to_accumulate

            scaler.scale(loss).backward()

            if (it + 1) % iters_to_accumulate == 0:
                scaler.step(opti)
                scaler.update()
                lr_scheduler.step()
                opti.zero_grad()

            running_loss += loss.item()

            if (it + 1) % print_every == 0:
                print()
                print("Iteration {}/{} of epoch {} complete. Loss : {} "
                      .format(it+1, nb_iterations, ep+1, running_loss / print_every))
                running_loss = 0.0

        val_loss = evaluate_loss(net, device, criterion, val_loader)
        print()
        print("Epoch {} complete! Validation Loss : {}".format(ep+1, val_loss))

        if val_loss < best_loss:
            print("Best validation loss improved from {} to {}".format(best_loss, val_loss))
            print()
            net_copy = copy.deepcopy(net)
            best_loss = val_loss
            best_ep = ep + 1

    path_to_model='models/{}_lr_{}_val_loss_{}_ep_{}.pt'.format(bert_model, lr, round(best_loss, 5), best_ep)
    torch.save(net_copy.state_dict(), path_to_model)
    print("The model has been saved in {}".format(path_to_model))

    del loss
    torch.cuda.empty_cache()

### Tuning the parameters

In [13]:
bert_model = "albert-large-v2"  # 'albert-base-v2', 'albert-large-v2', 'albert-xlarge-v2', 'albert-xxlarge-v2', 'bert-base-uncased', ...
freeze_bert = False
maxlen = 128
bs = 32
iters_to_accumulate = 2
lr = 2e-5
epochs = 10

### Training and validation

In [14]:
set_seed(1)

print("Reading training data...")
train_set = CustomDataset(df_train, maxlen, bert_model)
print("Reading validation data...")
val_set = CustomDataset(df_val, maxlen, bert_model)
train_loader = DataLoader(train_set, batch_size=bs, num_workers=5)
val_loader = DataLoader(val_set, batch_size=bs, num_workers=5)


device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net = SentencePairClassifier(bert_model, freeze_bert=freeze_bert)

if torch.cuda.device_count() > 1:
    print("Let's use", torch.cuda.device_count(), "GPUs!")
    net = nn.DataParallel(net)

net.to(device)

criterion = nn.BCEWithLogitsLoss()
opti = AdamW(net.parameters(), lr=lr, weight_decay=1e-2)
num_warmup_steps = 0
num_training_steps = epochs * len(train_loader)
t_total = (len(train_loader) // iters_to_accumulate) * epochs
lr_scheduler = get_linear_schedule_with_warmup(optimizer=opti, num_warmup_steps=num_warmup_steps, num_training_steps=t_total)

train_bert(net, criterion, opti, lr, lr_scheduler, train_loader, val_loader, epochs, iters_to_accumulate)

Reading training data...


Downloading:   0%|          | 0.00/684 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/760k [00:00<?, ?B/s]

Reading validation data...


  cpuset_checked))


Downloading:   0%|          | 0.00/685 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/71.5M [00:00<?, ?B/s]

 20%|█▉        | 37/187 [00:29<01:56,  1.28it/s]


Iteration 37/187 of epoch 1 complete. Loss : 0.3096947380014368 


 40%|███▉      | 74/187 [00:58<01:29,  1.27it/s]


Iteration 74/187 of epoch 1 complete. Loss : 0.2642440981156117 


 59%|█████▉    | 111/187 [01:27<01:00,  1.26it/s]


Iteration 111/187 of epoch 1 complete. Loss : 0.23177079091200958 


 79%|███████▉  | 148/187 [01:56<00:31,  1.25it/s]


Iteration 148/187 of epoch 1 complete. Loss : 0.22115241494533178 


 99%|█████████▉| 185/187 [02:26<00:01,  1.25it/s]


Iteration 185/187 of epoch 1 complete. Loss : 0.21293429587338422 


100%|██████████| 187/187 [02:28<00:00,  1.26it/s]
100%|██████████| 33/33 [00:11<00:00,  2.98it/s]



Epoch 1 complete! Validation Loss : 0.41400139891740045
Best validation loss improved from inf to 0.41400139891740045



 20%|█▉        | 37/187 [00:29<02:00,  1.24it/s]


Iteration 37/187 of epoch 2 complete. Loss : 0.20310031179640745 


 40%|███▉      | 74/187 [00:59<01:31,  1.23it/s]


Iteration 74/187 of epoch 2 complete. Loss : 0.2129944709909929 


 59%|█████▉    | 111/187 [01:29<01:01,  1.24it/s]


Iteration 111/187 of epoch 2 complete. Loss : 0.17467704757645325 


 79%|███████▉  | 148/187 [01:59<00:31,  1.23it/s]


Iteration 148/187 of epoch 2 complete. Loss : 0.18996905454912702 


 99%|█████████▉| 185/187 [02:29<00:01,  1.23it/s]


Iteration 185/187 of epoch 2 complete. Loss : 0.16863335024666143 


100%|██████████| 187/187 [02:31<00:00,  1.23it/s]
100%|██████████| 33/33 [00:11<00:00,  2.91it/s]



Epoch 2 complete! Validation Loss : 0.3649486930984439
Best validation loss improved from 0.41400139891740045 to 0.3649486930984439



 20%|█▉        | 37/187 [00:30<02:01,  1.23it/s]


Iteration 37/187 of epoch 3 complete. Loss : 0.159418234148541 


 40%|███▉      | 74/187 [01:00<01:31,  1.23it/s]


Iteration 74/187 of epoch 3 complete. Loss : 0.16208702587598078 


 59%|█████▉    | 111/187 [01:30<01:01,  1.23it/s]


Iteration 111/187 of epoch 3 complete. Loss : 0.14083634306852882 


 79%|███████▉  | 148/187 [02:00<00:31,  1.23it/s]


Iteration 148/187 of epoch 3 complete. Loss : 0.14134396793874535 


 99%|█████████▉| 185/187 [02:30<00:01,  1.23it/s]


Iteration 185/187 of epoch 3 complete. Loss : 0.13098932571105054 


100%|██████████| 187/187 [02:32<00:00,  1.23it/s]
100%|██████████| 33/33 [00:11<00:00,  2.91it/s]



Epoch 3 complete! Validation Loss : 0.3434623841083411
Best validation loss improved from 0.3649486930984439 to 0.3434623841083411



 20%|█▉        | 37/187 [00:30<02:02,  1.23it/s]


Iteration 37/187 of epoch 4 complete. Loss : 0.12807224217701602 


 40%|███▉      | 74/187 [01:00<01:31,  1.23it/s]


Iteration 74/187 of epoch 4 complete. Loss : 0.11898109226210697 


 59%|█████▉    | 111/187 [01:30<01:01,  1.23it/s]


Iteration 111/187 of epoch 4 complete. Loss : 0.10994577382666033 


 79%|███████▉  | 148/187 [02:00<00:31,  1.23it/s]


Iteration 148/187 of epoch 4 complete. Loss : 0.11335912271327264 


 99%|█████████▉| 185/187 [02:30<00:01,  1.23it/s]


Iteration 185/187 of epoch 4 complete. Loss : 0.10784145716477085 


100%|██████████| 187/187 [02:32<00:00,  1.23it/s]
100%|██████████| 33/33 [00:11<00:00,  2.91it/s]



Epoch 4 complete! Validation Loss : 0.39421116944515344


 20%|█▉        | 37/187 [00:30<02:02,  1.23it/s]


Iteration 37/187 of epoch 5 complete. Loss : 0.09281898279850548 


 40%|███▉      | 74/187 [01:00<01:31,  1.23it/s]


Iteration 74/187 of epoch 5 complete. Loss : 0.07688312781219547 


 59%|█████▉    | 111/187 [01:30<01:01,  1.23it/s]


Iteration 111/187 of epoch 5 complete. Loss : 0.09141195321304572 


 79%|███████▉  | 148/187 [02:00<00:31,  1.22it/s]


Iteration 148/187 of epoch 5 complete. Loss : 0.09008491779300007 


 99%|█████████▉| 185/187 [02:30<00:01,  1.23it/s]


Iteration 185/187 of epoch 5 complete. Loss : 0.09120750749433362 


100%|██████████| 187/187 [02:32<00:00,  1.23it/s]
100%|██████████| 33/33 [00:11<00:00,  2.92it/s]



Epoch 5 complete! Validation Loss : 0.4170467591646946


 20%|█▉        | 37/187 [00:30<02:01,  1.23it/s]


Iteration 37/187 of epoch 6 complete. Loss : 0.07514344623966797 


 40%|███▉      | 74/187 [01:00<01:31,  1.23it/s]


Iteration 74/187 of epoch 6 complete. Loss : 0.06401765530274527 


 59%|█████▉    | 111/187 [01:30<01:01,  1.23it/s]


Iteration 111/187 of epoch 6 complete. Loss : 0.07739141729433795 


 79%|███████▉  | 148/187 [02:00<00:31,  1.23it/s]


Iteration 148/187 of epoch 6 complete. Loss : 0.09415011460313925 


 99%|█████████▉| 185/187 [02:30<00:01,  1.23it/s]


Iteration 185/187 of epoch 6 complete. Loss : 0.09078848120328542 


100%|██████████| 187/187 [02:32<00:00,  1.22it/s]
100%|██████████| 33/33 [00:11<00:00,  2.90it/s]



Epoch 6 complete! Validation Loss : 0.41175023353461065


 20%|█▉        | 37/187 [00:30<02:02,  1.23it/s]


Iteration 37/187 of epoch 7 complete. Loss : 0.06780752363438541 


 40%|███▉      | 74/187 [01:00<01:32,  1.22it/s]


Iteration 74/187 of epoch 7 complete. Loss : 0.05836113582591753 


 59%|█████▉    | 111/187 [01:30<01:02,  1.23it/s]


Iteration 111/187 of epoch 7 complete. Loss : 0.05392784021190695 


 79%|███████▉  | 148/187 [02:00<00:31,  1.23it/s]


Iteration 148/187 of epoch 7 complete. Loss : 0.062248720510585887 


 99%|█████████▉| 185/187 [02:30<00:01,  1.23it/s]


Iteration 185/187 of epoch 7 complete. Loss : 0.053896648129700006 


100%|██████████| 187/187 [02:32<00:00,  1.23it/s]
100%|██████████| 33/33 [00:11<00:00,  2.91it/s]



Epoch 7 complete! Validation Loss : 0.48240599126526806


 20%|█▉        | 37/187 [00:30<02:02,  1.23it/s]


Iteration 37/187 of epoch 8 complete. Loss : 0.04504957196076174 


 40%|███▉      | 74/187 [01:00<01:31,  1.23it/s]


Iteration 74/187 of epoch 8 complete. Loss : 0.036355540911490854 


 59%|█████▉    | 111/187 [01:30<01:01,  1.23it/s]


Iteration 111/187 of epoch 8 complete. Loss : 0.031285696511937154 


 79%|███████▉  | 148/187 [02:00<00:31,  1.23it/s]


Iteration 148/187 of epoch 8 complete. Loss : 0.03189814805581763 


 99%|█████████▉| 185/187 [02:30<00:01,  1.23it/s]


Iteration 185/187 of epoch 8 complete. Loss : 0.03805752469830819 


100%|██████████| 187/187 [02:32<00:00,  1.23it/s]
100%|██████████| 33/33 [00:11<00:00,  2.91it/s]



Epoch 8 complete! Validation Loss : 0.5196360909577572


 20%|█▉        | 37/187 [00:30<02:02,  1.23it/s]


Iteration 37/187 of epoch 9 complete. Loss : 0.03171226033009589 


 40%|███▉      | 74/187 [01:00<01:32,  1.23it/s]


Iteration 74/187 of epoch 9 complete. Loss : 0.020970824571024324 


 59%|█████▉    | 111/187 [01:30<01:01,  1.23it/s]


Iteration 111/187 of epoch 9 complete. Loss : 0.022864087755363936 


 79%|███████▉  | 148/187 [02:00<00:31,  1.23it/s]


Iteration 148/187 of epoch 9 complete. Loss : 0.02036288971509281 


 99%|█████████▉| 185/187 [02:30<00:01,  1.23it/s]


Iteration 185/187 of epoch 9 complete. Loss : 0.03092137652424139 


100%|██████████| 187/187 [02:32<00:00,  1.23it/s]
100%|██████████| 33/33 [00:11<00:00,  2.92it/s]



Epoch 9 complete! Validation Loss : 0.5563212235768636


 20%|█▉        | 37/187 [00:30<02:02,  1.23it/s]


Iteration 37/187 of epoch 10 complete. Loss : 0.025341721050239897 


 40%|███▉      | 74/187 [01:00<01:31,  1.23it/s]


Iteration 74/187 of epoch 10 complete. Loss : 0.013916522679800118 


 59%|█████▉    | 111/187 [01:30<01:02,  1.22it/s]


Iteration 111/187 of epoch 10 complete. Loss : 0.016716930203492176 


 79%|███████▉  | 148/187 [02:00<00:31,  1.23it/s]


Iteration 148/187 of epoch 10 complete. Loss : 0.017090223710732284 


 99%|█████████▉| 185/187 [02:30<00:01,  1.23it/s]


Iteration 185/187 of epoch 10 complete. Loss : 0.02131367802015833 


100%|██████████| 187/187 [02:32<00:00,  1.23it/s]
100%|██████████| 33/33 [00:11<00:00,  2.92it/s]



Epoch 10 complete! Validation Loss : 0.5607575923204422
The model has been saved in models/albert-large-v2_lr_2e-05_val_loss_0.34346_ep_3.pt


In [15]:
printm()

Gen RAM Free: 9.6 GB  | Proc size: 6.1 GB
GPU RAM Free: 15109MB | Used: 0MB | Util   0% | Total 15109MB


### Prediction

In [16]:
print("Creation of the results' folder...")
!mkdir results

Creation of the results' folder...


In [17]:
def get_probs_from_logits(logits):
    probs = torch.sigmoid(logits.unsqueeze(-1))
    return probs.detach().cpu().numpy()

def test_prediction(net, device, dataloader, with_labels=True, result_file="results/output.txt"):
    net.eval()
    w = open(result_file, 'w')
    probs_all = []

    with torch.no_grad():
        if with_labels:
            for seq, attn_masks, token_type_ids, _ in tqdm(dataloader):
                seq, attn_masks, token_type_ids = seq.to(device), attn_masks.to(device), token_type_ids.to(device)
                logits = net(seq, attn_masks, token_type_ids)
                probs = get_probs_from_logits(logits.squeeze(-1)).squeeze(-1)
                probs_all += probs.tolist()
        else:
            for seq, attn_masks, token_type_ids in tqdm(dataloader):
                seq, attn_masks, token_type_ids = seq.to(device), attn_masks.to(device), token_type_ids.to(device)
                logits = net(seq, attn_masks, token_type_ids)
                probs = get_probs_from_logits(logits.squeeze(-1)).squeeze(-1)
                probs_all += probs.tolist()

    w.writelines(str(prob)+'\n' for prob in probs_all)
    w.close()

In [18]:
test_filename = 'pnli_test_unlabeled.csv'
(test_dataset, test_sentence1, test_sentence2) = loadDataset(test_filename, 'test')
test_header = ["sentence1","sentence2"]
df_test = pd.DataFrame(test_dataset, columns=test_header)

print(df_test.shape)
display(df_test.head())

(9700, 2)


Unnamed: 0,sentence1,sentence2
0,The people want to have a romantic and pleasan...,People typically does desire to smell violets.
1,The people want to have a romantic and pleasan...,People typically does desire to smell violets.
2,The contract is to buy products from you.,Getting contract typically cause to make money...
3,The contract is to buy products from you.,Getting contract typically cause to make money...
4,Train station is closed.,Line can typically be used to move train along...


In [19]:
path_to_model = '/content/models/albert-large-v2_lr_2e-05_val_loss_0.34346_ep_3.pt'

path_to_output_file = 'results/output.txt'

print("Reading test data...")
test_set = CustomDataset(data=df_test, maxlen=maxlen, with_labels=False, bert_model=bert_model)
test_loader = DataLoader(test_set, batch_size=bs, num_workers=5)

model = SentencePairClassifier(bert_model)
if torch.cuda.device_count() > 1:
    print("Let's use", torch.cuda.device_count(), "GPUs!")
    model = nn.DataParallel(model)

print()
print("Loading the weights of the model...")
model.load_state_dict(torch.load(path_to_model))
model.to(device)

print("Predicting on test data...")
test_prediction(net=model, device=device, dataloader=test_loader, with_labels=False, result_file=path_to_output_file)
print()
print("Predictions are available in : {}".format(path_to_output_file))

probs_test = pd.read_csv(path_to_output_file, header=None)[0]
threshold = 0.5
results=(probs_test>=threshold).astype('uint8')

Reading test data...


Downloading:   0%|          | 0.00/760k [00:00<?, ?B/s]

  cpuset_checked))



Loading the weights of the model...
Predicting on test data...


100%|██████████| 304/304 [01:40<00:00,  3.02it/s]


Predictions are available in : results/output.txt





In [20]:
results

0       1
1       1
2       0
3       0
4       0
       ..
9695    0
9696    1
9697    1
9698    0
9699    0
Name: 0, Length: 9700, dtype: uint8

In [24]:
results = results[:4850]

### Output Prediction Result File

You will need to submit a prediction result file. It should have 2028 lines, every line should be either 0 or 1, which is your model's prediction on the respective test set instance.

In [25]:
# suppose you had your model's predictions on the 2028 test cases read from test_enc_unlabeled.tsv, and 
#those results are in the list called 'results'
assert (len(results) == 4850)

In [26]:
# make sure the results are not float numbers, but intergers 0 and 1
results = [int(x) for x in results]

In [27]:
# write your prediction results to 'upload_predictions.txt' and upload that later
with open('upload_predictions.txt', 'w', encoding = 'utf-8') as fp:
    for x in results:
        fp.write(str(x) + '\n')

### Evaluation

In [28]:
path_to_model = '/content/models/albert-large-v2_lr_2e-05_val_loss_0.34346_ep_3.pt'

path_to_val_output_file = 'results/val_output.txt'

print("Reading test data...")
test_set = CustomDataset(data=df_val, maxlen=maxlen, with_labels=True, bert_model=bert_model)
test_loader = DataLoader(test_set, batch_size=bs, num_workers=5)

model = SentencePairClassifier(bert_model)
if torch.cuda.device_count() > 1:
    print("Let's use", torch.cuda.device_count(), "GPUs!")
    model = nn.DataParallel(model)

print()
print("Loading the weights of the model...")
model.load_state_dict(torch.load(path_to_model))
model.to(device)

print("Predicting on test data...")
test_prediction(net=model, device=device, dataloader=test_loader, with_labels=True, result_file=path_to_val_output_file)
print()
print("Predictions are available in : {}".format(path_to_val_output_file))

labels_test = df_val['label']

probs_test = pd.read_csv(path_to_val_output_file, header=None)[0]
threshold = 0.5
preds_test=(probs_test>=threshold).astype('uint8')

metric = load_metric("glue", "mrpc")
metric._compute(predictions=preds_test, references=labels_test)

Reading test data...


  cpuset_checked))



Loading the weights of the model...
Predicting on test data...


100%|██████████| 33/33 [00:10<00:00,  3.07it/s]



Predictions are available in : results/val_output.txt


https://raw.githubusercontent.com/huggingface/datasets/1.0.1/metrics/glue/glue.py not found in cache or force_download set to True, downloading to /root/.cache/huggingface/datasets/tmpl03tq6p8


Downloading:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

storing https://raw.githubusercontent.com/huggingface/datasets/1.0.1/metrics/glue/glue.py in cache at /root/.cache/huggingface/datasets/50d5843bbbbd80c47809bc76a5b03c0fd87d068509b0060103ae8182e4f5cfb9.ec871b06a00118091ec63eff0a641fddcb8d3c7cd52e855bbb2be28944df4b82.py
creating metadata file for /root/.cache/huggingface/datasets/50d5843bbbbd80c47809bc76a5b03c0fd87d068509b0060103ae8182e4f5cfb9.ec871b06a00118091ec63eff0a641fddcb8d3c7cd52e855bbb2be28944df4b82.py
Checking /root/.cache/huggingface/datasets/50d5843bbbbd80c47809bc76a5b03c0fd87d068509b0060103ae8182e4f5cfb9.ec871b06a00118091ec63eff0a641fddcb8d3c7cd52e855bbb2be28944df4b82.py for additional imports.
Creating main folder for metric https://raw.githubusercontent.com/huggingface/datasets/1.0.1/metrics/glue/glue.py at /root/.cache/huggingface/modules/datasets_modules/metrics/glue
Creating specific version folder for metric https://raw.githubusercontent.com/huggingface/datasets/1.0.1/metrics/glue/glue.py at /root/.cache/huggingface/mod

{'accuracy': 0.8540284360189574, 'f1': 0.8605072463768116}

In [29]:
from sklearn import metrics

# Confusion Matrix
from sklearn.metrics import confusion_matrix
print(metrics.confusion_matrix(labels_test, preds_test))
# Accuracy
from sklearn.metrics import accuracy_score
print(metrics.accuracy_score(labels_test, preds_test))
# Recall
from sklearn.metrics import recall_score
print(metrics.recall_score(labels_test, preds_test, average=None))
# Precision
from sklearn.metrics import precision_score
print(metrics.precision_score(labels_test, preds_test, average=None))

[[426  75]
 [ 79 475]]
0.8540284360189574
[0.8502994  0.85740072]
[0.84356436 0.86363636]
