## QNLI : Question Natural Language Inference

The Question Natural Language Inference(QNLI) task is a sentence pair classification task. It consists of sentence pairs drawn from the Stanford Question Answering Dataset(SQUAD) and reframed as binary textual entailment.

See [Squad website](https://rajpurkar.github.io/SQuAD-explorer/) for more info on the original dataset.

In [1]:
import numpy as np
import pandas as pd
import os
import sys
import csv
from sklearn import metrics
from sklearn.metrics import classification_report

sys.path.append("../") 
from bert_sklearn import BertClassifier
from bert_sklearn import load_model

DATADIR = os.getcwd() + '/glue_data'

In [2]:
%%bash
python3 download_glue_data.py --data_dir glue_data --tasks QNLI 

Downloading and extracting QNLI...
	Completed!


In [3]:
"""
QNLI train data size: 108436 
QNLI dev data size: 5732 
"""
def read_tsv(filename,quotechar=None):
    with open(filename, "r", encoding='utf-8') as f:
        return list(csv.reader(f,delimiter="\t",quotechar=quotechar))
    
def get_qnli_df(filename):
    rows = read_tsv(filename)
    df=pd.DataFrame(rows[1:],columns=rows[0])
    df=df[['question','sentence','label']]
    df = df[pd.notnull(df['label'])]
    df.columns=['text_a','text_b','label']
    return df

def get_qnli_data(train_file = DATADIR+'/QNLI/train.tsv', 
                   dev_file =  DATADIR+'/QNLI/dev.tsv'):
    
    train = get_qnli_df(train_file)
    print("QNLI train data size: %d "%(len(train)))
    dev = get_qnli_df(dev_file)
    print("QNLI dev data size: %d "%(len(dev)))

    label_list = np.unique(train['label'].values)
    return train,dev,label_list  
                  
train,dev,label_list =  get_qnli_data()             

QNLI train data size: 108436 
QNLI dev data size: 5732 


In [4]:
print(label_list)

['entailment' 'not_entailment']


In [12]:
train.head()

Unnamed: 0,text_a,text_b,label
0,What is the Grotto at Notre Dame?,"Immediately behind the basilica is the Grotto,...",entailment
1,What is the Grotto at Notre Dame?,"It is a replica of the grotto at Lourdes, Fran...",not_entailment
2,What sits on top of the Main Building at Notre...,Atop the Main Building's gold dome is a golden...,entailment
3,What sits on top of the Main Building at Notre...,Next to the Main Building is the Basilica of t...,not_entailment
4,When did the Scholastic Magazine of Notre dame...,"Begun as a one-page journal in September 1876,...",entailment


In [4]:
%%time

X_train = train[['text_a','text_b']]
y_train = train['label']

# define model
model = BertClassifier()
model.label_list = label_list
model.epochs = 3
model.validation_fraction = 0.05
model.learning_rate = 4e-5
model.max_seq_length = 96

print('\n',model,'\n')

# fit model
model.fit(X_train, y_train)

# test model on dev
test = dev
X_test = test[['text_a','text_b']]
y_test = test['label']

# make predictions
y_pred = model.predict(X_test)
print("Accuracy: %0.2f%%"%(metrics.accuracy_score(y_pred,y_test) * 100))
print(classification_report(y_test, y_pred, target_names=label_list))

Building sklearn classifier...

 BertClassifier(bert_model='bert-base-uncased', epochs=4, eval_batch_size=8,
        fp16=False, gradient_accumulation_steps=1,
        label_list=array(['entailment', 'not_entailment'], dtype=object),
        learning_rate=4e-05, local_rank=-1, logfile='bert_sklearn.log',
        loss_scale=0, max_seq_length=96, num_mlp_hiddens=500,
        num_mlp_layers=0, random_state=42, restore_file=None,
        train_batch_size=32, use_cuda=True, validation_fraction=0.05,
        warmup_proportion=0.1) 

Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 103015, validation data size: 5421


Training: 100%|██████████| 3220/3220 [44:42<00:00,  1.48it/s, loss=0.406]
                                                             

Epoch 1, Train loss : 0.4062, Val loss: 0.3141, Val accy = 87.42%


Training: 100%|██████████| 3220/3220 [48:23<00:00,  1.40it/s, loss=0.228]
                                                             

Epoch 2, Train loss : 0.2277, Val loss: 0.3287, Val accy = 87.73%


Training: 100%|██████████| 3220/3220 [48:07<00:00,  1.42it/s, loss=0.121]
                                                             

Epoch 3, Train loss : 0.1213, Val loss: 0.4244, Val accy = 87.29%


Training: 100%|██████████| 3220/3220 [47:51<00:00,  1.37it/s, loss=0.08]  
                                                             

Epoch 4, Train loss : 0.0800, Val loss: 0.5211, Val accy = 87.14%


                                                             

Accuracy: 88.57%
                precision    recall  f1-score   support

    entailment       0.88      0.89      0.89      2866
not_entailment       0.89      0.88      0.89      2866

     micro avg       0.89      0.89      0.89      5732
     macro avg       0.89      0.89      0.89      5732
  weighted avg       0.89      0.89      0.89      5732

CPU times: user 2h 55s, sys: 1h 13min 12s, total: 3h 14min 8s
Wall time: 3h 14min 5s


## with a MLP...

In [5]:
%%time

X_train = train[['text_a','text_b']]
y_train = train['label']

# define model
model = BertClassifier()
model.label_list = label_list
model.epochs = 4
model.validation_fraction = 0.05
model.learning_rate = 4e-5
model.max_seq_length = 96
model.num_mlp_layers = 4

print('\n',model,'\n')

# fit model
model.fit(X_train, y_train)

# test model on dev
test = dev
X_test = test[['text_a','text_b']]
y_test = test['label']

# make predictions
y_pred = model.predict(X_test)
print("Accuracy: %0.2f%%"%(metrics.accuracy_score(y_pred,y_test) * 100))
print(classification_report(y_test, y_pred, target_names=label_list))

Building sklearn classifier...

 BertClassifier(bert_model='bert-base-uncased', epochs=4, eval_batch_size=8,
        fp16=False, gradient_accumulation_steps=1,
        label_list=array(['entailment', 'not_entailment'], dtype=object),
        learning_rate=4e-05, local_rank=-1, logfile='bert_sklearn.log',
        loss_scale=0, max_seq_length=96, num_mlp_hiddens=500,
        num_mlp_layers=4, random_state=42, restore_file=None,
        train_batch_size=32, use_cuda=True, validation_fraction=0.05,
        warmup_proportion=0.1) 

Loading bert-base-uncased model...
Using mlp with D=768,H=500,K=2,n=4
train data size: 103015, validation data size: 5421


Training: 100%|██████████| 3219/3219 [44:29<00:00,  1.17it/s, loss=0.419]
                                                             

Epoch 1, Train loss : 0.4188, Val loss: 0.3102, Val accy = 87.46%


Training: 100%|██████████| 3219/3219 [48:26<00:00,  1.14it/s, loss=0.239]
                                                             

Epoch 2, Train loss : 0.2391, Val loss: 0.3060, Val accy = 87.44%


Training: 100%|██████████| 3219/3219 [48:25<00:00,  1.12it/s, loss=0.139]
                                                             

Epoch 3, Train loss : 0.1390, Val loss: 0.3463, Val accy = 88.01%


Training: 100%|██████████| 3219/3219 [48:36<00:00,  1.17it/s, loss=0.0973]
                                                             

Epoch 4, Train loss : 0.0973, Val loss: 0.3834, Val accy = 87.75%


                                                             

Accuracy: 88.75%
                precision    recall  f1-score   support

    entailment       0.89      0.88      0.89      2866
not_entailment       0.89      0.89      0.89      2866

     micro avg       0.89      0.89      0.89      5732
     macro avg       0.89      0.89      0.89      5732
  weighted avg       0.89      0.89      0.89      5732

