## MNLI : Multi-Genre Natural Language Inference

The Multi-Genre Natural Language Inference(MNLI) task is a sentence pair classification task. It consists of crowdsourced sentence pairs with textual entailment annotations.

See [webisite](http://www.nyu.edu/projects/bowman/multinli/) and [paper](http://www.nyu.edu/projects/bowman/multinli/paper.pdf) for more info.

In [1]:
import numpy as np
import pandas as pd
import os
import sys
import csv
from sklearn import metrics
from sklearn.metrics import classification_report

sys.path.append("../") 
from bert_sklearn import BertClassifier

DATADIR = os.getcwd() + '/glue_data'
#DATADIR = '/data/glue_data'

In [2]:
%%time
%%bash
python3 download_glue_data.py --data_dir glue_data --tasks MNLI 

Downloading and extracting MNLI...
	Completed!
CPU times: user 8 ms, sys: 0 ns, total: 8 ms
Wall time: 25.2 s


In [2]:
"""
MNLI train data size: 392702 
MNLI dev_matched data size: 9815 
MNLI dev_mismatched data size: 9832 
"""
def read_tsv(filename,quotechar=None):
    with open(filename, "r", encoding='utf-8') as f:
        return list(csv.reader(f,delimiter="\t",quotechar=quotechar))
    
def get_mnli_df(filename):
    rows = read_tsv(filename)    
    df=pd.DataFrame(rows[1:],columns=rows[0])
    df=df[['sentence1','sentence2','gold_label']]
    df.columns=['text_a','text_b','label']
    df = df[pd.notnull(df['label'])]                
    return df    

def get_mnli_data(train_file = DATADIR + '/MNLI/train.tsv',
                  dev_matched_file = DATADIR + '/MNLI/dev_matched.tsv',                  
                  dev_mismatched_file = DATADIR + '/MNLI/dev_mismatched.tsv'):
    
    train = get_mnli_df(train_file) 
    print("MNLI train data size: %d "%(len(train)))        
    dev_matched = get_mnli_df(dev_matched_file) 
    print("MNLI dev_matched data size: %d "%(len(dev_matched)))        
    dev_mismatched = get_mnli_df(dev_mismatched_file)
    print("MNLI dev_mismatched data size: %d "%(len(dev_mismatched)))        
    label_list = np.unique(train['label'].values)

    return train,dev_matched,dev_mismatched,label_list

train,dev_matched,dev_mismatched,label_list = get_mnli_data()


MNLI train data size: 392702 
MNLI dev_matched data size: 9815 
MNLI dev_mismatched data size: 9832 


In [5]:
print(label_list)

['contradiction' 'entailment' 'neutral']


In [3]:
train.head()

Unnamed: 0,text_a,text_b,label
0,Conceptually cream skimming has two basic dime...,Product and geography are what make cream skim...,neutral
1,you know during the season and i guess at at y...,You lose the things to the following level if ...,entailment
2,One of our number will carry out your instruct...,A member of my team will execute your orders w...,entailment
3,How do you know? All this is their information...,This information belongs to them.,entailment
4,yeah i tell you what though if you go price so...,The tennis shoes have a range of prices.,neutral


In [27]:
dev_matched.head()

Unnamed: 0,text_a,text_b,label
0,The new rights are nice enough,Everyone really likes the newest benefits,neutral
1,This site includes a list of all award winners...,The Government Executive articles housed on th...,contradiction
2,uh i don't know i i have mixed emotions about ...,"I like him for the most part, but would still ...",entailment
3,yeah i i think my favorite restaurant is alway...,My favorite restaurants are always at least a ...,contradiction
4,i don't know um do you do a lot of camping,I know exactly.,contradiction


In [25]:
dev_mismatched.head()

Unnamed: 0,text_a,text_b,label
0,Your contribution helped make it possible for ...,Your contributions were of no help with our st...,contradiction
1,"The answer has nothing to do with their cause,...",Dictionaries are indeed exercises in bi-unique...,contradiction
2,We serve a classic Tuscan meal that includes ...,We serve a meal of Florentine terrine.,entailment
3,"A few months ago, Carl Newton and I wrote a le...",Carl Newton and I have never had any other pre...,contradiction
4,"I was on this earth you know, I've lived on th...",I don't yet know the reason why I have lived o...,entailment


In [3]:
%%time

#nrows = 1000
#train = train.sample(nrows)
#dev_mismatched = dev_mismatched.sample(nrows)
#dev_matched = dev_matched.sample(nrows)

X_train = train[['text_a','text_b']]
y_train = train['label']

# define model
model = BertClassifier()
model.epochs = 4
model.learning_rate = 3e-5
model.max_seq_length = 128
model.validation_fraction = 0.05

print('\n',model,'\n')

# fit model
model.fit(X_train, y_train)

# score model on dev_matched
test = dev_matched
X_test = test[['text_a','text_b']]
y_test = test['label']
m_accy=model.score(X_test, y_test)

# score model on dev_mismatched
test = dev_mismatched
X_test = test[['text_a','text_b']]
y_test = test['label']
mm_accy=model.score(X_test, y_test)

print("Matched/mismatched accuracy: %0.2f/%0.2f %%"%(m_accy,mm_accy))

Building sklearn classifier...

 BertClassifier(bert_model='bert-base-uncased', epochs=4, eval_batch_size=8,
        fp16=False, gradient_accumulation_steps=1, label_list=None,
        learning_rate=3e-05, local_rank=-1, logfile='bert.log',
        loss_scale=0, max_seq_length=128, num_mlp_hiddens=500,
        num_mlp_layers=0, random_state=42, restore_file=None,
        train_batch_size=32, use_cuda=True, validation_fraction=0.05,
        warmup_proportion=0.1) 

Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
train data size: 373067, validation data size: 19635


Training: 100%|██████████| 11659/11659 [2:29:47<00:00,  1.43it/s, loss=0.622] 
                                                               

Epoch 1, Train loss : 0.6216, Val loss: 0.4593, Val accy = 82.55%


Training: 100%|██████████| 11659/11659 [2:29:51<00:00,  1.43it/s, loss=0.37]  
                                                               

Epoch 2, Train loss : 0.3698, Val loss: 0.4371, Val accy = 83.68%


Training: 100%|██████████| 11659/11659 [2:30:03<00:00,  1.43it/s, loss=0.247] 
                                                               

Epoch 3, Train loss : 0.2473, Val loss: 0.4834, Val accy = 84.02%


Training: 100%|██████████| 11659/11659 [2:29:52<00:00,  1.43it/s, loss=0.191] 
                                                               

Epoch 4, Train loss : 0.1914, Val loss: 0.5338, Val accy = 83.95%


                                                            

Matched/mismatched accuracy: 83.73/83.93 %
CPU times: user 8h 49min 19s, sys: 4h 32min 35s, total: 13h 21min 55s
Wall time: 10h 26min 32s




## with MLP...

In [3]:
%%time

#nrows = 1000
#train = train.sample(nrows)
#dev_mismatched = dev_mismatched.sample(nrows)
#dev_matched = dev_matched.sample(nrows)

X_train = train[['text_a','text_b']]
y_train = train['label']

# define model
model = BertClassifier()
model.epochs = 4
model.learning_rate = 3e-5
model.max_seq_length = 128
model.validation_fraction = 0.05
model.num_mlp_layers = 4

print('\n',model,'\n')

# fit model
model.fit(X_train, y_train)

# score model on dev_matched
test = dev_matched
X_test = test[['text_a','text_b']]
y_test = test['label']
m_accy=model.score(X_test, y_test)

# score model on dev_mismatched
test = dev_mismatched
X_test = test[['text_a','text_b']]
y_test = test['label']
mm_accy=model.score(X_test, y_test)

print("Matched/mismatched accuracy: %0.2f/%0.2f %%"%(m_accy,mm_accy))


Building sklearn classifier...

 BertClassifier(bert_model='bert-base-uncased', epochs=4, eval_batch_size=8,
        fp16=False, gradient_accumulation_steps=1, label_list=None,
        learning_rate=3e-05, local_rank=-1, logfile='bert.log',
        loss_scale=0, max_seq_length=128, num_mlp_hiddens=500,
        num_mlp_layers=4, random_state=42, restore_file=None,
        train_batch_size=32, use_cuda=True, validation_fraction=0.05,
        warmup_proportion=0.1) 

Loading bert-base-uncased model...
Using mlp with D=768,H=500,K=3,n=4
train data size: 373067, validation data size: 19635


Training: 100%|██████████| 11658/11658 [2:31:08<00:00,  1.29it/s, loss=0.657] 
                                                               

Epoch 1, Train loss : 0.6571, Val loss: 0.4450, Val accy = 82.78%


Training: 100%|██████████| 11658/11658 [2:31:31<00:00,  1.29it/s, loss=0.384] 
                                                               

Epoch 2, Train loss : 0.3836, Val loss: 0.4183, Val accy = 84.00%


Training: 100%|██████████| 11658/11658 [2:31:39<00:00,  1.27it/s, loss=0.264] 
                                                               

Epoch 3, Train loss : 0.2635, Val loss: 0.4343, Val accy = 84.07%


Training: 100%|██████████| 11658/11658 [2:32:20<00:00,  1.29it/s, loss=0.209] 
                                                               

Epoch 4, Train loss : 0.2090, Val loss: 0.4585, Val accy = 84.20%


                                                            

Matched/mismatched accuracy: 84.51/84.49 %
CPU times: user 8h 57min 53s, sys: 4h 31min 35s, total: 13h 29min 28s
Wall time: 10h 34min 3s


