# Fudan PRML22 Spring Final Project

*Your name and Student ID: [Name], [Student ID]*

*Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet, and a .pdf report file) with your assignment submission.*

**Congratulations, you have come to the last challenge!**

Having finished the past two assignments, we think all you gugs already have a solid foundation in the field of machine learning and deep learning. And now you are qualified to apply machine learning algorithms to the real-world tasks you are interested in, or start your machine learning research. 

**In this final project, you are free to choose a topic you are passionate about. The project can be an application one, a theoretical one or implementing your own amazing machine learning/deep learning framework like a toy pytorch. If you don't have any idea, we will also provide you with a default one you can play with.** 

**! Notice: If you want to work on your own idea, you have to email the TA (lip21[at]m.fudan.edu.cn) to give a simple project proposal first before May 22, 2022.** 

## Default Project: Natural Language Inference

![Sherlock](./img/inference.jpg)

The default final project this semester is a NLP task called "Natural Language Inference". Though deep neural networks have demonstrated astonishing performance in many tasks like text classification and generation, you might somehow think they are just "advanced statistics" but far from *intelligent* machines. One intelligent machine must be able to reason, you may think. And in this default final project, your aim is to design a machine which can conduct inference. The machine can know that "A man inspects the uniform of a figure in some East Asian country" is contradictory to "The man is sleeping", and "a soccer game with multiple males playing." entails "some men are playing a sport".

The dataset we use this time is the Original Chinese Natural Language Inference (OCNLI) dataset[1]. It is a chinese NLI dataset with about 50k training data and 3k development data. The sentence pairs in the dataset are labeled as "entailment", "neutral" and "contradiction". Due to they release the test data without its labels, we select 5k data pairs from the training data as labeled test data, and the other 45k data as your t. You can visit the [GitHub link](https://github.com/CLUEbenchmark/OCNLI) for more information.

After you finished the NLI task with the full 50k training set, you have to complete an advanced challenge. You have to select **at most 5k data** from the training set as labeled training set, leaving the other training data as unlabeled training set, then use these labeled and unlabeled data to finish the same NLI task. You can randomly choosing the 5k training data but can also think up some ideas to select more **important data** as labeled training data. Like assignment1, you may have to think how to use the unlabeled training data.

You can use the deep learning frameworks like paddle, pytorch, tensorflow in your experiment but not more high-level libraries like Huggingface. Please write down the version of them in the './requirements.txt' file.

**! Notice: You CAN NOT use any other people's pretrained model like 'bert-base-chinese' in this default project. You are encouraged to design your own model and algorithm, no matter it looks naive or not.**

NLI is a traditional but promising NLP task, and you can search the Google/Bing for more information. Some key words can be "natural language inference with attention", "training data selection", "semi-surpervised learning", "unsupervised representation learning" and so on.

## 1. Setup

import the libraries and load the dataset here.

In [4]:
# setup code
import json

%load_ext autoreload
%autoreload 2
%matplotlib inline

from __future__ import unicode_literals, print_function, division
from io import open
import unicodedata
import string
import re
import random

import torch
import torch.nn as nn
from torch import optim
import torch.nn.functional as F
import jieba
import time
import math

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.cuda.is_available()

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


True

In [5]:
dataset_path = './dataset'

train_data_file = dataset_path + '/train.json'
dev_data_file = dataset_path + '/dev.json'

In [6]:
def read_ocnli_file(data_file):
    # read the ocnli file. feel free to change it. 
    print ("loading data from ", data_file)
    
    text_outputs = []
    label_outputs = []
    
    label_to_idx = {"entailment": 0, "neutral": 1, "contradiction": 2}
    
    with open(data_file, 'r', encoding="utf-8") as f:
        line = f.readline()
        while line:
            line = json.loads(line.strip())
            text_a, text_b, label = line['sentence1'], line['sentence2'],line['label']
            label_id = label_to_idx[label.strip()]
            
            text_outputs.append((text_a,text_b))
            label_outputs.append(label_id)

            line = f.readline()
                
    print ("there are ", len(label_outputs), "sentence pairs in this file.")
    return text_outputs, label_outputs


training_data, training_labels = read_ocnli_file(train_data_file)
dev_data, dev_labels = read_ocnli_file(dev_data_file)

stop_words=[]

with open("./saved_weights/stop_words.txt",'r',encoding="utf-8") as f:
    for line in f.readlines():
        line = line.strip('\n')
        stop_words.append(str(line))

loading data from  ./dataset/train.json
there are  45437 sentence pairs in this file.
loading data from  ./dataset/dev.json
there are  2950 sentence pairs in this file.


In [7]:
print ("training data samples: ", training_data[:5])
print ("training labels samples: ", training_labels[:5])

training data samples:  [('对,对,对,对,对,具体的答复.', '要的是抽象的答复'), ('当前国际形势仍处于复杂而深刻的变动之中', '一个月后将发生世界战争'), ('在全县率先推行宅基地有偿使用,全乡20年无须再扩大宅基地', '宅基地有偿使用获得较好成果,将在更大范围实施。'), ('上海马路上的喧声也是老调子', '上海有很多条马路'), ('那你看看第二封信什么时候到吧.', '第一封信已经收到了。')]
training labels samples:  [2, 1, 1, 1, 1]


## 2. Exploratory Data Analysis (5 points)

Your may have to explore the dataset and do some analysis first.

We'll need a unique index per word to use as the inputs and targets of
the networks later. To keep track of all this we will use a helper class
called ``Lang`` which has word → index (``word2index``) and index → word
(``index2word``) dictionaries, as well as a count of each word
``word2count`` which will be used to replace rare words later.

In [8]:
SOS_token = 0
EOS_token = 1


class Lang:
    def __init__(self, name):
        self.name = name
        self.word2index = {}
        self.word2count = {}
        self.index2word = {0: "SOS", 1: "EOS"}
        self.n_words = 2  # Count SOS and EOS

    def addSentence(self, sentence):
        for word in sentence:
            self.addWord(word)

    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.n_words
            self.word2count[word] = 1
            self.index2word[self.n_words] = word
            self.n_words += 1
        else:
            self.word2count[word] += 1

In [9]:
training_premise = []
training_hypothesis = []
test_premise = []
test_hypothesis = []

def is_chinese(uchar):
    if uchar >= u'\u4e00' and uchar <= u'\u9fa5':
        return True
    else:
        return False

def format_str(content):
    content_str = ''
    for i in content:
        if is_chinese(i):
            content_str = content_str + i
    return content_str

for pair in training_data:
    training_premise.append(format_str(pair[0]))
    training_hypothesis.append(format_str(pair[1]))

for pair in dev_data:
    test_premise.append(format_str(pair[0]))
    test_hypothesis.append(format_str(pair[1]))

print(training_premise[:5])
print(len(training_premise))



['对对对对对具体的答复', '当前国际形势仍处于复杂而深刻的变动之中', '在全县率先推行宅基地有偿使用全乡年无须再扩大宅基地', '上海马路上的喧声也是老调子', '那你看看第二封信什么时候到吧']
45437


In [10]:
# randnum = random.randint(0,100)
randnum = 37
print(randnum)
random.seed(randnum)
random.shuffle(training_premise)
random.seed(randnum)
random.shuffle(training_hypothesis)
random.seed(randnum)
random.shuffle(training_labels)

random.seed(randnum)
random.shuffle(test_premise)
random.seed(randnum)
random.shuffle(test_hypothesis)
random.seed(randnum)
random.shuffle(dev_labels)

37


split words

In [11]:
def split_words(datas):
    cut_words = map(lambda s: list(jieba.cut(s)), datas)
    return list(cut_words)

training_premise = split_words(training_premise)
training_hypothesis = split_words(training_hypothesis)
test_premise = split_words(test_premise)
test_hypothesis = split_words(test_hypothesis)

print(training_premise[:5])

def drop_stopwords(contents, stopwords):
    contents_clean = []
    for line in contents:
        line_clean = []
        for word in line:
            if word in stopwords:
                continue
            line_clean.append(word)
        contents_clean.append(line_clean)
    return contents_clean

training_premise = drop_stopwords(training_premise,stop_words)
training_hypothesis = drop_stopwords(training_hypothesis,stop_words)

test_premise = drop_stopwords(test_premise,stop_words)
test_hypothesis = drop_stopwords(test_hypothesis,stop_words)

print(training_premise[:5])
print(training_hypothesis[:5])


Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\use\AppData\Local\Temp\jieba.cache
Loading model cost 0.984 seconds.
Prefix dict has been built successfully.


[['第二', '保持', '经济运行', '处在', '合理', '区间'], ['王琦瑶', '走', '回', '房间', '将', '泡', '好', '的', '茶', '往', '桌上', '一放见', '他', '还', '沉着脸', '就', '说', '不要', '无事生非', '好好', '的', '事情', '倒', '弄', '得', '不好', '了'], ['说真的', '当', '亏了', '我', '回来', '呀'], ['他', '对', '自己', '说', '我', '应该', '怎么办'], ['科学技术', '的', '发展', '给', '商品经济', '注入', '了', '强大', '的', '活力']]
[['第二', '保持', '经济运行', '处在', '合理', '区间'], ['王琦瑶', '走', '回', '房间', '将', '泡', '好', '茶', '往', '桌上', '一放见', '他', '还', '沉着脸', '就', '说', '不要', '无事生非', '好好', '事情', '倒', '弄', '得', '不好'], ['说真的', '当', '亏了', '我', '回来', '呀'], ['他', '对', '自己', '说', '我', '应该', '怎么办'], ['科学技术', '发展', '给', '商品经济', '注入', '强大', '活力']]
[['经济运行', '要', '处在', '合理', '区间'], ['王琦瑶', '是', '个', '双脚', '残废', '残疾人'], ['我', '是', '坐', '公交车', '回来'], ['他', '在', '问', '别人', '他', '该', '怎么办'], ['使', '商品经济', '繁荣', '因素', '有', '多种']]


In [12]:
Max = 0
Sum = 0
count = 0
for sentence in training_premise:
    count += 1
    Max = max(Max,len(sentence))
    Sum += len(sentence)
print("Max",Max)
print("Avg",Sum/count)
print(len(training_premise))

Max 29
Avg 11.471025815964962
45437


filter the train_set

In [13]:
del_list_train = []
del_list_test = []


MAX_LENGTH = 15
for i in range(len(training_premise)):
    if len(training_premise[i]) > MAX_LENGTH or len(training_hypothesis[i]) > MAX_LENGTH :
        del_list_train.append(i)

for idx in sorted(del_list_train, reverse = True):
    del training_premise[idx]
    del training_hypothesis[idx]
    del training_labels[idx]

for i in range(len(test_premise)):
    if len(test_premise[i]) > MAX_LENGTH or len(test_hypothesis[i]) > MAX_LENGTH :
        del_list_test.append(i)

for idx in sorted(del_list_test, reverse = True):
    del test_premise[idx]
    del test_hypothesis[idx]
    del dev_labels[idx]

In [14]:
Max = 0
Sum = 0
count = 0
for sentence in training_premise:
    count += 1
    Max = max(Max,len(sentence))
    Sum += len(sentence)
print("Max",Max)
print("Avg",Sum/count)
print(len(training_premise))

Max 15
Avg 8.94646326962675
34481


In [15]:
Chinese = Lang("CN")
for sentence in training_premise:
    Chinese.addSentence(sentence)
for sentence in training_hypothesis:
    Chinese.addSentence(sentence)
for sentence in test_premise:
    Chinese.addSentence(sentence)
for sentence in test_hypothesis:
    Chinese.addSentence(sentence)

print(Chinese.name, Chinese.n_words)

premise_vec = []
hypothesis_vec = []

test_premise_vec = []
test_hypothesis_vec = []

for sentence in training_premise:
    word_list = []
    for word in sentence:
        word_list.append(Chinese.word2index[word])
    premise_vec.append(word_list)

for sentence in training_hypothesis:
    word_list = []
    for word in sentence:
        word_list.append(Chinese.word2index[word])
    hypothesis_vec.append(word_list)

for sentence in test_premise:
    word_list = []
    for word in sentence:
        word_list.append(Chinese.word2index[word])
    test_premise_vec.append(word_list)

for sentence in test_hypothesis:
    word_list = []
    for word in sentence:
        word_list.append(Chinese.word2index[word])
    test_hypothesis_vec.append(word_list)

CN 23572


In [16]:
label_distribution = {}
label_distribution[0] = 0
label_distribution[1] = 0
label_distribution[2] = 0

for label in training_labels:
    label_distribution[label]+=1
    
print(label_distribution)

{0: 11437, 1: 11841, 2: 11203}


## 3. Build Your Model with Fully 45k Training Data (60 points)

The encoder of this network is a RNN 

In [17]:
class Bi_Lstm(nn.Module):
    def __init__(self):
        super(Bi_Lstm,self).__init__() 
        self.embeding = nn.Embedding(Chinese.n_words+1,100)
        self.lstm = nn.LSTM(input_size = 100, hidden_size = 128,num_layers = 1,bidirectional = True,batch_first=True,dropout=0.2)
        self.l1 = nn.BatchNorm1d(128)
        self.l2 = nn.ReLU()
        self.l3 = nn.Linear(512,3)
        self.l4 = nn.Dropout(0.2)
        self.l5 = nn.BatchNorm1d(3)
    def attention_net(self, lstm_output, final_state):
        batch_size = len(lstm_output)
        hidden = final_state.view(batch_size, -1, 1)   # hidden : [batch_size, n_hidden * num_directions(=2), n_layer(=1)]
        attn_weights = torch.bmm(lstm_output, hidden).squeeze(2) # attn_weights : [batch_size, n_step]
        soft_attn_weights = F.softmax(attn_weights, 1)
        # context : [batch_size, n_hidden * num_directions(=2)]
        context = torch.bmm(lstm_output.transpose(1, 2), soft_attn_weights.unsqueeze(2)).squeeze(2)
        return context, soft_attn_weights 
    def forward(self, x,y):
        x = self.embeding(x)
        y = self .embeding(y)
        #out,_ = self.lstm(x)
        x, (final_hidden_state, final_cell_state) = self.lstm(x)
        y, (final_hidden_state, final_cell_state) = self.lstm(y)
        #output = out.transpose(0, 1) # output : [batch_size, seq_len, n_hidden]
        x_attn_output, x_attention = self.attention_net(x, final_hidden_state)
        y_attn_output, y_attention = self.attention_net(y, final_hidden_state)
        #return self.out(attn_output), attention # model : [batch_size, num_classes]
        z = torch.cat((x_attn_output,y_attn_output),1)
        out = self.l3(z)
        out = self.l4(out)
        out = self.l5(out)
        return out,x_attention,y_attention




train the model

In [18]:
def train(premise_tensor,hypothesis_tensor, bilstm, optimizer, criterion,result, max_length=MAX_LENGTH):

    optimizer.zero_grad()


    loss = 0

    output,x,y=bilstm(premise_tensor,hypothesis_tensor)


    loss+=criterion(output,result) 

    loss.backward()

    optimizer.step()

    return loss.item() 

In [19]:
data_length = len(premise_vec)

def trainIters(bilstm,premise,hypothesis,labels,epoch_num=10, batch_size=64):

    n_iters = len(premise)

    criterion = nn.CrossEntropyLoss()

    result = torch.zeros(len(labels),device=device).long()
    for i in range(len(labels)):
        result[i]=labels[i]

    batch_premise = torch.zeros(batch_size,MAX_LENGTH,device=device).long()
    batch_hypothesis = torch.zeros(batch_size,MAX_LENGTH,device=device).long()


    sum = 0
    count = 0
    for i in range(epoch_num):

        index = 0+i

        while index + batch_size < n_iters:

            for batch in range(batch_size):
                input_tensor = premise[index+batch]
                target_tensor = hypothesis[index+batch]
                # print(input_tensor)
                for iter in range(len(input_tensor)):
                    batch_premise[batch][iter] = input_tensor[iter]
                for iter in range(len(target_tensor)):
                    batch_hypothesis[batch][iter] = target_tensor[iter]


            loss = train(batch_premise,batch_hypothesis, bilstm,optimizer, criterion,result[index:index+batch_size])
            sum += loss
            count += 1
            progress = ((i*n_iters+index)/(epoch_num*n_iters))*100
            print("progress : ",round(progress,2),"%"," loss = ",loss," / ",sum/count)

            index += batch_size


    # showPlot(plot_losses)

Build and evaluate your model here.

In [268]:

bilstm = Bi_Lstm().to(device)
bilstm.train()



optimizer = torch.optim.Adam(bilstm.parameters(),lr=0.01)

trainIters(bilstm, premise_vec,hypothesis_vec, training_labels, epoch_num=50,batch_size=32)
torch.save(bilstm,"model_epoch_num_50.pkl")

bilstm.eval()

progress :  0.0 %  loss =  1.1812331676483154  /  1.1812331676483154
progress :  0.0 %  loss =  1.094835638999939  /  1.1380344033241272
progress :  0.0 %  loss =  1.6100895404815674  /  1.2953861157099407
progress :  0.01 %  loss =  1.346103310585022  /  1.308065414428711
progress :  0.01 %  loss =  1.465364694595337  /  1.3395252704620362
progress :  0.01 %  loss =  1.6709281206130981  /  1.3947590788205464
progress :  0.01 %  loss =  1.259867787361145  /  1.3754888943263464
progress :  0.01 %  loss =  1.277624487876892  /  1.3632558435201645
progress :  0.01 %  loss =  1.1749037504196167  /  1.3423278331756592
progress :  0.02 %  loss =  1.5047588348388672  /  1.35857093334198
progress :  0.02 %  loss =  1.362913966178894  /  1.3589657545089722
progress :  0.02 %  loss =  1.5335495471954346  /  1.3735144038995106
progress :  0.02 %  loss =  1.0878233909606934  /  1.3515381721349864
progress :  0.02 %  loss =  1.416717290878296  /  1.356193823473794
progress :  0.03 %  loss =  1.3070

Bi_Lstm(
  (embeding): Embedding(23573, 100)
  (lstm): LSTM(100, 128, batch_first=True, dropout=0.2, bidirectional=True)
  (l1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (l2): ReLU()
  (l3): Linear(in_features=512, out_features=3, bias=True)
  (l4): Dropout(p=0.2, inplace=False)
  (l5): BatchNorm1d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)

In [20]:
def accuracy(result,label):
    sum = 0

    for i in range(len(result)):
        if result[i] == label[i]:
            sum += 1
        
    return sum/len(result)

def predict(model,premise,hypothesis):
    batch_size = 32
    batch_premise= torch.zeros(batch_size,MAX_LENGTH,device=device).long()
    batch_hypothesis = torch.zeros(batch_size,MAX_LENGTH,device=device).long()
    index = 0

    result = []
    confidence = []

    while index + batch_size < len(premise):

        for batch in range(batch_size):
            input_tensor = premise[index+batch]
            target_tensor = hypothesis[index+batch]
            # print(input_tensor)
            for iter in range(len(input_tensor)):
                batch_premise[batch][iter] = input_tensor[iter]
            for iter in range(len(target_tensor)):
                batch_hypothesis[batch][iter] = target_tensor[iter]

        output,x,y= model(batch_premise,batch_hypothesis)
        for j in range(batch_size):
            result.append(int(output[j].argmax()))
            soft = F.softmax(output[j])
            confidence.append(float(max(soft)))

        index += batch_size
    return result,confidence



In [28]:
train_result , train_confidence = predict(bilstm,premise_vec,hypothesis_vec)
test_result , test_confidence = predict(bilstm,test_premise_vec,test_hypothesis_vec)

print(accuracy(train_result,training_labels))
print(accuracy(test_result,dev_labels))

  soft = F.softmax(output[j])


0.6865134633240483
0.46473214285714287


In [21]:
model = torch.load("./saved_weights/model_epoch_num_50.pkl")

In [23]:
train_result , train_confidence = predict(model,premise_vec,hypothesis_vec)

  soft = F.softmax(output[j])


## 4. Important Data Selection (20 points)

In [40]:
count = 0
for c in train_confidence:
    if c < 0.6:
        count += 1
print(count)

count = 0
for c in train_confidence:
    if c < 0.5:
        count += 1
print(count)

count = 0
for c in train_confidence:
    if c < 0.47:
        count += 1
print(count)

select_index = []
for i in range(len(train_confidence)):
    if train_confidence[i] < 0.47:
        select_index.append(i)

14624
7649
5356


## 5. Build Your Model with 5k Training Data (10 points)

In [41]:
labeled_premise = []
labeled_hypothesis = []
right_labels = []

unlabeled_premise = []
unlabeled_hypothesis = []

for i in range(len(premise_vec)):
    if i in select_index:
        labeled_premise.append(premise_vec[i])
        labeled_hypothesis.append(hypothesis_vec[i])
        right_labels.append(training_labels[i])
    else:
        unlabeled_premise.append(premise_vec[i])
        unlabeled_hypothesis.append(hypothesis_vec[i])

In [47]:
bilstm_5k = Bi_Lstm().to(device)
bilstm_5k.train()

optimizer = torch.optim.Adam(bilstm_5k.parameters(),lr=0.01)

trainIters(bilstm_5k, labeled_premise,labeled_hypothesis,right_labels, epoch_num=20,batch_size=32)
torch.save(bilstm_5k,"./saved_weights/model_5k.pkl")

bilstm_5k.eval()

progress :  0.0 %  loss =  1.3577322959899902  /  1.3577322959899902
progress :  0.03 %  loss =  1.2485884428024292  /  1.3031603693962097
progress :  0.06 %  loss =  1.1198292970657349  /  1.242050011952718
progress :  0.09 %  loss =  1.301358699798584  /  1.2568771839141846
progress :  0.12 %  loss =  1.1583178043365479  /  1.2371653079986573
progress :  0.15 %  loss =  1.1685242652893066  /  1.2257251342137654
progress :  0.18 %  loss =  1.1580581665039062  /  1.2160584245409285
progress :  0.21 %  loss =  1.1567741632461548  /  1.2086478918790817
progress :  0.24 %  loss =  1.120956540107727  /  1.1989044083489313
progress :  0.27 %  loss =  1.1396632194519043  /  1.1929802894592285
progress :  0.3 %  loss =  1.1687794923782349  /  1.19078021699732
progress :  0.33 %  loss =  1.0526316165924072  /  1.179267833630244
progress :  0.36 %  loss =  1.265739917755127  /  1.185919532409081
progress :  0.39 %  loss =  1.1515696048736572  /  1.1834659661565508
progress :  0.42 %  loss =  1.

Bi_Lstm(
  (embeding): Embedding(23573, 100)
  (lstm): LSTM(100, 128, batch_first=True, dropout=0.2, bidirectional=True)
  (l1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (l2): ReLU()
  (l3): Linear(in_features=512, out_features=3, bias=True)
  (l4): Dropout(p=0.2, inplace=False)
  (l5): BatchNorm1d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)

In [58]:
test_result , a = predict(bilstm_5k,test_premise_vec,test_hypothesis_vec)

print(accuracy(test_result,dev_labels))

  soft = F.softmax(output[j])


0.35714285714285715


In [49]:
unlabeled_result , unlabeled_confidence= predict(bilstm_5k,unlabeled_premise,unlabeled_hypothesis)

confidence_index = []
thresh_hold = 0.9
for i in range(len(unlabeled_result)):
    if unlabeled_confidence[i]>thresh_hold :
        confidence_index.append(i)

  soft = F.softmax(output[j])


In [50]:
print(len(unlabeled_confidence))

print(len(confidence_index))

29120
11958


In [51]:
pseudo_labeled_premise = labeled_premise[:]
pseudo_labeled_hypothesis = labeled_hypothesis[:]
pseudo_labels = right_labels[:]

for index in confidence_index:
    pseudo_labeled_premise.append(unlabeled_premise[index])
    pseudo_labeled_hypothesis.append(unlabeled_hypothesis[index])
    pseudo_labels.append(unlabeled_result[index])

In [54]:
bilstm_final = Bi_Lstm().to(device)
bilstm_final.train()

optimizer = torch.optim.Adam(bilstm_final.parameters(),lr=0.01)

trainIters(bilstm_final, pseudo_labeled_premise,pseudo_labeled_hypothesis,pseudo_labels, epoch_num=5,batch_size=32)
trainIters(bilstm_final, labeled_premise,labeled_hypothesis,right_labels, epoch_num=5,batch_size=32)
torch.save(bilstm_final,"model_final.pkl")

bilstm_final.eval()



progress :  0.0 %  loss =  1.5322487354278564  /  1.5322487354278564
progress :  0.04 %  loss =  1.1307035684585571  /  1.3314761519432068
progress :  0.07 %  loss =  1.407393217086792  /  1.3567818403244019
progress :  0.11 %  loss =  1.145969271659851  /  1.3040786981582642
progress :  0.15 %  loss =  1.3111392259597778  /  1.3054908037185669
progress :  0.18 %  loss =  1.2204409837722778  /  1.2913158337275188
progress :  0.22 %  loss =  1.2919636964797974  /  1.2914083855492728
progress :  0.26 %  loss =  1.1859909296035767  /  1.2782312035560608
progress :  0.3 %  loss =  1.1203927993774414  /  1.2606936030917697
progress :  0.33 %  loss =  1.3896276950836182  /  1.2735870122909545
progress :  0.37 %  loss =  1.1086430549621582  /  1.2585921070792458
progress :  0.41 %  loss =  1.1694772243499756  /  1.2511658668518066
progress :  0.44 %  loss =  1.1733882427215576  /  1.2451829726879413
progress :  0.48 %  loss =  1.1423943042755127  /  1.2378409249441964
progress :  0.52 %  loss

Bi_Lstm(
  (embeding): Embedding(23573, 100)
  (lstm): LSTM(100, 128, batch_first=True, dropout=0.2, bidirectional=True)
  (l1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (l2): ReLU()
  (l3): Linear(in_features=512, out_features=3, bias=True)
  (l4): Dropout(p=0.2, inplace=False)
  (l5): BatchNorm1d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)

In [59]:

test_result , a= predict(bilstm_final,test_premise_vec,test_hypothesis_vec)

print(accuracy(test_result,dev_labels))


  soft = F.softmax(output[j])


0.4294642857142857


## 6. Conclusion (5 points)

To sum up, using method of simplified pseudo labels has the risk of overfitting. Though the loss declined quickly when training, and the accuracy on training set is high enough, the accuracy on test set is still unqualified. But this method actually improve the model's preformace (a little), and it may perform much better if I can try more hyperparameters or reduce the epoch_num.

When selecting more important data, I choose those with low confidence for they carry more unknown information to the trained model, i.e. these data are hard to learn from.

And as we can see the origin model performs better, though it's accuracy on training set is under 70%. However, I haven't got a good GPU to have more trials, and there's no enough time. Maybe you can train it on your device to get a much better performance.

## Reference

[1] OCNLI: Original Chinese Natural Language Inference, arxiv: https://arxiv.org/abs/2010.05444 