## Runtime Environment

* python >= 3.7
* pytorch >= 1.0
* pandas
* nltk
* numpy
* sklearn
* pickle
* tqdm
* json

## Set Random seed

In [None]:
import torch
import ipywidgets as widgets
torch.manual_seed(17)

# Take a  view of dataset

In [None]:
import pandas as pd
import numpy as np
import pickle

dataset = pd.read_csv('../data/task2_trainset.csv', dtype=str)
dataset.head()

**Id**: 流水號  
**Title**: 論文標題  
**Abstract**: 論文摘要內容, 句子間以 **$$$** 分隔  
**Authors**: 論文作者  
**Categories**: 論文類別  
**Created date**: 論文上傳日期  
**Task 2**: 論文分類類別, 若句子有多個類別,以 **空格** 分隔 

# Data processing

## 刪除多於資訊 (Remove redundant information)  
我們在資料集中保留了許多額外資訊供大家使用，但是在這次的教學中我們並沒有用到全部資訊，因此先將多餘的部分先抽走。  
In dataset, we reserved lots of information. But in this tutorial, we don't need them, so we need to discard them.

In [None]:
dataset.drop('Categories',axis=1,inplace=True)
dataset.drop('Created Date',axis=1, inplace=True)
dataset.drop('Authors',axis=1,inplace=True)
dataset.drop('CiteEmbed',axis=1,inplace=True)

In [None]:
dataset.head()

## 資料切割  (Partition)
在訓練時，我們需要有個方法去檢驗訓練結果的好壞，因此需要將訓練資料切成training/validataion set。   
While training, we need some method to exam our model's performance, so we divide our training data into training/validataion set.

In [None]:
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split

train_all = dataset
trainset, validset = train_test_split(dataset, test_size=0.1, random_state=17)

print (len(train_all))
print (len(trainset))
print (len(validset))

train_all.to_csv('trainallset.csv', index=False)
trainset.to_csv('trainset.csv', index=False)
validset.to_csv('validset.csv', index=False)

### For test data

In [None]:
dataset = pd.read_csv('../data/task2_public_testset.csv', dtype=str)
#dataset.drop('Title',axis=1,inplace=True)
dataset.drop('Categories',axis=1,inplace=True)
dataset.drop('Created Date',axis=1, inplace=True)
dataset.drop('Authors',axis=1,inplace=True)
dataset.drop('CiteEmbed',axis=1,inplace=True)
dataset.to_csv('testset.csv',index=False)
dataset.head()

### 資料格式化 (Data formatting)  
有了字典後，接下來我們要把資料整理成一筆一筆，把input的句子轉成數字，把答案轉成onehot的形式。  
這裡，我們一樣使用`multiprocessing`來加入進行。  
After building dictionary, that's mapping our sentences into number array, and convert answers to onehot format.  

In [None]:
#For BERT
from tqdm import tqdm_notebook as tqdm
from multiprocessing import Pool

def label_to_onehot(labels):
    """ Convert label to onehot .
        Args:
            labels (string): sentence's labels.
        Return:
            outputs (onehot list): sentence's onehot label.
    """
    label_dict = {'THEORETICAL': 0, 'ENGINEERING':1, 'EMPIRICAL':2, 'OTHERS':3}
    onehot = [0,0,0,0]
    for l in labels.split():
        onehot[label_dict[l]] = 1
    return onehot
    
def get_dataset(data_path, n_workers=4):
    """ Load data and return dataset for training and validating.

    Args:
        data_path (str): Path to the data.
    """
    dataset = pd.read_csv(data_path, dtype=str)

    results = [None] * n_workers
    with Pool(processes=n_workers) as pool:
        for i in range(n_workers):
            batch_start = (len(dataset) // n_workers) * i
            if i == n_workers - 1:
                batch_end = len(dataset)
            else:
                batch_end = (len(dataset) // n_workers) * (i + 1)
            
            batch = dataset[batch_start: batch_end]
            results[i] = pool.apply_async(preprocess_samples, args=(batch,))

        pool.close()
        pool.join()

    processed = []
    for result in results:
        processed += result.get()
    return processed

def preprocess_samples(dataset):
    """ Worker function.

    Args:
        dataset (list of dict)
    Returns:
        list of processed dict.
    """
    processed = []
    for sample in tqdm(dataset.iterrows(), total=len(dataset)):
        processed.append(preprocess_sample(sample[1]))

    return processed

def preprocess_sample(data):
    """
    Args:
        data (dict)
    Returns:
        dict
    """
    processed = {}
    processed['Abstract'] = [data['Title'] + "."] + [sent for sent in data['Abstract'].split('$$$')]
    #processed['Abstract'] = [sent for sent in data['Abstract'].split('$$$')]
    #print (processed['Abstract'])
    if 'Task 2' in data:
        processed['Label'] = label_to_onehot(data['Task 2'])
        
    return processed

In [None]:
print('[INFO] Start processing trainallset...')
trainall = get_dataset('trainallset.csv', n_workers=4)
print('[INFO] Start processing trainset...')
#train = get_dataset('trainset.csv', embedder, n_workers=4)
train = get_dataset('trainset.csv', n_workers=4)
print('[INFO] Start processing validset...')
#valid = get_dataset('validset.csv', embedder, n_workers=4)
valid = get_dataset('validset.csv', n_workers=4)
print('[INFO] Start processing testset...')
#test = get_dataset('testset.csv', embedder, n_workers=4)
test = get_dataset('testset.csv', n_workers=4)

## 資料封裝 (Data packing)

為了更方便的進行batch training，我們將會借助[torch.utils.data.DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)。  
而要將資料放入dataloader，我們需要繼承[torch.utils.data.Dataset](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset)，撰寫適合這份dataset的class。  
`collate_fn`用於batch data的後處理，在`dataloder`將選出的data放進list後會呼叫collate_fn，而我們會在此把sentence padding到同樣的長度，才能夠放入torch tensor (tensor必須為矩陣)。  

To easily training in batch, we'll use `dataloader`, which is a function built in Pytorch[torch.utils.data.DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)  
To use datalaoder, we need to packing our data into class `dataset` [torch.utils.data.Dataset](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset)  
`collate_fn` is used for data processing.

In [None]:
from torch.utils.data import Dataset
import torch
from pytorch_transformers import *
from multiprocessing import Pool
from tqdm import tqdm_notebook as tqdm
from functools import reduce
import numpy as np
import itertools

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print (device)
#bert-large-cased-whole-word-masking-finetuned-squad 1024
#bert-base-uncased 768 
#bert-large-uncased 1024
#./scibert-scivocab-uncased/
tokenizer = BertTokenizer.from_pretrained('../scibert_scivocab_uncased/')
bert_model = BertModel.from_pretrained('../scibert_scivocab_uncased/')
bert_model.to(device)
bert_model.train(False)
bert_model.eval()

class BertDataset(Dataset):
    def __init__(self, data, max_len = 256, n_workers=1):        
        processed_data = []
        for d in tqdm(data, total=len(data)):
            processed_d = {'Abstract':[], 'Label':[]}
            token_sent = None
            for idx, sentence in enumerate(d['Abstract']):
                if idx==0:
                    token_sent = tokenizer.convert_tokens_to_ids(['[CLS]'] + tokenizer.tokenize(sentence) + ['[SEP]'])
                else: 
                    token_sent += tokenizer.convert_tokens_to_ids(tokenizer.tokenize(sentence) + ['[SEP]'])
            if (len(token_sent) > 511):
                token_sent = token_sent[:511] + tokenizer.convert_tokens_to_ids(['[SEP]'])
            #print (len(token_sent))
            encode_sentence_tensor = torch.tensor([token_sent])
            encode_sentence_tensor = encode_sentence_tensor.to(device)
            with torch.no_grad():
                out = bert_model(encode_sentence_tensor)[0]
                #print (out.shape)
                out = out[:,-1,:]
                #print (out.shape)
            #processed_d['Abstract'] = [list(itertools.chain(out.to('cpu').tolist()[0], DocEmbed_32[d['PaperId']], DocEmbed_64[d['PaperId']], DocEmbed_128[d['PaperId']]))]
            #processed_d['Abstract'] = [list(itertools.chain(out.to('cpu').tolist()[0], DocEmbed_64[d['PaperId']]))]
            #print(DocEmbed_title_32[d['PaperId']].tolist())
#             processed_d['Abstract'] = [list(itertools.chain(out.to('cpu').tolist()[0], DocEmbed_title_32[d['PaperId']].tolist(), CateEmbed_64[d['PaperId']].tolist()))]
            processed_d['Abstract'] = [list(itertools.chain(out.to('cpu').tolist()[0]))]
            #print(len(processed_d['Abstract'][0]))
            #processed_d['Abstract'] = [out.to('cpu').tolist()[0]]
            if 'Label' in d:
                processed_d['Label'] = d['Label']
            processed_data.append(processed_d)
        
        self.data = processed_data
        
        self.max_len = max_len
        
    
    def __len__(self):
        return len(self.data) # return data筆數

    def __getitem__(self, index):
        return self.data[index]
     
    
    def collate_fn(self, datas):
        # get max length in this batch
        max_sent = max([len(d['Abstract']) for d in datas])# Get max length of sentence in datas
        batch_abstract = None
        batch_label = []
        for idx, data in enumerate(datas):
            # padding abstract to make them in same length
            #pad_abstract = data['Abstract']
            if idx==0:
                batch_abstract = data['Abstract']
            else: 
                batch_abstract += data['Abstract']
            #print (len(batch_abstract))
            # gather labels
            if 'Label' in data:
                batch_label.append(data['Label'])
        #print (batch_abstract)
        #print (batch_label)
        return torch.FloatTensor(batch_abstract), torch.FloatTensor(batch_label)

In [None]:
# trainallData = BertDataset(trainall)

# trainData = BertDataset(train)

# validData = BertDataset(valid)

testData = BertDataset(test)

# Model

資料處理完成後，接下來就是最重要的核心部分：`Model`。  
此次範例中我們以簡單的一層RNN + 一層Linear layer作為示範。  
而為了解決每次的句子長度不一的問題(`linear layer必須是fixed input size`)，因此我們把所有字的hidden_state做平均，讓這一個vector代表這句話。  

In this tutorial, we're going to implement a simple model, which contain one RNN layer and one fully connected layers (Linear layer). Of course you can make it "deep".  
To solve variant sentence length problem (`input size in linear layer must be fixed`), we can average all hidden_states, and become one vector. (Perfect!)

In [None]:
import torch.nn as nn
import torch.nn.functional as F

class simpleNet(nn.Module):
    def __init__(self, vocabulary_size):
        super(simpleNet, self).__init__()
        self.hidden_dim1 = 512
        self.l0 = nn.Linear((vocabulary_size), (vocabulary_size))
        self.b0 = nn.Parameter(torch.zeros(vocabulary_size))
        self.l1 = nn.Linear((vocabulary_size), self.hidden_dim1)
        self.b1 = nn.Parameter(torch.zeros(self.hidden_dim1))
        self.relu1 = nn.ReLU()
        self.l2 = nn.Linear(self.hidden_dim1, 4)
        self.b2 = nn.Parameter(torch.zeros(4))
        self.dropout = nn.Dropout(0.6)
    
    def forward(self, x):
        b,e = x.shape
        x0 = self.relu1(self.l0(x)+self.b0)
        x0 = self.dropout(x0)
        x = self.relu1(self.l1(x+x0)+self.b1)
        x = self.dropout(x)
        x = torch.sigmoid(self.l2(x)+self.b2)
        return x

In [None]:
class Regularization(torch.nn.Module):
    def __init__(self,model,weight_decay,p=2):
        '''
        :param model 模型
        :param weight_decay:正则化参数
        :param p: 范数计算中的幂指数值，默认求2范数,
                  当p=0为L2正则化,p=1为L1正则化
        '''
        super(Regularization, self).__init__()
        if weight_decay <= 0:
            print("param weight_decay can not <=0")
            exit(0)
        self.model=model
        self.weight_decay=weight_decay
        self.p=p
        self.weight_list=self.get_weight(model)
        self.weight_info(self.weight_list)
 
    def to(self,device):
        '''
        指定运行模式
        :param device: cude or cpu
        :return:
        '''
        self.device=device
        super().to(device)
        return self
 
    def forward(self, model):
        self.weight_list=self.get_weight(model)#获得最新的权重
        reg_loss = self.regularization_loss(self.weight_list, self.weight_decay, p=self.p)
        return reg_loss
 
    def get_weight(self,model):
        '''
        获得模型的权重列表
        :param model:
        :return:
        '''
        weight_list = []
        for name, param in model.named_parameters():
            if 'weight' in name:
                weight = (name, param)
                weight_list.append(weight)
        return weight_list
 
    def regularization_loss(self,weight_list, weight_decay, p=2):
        '''
        计算张量范数
        :param weight_list:
        :param p: 范数计算中的幂指数值，默认求2范数
        :param weight_decay:
        :return:
        '''
        # weight_decay=Variable(torch.FloatTensor([weight_decay]).to(self.device),requires_grad=True)
        # reg_loss=Variable(torch.FloatTensor([0.]).to(self.device),requires_grad=True)
        # weight_decay=torch.FloatTensor([weight_decay]).to(self.device)
        # reg_loss=torch.FloatTensor([0.]).to(self.device)
        reg_loss=0
        for name, w in weight_list:
            l2_reg = torch.norm(w, p=p)
            reg_loss = reg_loss + l2_reg
 
        reg_loss=weight_decay*reg_loss
        return reg_loss
 
    def weight_info(self,weight_list):
        '''
        打印权重列表信息
        :param weight_list:
        :return:
        '''
        print("---------------regularization weight---------------")
        for name ,w in weight_list:
            print(name)
        print("---------------------------------------------------")


# Training

指定使用的運算裝置  
Designate running device.

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print (device)

定義一個算分公式, 讓我們在training能快速了解model的效能  
Define score function, let us easily observe model performance while training.  

In [None]:
import itertools
class F1():
    def __init__(self):
        self.threshold = torch.tensor([0.5,0.5,0.5])
        self.n_precision = 0
        self.n_recall = 0
        self.n_corrects = 0
        self.name = 'F1'
        
    def extend_label(self, x):
        idx = torch.where(torch.sum(x[:,:3],1) == 0)[0]
        y = torch.zeros((x.shape[0],4))
        y[:,:3] = x[:,:3]
        y[idx,3] = 1
        return y
        
    def search_best_f1(self, y_pred, y_true):
        pred = y_pred.clone().detach()
        value = 0
        for i in itertools.product(list(np.arange(350, 650, 10)), repeat = 3):
            y_pred = pred.clone().detach()
            #print (i)
            #print (y_pred)
            
            for j in range(len(i)):
                y_pred[:,j] = y_pred[:,j] > i[j]*0.001
            #print (y_pred)
            n_precision = torch.sum(y_pred)
            n_recall = torch.sum(y_true)
            n_corrects = torch.sum(y_pred*y_true)
            recall = n_corrects/n_recall
            precision = n_corrects/(n_precision + 1e-20)
            if(value < 2 * (recall * precision) / (recall + precision + 1e-20)):
                value = 2 * (recall * precision) / (recall + precision + 1e-20)
                best_weight = [int(x)*0.001 for x in list(i)]
                
        #print(best_weight, value)
        self.threshold = torch.tensor(best_weight)
    
    def get_threshold(self):
        return self.threshold
    
    def set_threshold(self, thres):
        self.threshold = thres

    def reset(self):
        self.n_precision = 0
        self.n_recall = 0
        self.n_corrects = 0

    def update(self, predicts, groundTruth):
        predicts[:,:3] = predicts[:,:3] > self.threshold
        predicts = self.extend_label(predicts)
        self.n_precision += torch.sum(predicts).data.item()
        self.n_recall += torch.sum(groundTruth).data.item()
        self.n_corrects += torch.sum(groundTruth.type(torch.uint8) * predicts.type(torch.uint8)).data.item()

    def get_score(self):
        recall = self.n_corrects / self.n_recall
        precision = self.n_corrects / (self.n_precision + 1e-20)
        return 2 * (recall * precision) / (recall + precision + 1e-20)

    def print_score(self):
        score = self.get_score()
        return '{:.5f}'.format(score)


In [None]:
import os

def ExtendLabel(x):
    idx = torch.where(torch.sum(x[:,:3],1) == 0)[0]
    y = torch.zeros((x.shape[0],4))
    y[:,:3] = x[:,:3]
    y[idx,3] = 1
    return y

def _run_epoch(epoch, training, thres=None):
    model.train(training)
    if training:
        description = 'Train'
        dataset = trainData
        shuffle = True
    else:
        description = 'Valid'
        dataset = validData
        shuffle = False
    dataloader = DataLoader(dataset=dataset,
                            batch_size=8,
                            shuffle=shuffle,
                            collate_fn=dataset.collate_fn,
                            num_workers=8)

    trange = tqdm(enumerate(dataloader), total=len(dataloader), desc=description)
    loss = 0
    f1_score = F1()
    labels_all = None
    ys = None
    for i, (x, y) in trange:
        #print (x.shape)
        o_labels, batch_loss = _run_iter(x,y)
        if training:
            opt.zero_grad()
            batch_loss.backward()
            opt.step()

        loss += batch_loss.item()
        o_labels = o_labels.cpu()
        
        if (i == 0):
            labels_all=o_labels
            ys=y
        else:
            labels_all = torch.cat((labels_all, o_labels),dim=0)
            ys = torch.cat((ys, y),dim=0)
            
        f1_score.update(o_labels, y)

        trange.set_postfix(
            loss=loss / (i + 1), f1=f1_score.print_score())
    if training:        
        print ('w/o threshold, training F1:{0:.5f}, training Loss:{1:.5f}'.format(f1_score.get_score(),float(loss/ len(trange))))
        history['train'].append({'f1':f1_score.get_score(), 'loss':loss/ len(trange)})
        f1_score.reset()
        f1_score.search_best_f1(labels_all[:,:3], ys[:,:3])
        f1_score.update(labels_all, ys)
        print(f1_score.threshold)
        print ('w/  threshold, training F1:{0:.5f}, training Loss:{1:.5f}'.format(f1_score.get_score(),float(loss/ len(trange))))
        return f1_score.threshold
    else:        
        print ('w/o threshold, validating F1:{0:.5f}, validating Loss:{1:.5f}'.format(f1_score.get_score(),float(loss/ len(trange))))
        history['valid'].append({'f1':f1_score.get_score(), 'loss':loss/ len(trange)})
        f1_score.reset()
        f1_score.set_threshold(thres)
        f1_score.update(labels_all, ys)
        print ('w/  threshold, validating F1:{0:.5f}, validating Loss:{1:.5f}'.format(f1_score.get_score(),float(loss/ len(trange))))
        return f1_score.get_score()
        
def _run_iter(x,y):
    abstract = x.to(device)
    labels = y.to(device)
    o_labels = model(abstract)
    l_loss = criteria(o_labels, labels)
#     +0.1*F1_loss(o_labels, labels)
#     l_loss = F1_loss(o_labels, labels)
    
    if weight_decay > 0:
        loss = l_loss + reg_loss(model)
    total_loss = loss
    return o_labels, total_loss
    
    return o_labels, l_loss

def save(epoch):
    if not os.path.exists('model'):
        os.makedirs('model')
    torch.save(model.state_dict(), 'model/model.pkl.'+str(epoch))
    with open('model/history.json', 'w') as f:
        json.dump(history, f, indent=4)

In [None]:
def SubmitGenerator(prediction, sampleFile, public=True, filename='prediction.csv'):
    """
    Args:
        prediction (numpy array)
        sampleFile (str)
        public (boolean)
        filename (str)
    """
    sample = pd.read_csv(sampleFile)
    submit = {}
    submit['order_id'] = list(sample.order_id.values)
    redundant = len(sample) - prediction.shape[0]
    if public:
        submit['THEORETICAL'] = list(prediction[:,0]) + [0]*redundant
        submit['ENGINEERING'] = list(prediction[:,1]) + [0]*redundant
        submit['EMPIRICAL'] = list(prediction[:,2]) + [0]*redundant
        submit['OTHERS'] = list(prediction[:,3]) + [0]*redundant
    else:
        submit['THEORETICAL'] = [0]*redundant + list(prediction[:,0])
        submit['ENGINEERING'] = [0]*redundant + list(prediction[:,1])
        submit['EMPIRICAL'] = [0]*redundant + list(prediction[:,2])
        submit['OTHERS'] = [0]*redundant + list(prediction[:,3])
    df = pd.DataFrame.from_dict(submit) 
    df.to_csv(filename,index=False)

In [None]:
def F1_loss(probas, target, epsilon=1e-7):
    TP = (probas * target).sum(dim=1)
    precision = TP / (probas.sum(dim=1) + epsilon)
    recall = TP / (target.sum(dim=1) + epsilon)
    f1 = 2 * precision * recall / (precision + recall + epsilon)
    f1 = f1.clamp(min=epsilon, max=1-epsilon)
    return 1 - f1.mean()

In [None]:
from torch.utils.data import DataLoader
from tqdm import trange
import json
import matplotlib.pyplot as plt
from sklearn.model_selection import KFold

kf = KFold(n_splits=10, shuffle=True, random_state=50)
for train_index, valid_index in kf.split(trainall):
    model = simpleNet(768)
    opt = torch.optim.Adam(model.parameters(), lr=9e-6)
    criteria = torch.nn.BCELoss()

    weight_decay=0.015 # 正则化参数
    model.to(device)

    # 初始化正则化
    if weight_decay>0:
        reg_loss=Regularization(model, weight_decay, p=2).to(device)
    else:
        print("no regularization")
    
    train = [trainall[i] for i in train_index]
    valid = [trainall[i] for i in valid_index]
    trainData = BertDataset(train)
    validData = BertDataset(valid)
    
    max_epoch = 120
    history = {'train':[],'valid':[]}

    thres_all = []

    #print (embedder.get_vocabulary_size())
    #print (embedder.get_dim())
    #embedding = nn.Embedding(embedder.get_vocabulary_size(),embedder.get_dim())
    #embedding.weight = torch.nn.Parameter(embedder.vectors)
    
    for epoch in range(max_epoch):
        print('Epoch: {}'.format(epoch))
        thres = _run_epoch(epoch, True)
        _run_epoch(epoch, False, thres=thres)
        thres_all.append(thres)
        save(epoch)

    %matplotlib inline

    with open('model/history.json', 'r') as f:
        history = json.loads(f.read())

    train_loss = [l['loss'] for l in history['train']]
    valid_loss = [l['loss'] for l in history['valid']]
    train_f1 = [l['f1'] for l in history['train']]
    valid_f1 = [l['f1'] for l in history['valid']]

    plt.figure(figsize=(7,5))
    plt.title('Loss')
    plt.plot(train_loss, label='train')
    plt.plot(valid_loss, label='valid')
    plt.legend()
    plt.show()

    plt.figure(figsize=(7,5))
    plt.title('F1 Score')
    plt.plot(train_f1, label='train')
    plt.plot(valid_f1, label='valid')
    plt.legend()
    plt.show()

    print('Best F1 score ', max([[l['f1'], idx] for idx, l in enumerate(history['valid'])]))
    model_number = max([[l['f1'], idx] for idx, l in enumerate(history['valid'])])[1]

    model.load_state_dict(torch.load('model/model.pkl.{}'.format(model_number)))

    valid_value = _run_epoch(epoch, False, thres=thres_all[model_number])

    print (thres_all[model_number])
    model.train(False)
    dataloader = DataLoader(dataset=testData,
                                batch_size=64,
                                shuffle=False,
                                collate_fn=testData.collate_fn,
                                num_workers=4)
    trange = tqdm(enumerate(dataloader), total=len(dataloader), desc='Predict')
    prediction = []
    result = []
    for i, (x,y) in trange:
        o_labels = model(x.to(device))
        o_labels = o_labels.to('cpu')
        result.append(o_labels.clone())
        #print (o_labels)
        o_labels[:,:3] = o_labels[:,:3] > thres_all[model_number]
        o_labels = ExtendLabel(o_labels)
        prediction.append(o_labels)

    prediction = torch.cat(prediction).detach().numpy().astype(int)
    result = torch.cat(result).detach().numpy()
    print (result)
    import scipy.io
    from datetime import datetime
    now = datetime.now()
    time = now.strftime("%D_%H_%M_%S").replace('/','_')
    scipy.io.savemat('Results_Training/{0}_{1}.mat'.format(valid_value, time), mdict={'result': result, 'prediction': prediction, 'best_weight': thres_all[model_number].tolist()})    
#     torch.save(model.state_dict(), 'well-trained model/{0}_{1}.pkl'.format(valid_value, time))
    torch.save(model, 'well-trained-model/{0}_{1}.pkl'.format(valid_value, time))
    
    SubmitGenerator(prediction, 
                    '../data/task2_sample_submission.csv',
                    True, 
                    '../upload/{0}_{1}_task2_submission.csv'.format(valid_value, time))