# <center>**医学文本多分类实战**</center>
### <center>**《计算医疗》课程作业**</center>
<center>3200100259 沈骏一 控制科学与工程学院</center>

首先，我们导入在程序中需要的库，未下载的库可以使用pip语法下载  
本次实验所用的方法是Bert（即双向Transformer）的强化版，RoBERTa，在Transformer包中已经得到了集成。  
本次实验主要是调用函数并调整参数，以满足医学文本多分类任务要求。

In [12]:
import pandas as pd 
import numpy as np 
import json, time 
from tqdm import tqdm 
from sklearn.metrics import accuracy_score, classification_report
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from transformers import BertModel, BertConfig, BertTokenizer, AdamW, get_cosine_schedule_with_warmup
import warnings
warnings.filterwarnings('ignore')

先下载Bert的预训练模型，以及相对应的中文字典  
并以‘model’为变量名加载预训练模型

简要介绍：  
RoBERTa 模型是BERT 的改进版(A Robustly Optimized BERT，即简单粗暴称为强力优化的BERT方法)。  
在模型规模、算力和数据上，与BERT相比主要有以下几点改进：
1. 更大的模型参数量（论文提供的训练时间来看，模型使用 1024 块 V100 GPU 训练了 1 天的时间）
2. 更大bacth size。RoBERTa 在训练过程中使用了更大的bacth size。尝试过从 256 到 8000 不等的bacth size。
3. 更多的训练数据（包括：CC-NEWS 等在内的 160GB 纯文本。而最初的BERT使用16GB BookCorpus数据集和英语维基百科进行训练） 
 
另外，RoBERTa在训练方法上有以下改进：
1. 去掉下一句预测(NSP)任务
2. 动态掩码。BERT 依赖随机掩码和预测 token。原版的 BERT 实现在数据预处理期间执行一次掩码，得到一个静态掩码。 而 RoBERTa 使用了动态掩码：每次向模型输入一个序列时都会生成新的掩码模式。这样，在大量数据不断输入的过程中，模型会逐渐适应不同的掩码策略，学习不同的语言表征。  
3. 文本编码。Byte-Pair Encoding（BPE）是字符级和词级别表征的混合，支持处理自然语言语料库中的众多常见词汇。原版的 BERT 实现使用字符级别的 BPE 词汇，大小为 30K，是在利用启发式分词规则对输入进行预处理之后学得的。Facebook 研究者没有采用这种方式，而是考虑用更大的 byte 级别 BPE 词汇表来训练 BERT，这一词汇表包含 50K 的 subword 单元，且没有对输入作任何额外的预处理或分词。  
4. RoBERTa建立在BERT的语言掩蔽策略的基础上，修改BERT中的关键超参数，包括删除BERT的下一个句子训练前目标，以及使用更大的bacth size和学习率进行训练。RoBERTa也接受了比BERT多一个数量级的训练，时间更长。这使得RoBERTa表示能够比BERT更好地推广到下游任务。    

经过长时间的训练，本文的模型在GLUE排行榜上的得分为88.5分，与Yang等人(2019)报告的88.4分相当。本文模型在GLUE 9个任务的其中4个上达到了state-of-the-art的水平，分别是：MNLI, QNLI, RTE 和 STS-B。此外，RoBERTa还在SQuAD 和 RACE 排行榜上达到了最高分。

In [13]:
model_name = 'hfl/chinese-roberta-wwm-ext-large'
#model_name = 'hfl/chinese-roberta-wwm-ext'     #换用模型
#model_name = 'hfl/chinese-bert-wwm-ext'
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertModel.from_pretrained(model_name)

Some weights of the model checkpoint at hfl/chinese-roberta-wwm-ext-large were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


下一步，我们进行训练集与测试集数据的导入过程  
注意，这里的数据已经在‘textprocession.ipynb’中进行了预处理，统一处理成了‘类别ID+\t+特征描述’的格式，方便进行下一步的处理  
接下来的代码将‘train.csv’按8:2的比例划分为训练集与验证集，并放入对应的numpy数组

In [14]:
input_ids, input_masks, input_types, labels = [], [], [] ,[] # input char ids, segment type ids,  attention mask
valid_input_ids, valid_input_masks, valid_input_types, valid_labels = [], [], [] ,[] # input char ids, segment type ids,  attention mask
train_input_ids, train_input_masks, train_input_types, train_labels = [], [], [] ,[] # input char ids, segment type ids,  attention mask
maxlen = 100      # 取30即可覆盖99%
divide = 20000
with open("train.txt", encoding='utf-8') as f:
    for i, line in tqdm(enumerate(f)): 
        target, context = line.strip().split('\t')
        # encode_plus会输出一个字典，分别为'input_ids', 'token_type_ids', 'attention_mask'对应的编码
        # 根据参数会短则补齐，长则切断
        encode_dict = tokenizer.encode_plus(text=context, max_length = maxlen, 
                                            padding='max_length', truncation=True)
        input_ids.append(encode_dict['input_ids'])
        input_types.append(encode_dict['token_type_ids'])
        input_masks.append(encode_dict['attention_mask'])

        labels.append(int(target))

valid_input_ids, valid_input_types, valid_input_masks = np.array(input_ids[:divide]), np.array(input_types[:divide]), np.array(input_masks[:divide])
valid_labels = np.array(labels[:divide])
train_input_ids, train_input_types, train_input_masks = np.array(input_ids[divide:]), np.array(input_types[divide:]), np.array(input_masks[divide:])
train_labels = np.array(labels[divide:])

print(valid_input_ids.shape, valid_input_types.shape, valid_input_masks.shape, valid_labels.shape)
print(train_input_ids.shape, train_input_types.shape, train_input_masks.shape, train_labels.shape)

106330it [00:49, 2149.69it/s]


(20000, 100) (20000, 100) (20000, 100) (20000,)
(86330, 100) (86330, 100) (86330, 100) (86330,)


上述结果展示了处理后的数据集，统一转变成np序列格式，共有id、类别、描述、掩码4个np数组  
其中描述长度截断至50字符，这里的maxlen是一个可以进行参数调整的点  
共有86330条训练数据、20000条验证数据  
我们用同样的方法处理测试集的数据

In [15]:
test_input_ids, test_input_masks, test_input_types, test_labels = [], [], [] ,[] # input char ids, segment type ids,  attention mask
maxlen = 100      # 取30即可覆盖99%
with open("test.txt", encoding='utf-8') as f:
    for i, line in tqdm(enumerate(f)): 
        target, context = line.strip().split('\t')
        # encode_plus会输出一个字典，分别为'input_ids', 'token_type_ids', 'attention_mask'对应的编码
        # 根据参数会短则补齐，长则切断
        encode_dict = tokenizer.encode_plus(text=context, max_length = maxlen, 
                                            padding='max_length', truncation=True)
        test_input_ids.append(encode_dict['input_ids'])
        test_input_types.append(encode_dict['token_type_ids'])
        test_input_masks.append(encode_dict['attention_mask'])
        test_labels.append(int(target))

test_input_ids, test_input_types, test_input_masks = np.array(test_input_ids), np.array(test_input_types), np.array(test_input_masks)
test_labels = np.array(test_labels)
print(test_input_ids.shape, test_input_types.shape, test_input_masks.shape, test_labels.shape)

26583it [00:12, 2053.44it/s]


(26583, 100) (26583, 100) (26583, 100) (26583,)


将一次行堆处理数设置为64，当然，本次实验是在AutoDL平台上借用GPU进行训练的，故可以满足较大显存的需求  
将np格式转化成为tensor，便于之后模型的输入

In [16]:
BATCH_SIZE = 64  # 如果会出现OOM问题，减小它
# 训练集
train_data = TensorDataset(torch.LongTensor(train_input_ids), 
                           torch.LongTensor(train_input_masks), 
                           torch.LongTensor(train_input_types), 
                           torch.LongTensor(train_labels))
train_sampler = RandomSampler(train_data)  
train_loader = DataLoader(train_data, sampler=train_sampler, batch_size=BATCH_SIZE)

# 验证集
valid_data = TensorDataset(torch.LongTensor(valid_input_ids), 
                          torch.LongTensor(valid_input_masks),
                          torch.LongTensor(valid_input_types), 
                          torch.LongTensor(valid_labels))
valid_sampler = SequentialSampler(valid_data)
valid_loader = DataLoader(valid_data, sampler=valid_sampler, batch_size=BATCH_SIZE)

# 测试集（是没有标签的）
test_data = TensorDataset(torch.LongTensor(test_input_ids), 
                          torch.LongTensor(test_input_masks),
                          torch.LongTensor(test_input_types))
test_sampler = SequentialSampler(test_data)
test_loader = DataLoader(test_data, sampler=test_sampler, batch_size=BATCH_SIZE)

定义模型，这里的模型是一个BERT模型，之后进行一个池化层与全连接层，预测概率

In [17]:
# 定义model
class Bert_Model(nn.Module):
    def __init__(self,classes=10):
        super(Bert_Model,self).__init__()
        self.model_name = 'hfl/chinese-roberta-wwm-ext-large'
        #self.model_name = 'hfl/chinese-roberta-wwm-ext'            #更换不同的模型
        #self.model_name = 'hfl/chinese-bert-wwm'
        self.model = BertModel.from_pretrained(self.model_name)
        self.tokenizer = BertTokenizer.from_pretrained(self.model_name)
        self.fc = nn.Linear(1024,250)     #全连接层
        
        
    def forward(self, input_ids, attention_mask=None, token_type_ids=None):
        outputs = self.model(input_ids, attention_mask, token_type_ids)
        out_pool = outputs[1]   # 池化后的输出 [bs, config.hidden_size]
        logit = self.fc(out_pool)   #  [bs, classes]
        return logit

构建模型完成后，测试整个模型中的所有参数数量  
定义训练的模型是GPU，共进行十次训练

In [18]:
def get_parameter_number(model):
    #  打印模型参数量
    total_num = sum(p.numel() for p in model.parameters())
    trainable_num = sum(p.numel() for p in model.parameters() if p.requires_grad)
    return 'Total parameters: {}, Trainable parameters: {}'.format(total_num, trainable_num)

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
#DEVICE = 'cpu'         #用cpu进行训练
EPOCHS = 5          #训练次数
model = Bert_Model().to(DEVICE)
print(get_parameter_number(model))

Some weights of the model checkpoint at hfl/chinese-roberta-wwm-ext-large were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Total parameters: 325778682, Trainable parameters: 325778682


设置优化器为AdmW（当然选择其他类型的优化器也可以）  
这里定义了学习率lr，并设置了权重衰减比例，方式模型过拟合  
选择warmup，将学习率逐渐上升，防止模型无法收敛

In [19]:
optimizer = AdamW(model.parameters(), lr=2e-5, weight_decay=1e-4) #AdamW优化器
scheduler = get_cosine_schedule_with_warmup(optimizer, num_warmup_steps=len(train_loader),
                                            num_training_steps=EPOCHS*len(train_loader))
# 学习率先线性warmup一个epoch，然后cosine式下降。
# 加warmup（学习率从0慢慢升上去），如果把warmup去掉，可能收敛不了。


定义训练、验证、测试函数  
主要调用的是sklearn中的‘accuracy_score’函数来测试模型的精确度

In [20]:
# 评估模型性能，在验证集上
def evaluate(model, data_loader, device):
    model.eval()
    val_true, val_pred = [], []
    with torch.no_grad():
        for idx, (ids, att, tpe, y) in (enumerate(data_loader)):
            y_pred = model(ids.to(device), att.to(device), tpe.to(device))
            y_pred = torch.argmax(y_pred, dim=1).detach().cpu().numpy().tolist()
            val_pred.extend(y_pred)
            val_true.extend(y.squeeze().cpu().numpy().tolist())
    
    return accuracy_score(val_true, val_pred)  #返回accuracy


# 测试集没有标签，需要预测提交
def predict(model, data_loader, device):
    model.eval()
    val_pred1,val_pred2,val_pred3,val_pred4,val_pred5 = [],[],[],[],[]
    with torch.no_grad():
        for idx, (ids, att, tpe) in tqdm(enumerate(data_loader)):
            y_pred = model(ids.to(device), att.to(device), tpe.to(device))
            #y_pred = torch.argmax(y_pred, dim=1).detach().cpu().numpy().tolist()
            y_pred = torch.argsort(y_pred,dim=1,descending=True).detach().cpu().numpy()
            y_pred1=y_pred[:,0].tolist()
            val_pred1.extend(y_pred1)
            y_pred2=y_pred[:,1].tolist()
            val_pred2.extend(y_pred2)
            y_pred3=y_pred[:,2].tolist()
            val_pred3.extend(y_pred3)
            y_pred4=y_pred[:,3].tolist()
            val_pred4.extend(y_pred4)
            y_pred5=y_pred[:,4].tolist()
            val_pred5.extend(y_pred5)
    return val_pred1,val_pred2,val_pred3,val_pred4,val_pred5


def train_and_eval(model, train_loader, valid_loader, 
                   optimizer, scheduler, device, epoch):
    best_acc = 0.0
    patience = 0
    criterion = nn.CrossEntropyLoss()
    for i in range(epoch):
        """训练模型"""
        start = time.time()
        model.train()
        print("***** Running training epoch {} *****".format(i+1))
        train_loss_sum = 0.0
        for idx, (ids, att, tpe, y) in enumerate(train_loader):
            ids, att, tpe, y = ids.to(device), att.to(device), tpe.to(device), y.to(device)  
            y_pred = model(ids, att, tpe)
            loss = criterion(y_pred, y)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            scheduler.step()   # 学习率变化
            
            train_loss_sum += loss.item()
            if (idx + 1) % (len(train_loader)//5) == 0:    # 只打印五次结果
            #if (idx + 1) % 5 == 0:    # 每5epoch打印结果
                print("Epoch {:04d} | Step {:04d}/{:04d} | Loss {:.4f} | Time {:.4f}".format(
                          i+1, idx+1, len(train_loader), train_loss_sum/(idx+1), time.time() - start))
                # print("Learning rate = {}".format(optimizer.state_dict()['param_groups'][0]['lr']))

        """验证模型"""
        model.eval()
        acc = evaluate(model, valid_loader, device)  # 验证模型的性能
        ## 保存最优模型
        if acc > best_acc:
            best_acc = acc
            torch.save(model.state_dict(), "best_bert_model.pth") 
        
        print("current acc is {:.4f}, best acc is {:.4f}".format(acc, best_acc))
        print("time costed = {}s \n".format(round(time.time() - start, 5)))

完成函数构建，进行模型的训练与评估，训练次数由EPOCH参数决定

In [21]:
#model.load_state_dict(torch.load("best_bert_model.pth"))
train_and_eval(model, train_loader, valid_loader, optimizer, scheduler, DEVICE, EPOCHS)

***** Running training epoch 1 *****
Epoch 0001 | Step 0269/1349 | Loss 5.0004 | Time 129.7180
Epoch 0001 | Step 0538/1349 | Loss 4.0901 | Time 259.8747
Epoch 0001 | Step 0807/1349 | Loss 3.5005 | Time 389.2240
Epoch 0001 | Step 1076/1349 | Loss 3.0970 | Time 518.6850
Epoch 0001 | Step 1345/1349 | Loss 2.8009 | Time 648.0758
current acc is 0.6015, best acc is 0.6015
time costed = 701.29101s 

***** Running training epoch 2 *****
Epoch 0002 | Step 0269/1349 | Loss 1.4106 | Time 129.4957
Epoch 0002 | Step 0538/1349 | Loss 1.3596 | Time 259.0841
Epoch 0002 | Step 0807/1349 | Loss 1.3147 | Time 388.5630
Epoch 0002 | Step 1076/1349 | Loss 1.2764 | Time 518.1023
Epoch 0002 | Step 1345/1349 | Loss 1.2478 | Time 647.6062
current acc is 0.6625, best acc is 0.6625
time costed = 700.66823s 

***** Running training epoch 3 *****
Epoch 0003 | Step 0269/1349 | Loss 0.9916 | Time 129.5571
Epoch 0003 | Step 0538/1349 | Loss 0.9705 | Time 259.0066
Epoch 0003 | Step 0807/1349 | Loss 0.9636 | Time 388.42

加载最优模型对‘test.csv’中的数据进行测试

In [22]:
# 加载最优权重对测试集测试
model.load_state_dict(torch.load("best_bert_model.pth"))
pred1_test,pred2_test,pred3_test,pred4_test,pred5_test = predict(model, test_loader, DEVICE)
print("\n Test Accuracy = {} \n".format(accuracy_score(test_labels, pred1_test)))

top3 = (accuracy_score(test_labels, pred1_test,normalize=False)+accuracy_score(test_labels, pred2_test,normalize=False)+accuracy_score(test_labels, pred3_test,normalize=False))/len(test_labels)
print("\n Top3 Accuracy = {} \n".format(top3))
top5 = (accuracy_score(test_labels, pred1_test,normalize=False)+accuracy_score(test_labels, pred2_test,normalize=False)+accuracy_score(test_labels, pred3_test,normalize=False)+accuracy_score(test_labels, pred4_test,normalize=False)+accuracy_score(test_labels, pred5_test,normalize=False))/len(test_labels)
print("\n Top5 Accuracy = {} \n".format(top5))
print(classification_report(test_labels, pred1_test, digits=4))

416it [01:06,  6.25it/s]



 Test Accuracy = 0.6968363239664447 


 Top3 Accuracy = 0.9109205131098822 


 Top5 Accuracy = 0.9561373810329911 

              precision    recall  f1-score   support

           0     0.6167    0.9250    0.7400        40
           1     1.0000    0.0500    0.0952        20
           2     0.5316    0.5600    0.5455        75
           3     0.4158    0.4158    0.4158       101
           4     0.6452    0.6818    0.6630        88
           5     0.5185    0.4444    0.4786        63
           6     0.0000    0.0000    0.0000        47
           7     0.6184    0.8294    0.7085       170
           8     0.7705    0.9038    0.8319        52
           9     0.5414    0.5000    0.5199       144
          10     1.0000    0.0833    0.1538        24
          11     0.8222    0.8605    0.8409        43
          12     0.5455    0.7083    0.6163       144
          13     0.8286    0.7632    0.7945        38
          14     0.2921    0.3662    0.3250        71
          15     0

**一些额外的指标数据：**  
（尝试了但是好像出了点问题）

In [None]:
'''
from sklearn.metrics import precision_score
# 传入真实值和预测值
result_precision = precision_score(test_labels, pred1_test)
# ==3== 计算召回率，查得全不全
from sklearn.metrics import recall_score
result_recall = recall_score(test_labels, pred1_test)
# ==4== F1-score综合评分
from sklearn.metrics import f1_score
result_f1 = f1_score(test_labels, pred1_test)
# 精准率和召回率曲线绘制
from sklearn.metrics import precision_recall_curve
precisions,recalls,thretholds = precision_recall_curve(test_labels,pred1_test)
 
# 计算平均精准率
from sklearn.metrics import average_precision_score
# 参数：y_true真实值，y_score预测到的概率
precisions_average = average_precision_score(test_labels,pred1_test)
import matplotlib.pyplot as plt
# 绘图，召回率x轴，精准率y轴
fig,axes = plt.subplots(1,2,figsize=(10,5)) #设置画布，1行2列
# 在第一张画布上绘图
axes[0].plot(precisions,recalls) #横坐标精确率，纵坐标召回率
axes[0].set_title(f'平均精准率：{round(precisions_average,2)}')
axes[0].set_xlabel('召回率')
axes[0].set_ylabel('精准率')
# ROC曲线绘制
from sklearn.metrics import roc_curve
# 传入参数：y_true真实值，y_predict_proba预测到的概率
# 产生返回值，FP、TP、阈值
fpr,tpr,thretholds = roc_curve(test_labels, pred1_test)
# 计算AUC得分
from sklearn.metrics import auc
# 传入参数：fpr、tpr
AUC = auc(fpr,tpr) 
 
# 绘图
axes[1].plot(fpr,tpr) #传入FP和TP的值
axes[1].set_title(f'AUC值为{round(AUC,2)}')
axes[1].set_xlabel('FPR')
axes[1].set_ylabel('TPR')
'''

**当EPOCH = 10，maxlen = 50 时的训练结果：**   
![](pics/pic1.png)  
![](pics/Screenshot%20from%202022-10-12%2021-02-52.png)

接下来进行调参分析：  
先调整maxlen的参数，为了方便起见，我们将EPOCH更改为5  
**当maxlen = 50**   
![](pics/Screenshot%20from%202022-10-18%2000-13-41.png)  
![](pics/Screenshot%20from%202022-10-18%2000-16-13.png)  
**当maxlen = 100**  
![](pics/Screenshot%20from%202022-10-18%2001-32-17.png)  
![](pics/Screenshot%20from%202022-10-18%2001-32-28.png)  
可以发现，在Maxlen = 50时，已经达到了较好的效果，当然，当Maxlen = 100，时，在第二个小数位有了较小的提升，但训练时间需要两倍左右。

进行学习率的调整：将lr由2e-5降低之1e-5，epoch本次设置10  
![](pics/Screenshot%20from%202022-10-19%2003-13-00.png)  
![](pics/Screenshot%20from%202022-10-19%2003-13-13.png)  
同样进行对比分析，将学习率提升至3e-5，再次进行测试  
![](pics/Screenshot%20from%202022-10-20%2001-04-55.png)  
![](pics/Screenshot%20from%202022-10-20%2001-05-07.png)  
**发现随着学习率的提升，最后的模型准确率有了0.001数量级的提升，但效果并不明显**

尝试更改网络结构进行测试分析：
将原本的线性输出层更改为三层，增加了gelu模块防止过拟合：  
![](pics/Screenshot%20from%202022-10-29%2003-54-13.png)   
**最终得出结果如下所示：**
![](pics/L2hvbWUveGlhbzExLy5jb25maWcvRGluZ1RhbGsvd3Vrb25nLzE4NjI5MTcxMDhfdjIvSW1hZ2VGaWxlcy83MzQ2MTgzLzkyMTkyNDk3NTlfNzIxNTU2NjA3MTFfQzAyRTJFREQtNUREQS00NDk5LTgyQzctMDlDNkYzOEZGQzIxLnBuZw==.png)  
![](pics/L2hvbWUveGlhbzExLy5jb25maWcvRGluZ1RhbGsvd3Vrb25nLzE4NjI5MTcxMDhfdjIvSW1hZ2VGaWxlcy83MzQ2MTgzLzkyMTkyNDk3NTlfNzIxNTU1OTM5MTFfNTI4MDFFQkItNDIzRC00RDM4LTg3RTYtOTU2MTk0NjVCQjI5LnBuZw==.png)  
![](pics/L2hvbWUveGlhbzExLy5jb25maWcvRGluZ1RhbGsvd3Vrb25nLzE4NjI5MTcxMDhfdjIvSW1hZ2VGaWxlcy83MzQ2MTgzLzkyMTkyNDk3NTlfNzIxNTU0NjE3OTFfRDhEMkMxMkEtMUQ3Ny00REM0LUI1QjYtREYxNDZEMERFNDQwLnBuZw==.png)  
**也并没有得到有效提升**

调参是个复杂的过程，于是我们尝试用机器学习自动调参  
这里使用的时Optuna工具

In [None]:
import optuna

def objective(trial):
    learningrate = trial.suggest_float('x', 1e-5, 1e-4)
    optimizer = AdamW(model.parameters(), lr=learningrate, weight_decay=1e-4) #AdamW优化器
    train_and_eval(model, train_loader, valid_loader, optimizer, scheduler, DEVICE, EPOCHS)
    model.load_state_dict(torch.load("best_bert_model.pth"))
    pred1_test,pred2_test,pred3_test,pred4_test,pred5_test = predict(model, test_loader, DEVICE)
    return accuracy_score(test_labels, pred1_test)

study = optuna.create_study()
study.optimize(objective, n_trials=1000)

study.best_params  

但是由于计算量过于庞大，本次实验就不尝试了

### 实验总结
在本次实验中，利用了BERT自然语言处理方法，对于长文本分类问题做了一定程度的探究。  
在完成模型搭建与分类任务的同时，还探究了参数的选取对于最后实验结果的影响。  
虽然在最后可能还是并没有准确完成模型的调参分析，但是经过测试模型的准确率在TOP1、TOP3、TOP5已经可以达到70%、90%、95%  
属于是比较好的模型效果了，当然，这也与数据集的结构与选取有关。  
总而言之，本次实验给了我很大启发，我也正式迈入了NLP与人工智能的领域。