# LSTM on IMDB Sentiment Analysis

这个文件在IMDB上进行情感分类，练习torchtext等的使用，熟悉训练流程，并测试以下模型的表现：

* LSTM

相对前一个文件做了以下改进

* 使用预训练词向量，比较不同维度词向量的影响
* 进行了pack，pad操作
* 考虑了num_layer,dropout等的影响

目前的进度：

* 完成，效果良好

问题：
* BucketIterator，shuffle，packandpad这几个怎么处理？（目前采用的是只在batch间shuffle，长度相近的在同一个batch，batch内按降序排列）
* packpad操作之后hidden和output对应位置不一致？
* Adam不需要指定学习率？
* 实验中有很多超参等设置时是不是只能一个一个按当前最优选，还是严格控制变量做很多实验？
* 词典最大长度需要指定吗，怎么指定？停用词需要去除吗？


参考：
* [torchtext使用--updated IMDB](https://blog.csdn.net/weixin_43301333/article/details/105745053)

## Requirement
* torchtext==0.6.0

## Import

In [2]:
import torch
from torchtext import datasets
from torchtext import data
import numpy as np
import random
from torch import nn,optim
from sklearn import metrics
import torch.nn.functional as F

use_cuda=torch.cuda.is_available()
device=torch.device("cuda" if use_cuda else "cpu")

SEED = 1234
np.random.seed(SEED)
random.seed(SEED)
torch.manual_seed(SEED)
if use_cuda:
    torch.cuda.manual_seed(SEED)

## 供调整的参数

In [91]:
bs=32
d_embed=100
d_hidden=256
d_output=2
dropout=0.5
max_epochs=20
require_improvement=3
n_layers=1
bidirectional=True
MAX_VOCAB_SIZE=25000

## 数据载入和处理

在载入和处理数据部分采用了torchtext库。

由于在colab上无法运行spacy，我们采用简单的按空格分词，spacy后续在服务器上跑时加进去。

In [77]:
tokenize = lambda x: x.split()
TEXT=data.Field(tokenize=tokenize,batch_first=True,include_lengths=True)
LABEL=data.LabelField(dtype=torch.long)
train_data,test_data=datasets.IMDB.splits(TEXT,LABEL)

**下面展示样本数量和一个样本。**

In [78]:
print(f'Number of training examples: {len(train_data)}')
print(f'Number of testing examples: {len(test_data)}')
print(vars(train_data.examples[0])['text'])

Number of training examples: 25000
Number of testing examples: 25000
['At', 'one', 'end', 'of', 'the', 'Eighties', 'Warren', 'Beatty', 'created', 'and', 'starred', 'in', 'the', 'literate', 'epic', 'Reds', 'about', 'the', 'founding', 'of', 'the', 'Soviet', 'Union', 'as', 'seen', 'through', 'the', 'eyes', 'of', 'iconoclast', 'radical', 'John', 'Reed.', 'It', 'was', 'a', 'profound', 'film', 'both', 'entertaining', 'and', 'with', 'a', 'message', 'presented', 'by', 'an', 'all', 'star', 'cast.', 'At', 'the', 'end', 'of', 'the', 'decade', 'Warren', 'Beatty', 'created', 'another', 'kind', 'of', 'epic', 'in', 'Dick', 'Tracy', 'that', 'makes', 'no', 'pretense', 'to', 'being', 'anything', 'other', 'than', 'entertainment', 'with', 'a', 'whole', 'bunch', 'of', 'the', 'best', 'actors', 'around', 'just', 'having', 'a', 'great', 'old', 'time', 'hamming', 'it', 'up', 'under', 'tons', 'of', 'makeup.<br', '/><br', '/>That', 'both', 'Reds', 'and', 'Dick', 'Tracy', 'could', 'come', 'from', 'the', 'same', '

有25000个训练样本和25000个测试样本，尽管这个数量比不太符合要求，但是这个任务比较简单，我们就这么来。

一个样本是一个字典的形式，'text'中含有分词完毕的单词列表，'label'中含其标签（pos或neg）。

**下面我们需要把训练样本中再分一些出来作为验证集。**

In [79]:
train_data,valid_data=train_data.split(split_ratio=0.8)

print(f'Number of training examples: {len(train_data)}')
print(f'Number of validation examples: {len(valid_data)}')
print(f'Number of testing examples: {len(test_data)}')

Number of training examples: 20000
Number of validation examples: 5000
Number of testing examples: 25000


**下面我们需要建立词典**

**这里我们使用Glove的100维词向量初始化**

这里词典最大长度是否需要指定？

In [80]:
TEXT.build_vocab(train_data,vectors='glove.6B.100d',unk_init=torch.Tensor.normal_,max_size=MAX_VOCAB_SIZE)
LABEL.build_vocab(train_data)

d_vocab=len(TEXT.vocab)
print(f"Unique tokens in TEXT vocabulary: {len(TEXT.vocab)}")
print(f"Unique tokens in LABEL vocabulary: {len(LABEL.vocab)}")

print('最频繁的20个单词：')
print(TEXT.vocab.freqs.most_common(20))

Unique tokens in TEXT vocabulary: 25002
Unique tokens in LABEL vocabulary: 2
最频繁的20个单词：
[('the', 229686), ('a', 124475), ('and', 122205), ('of', 114632), ('to', 106377), ('is', 82794), ('in', 68497), ('I', 52823), ('that', 51928), ('this', 45800), ('it', 43628), ('/><br', 41010), ('was', 37436), ('as', 34034), ('with', 33323), ('for', 33022), ('but', 27233), ('The', 27013), ('on', 24694), ('movie', 24354)]


测试和验证文本中可能出现训练集中没有的单词，另外在训练时为了满足批量输入需要将所有或一个批次的文本长度对齐，因此上述字典的建立中会自动加入特殊标记_&lt;unk&gt;_ 和*&lt;pad&gt;* ，用来表示未知字符和填充字符。


**下面我们需要建立迭代器**

In [92]:
train_iterator, valid_iterator, test_iterator =data.BucketIterator.splits(
    (train_data,valid_data,test_data),
    batch_size=bs,device=device,sort_within_batch=True)

#测试
for x in train_iterator:
    print(x.text[0].shape)
    print(x.text[0])
    print(x.label)
    break

torch.Size([32, 453])
tensor([[ 1973,     5,    28,  ...,    46,     5,  1340],
        [  133,  1999,   703,  ...,   946, 15968,  3272],
        [ 1675,    87,     9,  ...,    14,   872,     1],
        ...,
        [   49,    21,   154,  ...,     1,     1,     1],
        [   19,  6010,     5,  ...,     1,     1,     1],
        [  122,   166,     3,  ...,     1,     1,     1]], device='cuda:0')
tensor([0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0,
        0, 1, 0, 1, 0, 1, 1, 1], device='cuda:0')


sort_within_batch可以让iterator生成的batch按照长度排序，这是packed pad sequences所要求的。

值得注意的是，**迭代器中的文本已经被转换成了序号**。


## Model

定义一个LSTM模型。

对应之前使用预训练词向量，在这里我们使用nn.Embedding.from_pretrained。

**非常值得注意的是，我们需要设置freeze=False，选择不冻结词向量，否则会影响收敛速度。**

In [93]:
class simple_rnn(nn.Module):
    
    def __init__(self,d_vocab: int,d_embed:int ,d_hidden:int ,d_output:int,dropout=0,vectors=None,
                 n_layers=1,bidirectional=False,pad_idx=0):
        super(simple_rnn, self).__init__()
        self.bi=2 if bidirectional else 1
        self.n_layers=n_layers
        self.pad_idx=pad_idx
        self.d_hidden=d_hidden
        self.d_output=d_output
        
        self.embed=nn.Embedding.from_pretrained(TEXT.vocab.vectors,padding_idx=pad_idx,freeze=False)
        self.rnn=nn.LSTM(d_embed,d_hidden,batch_first=True,num_layers=n_layers,bidirectional=bidirectional,dropout=dropout)
        self.fc=nn.Linear(d_hidden*self.bi,d_output)
        self.dropout=nn.Dropout(dropout)
        
    def forward(self,text,text_length):
        # input:(bs,1ength),(bs)
        #print(text.shape)
        #print(text_length)
        embeded=self.dropout(self.embed(text)) #(bs,length,d_embed)
        packed=nn.utils.rnn.pack_padded_sequence(embeded,text_length.cpu(),batch_first=True)
        output,(hidden,cell)=self.rnn(packed)
        output,output_len=nn.utils.rnn.pad_packed_sequence(output,batch_first=True)
        #print(output)
        #print(output_len)
        #output=torch.gather(output,1,(text_length-1).unsqueeze(-1).unsqueeze(-1).expand(-1,-1,self.d_hidden*self.bi))   #(bs,1,d_hidden*bi)
        #print(output.shape)
        #print(output)

        hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1))

        return self.fc(hidden)#(batch,d_output)
    
model=simple_rnn(d_vocab,d_embed,d_hidden,d_output,dropout,n_layers=n_layers,bidirectional=bidirectional,pad_idx=TEXT.vocab.stoi[TEXT.pad_token])
print(model)

PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]
UNK_IDX = TEXT.vocab.stoi[TEXT.unk_token]
model.embed.weight.data[UNK_IDX] = torch.zeros(d_embed)
model.embed.weight.data[PAD_IDX] = torch.zeros(d_embed)
print(model.embed.weight.data)

simple_rnn(
  (embed): Embedding(25002, 100, padding_idx=1)
  (rnn): LSTM(100, 256, batch_first=True, dropout=0.5, bidirectional=True)
  (fc): Linear(in_features=512, out_features=2, bias=True)
  (dropout): Dropout(p=0.5, inplace=False)
)
tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [-0.0382, -0.2449,  0.7281,  ..., -0.1459,  0.8278,  0.2706],
        ...,
        [-1.3295,  0.4554, -0.1147,  ...,  0.5420,  2.2063,  0.7768],
        [ 0.8933, -0.6821, -0.7016,  ...,  0.4941, -0.0156,  1.4400],
        [-0.1658,  0.3175, -0.3845,  ...,  0.9370, -0.3842, -2.0957]])


  "num_layers={}".format(dropout, num_layers))


测试一下能否跑通

In [89]:
optimizer = optim.Adam(model.parameters())
criterion = nn.CrossEntropyLoss()
#criterion=nn.BCEWithLogitsLoss()
if use_cuda:
    criterion.to(device)
    model.to(device)
with torch.no_grad():
    for batch in train_iterator:
        x,l=batch.text
        y=batch.label
        preds=model(x,l)
        print(preds.shape)
        criterion(preds,y)
        break

torch.Size([128, 2])


## Training

因为是在colab上运行，没有保存模型，最后在测试集上的结果并非最优模型的结果。

In [84]:
def train(model, train_iter, dev_iter, test_iter):
    model.to(device)
    model.train()
    optimizer = optim.Adam(model.parameters())
    criterion = nn.CrossEntropyLoss()
    #criterion = nn.BCEWithLogitsLoss()
    if use_cuda:
        criterion.cuda()

    # 学习率指数衰减，每次epoch：学习率 = gamma * 学习率
    # scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9)
    dev_best_loss = float('inf')
    last_improve = 0  # 记录上次验证集loss下降的batch数
    #writer = SummaryWriter(log_dir=config.log_path + '/' + time.strftime('%m-%d.%H.%M', time.localtime())+'_'+which_data+'_'+which_model+'_'+which_task+'_'+exp_number)
    
    for epoch in range(max_epochs):
        train_loss=0
        train_correct=0
        # scheduler.step() # 学习率衰减
        for i, batch in enumerate(train_iter):
            optimizer.zero_grad()
            x,l=batch.text
            y=batch.label
            outputs = model(x,l)
            loss = criterion(outputs, y)
            loss.backward()
            optimizer.step()
            #训练集的准确率
            preds = torch.max(outputs, 1)[1]
            #preds = torch.round(torch.sigmoid(outputs))
            train_correct+=(y==preds).sum()
            train_loss+=loss.item()
        train_loss/=len(train_iterator)   #train_loss
        train_acc=train_correct/len(train_iterator.dataset)   #train_acc
            
        #验证集
        dev_acc, dev_loss = evaluate(model, dev_iter)
        if dev_loss < dev_best_loss:
            dev_best_loss = dev_loss
            improve = '*'
            last_improve=epoch
        else:
            improve = ''
        msg = 'Epoch: {0:>6},  Train Loss: {1:>5.2},  Train Acc: {2:>6.2%},  Val Loss: {3:>5.2},  Val Acc: {4:>6.2%} {5}'
        print(msg.format(epoch+1, train_loss, train_acc, dev_loss, dev_acc, improve))
        #writer.add_scalar("loss/train", loss.item(), total_batch)
        #writer.add_scalar("loss/dev", dev_loss, total_batch)
        #writer.add_scalar("acc/train", train_acc, total_batch)
        #writer.add_scalar("acc/dev", dev_acc, total_batch)

        if epoch - last_improve > require_improvement:
            # 验证集loss超过1epoch没下降，结束训练
            print("No optimization for a long time, auto-stopping...")
            break
    #writer.close()
    test(model, test_iter)

def evaluate(model, data_iter, test=False):
    model.eval()
    loss_total = 0
    predict_all = np.array([], dtype=int)
    labels_all = np.array([], dtype=int)
    with torch.no_grad():
        for batch in data_iter:
            x,l=batch.text
            labels=batch.label
            outputs = model(x,l)
            loss = F.cross_entropy(outputs, labels)
            #loss=criterion(outputs,labels)
            loss_total += loss
            labels = labels.data.cpu().numpy()
            predic = torch.max(outputs, 1)[1].cpu().numpy()
            #predic=torch.round(torch.sigmoid(outputs)).cpu().numpy()
            labels_all = np.append(labels_all, labels)
            predict_all = np.append(predict_all, predic)
    model.train()
    acc = metrics.accuracy_score(labels_all, predict_all)
    
    if test:
        report = metrics.classification_report(labels_all, predict_all, labels=[0,1],target_names=['pos','neg'], digits=4,output_dict=True)
        confusion = metrics.confusion_matrix(labels_all, predict_all)
        return acc, loss_total / len(data_iter), report, confusion
    
    return acc, loss_total / len(data_iter)


def test(model, test_iter):
    test_acc, test_loss, test_report, test_confusion = evaluate(model, test_iter, test=True)
    msg = 'Test Loss: {0:>5.2},  Test Acc: {1:>6.2%}'
    print(msg.format(test_loss, test_acc))
    print("Precision, Recall and F1-Score...")
    print(test_report)
    print("Confusion Matrix...")
    print(test_confusion)

In [45]:
#original
train(model,train_iterator,valid_iterator,test_iterator)

Epoch:      1,  Train Loss:  0.64,  Train Acc: 62.28%,  Val Loss:  0.67,  Val Acc: 62.84% *
Epoch:      2,  Train Loss:  0.62,  Train Acc: 66.15%,  Val Loss:  0.59,  Val Acc: 70.22% *
Epoch:      3,  Train Loss:   0.6,  Train Acc: 67.28%,  Val Loss:  0.59,  Val Acc: 67.66% 
Epoch:      4,  Train Loss:   0.5,  Train Acc: 76.34%,  Val Loss:  0.44,  Val Acc: 79.78% *
Epoch:      5,  Train Loss:  0.38,  Train Acc: 83.55%,  Val Loss:  0.33,  Val Acc: 85.82% *
Epoch:      6,  Train Loss:  0.31,  Train Acc: 86.75%,  Val Loss:  0.37,  Val Acc: 84.68% 
Epoch:      7,  Train Loss:  0.28,  Train Acc: 88.79%,  Val Loss:   0.3,  Val Acc: 87.46% *
Epoch:      8,  Train Loss:  0.23,  Train Acc: 91.03%,  Val Loss:  0.28,  Val Acc: 88.38% *
Epoch:      9,  Train Loss:  0.21,  Train Acc: 91.81%,  Val Loss:  0.28,  Val Acc: 88.92% *
Epoch:     10,  Train Loss:  0.19,  Train Acc: 92.61%,  Val Loss:   0.3,  Val Acc: 89.00% 
Epoch:     11,  Train Loss:  0.16,  Train Acc: 93.80%,  Val Loss:  0.32,  Val Acc: 

## Ablation Studies

In [49]:
#词典没有最大长度限制
train(model,train_iterator,valid_iterator,test_iterator)

Epoch:      1,  Train Loss:  0.68,  Train Acc: 56.34%,  Val Loss:   0.7,  Val Acc: 50.78% *
Epoch:      2,  Train Loss:  0.68,  Train Acc: 54.80%,  Val Loss:  0.65,  Val Acc: 63.44% *
Epoch:      3,  Train Loss:  0.65,  Train Acc: 61.18%,  Val Loss:  0.53,  Val Acc: 74.84% *
Epoch:      4,  Train Loss:  0.45,  Train Acc: 79.60%,  Val Loss:  0.35,  Val Acc: 85.00% *
Epoch:      5,  Train Loss:   0.3,  Train Acc: 87.40%,  Val Loss:  0.32,  Val Acc: 86.62% *
Epoch:      6,  Train Loss:  0.23,  Train Acc: 91.01%,  Val Loss:   0.3,  Val Acc: 88.30% *
Epoch:      7,  Train Loss:  0.17,  Train Acc: 93.39%,  Val Loss:  0.31,  Val Acc: 87.76% 
Epoch:      8,  Train Loss:  0.14,  Train Acc: 94.69%,  Val Loss:   0.4,  Val Acc: 86.84% 
Epoch:      9,  Train Loss:  0.11,  Train Acc: 95.93%,  Val Loss:  0.38,  Val Acc: 87.20% 
Epoch:     10,  Train Loss:  0.09,  Train Acc: 96.79%,  Val Loss:  0.38,  Val Acc: 88.28% 
Epoch:     11,  Train Loss: 0.074,  Train Acc: 97.46%,  Val Loss:  0.38,  Val Acc: 8

略有下降，没有明显差异。

在理想情况下，模型应该可以学习到哪些有意义的哪些是无意义的。而且频繁的不一定有价值（停用词），罕见的不一定无价值。这个问题还需深挖。

In [56]:
#dropout=0
train(model,train_iterator,valid_iterator,test_iterator)

Epoch:      1,  Train Loss:  0.65,  Train Acc: 61.43%,  Val Loss:  0.56,  Val Acc: 71.30% *
Epoch:      2,  Train Loss:   0.5,  Train Acc: 75.11%,  Val Loss:  0.44,  Val Acc: 80.16% *
Epoch:      3,  Train Loss:  0.37,  Train Acc: 84.13%,  Val Loss:  0.39,  Val Acc: 83.44% *
Epoch:      4,  Train Loss:  0.25,  Train Acc: 89.73%,  Val Loss:  0.34,  Val Acc: 85.86% *
Epoch:      5,  Train Loss:  0.17,  Train Acc: 93.65%,  Val Loss:  0.33,  Val Acc: 87.18% *
Epoch:      6,  Train Loss:  0.11,  Train Acc: 95.97%,  Val Loss:  0.35,  Val Acc: 88.08% 
Epoch:      7,  Train Loss: 0.071,  Train Acc: 97.73%,  Val Loss:  0.43,  Val Acc: 87.90% 
Epoch:      8,  Train Loss: 0.045,  Train Acc: 98.56%,  Val Loss:  0.55,  Val Acc: 87.20% 
Epoch:      9,  Train Loss: 0.029,  Train Acc: 99.09%,  Val Loss:  0.57,  Val Acc: 86.86% 
No optimization for a long time, auto-stopping...
Test Loss:  0.62,  Test Acc: 85.63%
Precision, Recall and F1-Score...
{'pos': {'precision': 0.8401435881768884, 'recall': 0.88

收敛变快，但是最终效果下降比较明显

In [59]:
#layer=1
train(model,train_iterator,valid_iterator,test_iterator)

Epoch:      1,  Train Loss:  0.67,  Train Acc: 59.28%,  Val Loss:  0.61,  Val Acc: 65.86% *
Epoch:      2,  Train Loss:  0.61,  Train Acc: 65.82%,  Val Loss:   0.5,  Val Acc: 76.08% *
Epoch:      3,  Train Loss:  0.48,  Train Acc: 77.40%,  Val Loss:  0.44,  Val Acc: 81.00% *
Epoch:      4,  Train Loss:  0.37,  Train Acc: 83.57%,  Val Loss:  0.37,  Val Acc: 84.86% *
Epoch:      5,  Train Loss:  0.35,  Train Acc: 85.04%,  Val Loss:  0.31,  Val Acc: 87.48% *
Epoch:      6,  Train Loss:  0.28,  Train Acc: 88.73%,  Val Loss:   0.3,  Val Acc: 87.80% *
Epoch:      7,  Train Loss:  0.23,  Train Acc: 90.88%,  Val Loss:  0.29,  Val Acc: 87.98% *
Epoch:      8,  Train Loss:   0.2,  Train Acc: 91.97%,  Val Loss:  0.35,  Val Acc: 88.62% 
Epoch:      9,  Train Loss:  0.18,  Train Acc: 92.92%,  Val Loss:   0.3,  Val Acc: 89.04% 
Epoch:     10,  Train Loss:  0.17,  Train Acc: 93.69%,  Val Loss:  0.29,  Val Acc: 89.06% *
Epoch:     11,  Train Loss:  0.14,  Train Acc: 94.57%,  Val Loss:  0.38,  Val Acc:

训练速度变快，而且最终效果没有下降。可能这个任务不需要深层理解，一层足以拟合，多层反而会过拟合。后续我们就采用一层。

In [70]:
#300维词向量
train(model,train_iterator,valid_iterator,test_iterator)

Epoch:      1,  Train Loss:  0.66,  Train Acc: 60.86%,  Val Loss:  0.64,  Val Acc: 64.22% *
Epoch:      2,  Train Loss:  0.57,  Train Acc: 70.24%,  Val Loss:  0.53,  Val Acc: 73.10% *
Epoch:      3,  Train Loss:  0.52,  Train Acc: 74.70%,  Val Loss:  0.48,  Val Acc: 78.32% *
Epoch:      4,  Train Loss:  0.48,  Train Acc: 77.45%,  Val Loss:  0.37,  Val Acc: 84.72% *
Epoch:      5,  Train Loss:  0.37,  Train Acc: 83.35%,  Val Loss:  0.34,  Val Acc: 86.76% *
Epoch:      6,  Train Loss:  0.25,  Train Acc: 90.18%,  Val Loss:  0.32,  Val Acc: 87.36% *
Epoch:      7,  Train Loss:  0.19,  Train Acc: 92.47%,  Val Loss:  0.34,  Val Acc: 87.92% 
Epoch:      8,  Train Loss:  0.15,  Train Acc: 94.13%,  Val Loss:  0.33,  Val Acc: 88.60% 
Epoch:      9,  Train Loss:  0.12,  Train Acc: 95.47%,  Val Loss:  0.39,  Val Acc: 88.24% 
Epoch:     10,  Train Loss:   0.1,  Train Acc: 96.27%,  Val Loss:  0.37,  Val Acc: 88.42% 
No optimization for a long time, auto-stopping...
Test Loss:  0.39,  Test Acc: 87.70

结果和解释同上。

In [85]:
#CrossEntropy
train(model,train_iterator,valid_iterator,test_iterator)

Epoch:      1,  Train Loss:  0.67,  Train Acc: 58.53%,  Val Loss:  0.59,  Val Acc: 69.10% *
Epoch:      2,  Train Loss:  0.61,  Train Acc: 66.71%,  Val Loss:  0.49,  Val Acc: 77.38% *
Epoch:      3,  Train Loss:  0.48,  Train Acc: 77.37%,  Val Loss:  0.39,  Val Acc: 83.60% *
Epoch:      4,  Train Loss:  0.37,  Train Acc: 84.08%,  Val Loss:  0.33,  Val Acc: 85.98% *
Epoch:      5,  Train Loss:  0.29,  Train Acc: 87.89%,  Val Loss:  0.31,  Val Acc: 87.12% *
Epoch:      6,  Train Loss:  0.26,  Train Acc: 89.48%,  Val Loss:   0.3,  Val Acc: 87.56% *
Epoch:      7,  Train Loss:  0.22,  Train Acc: 91.28%,  Val Loss:  0.29,  Val Acc: 88.60% *
Epoch:      8,  Train Loss:   0.2,  Train Acc: 92.33%,  Val Loss:   0.3,  Val Acc: 88.50% 
Epoch:      9,  Train Loss:  0.17,  Train Acc: 93.61%,  Val Loss:  0.31,  Val Acc: 88.62% 
Epoch:     10,  Train Loss:  0.16,  Train Acc: 94.08%,  Val Loss:  0.33,  Val Acc: 89.40% 
Epoch:     11,  Train Loss:  0.14,  Train Acc: 94.57%,  Val Loss:  0.31,  Val Acc: 

果然是差不多的

In [90]:
#bs=128
train(model,train_iterator,valid_iterator,test_iterator)

Epoch:      1,  Train Loss:  0.67,  Train Acc: 57.86%,  Val Loss:  0.62,  Val Acc: 66.66% *
Epoch:      2,  Train Loss:  0.62,  Train Acc: 66.60%,  Val Loss:  0.69,  Val Acc: 51.74% 
Epoch:      3,  Train Loss:  0.64,  Train Acc: 63.05%,  Val Loss:  0.53,  Val Acc: 74.16% *
Epoch:      4,  Train Loss:  0.65,  Train Acc: 61.83%,  Val Loss:  0.65,  Val Acc: 63.94% 
Epoch:      5,  Train Loss:  0.61,  Train Acc: 67.12%,  Val Loss:  0.56,  Val Acc: 72.78% 
Epoch:      6,  Train Loss:  0.48,  Train Acc: 78.21%,  Val Loss:  0.38,  Val Acc: 83.26% *
Epoch:      7,  Train Loss:  0.36,  Train Acc: 84.25%,  Val Loss:  0.34,  Val Acc: 84.80% *
Epoch:      8,  Train Loss:  0.31,  Train Acc: 87.05%,  Val Loss:  0.33,  Val Acc: 85.62% *
Epoch:      9,  Train Loss:  0.28,  Train Acc: 88.77%,  Val Loss:   0.3,  Val Acc: 87.62% *
Epoch:     10,  Train Loss:  0.26,  Train Acc: 89.76%,  Val Loss:  0.31,  Val Acc: 87.26% 
Epoch:     11,  Train Loss:  0.23,  Train Acc: 90.92%,  Val Loss:  0.29,  Val Acc: 8

没有上升

In [94]:
#bs=32
train(model,train_iterator,valid_iterator,test_iterator)

Epoch:      1,  Train Loss:  0.68,  Train Acc: 56.91%,  Val Loss:   0.6,  Val Acc: 65.48% *
Epoch:      2,  Train Loss:   0.6,  Train Acc: 67.45%,  Val Loss:  0.57,  Val Acc: 70.26% *
Epoch:      3,  Train Loss:  0.44,  Train Acc: 79.86%,  Val Loss:  0.35,  Val Acc: 85.52% *
Epoch:      4,  Train Loss:  0.32,  Train Acc: 86.63%,  Val Loss:  0.31,  Val Acc: 86.96% *
Epoch:      5,  Train Loss:  0.27,  Train Acc: 89.08%,  Val Loss:  0.28,  Val Acc: 88.22% *
Epoch:      6,  Train Loss:  0.22,  Train Acc: 91.35%,  Val Loss:  0.29,  Val Acc: 88.54% 
Epoch:      7,  Train Loss:  0.19,  Train Acc: 92.83%,  Val Loss:  0.29,  Val Acc: 89.04% 
Epoch:      8,  Train Loss:  0.16,  Train Acc: 93.76%,  Val Loss:  0.28,  Val Acc: 89.12% *
Epoch:      9,  Train Loss:  0.14,  Train Acc: 94.72%,  Val Loss:  0.36,  Val Acc: 89.04% 
Epoch:     10,  Train Loss:  0.13,  Train Acc: 95.11%,  Val Loss:   0.3,  Val Acc: 89.58% 
Epoch:     11,  Train Loss:  0.11,  Train Acc: 96.14%,  Val Loss:  0.33,  Val Acc: 8

几乎相同

## Results and Analysis

训练效果良好，达到了正常水平，与[Benchmark](https://paperswithcode.com/sota/sentiment-analysis-on-imdb)中的结果接近。

也比较了一些参数的设置，得到了一些有益的结论。

之后将要进行的工作：
* 使用BERT在IMDB上文本分类
* 完善框架，保存模型，用正确的模型在测试集上测试