# DeepLearning Assignment 3 实验报告
# SA22221042 汪泱泱

## 一、实验环境

GPU TITAN Xp  
CUDA 10.1  
python 3.7.13  
torch 1.8.1  
torchtext 0.6.0  
spacy 3.4.3  
transformers-4.25.1

## 二、实验过程

In [7]:
import random
import sys
import time
import torch
import torch.nn as nn
import torchtext
import tqdm
from transformers import AutoTokenizer, AutoModel

选择使用的BERT模型，一共对两种英文不区分大小写的BERT预训练模型（'bert-base-uncased'和'bert-large-uncased'）做了实验，这里以'bert-base-uncased'为例，得到了对应的Tokenizer和Model。

In [8]:
pretrained_model_name = 'bert-base-uncased'

In [9]:
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name, do_lower_case=False)
BertModel = AutoModel.from_pretrained(pretrained_model_name)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


进行数据集的预处理。
由于IMDB公开数据集“Large Movie Review Dataset“是非常常见的公开数据集，torchtext中提供了接口`torchtext.datasets.imdb.IMDB`，我们可以直接使用其进行预处理。

In [10]:
MAX_TOKENS = 510
def tokenize_and_cut(sentence):
    tokens = tokenizer.tokenize(sentence, max_length=MAX_TOKENS, truncation=True)
    return tokens

In [11]:
train_text = torchtext.data.Field(batch_first=True,
                            use_vocab=False,
                            tokenize = tokenize_and_cut,
                            preprocessing = tokenizer.convert_tokens_to_ids,
                            init_token=tokenizer.cls_token_id,
                            eos_token=tokenizer.sep_token_id,
                            pad_token=tokenizer.pad_token_id,
                            unk_token=tokenizer.unk_token_id)
train_label = torchtext.data.LabelField(dtype = torch.float)

In [None]:
train_data, test_data = torchtext.datasets.imdb.IMDB.splits(train_text, train_label)

In [8]:
len(train_data)

25000

划分验证集，划分比例为训练集：验证集=4:1

In [9]:
SEED=20230102
train_data, valid_data = train_data.split(random_state = random.seed(SEED),split_ratio=0.8)
train_label.build_vocab(train_data)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [11]:
class SentimentAnalysisModel(nn.Module):
    def __init__(self, BertModel, classifier):
        super().__init__()
        self.BertModel = BertModel
        embedding_dim = BertModel.config.hidden_size
        self.classifier = classifier
        
    def forward(self, text):
        embedded = self.BertModel(text)[0]
        predict = self.classifier(embedded.mean(dim=1))
        return predict

按batch_size打包数据

In [12]:
BATCH_SIZE = 32
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
train_iterator, valid_iterator, test_iterator = torchtext.data.BucketIterator.splits(
    (train_data, valid_data, test_data), 
    batch_size = BATCH_SIZE,
    sort_within_batch = True,
    device = device)

实例化模型，载入预训练的向量

In [13]:
ouput_dim = 1

分类头，先简单使用一层全连接层和一层sigmoid将值映射到$[0,1]$上，便于计算二分类交叉熵

In [14]:
classifier = torch.nn.Sequential(
    nn.Linear(768, ouput_dim),
    nn.Sigmoid()
)

In [15]:
model = SentimentAnalysisModel(BertModel, classifier)

使用BCELoss作为模型损失函数

In [16]:
optimizer = torch.optim.AdamW(model.parameters(), lr = 1e-5, eps = 1e-8)
Loss = torch.nn.BCELoss()
model = nn.DataParallel(model).to(device)
Loss = Loss.to(device)

定义计算准确率的函数，对结果取四舍五入近似和真实值比较是否相同

In [17]:
def cal_acc(preds, y):
    rounded_preds = torch.round(preds)
    correct = (rounded_preds == y).float()
    acc = correct.sum() / len(correct)
    return acc

训练代码（有反向传播更新参数）和在验证集上的上测试loss和acc的代码：

In [18]:
def train(model, iterator, optimizer, Loss):
    epoch_loss = 0
    epoch_acc = 0
    model.train()
    for batch in tqdm.tqdm(iterator, desc='training...', file=sys.stdout):
        optimizer.zero_grad()
        text = batch.text
        predictions = model(text).squeeze(1)
        loss = Loss(predictions, batch.label)
        acc = cal_acc(predictions, batch.label)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
        epoch_acc += acc.item()
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

def evaluate(model, iterator, Loss):
    epoch_loss = 0
    epoch_acc = 0
    model.eval()
    with torch.no_grad():
        for batch in tqdm.tqdm(iterator, desc='evaluating...', file=sys.stdout):
            text = batch.text
            predictions = model(text).squeeze(1)
            loss = Loss(predictions, batch.label)
            acc = cal_acc(predictions, batch.label)
            epoch_loss += loss.item()
            epoch_acc += acc.item()
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

开始训练。经过实验，训练可能会在20轮作用收敛，所以我们将训练30个epoch。

In [19]:
epochs = 30
best_valid_loss = float('inf')
for epoch in range(epochs):
    train_loss, train_acc = train(model, train_iterator, optimizer, Loss)
    valid_loss, valid_acc = evaluate(model, valid_iterator, Loss)
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'model.pt')
    print(f'Epoch: {epoch+1:02}')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc:.5f}')
    print(f'\t Val. Loss: {valid_loss:.3f} | Val. Acc: {valid_acc:.5f}')
model.load_state_dict(torch.load('model.pt'))

training...: 100%|██████████| 625/625 [03:56<00:00,  2.65it/s]
evaluating...: 100%|██████████| 157/157 [00:20<00:00,  7.48it/s]
Epoch: 01
	Train Loss: 0.271 | Train Acc: 0.88890
	 Val. Loss: 0.213 | Val. Acc: 0.91600
training...: 100%|██████████| 625/625 [03:45<00:00,  2.77it/s]
evaluating...: 100%|██████████| 157/157 [00:20<00:00,  7.65it/s]
Epoch: 02
	Train Loss: 0.163 | Train Acc: 0.94040
	 Val. Loss: 0.196 | Val. Acc: 0.92257
training...: 100%|██████████| 625/625 [03:45<00:00,  2.77it/s]
evaluating...: 100%|██████████| 157/157 [00:21<00:00,  7.44it/s]
Epoch: 03
	Train Loss: 0.106 | Train Acc: 0.96410
	 Val. Loss: 0.248 | Val. Acc: 0.92118
training...: 100%|██████████| 625/625 [03:45<00:00,  2.77it/s]
evaluating...: 100%|██████████| 157/157 [00:20<00:00,  7.54it/s]
Epoch: 04
	Train Loss: 0.061 | Train Acc: 0.98050
	 Val. Loss: 0.245 | Val. Acc: 0.92834
training...: 100%|██████████| 625/625 [03:45<00:00,  2.77it/s]
evaluating...: 100%|██████████| 157/157 [00:21<00:00,  7.47it/s]
Epoc

KeyboardInterrupt: 

In [None]:
def predict_sentiment(text, model, tokenizer, device):
    prediction = model(text).squeeze(dim=0)
    probability = torch.softmax(prediction, dim=-1)
    predicted_class = prediction.argmax(dim=-1).item()
    predicted_probability = probability[predicted_class].item()
    return predicted_class, predicted_probability

In [None]:
text = "This film is terrible!"

In [None]:
predict_sentiment(text, BertModel, tokenizer, device)

### 三、参数选取

下面修改一些网络参数训练网络后，在验证进行测试，以求找到最佳参数。

首先改变隐藏层层数

| Layer Num | Best Valid Loss |
| ---------- | --------------- |
| 1         | 0.293          |
| 2        | **0.263**      |
| 3        | 0.271          |

隐藏层维度

| Hidden Layer Dimension  | Best Valid Loss |
| ---------- | --------------- |
| 128         | 0.285          |
| 256        | **0.263**      |
| 512        | 0.266          |

词向量嵌入方法

| Vector Embedding Method | Best Valid Loss |
| ---------- | --------------- |
| fasttext.en.300d         | **0.263**          |
| glove.6B.100d        | 0.274      |

批的大小

| Batch Size | Best Valid Loss |
| ---------- | --------------- |
| 64         | 0.281          |
| 128        | **0.263**      |
| 256        | 0.274          |

### 四、测试结果

使用验证集得到的最佳参数，在训练集上训练后，在测试集上进行测试

In [None]:
imput_dim = len(train_text.vocab)
vector_dim = 300
hidden_dim = 256
output_dim = 1
layer_num = 2
is_bidirectional = True
dropout = 0.5
pad_idx = train_text.vocab.stoi[train_text.pad_token]

In [None]:
epochs = 30
best_valid_loss = float('inf')
for epoch in range(epochs):
    train_loss, train_acc = train(model, train_iterator, optimizer, Loss)
    valid_loss, valid_acc = evaluate(model, valid_iterator, Loss)
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'model.pt')
    print(f'Epoch: {epoch+1:02}')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc:.5f}')
    print(f'\t Val. Loss: {valid_loss:.3f} | Val. Acc: {valid_acc:.5f}')

选择验证集上表现最好的模型参数在测试集上测试

In [None]:
test_loss, test_acc = evaluate(model, test_iterator, Loss)
print(f'Test Loss: {test_loss:.3f} | Test Acc: {test_acc:.5f}')

ACC为0.89143