# 任务一：IMDB-10 情感分类（基于SNLI模板修改）

本Notebook为IMDB电影评论的十分类情感（评分1-10）任务，基于SNLI代码模板修改而来。

三个模型分别是：
1.  **模型一：** GloVe嵌入 + BiLSTM
2.  **模型二：** BERT-base嵌入 + BiLSTM (作为特征提取器)
3.  **模型三：** 微调BERT-base模型

**评价指标：** 准确率（Accuracy）、宏平均F1值（Macro-F1）和均方根误差（RMSE）。

## 1. 环境设置与依赖安装

首先，我们安装必要的库。

## 2. 数据加载与预处理

我们将从 `.txt.ss` 文件加载IMDB数据集。数据集包含电影评论文本和评分标签。

**标签说明：**
- `0-9`: 评分1-10转换后的标签

我们需要将评分从1-10转换为0-9以便模型训练。

In [2]:
!pip install pandas

Looking in indexes: http://mirrors.aliyun.com/pypi/simple
Collecting pandas
  Downloading http://mirrors.aliyun.com/pypi/packages/01/a5/931fc3ad333d9d87b10107d948d757d67ebcfc33b1988d5faccc39c6845c/pandas-2.3.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.0/12.0 MB[0m [31m22.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting pytz>=2020.1 (from pandas)
  Downloading http://mirrors.aliyun.com/pypi/packages/81/c4/34e93fe5f5429d7570ec1fa436f1986fb1f00c3e0f43a589fe2bbcd22c3f/pytz-2025.2-py2.py3-none-any.whl (509 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m509.2/509.2 kB[0m [31m87.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tzdata>=2022.7 (from pandas)
  Downloading http://mirrors.aliyun.com/pypi/packages/5c/23/c7abc0ca0a1526a0774eca151daeb8de62ec457e77262b66b359c3c7679e/tzdata-2025.2-py2.py3-none-any.whl (347 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [3]:
import pandas as pd
import re

def load_imdb_data(file_path):
    """加载并解析IMDB数据文件，使用正则表达式查找标签。"""
    texts = []
    labels = []
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            for line in f:
                match = re.search(r'\t\t(\d+)\t\t', line)
                if match:
                    rating = int(match.group(1))
                    text_start_index = match.end()
                    text = line[text_start_index:].strip()
                    text = text.replace('<sssss>', ' ').strip()
                    labels.append(rating - 1)  # 转换为0-9
                    texts.append(text)
    except FileNotFoundError:
        print(f"错误: 文件未找到 {file_path}。请确保数据文件在当前目录下。")
        return pd.DataFrame({'text': [], 'label': []})
    
    df = pd.DataFrame({'text': texts, 'label': labels})
    return df.reset_index(drop=True)

# 定义文件路径
train_file = 'imdb.train.txt.ss'
dev_file = 'imdb.dev.txt.ss'
test_file = 'imdb.test.txt.ss'

# 加载所有数据集
df_train = load_imdb_data(train_file)
df_val = load_imdb_data(dev_file)
df_test = load_imdb_data(test_file)

if not df_train.empty:
    print(f"训练集大小: {df_train.shape}")
    print(f"验证集大小: {df_val.shape}")
    print(f"测试集大小: {df_test.shape}")
    print("\n数据样本示例:")
    print(df_train.head())
    
    # 显示标签分布
    print("\n训练集标签分布:")
    print(df_train['label'].value_counts().sort_index())
else:
    print("数据加载失败，请检查文件路径！")

训练集大小: (67426, 2)
验证集大小: (8381, 2)
测试集大小: (9112, 2)

数据样本示例:
                                                text  label
0  i excepted a lot from this movie , and it did ...      9
1  this movie is not worth seeing .   has no meri...      0
2  this is a truly remarkable horror movie .   al...      9
3  * minor spoilers * this movie is inept .   so ...      2
4  this is a brilliant horror movie .   fans of t...      9

训练集标签分布:
label
0     1838
1     1589
2     2359
3     3499
4     5653
5     8666
6    12849
7    13673
8     8330
9     8970
Name: count, dtype: int64


## 3. 模型一：GloVe + BiLSTM

该模型使用预训练的GloVe词向量作为BiLSTM网络的输入。

### 3.1. GloVe设置
使用300维的向量。可以选择使用glove.6B.300d.txt（较小）或glove.840B.300d.txt（较大）。


In [4]:
!pip install scikit-learn

Looking in indexes: http://mirrors.aliyun.com/pypi/simple
[0m

In [5]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from collections import Counter
from sklearn.metrics import accuracy_score, f1_score, mean_squared_error
import numpy as np
import os

# --- 参数配置 ---
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
NUM_CLASSES = 10  # IMDB 10分类
BATCH_SIZE = 64
EMBEDDING_DIM = 300
HIDDEN_DIM = 256
N_LAYERS = 2
DROPOUT = 0.5
EPOCHS = 5
MAX_LEN = 256  # 算力原因调成256

# GloVe路径 
GLOVE_PATH = r"glove.6B.300d.txt"  


# 检查GloVe文件是否存在
if not os.path.exists(GLOVE_PATH):
    print(f"警告: GloVe文件不存在于: {GLOVE_PATH}")
    print("将使用随机初始化的词向量")
    USE_GLOVE = False
else:
    print(f"找到GloVe文件: {GLOVE_PATH}")
    USE_GLOVE = True

# --- 文本预处理与词汇表构建 ---
def tokenizer(text):
    text = text.lower()
    text = re.sub(r"[^a-zA-Z0-9' ]+", "", text)
    return text.split()

print("正在构建词汇表...")
word_counts = Counter()
for text in df_train['text']:
    word_counts.update(tokenizer(text))

vocab = sorted(word_counts, key=word_counts.get, reverse=True)
word_to_idx = {word: i+2 for i, word in enumerate(vocab)}
word_to_idx['<pad>'] = 0
word_to_idx['<unk>'] = 1
VOCAB_SIZE = len(word_to_idx)

print(f"词汇表大小: {VOCAB_SIZE}")

# --- GloVe 词向量矩阵 ---
print("正在加载GloVe词向量...")
glove_embeddings = np.zeros((VOCAB_SIZE, EMBEDDING_DIM))
word_found = 0

if USE_GLOVE:
    with open(GLOVE_PATH, 'r', encoding='utf-8', errors='ignore') as f:
        for line_num, line in enumerate(f, 1):
            if line_num % 50000 == 0:
                print(f"已处理 {line_num} 行...")
            
            try:
                parts = line.strip().split(' ')
                word = parts[0]
                
                if word in word_to_idx:
                    if len(parts) >= EMBEDDING_DIM + 1:
                        vector = np.array(parts[1:EMBEDDING_DIM+1], dtype=np.float32)
                        if len(vector) == EMBEDDING_DIM:
                            glove_embeddings[word_to_idx[word]] = vector
                            word_found += 1
            except:
                continue
    
    print(f"成功加载 {word_found} 个词向量 ({word_found/VOCAB_SIZE*100:.2f}% 的词汇表)")
else:
    print("使用随机初始化的词向量")
    glove_embeddings = np.random.normal(0, 0.1, (VOCAB_SIZE, EMBEDDING_DIM))

glove_embeddings = torch.tensor(glove_embeddings, dtype=torch.float32)

# --- PyTorch 数据集 ---
class IMDBDataset(Dataset):
    def __init__(self, dataframe, word_to_idx, max_len):
        self.df = dataframe
        self.word_to_idx = word_to_idx
        self.max_len = max_len

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        text = self.df.loc[idx, 'text']
        label = self.df.loc[idx, 'label']

        tokens = [self.word_to_idx.get(word, self.word_to_idx['<unk>']) for word in tokenizer(text)]
        
        # 填充/截断
        if len(tokens) < self.max_len:
            tokens.extend([self.word_to_idx['<pad>']] * (self.max_len - len(tokens)))
        else:
            tokens = tokens[:self.max_len]
            
        return torch.tensor(tokens), torch.tensor(label)

# --- 创建数据加载器 ---
train_dataset = IMDBDataset(df_train, word_to_idx, MAX_LEN)
val_dataset = IMDBDataset(df_val, word_to_idx, MAX_LEN)
test_dataset = IMDBDataset(df_test, word_to_idx, MAX_LEN)

train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE)

print("\n数据准备完成。")

警告: GloVe文件不存在于: glove.6B.300d.txt
将使用随机初始化的词向量
正在构建词汇表...
词汇表大小: 86008
正在加载GloVe词向量...
使用随机初始化的词向量

数据准备完成。


### 3.2. BiLSTM 模型结构

In [6]:
class BiLSTMClassifier(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, dropout, pretrained_embeddings):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=0)
        self.embedding.weight.data.copy_(pretrained_embeddings)
        self.embedding.weight.requires_grad = False  # 冻结词向量

        self.lstm = nn.LSTM(embedding_dim, 
                              hidden_dim, 
                              num_layers=n_layers, 
                              bidirectional=True, 
                              dropout=dropout if n_layers > 1 else 0,
                              batch_first=True)
        
        self.fc = nn.Linear(hidden_dim * 2, output_dim)  # *2因为是双向LSTM
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, text):
        embedded = self.dropout(self.embedding(text))
        
        _, (hidden, cell) = self.lstm(embedded)
        
        # 拼接前向和后向的最终隐藏状态
        hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1))
            
        return self.fc(hidden)

# 实例化模型
model1 = BiLSTMClassifier(VOCAB_SIZE, EMBEDDING_DIM, HIDDEN_DIM, NUM_CLASSES, N_LAYERS, DROPOUT, glove_embeddings).to(DEVICE)
print(model1)

BiLSTMClassifier(
  (embedding): Embedding(86008, 300, padding_idx=0)
  (lstm): LSTM(300, 256, num_layers=2, batch_first=True, dropout=0.5, bidirectional=True)
  (fc): Linear(in_features=512, out_features=10, bias=True)
  (dropout): Dropout(p=0.5, inplace=False)
)


### 3.3. 训练与评估循环

In [7]:
def train_model(model, iterator, optimizer, criterion):
    epoch_loss = 0
    epoch_acc = 0
    model.train()
    
    for batch in iterator:
        text, labels = batch
        text, labels = text.to(DEVICE), labels.to(DEVICE)
        
        optimizer.zero_grad()
        predictions = model(text)
        loss = criterion(predictions, labels)
        
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
        
        # 计算准确率
        acc = accuracy_score(labels.cpu(), predictions.argmax(1).cpu())
        epoch_acc += acc
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

def evaluate_model(model, iterator, criterion):
    epoch_loss = 0
    all_preds = []
    all_labels = []
    model.eval()
    
    with torch.no_grad():
        for batch in iterator:
            text, labels = batch
            text, labels = text.to(DEVICE), labels.to(DEVICE)
            
            predictions = model(text)
            loss = criterion(predictions, labels)
            
            epoch_loss += loss.item()
            all_preds.extend(predictions.argmax(1).cpu().numpy())
            all_labels.extend(labels.cpu().numpy())
            
    acc = accuracy_score(all_labels, all_preds)
    f1 = f1_score(all_labels, all_preds, average='macro')
    
    # 计算RMSE（将标签转回1-10）
    preds_rmse = np.array(all_preds) + 1
    labels_rmse = np.array(all_labels) + 1
    rmse = np.sqrt(mean_squared_error(labels_rmse, preds_rmse))
    
    return epoch_loss / len(iterator), acc, f1, rmse

# --- 训练模型一 ---
print("开始训练模型一 (GloVe + BiLSTM)...")
optimizer = optim.Adam(model1.parameters())
criterion = nn.CrossEntropyLoss().to(DEVICE)

for epoch in range(EPOCHS):
    train_loss, train_acc = train_model(model1, train_loader, optimizer, criterion)
    valid_loss, valid_acc, valid_f1, valid_rmse = evaluate_model(model1, val_loader, criterion)
    
    print(f'轮次: {epoch+1:02} | 训练损失: {train_loss:.3f} | 训练准确率: {train_acc*100:.2f}% | 验证损失: {valid_loss:.3f} | 验证准确率: {valid_acc*100:.2f}% | 验证F1: {valid_f1:.3f} | 验证RMSE: {valid_rmse:.3f}')

开始训练模型一 (GloVe + BiLSTM)...
轮次: 01 | 训练损失: 2.100 | 训练准确率: 20.04% | 验证损失: 2.086 | 验证准确率: 20.73% | 验证F1: 0.034 | 验证RMSE: 2.437
轮次: 02 | 训练损失: 2.098 | 训练准确率: 19.99% | 验证损失: 2.086 | 验证准确率: 20.70% | 验证F1: 0.035 | 验证RMSE: 2.437
轮次: 03 | 训练损失: 2.095 | 训练准确率: 20.10% | 验证损失: 2.081 | 验证准确率: 20.73% | 验证F1: 0.034 | 验证RMSE: 2.437
轮次: 04 | 训练损失: 2.096 | 训练准确率: 19.97% | 验证损失: 2.087 | 验证准确率: 20.76% | 验证F1: 0.035 | 验证RMSE: 2.436
轮次: 05 | 训练损失: 2.095 | 训练准确率: 20.11% | 验证损失: 2.078 | 验证准确率: 20.73% | 验证F1: 0.034 | 验证RMSE: 2.437


### 3.4. 在测试集上对模型一进行最终评估

In [8]:
test_loss, test_acc, test_f1, test_rmse = evaluate_model(model1, test_loader, criterion)
print(f'模型一 测试集结果 -> 准确率: {test_acc*100:.2f}% | Macro-F1: {test_f1:.3f} | RMSE: {test_rmse:.3f}')
results = {}
results['模型一 (GloVe + BiLSTM)'] = {'Accuracy': test_acc, 'Macro-F1': test_f1, 'RMSE': test_rmse}

模型一 测试集结果 -> 准确率: 19.62% | Macro-F1: 0.033 | RMSE: 2.495


## 4. 模型二：BERT嵌入 + BiLSTM

在这个模型中，我们使用预训练的BERT模型作为特征提取器。

In [9]:
!pip install transformers

Looking in indexes: http://mirrors.aliyun.com/pypi/simple
[0m

In [14]:
from transformers import BertTokenizer, BertModel

# --- 参数配置 ---
BERT_MODEL_NAME = 'bert-base-uncased'
MAX_LEN_BERT = 256  # 算力原因调整到256
BATCH_SIZE_BERT = 64  

# --- BERT 分词器 ---
local_bert_path = "/root/models/bert-base-uncased"

if os.path.exists(local_bert_path):
    print(f"从本地路径加载BERT分词器: {local_bert_path}")
    tokenizer_bert = BertTokenizer.from_pretrained(local_bert_path)
else:
    print(f"从网络下载BERT分词器: {BERT_MODEL_NAME}")
    tokenizer_bert = BertTokenizer.from_pretrained(BERT_MODEL_NAME)

# --- 用于BERT的PyTorch数据集 ---
class IMDBDatasetBERT(Dataset):
    def __init__(self, dataframe, tokenizer, max_len):
        self.df = dataframe
        self.tokenizer = tokenizer
        self.max_len = max_len

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        text = self.df.loc[idx, 'text']
        label = self.df.loc[idx, 'label']

        encoding = self.tokenizer.encode_plus(
            text,
            add_special_tokens=True,
            max_length=self.max_len,
            return_token_type_ids=False,
            padding='max_length',
            truncation=True,
            return_attention_mask=True,
            return_tensors='pt',
        )

        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': torch.tensor(label, dtype=torch.long)
        }

# --- 创建数据加载器 ---
train_dataset_bert = IMDBDatasetBERT(df_train, tokenizer_bert, MAX_LEN_BERT)
val_dataset_bert = IMDBDatasetBERT(df_val, tokenizer_bert, MAX_LEN_BERT)
test_dataset_bert = IMDBDatasetBERT(df_test, tokenizer_bert, MAX_LEN_BERT)

train_loader_bert = DataLoader(train_dataset_bert, batch_size=BATCH_SIZE_BERT, shuffle=True)
val_loader_bert = DataLoader(val_dataset_bert, batch_size=BATCH_SIZE_BERT)
test_loader_bert = DataLoader(test_dataset_bert, batch_size=BATCH_SIZE_BERT)
print("用于BERT的数据准备完成。")

从本地路径加载BERT分词器: /root/models/bert-base-uncased
用于BERT的数据准备完成。


### 4.1. BERT+BiLSTM 模型结构

In [15]:
class BertBiLSTMClassifier(nn.Module):
    def __init__(self, bert, hidden_dim, output_dim, n_layers, dropout):
        super().__init__()
        self.bert = bert
        embedding_dim = bert.config.to_dict()['hidden_size']

        self.lstm = nn.LSTM(embedding_dim,
                              hidden_dim,
                              num_layers=n_layers,
                              bidirectional=True,
                              dropout=dropout if n_layers > 1 else 0,
                              batch_first=True)

        self.fc = nn.Linear(hidden_dim * 2, output_dim)
        self.dropout = nn.Dropout(dropout)

    def forward(self, input_ids, attention_mask):
        # 冻结BERT，不计算梯度
        with torch.no_grad():
            embedded = self.bert(input_ids=input_ids, attention_mask=attention_mask)[0]
        
        _, (hidden, cell) = self.lstm(embedded)
        
        hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1))
        
        return self.fc(hidden)

# 加载预训练的BERT模型
if os.path.exists(local_bert_path):
    bert_model = BertModel.from_pretrained(local_bert_path)
else:
    bert_model = BertModel.from_pretrained(BERT_MODEL_NAME)

# 冻结BERT的参数
for param in bert_model.parameters():
    param.requires_grad = False

# 实例化模型
model2 = BertBiLSTMClassifier(bert_model, HIDDEN_DIM, NUM_CLASSES, N_LAYERS, DROPOUT).to(DEVICE)
print("模型二已创建")

模型二已创建


### 4.2. 训练与评估（模型二）

In [16]:
def train_bert_bilstm(model, iterator, optimizer, criterion):
    model.train()
    epoch_loss = 0
    for batch in iterator:
        input_ids = batch['input_ids'].to(DEVICE)
        attention_mask = batch['attention_mask'].to(DEVICE)
        labels = batch['labels'].to(DEVICE)

        optimizer.zero_grad()
        predictions = model(input_ids, attention_mask)
        loss = criterion(predictions, labels)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
    return epoch_loss / len(iterator)

def evaluate_bert_bilstm(model, iterator, criterion):
    model.eval()
    epoch_loss = 0
    all_preds = []
    all_labels = []
    with torch.no_grad():
        for batch in iterator:
            input_ids = batch['input_ids'].to(DEVICE)
            attention_mask = batch['attention_mask'].to(DEVICE)
            labels = batch['labels'].to(DEVICE)
            
            predictions = model(input_ids, attention_mask)
            loss = criterion(predictions, labels)
            epoch_loss += loss.item()
            all_preds.extend(predictions.argmax(1).cpu().numpy())
            all_labels.extend(labels.cpu().numpy())

    acc = accuracy_score(all_labels, all_preds)
    f1 = f1_score(all_labels, all_preds, average='macro', zero_division=0)
    
    # 计算RMSE
    preds_rmse = np.array(all_preds) + 1
    labels_rmse = np.array(all_labels) + 1
    rmse = np.sqrt(mean_squared_error(labels_rmse, preds_rmse))
    
    return epoch_loss / len(iterator), acc, f1, rmse

# --- 训练模型二 ---
print("\n开始训练模型二 (BERT + BiLSTM)...")
optimizer = optim.Adam(model2.parameters())
criterion = nn.CrossEntropyLoss().to(DEVICE)

# 减少训练轮数
for epoch in range(3):
    train_loss = train_bert_bilstm(model2, train_loader_bert, optimizer, criterion)
    valid_loss, valid_acc, valid_f1, valid_rmse = evaluate_bert_bilstm(model2, val_loader_bert, criterion)
    print(f'轮次: {epoch+1:02} | 训练损失: {train_loss:.3f} | 验证损失: {valid_loss:.3f} | 验证准确率: {valid_acc*100:.2f}% | 验证F1: {valid_f1:.3f} | 验证RMSE: {valid_rmse:.3f}')


开始训练模型二 (BERT + BiLSTM)...
轮次: 01 | 训练损失: 1.836 | 验证损失: 1.679 | 验证准确率: 32.35% | 验证F1: 0.253 | 验证RMSE: 1.643
轮次: 02 | 训练损失: 1.697 | 验证损失: 1.647 | 验证准确率: 33.12% | 验证F1: 0.235 | 验证RMSE: 1.595
轮次: 03 | 训练损失: 1.652 | 验证损失: 1.640 | 验证准确率: 34.04% | 验证F1: 0.244 | 验证RMSE: 1.674


### 4.3. 在测试集上对模型二进行最终评估

In [17]:
test_loss, test_acc, test_f1, test_rmse = evaluate_bert_bilstm(model2, test_loader_bert, criterion)
print(f'模型二 测试集结果 -> 准确率: {test_acc*100:.2f}% | Macro-F1: {test_f1:.3f} | RMSE: {test_rmse:.3f}')
results['模型二 (BERT嵌入 + BiLSTM)'] = {'Accuracy': test_acc, 'Macro-F1': test_f1, 'RMSE': test_rmse}

模型二 测试集结果 -> 准确率: 33.70% | Macro-F1: 0.240 | RMSE: 1.729


## 5. 模型三：微调BERT

这是最常用且最强大的方法。我们采用一个带分类头的预训练BERT模型，并在我们的特定任务上对整个模型进行微调。

In [18]:
from transformers import BertForSequenceClassification, get_linear_schedule_with_warmup
from torch.optim import AdamW

# --- 加载模型 ---
if os.path.exists(local_bert_path):
    model3 = BertForSequenceClassification.from_pretrained(
        local_bert_path,
        num_labels=NUM_CLASSES,
        output_attentions=False,
        output_hidden_states=False,
    ).to(DEVICE)
else:
    model3 = BertForSequenceClassification.from_pretrained(
        BERT_MODEL_NAME,
        num_labels=NUM_CLASSES,
        output_attentions=False,
        output_hidden_states=False,
    ).to(DEVICE)

# --- 优化器与学习率调度器 ---
optimizer = AdamW(model3.parameters(), lr=2e-5, eps=1e-8)
EPOCHS_BERT_FINETUNE = 3
total_steps = len(train_loader_bert) * EPOCHS_BERT_FINETUNE
scheduler = get_linear_schedule_with_warmup(optimizer, 
                                            num_warmup_steps=0, 
                                            num_training_steps=total_steps)

criterion = nn.CrossEntropyLoss().to(DEVICE)
print("BERT微调模型已加载。")

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at /root/models/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERT微调模型已加载。


### 5.1. 训练与评估（模型三）

In [19]:
def train_bert_finetune(model, iterator, optimizer, scheduler, criterion):
    model.train()
    epoch_loss = 0
    for batch in iterator:
        input_ids = batch['input_ids'].to(DEVICE)
        attention_mask = batch['attention_mask'].to(DEVICE)
        labels = batch['labels'].to(DEVICE)

        optimizer.zero_grad()
        outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        epoch_loss += loss.item()
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        optimizer.step()
        scheduler.step()
        
    return epoch_loss / len(iterator)

def evaluate_bert_finetune(model, iterator, criterion):
    model.eval()
    epoch_loss = 0
    all_preds = []
    all_labels = []
    with torch.no_grad():
        for batch in iterator:
            input_ids = batch['input_ids'].to(DEVICE)
            attention_mask = batch['attention_mask'].to(DEVICE)
            labels = batch['labels'].to(DEVICE)
            
            outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
            loss = outputs.loss
            logits = outputs.logits
            
            epoch_loss += loss.item()
            all_preds.extend(logits.argmax(1).cpu().numpy())
            all_labels.extend(labels.cpu().numpy())

    acc = accuracy_score(all_labels, all_preds)
    f1 = f1_score(all_labels, all_preds, average='macro', zero_division=0)
    
    # 计算RMSE
    preds_rmse = np.array(all_preds) + 1
    labels_rmse = np.array(all_labels) + 1
    rmse = np.sqrt(mean_squared_error(labels_rmse, preds_rmse))
    
    return epoch_loss / len(iterator), acc, f1, rmse

# --- 训练模型三 ---
print("\n开始训练模型三 (微调BERT)...")
for epoch in range(EPOCHS_BERT_FINETUNE):
    train_loss = train_bert_finetune(model3, train_loader_bert, optimizer, scheduler, criterion)
    valid_loss, valid_acc, valid_f1, valid_rmse = evaluate_bert_finetune(model3, val_loader_bert, criterion)
    print(f'轮次: {epoch+1:02} | 训练损失: {train_loss:.3f} | 验证损失: {valid_loss:.3f} | 验证准确率: {valid_acc*100:.2f}% | 验证F1: {valid_f1:.3f} | 验证RMSE: {valid_rmse:.3f}')


开始训练模型三 (微调BERT)...
轮次: 01 | 训练损失: 1.675 | 验证损失: 1.528 | 验证准确率: 39.08% | 验证F1: 0.312 | 验证RMSE: 1.452
轮次: 02 | 训练损失: 1.442 | 验证损失: 1.494 | 验证准确率: 41.01% | 验证F1: 0.366 | 验证RMSE: 1.415
轮次: 03 | 训练损失: 1.323 | 验证损失: 1.509 | 验证准确率: 40.68% | 验证F1: 0.364 | 验证RMSE: 1.417


### 5.2. 在测试集上对模型三进行最终评估

In [33]:
test_loss, test_acc, test_f1, test_rmse = evaluate_bert_finetune(model3, test_loader_bert, criterion)
print(f'模型三 测试集结果 -> 准确率: {test_acc*100:.2f}% | Macro-F1: {test_f1:.3f} | RMSE: {test_rmse:.3f}')
results['模型三 (微调BERT)'] = {'Accuracy': test_acc, 'Macro-F1': test_f1, 'RMSE': test_rmse}

模型三 测试集结果 -> 准确率: 41.17% | Macro-F1: 0.361 | RMSE: 1.453


## 6. 总结与性能对比

In [28]:
# 创建结果DataFrame
df_results = pd.DataFrame(results).T
df_results['Accuracy'] = df_results['Accuracy'].apply(lambda x: f"{x*100:.2f}%")
df_results['Macro-F1'] = df_results['Macro-F1'].apply(lambda x: f"{x:.4f}")
df_results['RMSE'] = df_results['RMSE'].apply(lambda x: f"{x:.4f}")

print("--- IMDB-10测试集最终性能对比 ---")
print(df_results)

# 保存结果
df_results.to_csv('imdb_results.csv')
print("\n结果已保存到 imdb_results.csv")

--- IMDB-10测试集最终性能对比 ---
                      Accuracy Macro-F1    RMSE
模型一 (GloVe + BiLSTM)    19.62%   0.0328  2.4954
模型二 (BERT嵌入 + BiLSTM)   33.70%   0.2401  1.7289
模型三 (微调BERT)            41.17%   0.3610  1.4529

结果已保存到 imdb_results.csv
