# Using BERT + BiLSTM as Sentiment Analyzer
## Unified preprocessed:
1. 字母小寫
2. 刪除網址
3. 移除標點符號（所有標點符號都要刪除）
4. 移除非英文字母
5. 計算類別權重（class_weight）
6. 移除停用詞（stop_words）
7. 不替換用戶名（replace_username：False）
8. 不替換 COVID 相關詞彙（replace_covid：False）

### 2025/05/29 01:49 by sky
## gradient search for following hyperparameter：
### 階段一
    batch sizes = [64, 128, 256]
    學習率=[0.01,0.001, 0.0001]
### 階段二
    hidden_dim=[128, 256, 512]
    num_layers=[2, 3, 4]
    dropout=[0.1, 0.2, 0.3]
### 階段三
    max lengths = [30, 50]
### 階段四
    pooling methods = ['hidden state', 'max pooling', 'mean pooling']
    activation function：[None, ReLU, tanh]

## 步驟1：載入套件

In [1]:
import torch
torch.cuda.is_available()

True

In [2]:
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from transformers import BertTokenizer, BertModel
import re
import string
from sklearn.utils.class_weight import compute_class_weight
import matplotlib.pyplot as plt
from tqdm import tqdm
from torch.optim.lr_scheduler import CosineAnnealingLR
import nltk
from nltk.corpus import stopwords
import matplotlib.font_manager as fm
from itertools import product
import seaborn as sns
from sklearn.metrics import confusion_matrix, f1_score
import copy

In [3]:
# 設置中文字型
plt.rcParams['font.sans-serif'] = ['Microsoft JhengHei']
plt.rcParams['axes.unicode_minus'] = False
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\skych\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

# 步驟2：設置參數

In [4]:
OUTPUT_DIM = 5  # 五種情感類別
EPOCHS = 15  # 訓練輪數
PATIENCE = 5  # 早停耐心值
CLIP_GRAD_NORM = 1.0  # 梯度裁剪範圍
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [5]:
default_params = {
    'batch_size': 64,
    'learning_rate': 0.0001,
    'hidden_dim': 256,
    'num_layers': 4,
    'dropout': 0.2,
    'max_length': 50,
    'pooling_method': 'max pooling',  # 修正為 'max pooling'
    'activation_function': 'softmax'
}

In [6]:
# 階段性參數
stage_params = {
    'stage1': {
        'batch_size': [64, 128, 256],
        'learning_rate': [0.01, 0.001, 0.0001]
    },
    'stage2': {
        'hidden_dim': [128, 256, 512],
        'num_layers': [2, 3, 4],
        'dropout': [0.1, 0.2, 0.3]
    },
    'stage3': {
        'max_length': [30, 50]
    },
    'stage4': {
        'pooling_method': ['hidden state', 'max pooling', 'mean pooling'],
        'activation_function': [None, 'ReLU', 'tanh', 'softmax']
    }
}

## 步驟3：資料清理函數

In [7]:
# 步驟3：資料清理函數
def clean_text(text):
    # 轉換為小寫
    text = text.lower()
    # 移除網址
    text = re.sub(r'http\\S+|www\\S+|https\\S+', '', text, flags=re.MULTILINE)
    # 移除標點符號
    text = text.translate(str.maketrans('', '', string.punctuation))
    # 移除非英文字母
    text = re.sub(r'[^a-z\\s]', '', text)
    # 自定義停用詞（保留情感相關詞彙）
    custom_stop_words = set(stopwords.words('english')) - {'not', 'very', 'really'}
    text = ' '.join(word for word in text.split() if word not in custom_stop_words)
    # 移除多餘空格
    text = ' '.join(text.split())
    return text

## 步驟4：讀取資料

In [8]:
train_df = pd.read_csv('Corona_NLP_train.csv', encoding='latin1')
test_df = pd.read_csv('Corona_NLP_test.csv', encoding='latin1')

# 清理資料
train_df['clean_text'] = train_df['OriginalTweet'].apply(clean_text)
test_df['clean_text'] = test_df['OriginalTweet'].apply(clean_text)

# 檢查清理後文本長度
text_lengths = train_df['clean_text'].apply(lambda x: len(x.split()))
print(f"平均長度: {text_lengths.mean():.2f}, 最大長度: {text_lengths.max()}")

print("清理後的訓練資料前5筆：")
print(train_df[['OriginalTweet', 'clean_text']].head())
print("\n清理後的測試資料前5筆：")
print(test_df[['OriginalTweet', 'clean_text']].head())

平均長度: 1.00, 最大長度: 1
清理後的訓練資料前5筆：
                                       OriginalTweet  \
0  @MeNyrbie @Phil_Gahan @Chrisitv https://t.co/i...   
1  advice Talk to your neighbours family to excha...   
2  Coronavirus Australia: Woolworths to give elde...   
3  My food stock is not the only one which is emp...   
4  Me, ready to go at supermarket during the #COV...   

                                          clean_text  
0  menyrbiephilgahanchrisitvhttpstcoifzfanpaandht...  
1  advicetalktoyourneighboursfamilytoexchangephon...  
2  coronavirusaustraliawoolworthstogiveelderlydis...  
3  myfoodstockisnottheonlyonewhichisemptypleasedo...  
4  mereadytogoatsupermarketduringthecovidoutbreak...  

清理後的測試資料前5筆：
                                       OriginalTweet  \
0  TRENDING: New Yorkers encounter empty supermar...   
1  When I couldn't find hand sanitizer at Fred Me...   
2  Find out how you can protect yourself and love...   
3  #Panic buying hits #NewYork City as anxious sh...   
4  #to

## 步驟5：標籤編碼 & class weight

In [9]:
sentiment_mapping = {
    'Extremely Negative': 0,
    'Negative': 1,
    'Neutral': 2,
    'Positive': 3,
    'Extremely Positive': 4
}
train_df['label'] = train_df['Sentiment'].map(sentiment_mapping)
test_df['label'] = test_df['Sentiment'].map(sentiment_mapping)

print("\n訓練資料類別分佈：")
print(train_df['label'].value_counts())
print("\n測試資料類別分佈：")
print(test_df['label'].value_counts())

# 計算類別權重
class_weights = compute_class_weight('balanced', classes=np.unique(train_df['label']), y=train_df['label'])
class_weights = torch.tensor(class_weights, dtype=torch.float).to(device)


訓練資料類別分佈：
label
3    11422
1     9917
2     7713
4     6624
0     5481
Name: count, dtype: int64

測試資料類別分佈：
label
1    1041
3     947
2     619
4     599
0     592
Name: count, dtype: int64


## 步驟 6：創建自定義資料集

In [10]:
class TweetDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_len):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_len = max_len
    
    def __len__(self):
        return len(self.texts)
    
    def __getitem__(self, idx):
        text = str(self.texts[idx])
        label = self.labels[idx]
        encoding = self.tokenizer.encode_plus(
            text,
            add_special_tokens=True,
            max_length=self.max_len,
            return_token_type_ids=False,
            padding='max_length',
            truncation=True,
            return_attention_mask=True,
            return_tensors='pt'
        )
        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': torch.tensor(label, dtype=torch.long)
        }

## 步驟7：定義 BERT-BiLSTM 模型

In [11]:
class BertBiLSTM(nn.Module):
    def __init__(self, hidden_dim, output_dim, num_layers, dropout, pooling_method, activation_function):
        super(BertBiLSTM, self).__init__()
        self.bert = BertModel.from_pretrained('bert-base-uncased')
        for param in self.bert.encoder.layer[:10].parameters():
            param.requires_grad = False
        self.lstm = nn.LSTM(
            input_size=768,
            hidden_size=hidden_dim,
            num_layers=num_layers,
            batch_first=True,
            bidirectional=True
        )
        self.dropout = nn.Dropout(dropout)
        self.pooling_method = pooling_method
        if pooling_method == 'hidden state':
            self.fc = nn.Linear(hidden_dim * 2, output_dim)
        else:
            self.fc = nn.Linear(hidden_dim * 4, output_dim)
        self.activation_function = activation_function
        if activation_function == 'ReLU':
            self.activation = nn.ReLU()
        elif activation_function == 'tanh':
            self.activation = nn.Tanh()
        elif activation_function == 'softmax':
            self.activation = nn.Softmax(dim=1)
        else:
            self.activation = None
    
    def forward(self, input_ids, attention_mask):
        bert_outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        embedded = bert_outputs[0]
        lstm_out, (hidden, _) = self.lstm(embedded)
        if self.pooling_method == 'hidden state':
            hidden = torch.cat((hidden[-2], hidden[-1]), dim=1)
            pooled = hidden
        elif self.pooling_method == 'max pooling':
            max_pool, _ = torch.max(lstm_out, dim=1)
            avg_pool = torch.mean(lstm_out, dim=1)
            pooled = torch.cat((max_pool, avg_pool), dim=1)
        elif self.pooling_method == 'mean pooling':
            pooled = torch.mean(lstm_out, dim=1)
        else:
            raise ValueError(f"Invalid pooling_method: {self.pooling_method}. Expected 'hidden state', 'max pooling', or 'mean pooling'.")
        pooled = self.dropout(pooled)
        output = self.fc(pooled)
        if self.activation:
            output = self.activation(output)
        return output

## 步驟8：訓練與驗證函數

In [12]:
def train_epoch(model, data_loader, criterion, optimizer, device):
    model.train()
    total_loss = 0
    correct = 0
    total = 0
    for batch in tqdm(data_loader, desc="訓練"):
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)
        optimizer.zero_grad()
        outputs = model(input_ids, attention_mask)
        loss = criterion(outputs, labels)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), CLIP_GRAD_NORM)
        optimizer.step()
        total_loss += loss.item()
        _, preds = torch.max(outputs, dim=1)
        correct += (preds == labels).sum().item()
        total += labels.size(0)
    return total_loss / len(data_loader), correct / total

def evaluate(model, data_loader, criterion, device):
    model.eval()
    total_loss = 0
    correct = 0
    total = 0
    all_preds = []
    all_labels = []
    with torch.no_grad():
        for batch in tqdm(data_loader, desc="驗證"):
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['labels'].to(device)
            outputs = model(input_ids, attention_mask)
            loss = criterion(outputs, labels)
            total_loss += loss.item()
            _, preds = torch.max(outputs, dim=1)
            correct += (preds == labels).sum().item()
            total += labels.size(0)
            all_preds.extend(preds.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())
    f1 = f1_score(all_labels, all_preds, average='weighted')
    return total_loss / len(data_loader), correct / total, f1, all_preds, all_labels

## 步驟 9：階段性參數實驗

In [13]:
def staged_search():
    best_params = default_params.copy()
    best_val_acc = 0
    results = []
    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

    # 階段一：batch_size 和 learning_rate
    print("\n階段一：測試 batch_size 和 learning_rate")
    for batch_size, learning_rate in product(stage_params['stage1']['batch_size'], stage_params['stage1']['learning_rate']):
        print(f"\n測試參數：batch_size={batch_size}, learning_rate={learning_rate}")
        train_dataset = TweetDataset(
            texts=train_df['clean_text'].to_numpy(),
            labels=train_df['label'].to_numpy(),
            tokenizer=tokenizer,
            max_len=best_params['max_length']
        )
        test_dataset = TweetDataset(
            texts=test_df['clean_text'].to_numpy(),
            labels=test_df['label'].to_numpy(),
            tokenizer=tokenizer,
            max_len=best_params['max_length']
        )
        train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
        test_loader = DataLoader(test_dataset, batch_size=batch_size)
        model = BertBiLSTM(
            hidden_dim=best_params['hidden_dim'],
            output_dim=OUTPUT_DIM,
            num_layers=best_params['num_layers'],
            dropout=best_params['dropout'],
            pooling_method=best_params['pooling_method'],
            activation_function=best_params['activation_function']
        ).to(device)
        criterion = nn.CrossEntropyLoss(weight=class_weights).to(device)
        optimizer = optim.Adam([
            {'params': model.bert.parameters(), 'lr': learning_rate, 'weight_decay': 1e-4},
            {'params': list(model.lstm.parameters()) + list(model.fc.parameters()), 'lr': learning_rate, 'weight_decay': 1e-4}
        ])
        scheduler = CosineAnnealingLR(optimizer, T_max=10)
        best_val_loss = float('inf')
        counter = 0
        for epoch in range(EPOCHS):
            train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
            val_loss, val_acc, val_f1, _, _ = evaluate(model, test_loader, criterion, device)
            scheduler.step()
            print(f'輪次 {epoch+1}/{EPOCHS}, 訓練損失: {train_loss:.4f}, 訓練準確率: {train_acc:.4f}')
            print(f'驗證損失: {val_loss:.4f}, 驗證準確率: {val_acc:.4f}, F1: {val_f1:.4f}')
            if val_loss < best_val_loss:
                best_val_loss = val_loss
                counter = 0
            else:
                counter += 1
                if counter >= PATIENCE:
                    print(f'早停於輪次 {epoch+1}')
                    break
        results.append({
            'stage': 1,
            'batch_size': batch_size,
            'learning_rate': learning_rate,
            'val_acc': val_acc,
            'val_f1': val_f1
        })
        if val_acc > best_val_acc:
            best_val_acc = val_acc
            best_params['batch_size'] = batch_size
            best_params['learning_rate'] = learning_rate
    print(f"\n階段一最佳參數：batch_size={best_params['batch_size']}, learning_rate={best_params['learning_rate']}, 驗證準確率={best_val_acc:.4f}")

    # 階段二：hidden_dim, num_layers, dropout
    print("\n階段二：測試 hidden_dim, num_layers, dropout")
    for hidden_dim, num_layers, dropout in product(stage_params['stage2']['hidden_dim'], stage_params['stage2']['num_layers'], stage_params['stage2']['dropout']):
        print(f"\n測試參數：hidden_dim={hidden_dim}, num_layers={num_layers}, dropout={dropout}")
        train_dataset = TweetDataset(
            texts=train_df['clean_text'].to_numpy(),
            labels=train_df['label'].to_numpy(),
            tokenizer=tokenizer,
            max_len=best_params['max_length']
        )
        test_dataset = TweetDataset(
            texts=test_df['clean_text'].to_numpy(),
            labels=test_df['label'].to_numpy(),
            tokenizer=tokenizer,
            max_len=best_params['max_length']
        )
        train_loader = DataLoader(train_dataset, batch_size=best_params['batch_size'], shuffle=True)
        test_loader = DataLoader(test_dataset, batch_size=best_params['batch_size'])
        model = BertBiLSTM(
            hidden_dim=hidden_dim,
            output_dim=OUTPUT_DIM,
            num_layers=num_layers,
            dropout=dropout,
            pooling_method=best_params['pooling_method'],
            activation_function=best_params['activation_function']
        ).to(device)
        criterion = nn.CrossEntropyLoss(weight=class_weights).to(device)
        optimizer = optim.Adam([
            {'params': model.bert.parameters(), 'lr': best_params['learning_rate'], 'weight_decay': 1e-4},
            {'params': list(model.lstm.parameters()) + list(model.fc.parameters()), 'lr': best_params['learning_rate'], 'weight_decay': 1e-4}
        ])
        scheduler = CosineAnnealingLR(optimizer, T_max=10)
        best_val_loss = float('inf')
        counter = 0
        for epoch in range(EPOCHS):
            train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
            val_loss, val_acc, val_f1, _, _ = evaluate(model, test_loader, criterion, device)
            scheduler.step()
            print(f'輪次 {epoch+1}/{EPOCHS}, 訓練損失: {train_loss:.4f}, 訓練準確率: {train_acc:.4f}')
            print(f'驗證損失: {val_loss:.4f}, 驗證準確率: {val_acc:.4f}, F1: {val_f1:.4f}')
            if val_loss < best_val_loss:
                best_val_loss = val_loss
                counter = 0
            else:
                counter += 1
                if counter >= PATIENCE:
                    print(f'早停於輪次 {epoch+1}')
                    break
        results.append({
            'stage': 2,
            'hidden_dim': hidden_dim,
            'num_layers': num_layers,
            'dropout': dropout,
            'val_acc': val_acc,
            'val_f1': val_f1
        })
        if val_acc > best_val_acc:
            best_val_acc = val_acc
            best_params['hidden_dim'] = hidden_dim
            best_params['num_layers'] = num_layers
            best_params['dropout'] = dropout
    print(f"\n階段二最佳參數：hidden_dim={best_params['hidden_dim']}, num_layers={best_params['num_layers']}, dropout={best_params['dropout']}, 驗證準確率={best_val_acc:.4f}")

    # 階段三：max_length
    print("\n階段三：測試 max_length")
    for max_length in stage_params['stage3']['max_length']:
        print(f"\n測試參數：max_length={max_length}")
        train_dataset = TweetDataset(
            texts=train_df['clean_text'].to_numpy(),
            labels=train_df['label'].to_numpy(),
            tokenizer=tokenizer,
            max_len=max_length
        )
        test_dataset = TweetDataset(
            texts=test_df['clean_text'].to_numpy(),
            labels=test_df['label'].to_numpy(),
            tokenizer=tokenizer,
            max_len=max_length
        )
        train_loader = DataLoader(train_dataset, batch_size=best_params['batch_size'], shuffle=True)
        test_loader = DataLoader(test_dataset, batch_size=best_params['batch_size'])
        model = BertBiLSTM(
            hidden_dim=best_params['hidden_dim'],
            output_dim=OUTPUT_DIM,
            num_layers=best_params['num_layers'],
            dropout=best_params['dropout'],
            pooling_method=best_params['pooling_method'],
            activation_function=best_params['activation_function']
        ).to(device)
        criterion = nn.CrossEntropyLoss(weight=class_weights).to(device)
        optimizer = optim.Adam([
            {'params': model.bert.parameters(), 'lr': best_params['learning_rate'], 'weight_decay': 1e-4},
            {'params': list(model.lstm.parameters()) + list(model.fc.parameters()), 'lr': best_params['learning_rate'], 'weight_decay': 1e-4}
        ])
        scheduler = CosineAnnealingLR(optimizer, T_max=10)
        best_val_loss = float('inf')
        counter = 0
        for epoch in range(EPOCHS):
            train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
            val_loss, val_acc, val_f1, _, _ = evaluate(model, test_loader, criterion, device)
            scheduler.step()
            print(f'輪次 {epoch+1}/{EPOCHS}, 訓練損失: {train_loss:.4f}, 訓練準確率: {train_acc:.4f}')
            print(f'驗證損失: {val_loss:.4f}, 驗證準確率: {val_acc:.4f}, F1: {val_f1:.4f}')
            if val_loss < best_val_loss:
                best_val_loss = val_loss
                counter = 0
            else:
                counter += 1
                if counter >= PATIENCE:
                    print(f'早停於輪次 {epoch+1}')
                    break
        results.append({
            'stage': 3,
            'max_length': max_length,
            'val_acc': val_acc,
            'val_f1': val_f1
        })
        if val_acc > best_val_acc:
            best_val_acc = val_acc
            best_params['max_length'] = max_length
    print(f"\n階段三最佳參數：max_length={best_params['max_length']}, 驗證準確率={best_val_acc:.4f}")

    # 階段四：pooling_method 和 activation_function
    print("\n階段四：測試 pooling_method 和 activation_function")
    for pooling_method, activation_function in product(stage_params['stage4']['pooling_method'], stage_params['stage4']['activation_function']):
        print(f"\n測試參數：pooling_method={pooling_method}, activation_function={activation_function}")
        train_dataset = TweetDataset(
            texts=train_df['clean_text'].to_numpy(),
            labels=train_df['label'].to_numpy(),
            tokenizer=tokenizer,
            max_len=best_params['max_length']
        )
        test_dataset = TweetDataset(
            texts=test_df['clean_text'].to_numpy(),
            labels=test_df['label'].to_numpy(),
            tokenizer=tokenizer,
            max_len=best_params['max_length']
        )
        train_loader = DataLoader(train_dataset, batch_size=best_params['batch_size'], shuffle=True)
        test_loader = DataLoader(test_dataset, batch_size=best_params['batch_size'])
        model = BertBiLSTM(
            hidden_dim=best_params['hidden_dim'],
            output_dim=OUTPUT_DIM,
            num_layers=best_params['num_layers'],
            dropout=best_params['dropout'],
            pooling_method=pooling_method,
            activation_function=activation_function
        ).to(device)
        criterion = nn.CrossEntropyLoss(weight=class_weights).to(device)
        optimizer = optim.Adam([
            {'params': model.bert.parameters(), 'lr': best_params['learning_rate'], 'weight_decay': 1e-4},
            {'params': list(model.lstm.parameters()) + list(model.fc.parameters()), 'lr': best_params['learning_rate'], 'weight_decay': 1e-4}
        ])
        scheduler = CosineAnnealingLR(optimizer, T_max=10)
        best_val_loss = float('inf')
        counter = 0
        for epoch in range(EPOCHS):
            train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
            val_loss, val_acc, val_f1, _, _ = evaluate(model, test_loader, criterion, device)
            scheduler.step()
            print(f'輪次 {epoch+1}/{EPOCHS}, 訓練損失: {train_loss:.4f}, 訓練準確率: {train_acc:.4f}')
            print(f'驗證損失: {val_loss:.4f}, 驗證準確率: {val_acc:.4f}, F1: {val_f1:.4f}')
            if val_loss < best_val_loss:
                best_val_loss = val_loss
                counter = 0
            else:
                counter += 1
                if counter >= PATIENCE:
                    print(f'早停於輪次 {epoch+1}')
                    break
        results.append({
            'stage': 4,
            'pooling_method': pooling_method,
            'activation_function': activation_function,
            'val_acc': val_acc,
            'val_f1': val_f1
        })
        if val_acc > best_val_acc:
            best_val_acc = val_acc
            best_params['pooling_method'] = pooling_method
            best_params['activation_function'] = activation_function
    print(f"\n階段四最佳參數：pooling_method={best_params['pooling_method']}, activation_function={best_params['activation_function']}, 驗證準確率={best_val_acc:.4f}")

    # 儲存結果
    results_df = pd.DataFrame(results)
    results_df.to_csv('staged_search_results.csv', index=False)
    print(f"\n最終最佳參數：{best_params}")
    print(f"最終最佳驗證準確率: {best_val_acc:.4f}")

    # 使用最佳參數進行最終訓練並生成混淆矩陣
    print("\n使用最佳參數進行最終訓練...")
    train_dataset = TweetDataset(
        texts=train_df['clean_text'].to_numpy(),
        labels=train_df['label'].to_numpy(),
        tokenizer=tokenizer,
        max_len=best_params['max_length']
    )
    test_dataset = TweetDataset(
        texts=test_df['clean_text'].to_numpy(),
        labels=test_df['label'].to_numpy(),
        tokenizer=tokenizer,
        max_len=best_params['max_length']
    )
    train_loader = DataLoader(train_dataset, batch_size=best_params['batch_size'], shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=best_params['batch_size'])
    model = BertBiLSTM(
        hidden_dim=best_params['hidden_dim'],
        output_dim=OUTPUT_DIM,
        num_layers=best_params['num_layers'],
        dropout=best_params['dropout'],
        pooling_method=best_params['pooling_method'],
        activation_function=best_params['activation_function']
    ).to(device)
    criterion = nn.CrossEntropyLoss(weight=class_weights).to(device)
    optimizer = optim.Adam([
        {'params': model.bert.parameters(), 'lr': best_params['learning_rate'], 'weight_decay': 1e-4},
        {'params': list(model.lstm.parameters()) + list(model.fc.parameters()), 'lr': best_params['learning_rate'], 'weight_decay': 1e-4}
    ])
    scheduler = CosineAnnealingLR(optimizer, T_max=10)
    train_losses = []
    val_losses = []
    val_accuracies = []
    best_val_loss = float('inf')
    counter = 0
    best_model_state = None

    for epoch in range(EPOCHS):
        train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
        val_loss, val_acc, val_f1, all_preds, all_labels = evaluate(model, test_loader, criterion, device)
        scheduler.step()
        train_losses.append(train_loss)
        val_losses.append(val_loss)
        val_accuracies.append(val_acc)
        print(f'輪次 {epoch+1}/{EPOCHS}')
        print(f'訓練損失: {train_loss:.4f}, 訓練準確率: {train_acc:.4f}')
        print(f'驗證損失: {val_loss:.4f}, 驗證準確率: {val_acc:.4f}, F1: {val_f1:.4f}')
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            best_model_state = copy.deepcopy(model.state_dict())
            counter = 0
        else:
            counter += 1
            if counter >= PATIENCE:
                print(f'早停於輪次 {epoch+1}')
                break

    # 載入最佳模型
    model.load_state_dict(best_model_state)

    # 生成混淆矩陣
    sentiment_labels = ['Extremely Negative', 'Negative', 'Neutral', 'Positive', 'Extremely Positive']
    cm = confusion_matrix(all_labels, all_preds)
    plt.figure(figsize=(10, 8))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=sentiment_labels, yticklabels=sentiment_labels)
    plt.xlabel('預測標籤')
    plt.ylabel('實際標籤')
    plt.title('混淆矩陣 (BERT + BiLSTM)')
    plt.tight_layout()
    plt.savefig('confusion_matrix_bert_bilstm.png')
    plt.show()

    # 繪製訓練/驗證曲線
    plt.figure(figsize=(12, 5))
    plt.subplot(1, 2, 1)
    plt.plot(range(1, len(train_losses)+1), train_losses, label='訓練損失')
    plt.plot(range(1, len(val_losses)+1), val_losses, label='驗證損失')
    plt.xlabel('輪次')
    plt.ylabel('損失')
    plt.title('訓練/驗證損失曲線')
    plt.legend()
    plt.subplot(1, 2, 2)
    plt.plot(range(1, len(val_accuracies)+1), val_accuracies, label='驗證準確率', color='green')
    plt.xlabel('輪次')
    plt.ylabel('準確率')
    plt.title('驗證準確率曲線')
    plt.legend()
    plt.tight_layout()
    plt.savefig('final_training_curves.png')
    plt.show()

## 步驟10：最終訓練與視覺化

In [14]:
staged_search()


階段一：測試 batch_size 和 learning_rate

測試參數：batch_size=64, learning_rate=0.01


訓練: 100%|██████████| 644/644 [03:06<00:00,  3.45it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.32it/s]


輪次 1/15, 訓練損失: 1.7002, 訓練準確率: 0.1989
驗證損失: 1.6162, 驗證準確率: 0.2738, F1: 0.1178


訓練: 100%|██████████| 644/644 [03:14<00:00,  3.30it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.29it/s]


輪次 2/15, 訓練損失: 1.7029, 訓練準確率: 0.2041
驗證損失: 1.6797, 驗證準確率: 0.2741, F1: 0.1179


訓練: 100%|██████████| 644/644 [03:18<00:00,  3.24it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.25it/s]


輪次 3/15, 訓練損失: 1.7023, 訓練準確率: 0.2388
驗證損失: 1.7097, 驗證準確率: 0.1577, F1: 0.0430


訓練: 100%|██████████| 644/644 [03:18<00:00,  3.25it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.20it/s]


輪次 4/15, 訓練損失: 1.6550, 訓練準確率: 0.2041
驗證損失: 1.6146, 驗證準確率: 0.1630, F1: 0.0457


訓練: 100%|██████████| 644/644 [03:23<00:00,  3.16it/s]
驗證: 100%|██████████| 60/60 [00:10<00:00,  5.72it/s]


輪次 5/15, 訓練損失: 1.6851, 訓練準確率: 0.2162
驗證損失: 1.6758, 驗證準確率: 0.1559, F1: 0.0420


訓練: 100%|██████████| 644/644 [03:20<00:00,  3.21it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.22it/s]


輪次 6/15, 訓練損失: 1.7013, 訓練準確率: 0.2251
驗證損失: 1.7262, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 644/644 [03:19<00:00,  3.22it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.18it/s]


輪次 7/15, 訓練損失: 1.6643, 訓練準確率: 0.2274
驗證損失: 1.6091, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 644/644 [03:21<00:00,  3.20it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.00it/s]


輪次 8/15, 訓練損失: 1.6099, 訓練準確率: 0.2169
驗證損失: 1.6095, 驗證準確率: 0.2741, F1: 0.1179


訓練: 100%|██████████| 644/644 [03:21<00:00,  3.20it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.14it/s]


輪次 9/15, 訓練損失: 1.6096, 訓練準確率: 0.2278
驗證損失: 1.6094, 驗證準確率: 0.2741, F1: 0.1179


訓練: 100%|██████████| 644/644 [03:20<00:00,  3.20it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.06it/s]


輪次 10/15, 訓練損失: 1.6095, 訓練準確率: 0.2585
驗證損失: 1.6095, 驗證準確率: 0.2741, F1: 0.1179


訓練: 100%|██████████| 644/644 [03:21<00:00,  3.20it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.28it/s]


輪次 11/15, 訓練損失: 1.6094, 訓練準確率: 0.2410
驗證損失: 1.6095, 驗證準確率: 0.2741, F1: 0.1179


訓練: 100%|██████████| 644/644 [03:20<00:00,  3.21it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.24it/s]


輪次 12/15, 訓練損失: 1.6095, 訓練準確率: 0.2572
驗證損失: 1.6097, 驗證準確率: 0.2493, F1: 0.0995
早停於輪次 12

測試參數：batch_size=64, learning_rate=0.001


訓練: 100%|██████████| 644/644 [03:21<00:00,  3.20it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.25it/s]


輪次 1/15, 訓練損失: 1.6027, 訓練準確率: 0.2054
驗證損失: 1.6096, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 644/644 [03:21<00:00,  3.20it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.05it/s]


輪次 2/15, 訓練損失: 1.6095, 訓練準確率: 0.2038
驗證損失: 1.6104, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 644/644 [03:21<00:00,  3.19it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.16it/s]


輪次 3/15, 訓練損失: 1.6095, 訓練準確率: 0.2498
驗證損失: 1.6098, 驗證準確率: 0.1577, F1: 0.0430


訓練: 100%|██████████| 644/644 [03:18<00:00,  3.24it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.21it/s]


輪次 4/15, 訓練損失: 1.6095, 訓練準確率: 0.2432
驗證損失: 1.6096, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 644/644 [03:16<00:00,  3.28it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.23it/s]


輪次 5/15, 訓練損失: 1.6095, 訓練準確率: 0.2368
驗證損失: 1.6093, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 644/644 [03:14<00:00,  3.32it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.23it/s]


輪次 6/15, 訓練損失: 1.6095, 訓練準確率: 0.2619
驗證損失: 1.6096, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 644/644 [03:12<00:00,  3.34it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.36it/s]


輪次 7/15, 訓練損失: 1.6094, 訓練準確率: 0.2754
驗證損失: 1.6096, 驗證準確率: 0.1630, F1: 0.0457


訓練: 100%|██████████| 644/644 [03:13<00:00,  3.33it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.31it/s]


輪次 8/15, 訓練損失: 1.6094, 訓練準確率: 0.2163
驗證損失: 1.6096, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 644/644 [03:12<00:00,  3.35it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.11it/s]


輪次 9/15, 訓練損失: 1.6094, 訓練準確率: 0.2775
驗證損失: 1.6096, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 644/644 [03:12<00:00,  3.34it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.27it/s]


輪次 10/15, 訓練損失: 1.6094, 訓練準確率: 0.2775
驗證損失: 1.6096, 驗證準確率: 0.2493, F1: 0.0995
早停於輪次 10

測試參數：batch_size=64, learning_rate=0.0001


訓練: 100%|██████████| 644/644 [03:19<00:00,  3.23it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.13it/s]


輪次 1/15, 訓練損失: 1.5840, 訓練準確率: 0.2062
驗證損失: 1.5756, 驗證準確率: 0.2059, F1: 0.1079


訓練: 100%|██████████| 644/644 [03:19<00:00,  3.22it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.20it/s]


輪次 2/15, 訓練損失: 1.5800, 訓練準確率: 0.2078
驗證損失: 1.5734, 驗證準確率: 0.2130, F1: 0.1249


訓練: 100%|██████████| 644/644 [03:19<00:00,  3.23it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.16it/s]


輪次 3/15, 訓練損失: 1.5708, 訓練準確率: 0.2197
驗證損失: 1.5710, 驗證準確率: 0.2104, F1: 0.1253


訓練: 100%|██████████| 644/644 [03:19<00:00,  3.24it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.14it/s]


輪次 4/15, 訓練損失: 1.5638, 訓練準確率: 0.2308
驗證損失: 1.5706, 驗證準確率: 0.2088, F1: 0.1352


訓練: 100%|██████████| 644/644 [03:18<00:00,  3.24it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.33it/s]


輪次 5/15, 訓練損失: 1.5568, 訓練準確率: 0.2327
驗證損失: 1.5700, 驗證準確率: 0.2125, F1: 0.1403


訓練: 100%|██████████| 644/644 [03:18<00:00,  3.25it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.18it/s]


輪次 6/15, 訓練損失: 1.5522, 訓練準確率: 0.2351
驗證損失: 1.5767, 驗證準確率: 0.2088, F1: 0.1327


訓練: 100%|██████████| 644/644 [03:18<00:00,  3.24it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.15it/s]


輪次 7/15, 訓練損失: 1.5490, 訓練準確率: 0.2383
驗證損失: 1.5749, 驗證準確率: 0.2127, F1: 0.1391


訓練: 100%|██████████| 644/644 [03:19<00:00,  3.24it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.08it/s]


輪次 8/15, 訓練損失: 1.5462, 訓練準確率: 0.2465
驗證損失: 1.5745, 驗證準確率: 0.2146, F1: 0.1411


訓練: 100%|██████████| 644/644 [03:19<00:00,  3.23it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.22it/s]


輪次 9/15, 訓練損失: 1.5437, 訓練準確率: 0.2546
驗證損失: 1.5760, 驗證準確率: 0.2114, F1: 0.1376


訓練: 100%|██████████| 644/644 [03:19<00:00,  3.23it/s]
驗證: 100%|██████████| 60/60 [00:08<00:00,  7.22it/s]


輪次 10/15, 訓練損失: 1.5423, 訓練準確率: 0.2514
驗證損失: 1.5763, 驗證準確率: 0.2120, F1: 0.1383
早停於輪次 10

測試參數：batch_size=128, learning_rate=0.01


訓練: 100%|██████████| 322/322 [03:10<00:00,  1.69it/s]
驗證: 100%|██████████| 30/30 [00:08<00:00,  3.73it/s]


輪次 1/15, 訓練損失: 1.6981, 訓練準確率: 0.2010
驗證損失: 1.6745, 驗證準確率: 0.1559, F1: 0.0420


訓練: 100%|██████████| 322/322 [03:10<00:00,  1.69it/s]
驗證: 100%|██████████| 30/30 [00:07<00:00,  3.77it/s]


輪次 2/15, 訓練損失: 1.7028, 訓練準確率: 0.1829
驗證損失: 1.7268, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 322/322 [03:10<00:00,  1.69it/s]
驗證: 100%|██████████| 30/30 [00:08<00:00,  3.70it/s]


輪次 3/15, 訓練損失: 1.6491, 訓練準確率: 0.2188
驗證損失: 1.6380, 驗證準確率: 0.1577, F1: 0.0430


訓練: 100%|██████████| 322/322 [03:08<00:00,  1.71it/s]
驗證: 100%|██████████| 30/30 [00:08<00:00,  3.74it/s]


輪次 4/15, 訓練損失: 1.7057, 訓練準確率: 0.1760
驗證損失: 1.6748, 驗證準確率: 0.1559, F1: 0.0420


訓練: 100%|██████████| 322/322 [03:09<00:00,  1.70it/s]
驗證: 100%|██████████| 30/30 [00:07<00:00,  3.77it/s]


輪次 5/15, 訓練損失: 1.7007, 訓練準確率: 0.2180
驗證損失: 1.7268, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 322/322 [03:09<00:00,  1.70it/s]
驗證: 100%|██████████| 30/30 [00:08<00:00,  3.71it/s]


輪次 6/15, 訓練損失: 1.7060, 訓練準確率: 0.1929
驗證損失: 1.6792, 驗證準確率: 0.2741, F1: 0.1179


訓練: 100%|██████████| 322/322 [03:09<00:00,  1.70it/s]
驗證: 100%|██████████| 30/30 [00:07<00:00,  3.84it/s]


輪次 7/15, 訓練損失: 1.7040, 訓練準確率: 0.2000
驗證損失: 1.6748, 驗證準確率: 0.1559, F1: 0.0420


訓練: 100%|██████████| 322/322 [03:06<00:00,  1.73it/s]
驗證: 100%|██████████| 30/30 [00:07<00:00,  3.75it/s]


輪次 8/15, 訓練損失: 1.7050, 訓練準確率: 0.1618
驗證損失: 1.7330, 驗證準確率: 0.1630, F1: 0.0457
早停於輪次 8

測試參數：batch_size=128, learning_rate=0.001


訓練: 100%|██████████| 322/322 [03:09<00:00,  1.70it/s]
驗證: 100%|██████████| 30/30 [00:08<00:00,  3.69it/s]


輪次 1/15, 訓練損失: 1.6013, 訓練準確率: 0.2164
驗證損失: 1.6094, 驗證準確率: 0.2741, F1: 0.1179


訓練: 100%|██████████| 322/322 [03:09<00:00,  1.70it/s]
驗證: 100%|██████████| 30/30 [00:08<00:00,  3.75it/s]


輪次 2/15, 訓練損失: 1.6096, 訓練準確率: 0.2173
驗證損失: 1.6097, 驗證準確率: 0.2741, F1: 0.1179


訓練: 100%|██████████| 322/322 [03:04<00:00,  1.74it/s]
驗證: 100%|██████████| 30/30 [00:07<00:00,  3.77it/s]


輪次 3/15, 訓練損失: 1.6095, 訓練準確率: 0.2522
驗證損失: 1.6094, 驗證準確率: 0.2741, F1: 0.1179


訓練: 100%|██████████| 322/322 [03:01<00:00,  1.77it/s]
驗證: 100%|██████████| 30/30 [00:07<00:00,  3.86it/s]


輪次 4/15, 訓練損失: 1.6095, 訓練準確率: 0.2286
驗證損失: 1.6093, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 322/322 [03:00<00:00,  1.78it/s]
驗證: 100%|██████████| 30/30 [00:07<00:00,  3.86it/s]


輪次 5/15, 訓練損失: 1.6095, 訓練準確率: 0.2592
驗證損失: 1.6096, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 322/322 [03:00<00:00,  1.79it/s]
驗證: 100%|██████████| 30/30 [00:07<00:00,  3.80it/s]


輪次 6/15, 訓練損失: 1.6095, 訓練準確率: 0.2385
驗證損失: 1.6096, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 322/322 [02:59<00:00,  1.79it/s]
驗證: 100%|██████████| 30/30 [00:07<00:00,  3.85it/s]


輪次 7/15, 訓練損失: 1.6095, 訓練準確率: 0.2724
驗證損失: 1.6096, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 322/322 [02:58<00:00,  1.80it/s]
驗證: 100%|██████████| 30/30 [00:07<00:00,  3.82it/s]


輪次 8/15, 訓練損失: 1.6094, 訓練準確率: 0.2761
驗證損失: 1.6096, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 322/322 [02:57<00:00,  1.81it/s]
驗證: 100%|██████████| 30/30 [00:07<00:00,  3.90it/s]


輪次 9/15, 訓練損失: 1.6094, 訓練準確率: 0.2775
驗證損失: 1.6096, 驗證準確率: 0.2493, F1: 0.0995
早停於輪次 9

測試參數：batch_size=128, learning_rate=0.0001


訓練: 100%|██████████| 322/322 [03:04<00:00,  1.74it/s]
驗證: 100%|██████████| 30/30 [00:07<00:00,  3.81it/s]


輪次 1/15, 訓練損失: 1.5852, 訓練準確率: 0.2064
驗證損失: 1.5753, 驗證準確率: 0.2059, F1: 0.1079


訓練: 100%|██████████| 322/322 [03:08<00:00,  1.71it/s]
驗證: 100%|██████████| 30/30 [00:08<00:00,  3.74it/s]


輪次 2/15, 訓練損失: 1.5825, 訓練準確率: 0.2056
驗證損失: 1.5767, 驗證準確率: 0.2104, F1: 0.1098


訓練: 100%|██████████| 322/322 [03:10<00:00,  1.69it/s]
驗證: 100%|██████████| 30/30 [00:07<00:00,  3.76it/s]


輪次 3/15, 訓練損失: 1.5782, 訓練準確率: 0.2098
驗證損失: 1.5727, 驗證準確率: 0.2043, F1: 0.1164


訓練: 100%|██████████| 322/322 [03:05<00:00,  1.73it/s]
驗證: 100%|██████████| 30/30 [00:07<00:00,  3.82it/s]


輪次 4/15, 訓練損失: 1.5698, 訓練準確率: 0.2100
驗證損失: 1.5731, 驗證準確率: 0.2143, F1: 0.1332


訓練: 100%|██████████| 322/322 [03:02<00:00,  1.76it/s]
驗證: 100%|██████████| 30/30 [00:07<00:00,  3.76it/s]


輪次 5/15, 訓練損失: 1.5645, 訓練準確率: 0.2237
驗證損失: 1.5739, 驗證準確率: 0.2151, F1: 0.1421


訓練: 100%|██████████| 322/322 [03:01<00:00,  1.77it/s]
驗證: 100%|██████████| 30/30 [00:07<00:00,  3.85it/s]


輪次 6/15, 訓練損失: 1.5589, 訓練準確率: 0.2290
驗證損失: 1.5749, 驗證準確率: 0.2085, F1: 0.1330


訓練: 100%|██████████| 322/322 [03:02<00:00,  1.77it/s]
驗證: 100%|██████████| 30/30 [00:07<00:00,  3.80it/s]


輪次 7/15, 訓練損失: 1.5551, 訓練準確率: 0.2344
驗證損失: 1.5716, 驗證準確率: 0.2135, F1: 0.1392


訓練: 100%|██████████| 322/322 [03:10<00:00,  1.69it/s]
驗證: 100%|██████████| 30/30 [00:09<00:00,  3.16it/s]


輪次 8/15, 訓練損失: 1.5517, 訓練準確率: 0.2415
驗證損失: 1.5723, 驗證準確率: 0.2117, F1: 0.1371


訓練: 100%|██████████| 322/322 [03:22<00:00,  1.59it/s]
驗證: 100%|██████████| 30/30 [00:10<00:00,  2.85it/s]


輪次 9/15, 訓練損失: 1.5498, 訓練準確率: 0.2371
驗證損失: 1.5732, 驗證準確率: 0.2112, F1: 0.1364


訓練: 100%|██████████| 322/322 [03:27<00:00,  1.55it/s]
驗證: 100%|██████████| 30/30 [00:09<00:00,  3.18it/s]


輪次 10/15, 訓練損失: 1.5485, 訓練準確率: 0.2424
驗證損失: 1.5734, 驗證準確率: 0.2112, F1: 0.1361


訓練: 100%|██████████| 322/322 [03:25<00:00,  1.57it/s]
驗證: 100%|██████████| 30/30 [00:10<00:00,  2.89it/s]


輪次 11/15, 訓練損失: 1.5485, 訓練準確率: 0.2446
驗證損失: 1.5734, 驗證準確率: 0.2112, F1: 0.1361


訓練: 100%|██████████| 322/322 [03:26<00:00,  1.56it/s]
驗證: 100%|██████████| 30/30 [00:09<00:00,  3.12it/s]


輪次 12/15, 訓練損失: 1.5477, 訓練準確率: 0.2447
驗證損失: 1.5735, 驗證準確率: 0.2109, F1: 0.1360
早停於輪次 12

測試參數：batch_size=256, learning_rate=0.01


訓練: 100%|██████████| 161/161 [05:08<00:00,  1.92s/it]
驗證: 100%|██████████| 15/15 [00:09<00:00,  1.56it/s]


輪次 1/15, 訓練損失: 1.6704, 訓練準確率: 0.2244
驗證損失: 1.7324, 驗證準確率: 0.1630, F1: 0.0457


訓練: 100%|██████████| 161/161 [04:59<00:00,  1.86s/it]
驗證: 100%|██████████| 15/15 [00:09<00:00,  1.59it/s]


輪次 2/15, 訓練損失: 1.7037, 訓練準確率: 0.2328
驗證損失: 1.7270, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 161/161 [05:01<00:00,  1.87s/it]
驗證: 100%|██████████| 15/15 [00:09<00:00,  1.59it/s]


輪次 3/15, 訓練損失: 1.7044, 訓練準確率: 0.1690
驗證損失: 1.6736, 驗證準確率: 0.1559, F1: 0.0420


訓練: 100%|██████████| 161/161 [04:56<00:00,  1.84s/it]
驗證: 100%|██████████| 15/15 [00:09<00:00,  1.52it/s]


輪次 4/15, 訓練損失: 1.7041, 訓練準確率: 0.1699
驗證損失: 1.7324, 驗證準確率: 0.1630, F1: 0.0457


訓練: 100%|██████████| 161/161 [04:54<00:00,  1.83s/it]
驗證: 100%|██████████| 15/15 [00:09<00:00,  1.62it/s]


輪次 5/15, 訓練損失: 1.6419, 訓練準確率: 0.2018
驗證損失: 1.6112, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 161/161 [04:52<00:00,  1.82s/it]
驗證: 100%|██████████| 15/15 [00:09<00:00,  1.66it/s]


輪次 6/15, 訓練損失: 1.6104, 訓練準確率: 0.1936
驗證損失: 1.6100, 驗證準確率: 0.1577, F1: 0.0430


訓練: 100%|██████████| 161/161 [04:54<00:00,  1.83s/it]
驗證: 100%|██████████| 15/15 [00:09<00:00,  1.60it/s]


輪次 7/15, 訓練損失: 1.6098, 訓練準確率: 0.2179
驗證損失: 1.6088, 驗證準確率: 0.1559, F1: 0.0420


訓練: 100%|██████████| 161/161 [04:47<00:00,  1.79s/it]
驗證: 100%|██████████| 15/15 [00:08<00:00,  1.68it/s]


輪次 8/15, 訓練損失: 1.6096, 訓練準確率: 0.1880
驗證損失: 1.6098, 驗證準確率: 0.2741, F1: 0.1179


訓練: 100%|██████████| 161/161 [04:48<00:00,  1.79s/it]
驗證: 100%|██████████| 15/15 [00:09<00:00,  1.64it/s]


輪次 9/15, 訓練損失: 1.6095, 訓練準確率: 0.2234
驗證損失: 1.6101, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 161/161 [04:46<00:00,  1.78s/it]
驗證: 100%|██████████| 15/15 [00:08<00:00,  1.67it/s]


輪次 10/15, 訓練損失: 1.6095, 訓練準確率: 0.2230
驗證損失: 1.6099, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 161/161 [04:48<00:00,  1.79s/it]
驗證: 100%|██████████| 15/15 [00:08<00:00,  1.71it/s]


輪次 11/15, 訓練損失: 1.6095, 訓練準確率: 0.2690
驗證損失: 1.6099, 驗證準確率: 0.2493, F1: 0.0995


訓練: 100%|██████████| 161/161 [04:45<00:00,  1.78s/it]
驗證: 100%|██████████| 15/15 [00:08<00:00,  1.70it/s]


輪次 12/15, 訓練損失: 1.6095, 訓練準確率: 0.2352
驗證損失: 1.6097, 驗證準確率: 0.2493, F1: 0.0995
早停於輪次 12

測試參數：batch_size=256, learning_rate=0.001


訓練: 100%|██████████| 161/161 [04:42<00:00,  1.75s/it]
驗證: 100%|██████████| 15/15 [00:07<00:00,  1.90it/s]


輪次 1/15, 訓練損失: 1.6087, 訓練準確率: 0.1979
驗證損失: 1.6106, 驗證準確率: 0.1577, F1: 0.0430


訓練: 100%|██████████| 161/161 [04:38<00:00,  1.73s/it]
驗證: 100%|██████████| 15/15 [00:07<00:00,  1.88it/s]


輪次 2/15, 訓練損失: 1.6082, 訓練準確率: 0.2017
驗證損失: 1.6101, 驗證準確率: 0.1630, F1: 0.0457


訓練: 100%|██████████| 161/161 [04:40<00:00,  1.74s/it]
驗證: 100%|██████████| 15/15 [00:07<00:00,  1.89it/s]


輪次 3/15, 訓練損失: 1.6095, 訓練準確率: 0.1942
驗證損失: 1.6092, 驗證準確率: 0.1559, F1: 0.0420


訓練: 100%|██████████| 161/161 [04:37<00:00,  1.72s/it]
驗證: 100%|██████████| 15/15 [00:07<00:00,  1.89it/s]


輪次 4/15, 訓練損失: 1.6095, 訓練準確率: 0.2026
驗證損失: 1.6097, 驗證準確率: 0.1630, F1: 0.0457


訓練: 100%|██████████| 161/161 [04:37<00:00,  1.72s/it]
驗證: 100%|██████████| 15/15 [00:07<00:00,  1.89it/s]


輪次 5/15, 訓練損失: 1.6095, 訓練準確率: 0.1832
驗證損失: 1.6092, 驗證準確率: 0.1577, F1: 0.0430


訓練: 100%|██████████| 161/161 [04:37<00:00,  1.72s/it]
驗證: 100%|██████████| 15/15 [00:07<00:00,  1.88it/s]


輪次 6/15, 訓練損失: 1.6095, 訓練準確率: 0.1981
驗證損失: 1.6093, 驗證準確率: 0.1577, F1: 0.0430


訓練: 100%|██████████| 161/161 [04:37<00:00,  1.72s/it]
驗證: 100%|██████████| 15/15 [00:07<00:00,  1.90it/s]


輪次 7/15, 訓練損失: 1.6095, 訓練準確率: 0.2219
驗證損失: 1.6094, 驗證準確率: 0.2741, F1: 0.1179


訓練: 100%|██████████| 161/161 [04:36<00:00,  1.72s/it]
驗證: 100%|██████████| 15/15 [00:07<00:00,  1.90it/s]


輪次 8/15, 訓練損失: 1.6095, 訓練準確率: 0.1831
驗證損失: 1.6095, 驗證準確率: 0.1630, F1: 0.0457


訓練: 100%|██████████| 161/161 [04:38<00:00,  1.73s/it]
驗證: 100%|██████████| 15/15 [00:07<00:00,  1.89it/s]


輪次 9/15, 訓練損失: 1.6094, 訓練準確率: 0.2156
驗證損失: 1.6095, 驗證準確率: 0.1630, F1: 0.0457


訓練: 100%|██████████| 161/161 [04:40<00:00,  1.74s/it]
驗證: 100%|██████████| 15/15 [00:07<00:00,  1.88it/s]


輪次 10/15, 訓練損失: 1.6094, 訓練準確率: 0.2256
驗證損失: 1.6095, 驗證準確率: 0.2741, F1: 0.1179
早停於輪次 10

測試參數：batch_size=256, learning_rate=0.0001


訓練: 100%|██████████| 161/161 [04:39<00:00,  1.74s/it]
驗證: 100%|██████████| 15/15 [00:07<00:00,  1.89it/s]


輪次 1/15, 訓練損失: 1.5870, 訓練準確率: 0.2047
驗證損失: 1.5756, 驗證準確率: 0.2104, F1: 0.1098


訓練:  84%|████████▍ | 135/161 [03:57<00:45,  1.76s/it]


KeyboardInterrupt: 