# Checkout the dataset
Dataset: [bbc dataset][def]  
- 根據文章片段，分類該文章是屬於什麼領域的報導
- 總共有五種分類：entertainment, sport, tech, business, politics
- 資料總筆數：2225

我們可以從下方圖表得知：  
* 此資料集有五種類別，即是等等 BERT 判斷分類的 Label  
* 而每筆 Text 就是 BERT 的 input 資料

目的就是判斷此 Text 的類別 (Label) 是什麼  

[def]: https://www.kaggle.com/datasets/sainijagjit/bbc-dataset

In [1]:
import pandas as pd
import matplotlib.pyplot as plt

datapath = 'IMDB Dataset.csv'
df = pd.read_csv(datapath)
df.head()

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. The filming t...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive


# Preprocessing Data (資料前處理)

## 1. BertTokenizer
[**BertTokenizer**](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertTokenizer):
以 Word-based 的方式做 Tokenization，並加上 \[CLS\]、\[SEP\]、\[PAD\] 這三個特殊 token  
使用方式為： ```BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)```   
其中，```PRETRAINED_MODEL_NAME``` 可依需求尋找適合任務的預訓練模型 ([可用模型總攬](https://huggingface.co/models))  
本例子使用常見的英文模型 ```'bert-base-uncased'```

BertTokenizer 繼承於 [PreTrainedTokenizer](https://huggingface.co/docs/transformers/v4.23.1/en/main_classes/tokenizer#transformers.PreTrainedTokenizer)，其中較常使用的參數如下：

* ```text``` / ```text_pair ```: 要做 Tokenize 的序列 / 要做 Tokenize 的一對序列
* ```padding```: 選擇要不要補 padding (\[PAD\])  
  * ```True``` or ```'longest'```: 做 padding 直到序列長度等於 batch 中最長的序列
  * ```'max_length'```: padding 直到設定的值 'max_length'
  * ```False``` or ```'do_not_pad'```: 不做 padding
* ```max_length```: 設定序列的最長長度 (BERT 中是 512) 
* ```truncation```: 若序列(序列對)長度總和大於上限，是否截斷序列
  * ```True``` or ```'longest_first'```: 以迭代的方式，從最長的序列開始慢慢截斷。  
    ex:  
    seq1 = "a b c", seq2 = "d e", max_length = 2。因為 len(seq1) + len(seq2) > 2，需要要做截斷  
    (註：這邊先忽略 \[CLS\]、\[SEP\]、\[PAD\] 這些 Token)  
    迭代1：seq1 = "a b", seq2 = "d e"  
    迭代2：seq1 = "a", seq2 = "d e"  
    迭代3：seq1 = "a", seq2 = "d" => 最後輸出 "a d"  
  * ```'only_first'```: 只截斷一對序列對中的第一個序列
  * ```'only_second'```: 只截斷一對序列對中的第二個序列
  * ```False``` or ```'do_not_truncate'```: 不做截斷
* ```return_tensors```: 決定回傳張量的資料型態
  * ```'tf'```: TensorFlow tf.constant.
  * ```'pt'```: PyTorch torch.Tensor.
  * ```'np'```: Numpy np.ndarray.

In [2]:
from transformers import BertTokenizer
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score

tokenizer = BertTokenizer.from_pretrained('bert-base-cased')

## 2. Data Class
1. 決定要用什麼模型的 tokenizer  
2. 給定對應到 Dataset 類別的標籤 (labels)，即將資料集中的各類別對應到 id (數值)  
   這邊將 ```'business'```, ```'entertainment'```, ```'sport'```, ```'tech'```, ```'politics'``` 對應到 0 ~ 4，
3. 初始化 Dataset 時：
   - 各筆資料的分類 (category) 轉換成 id
   - 再把資料集丟進 BertTokenizer，轉換成 BERT 可以使用的格式

In [3]:
import torch
import numpy as np
from transformers import BertTokenizer

# 決定 tokenizer 類型
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
# 決定資料集中各分類對應的 id
labels = {'negative':0,
          'positive':1,
          }

# 資料集處理
class Dataset(torch.utils.data.Dataset):

    def __init__(self, df):
        # 把每一筆資料的類別改成 id
        self.labels = [labels[label] for label in df['sentiment']]  
        # 對每筆資料做 BERT tokenize
        self.texts = [tokenizer(text, 
                               padding='max_length', max_length = 512, truncation=True,
                                return_tensors="pt") for text in df['review']]

    # 回傳資料集各類別 (id)
    def classes(self):
        return self.labels

    # 回傳該 label 的資料數
    def __len__(self):
        return len(self.labels)

    # 取得當前資料的 label
    def get_batch_labels(self, idx):
        # Fetch a batch of labels
        return np.array(self.labels[idx])

    # 取得當前資料的 text
    def get_batch_texts(self, idx):
        # Fetch a batch of inputs
        return self.texts[idx]

    def __getitem__(self, idx):
        batch_texts = self.get_batch_texts(idx)
        batch_y = self.get_batch_labels(idx)

        return batch_texts, batch_y

切分 訓練 (training)、驗證 (validation)、測試 (testing) 資料集   
這邊切分比例為 80:10:10  

註：
1. ```df.sample(frac=1, random_state=42)```：從 dataframe 隨機取樣  
   ```frac```：要抽取 dataframe 的比例 (0 ~ 1)  
   ```random_state```：隨機的狀態，可以想樣乘亂數表的位置 (？)  
2. ```np.split(..., [int(.6*len(df)), int(.8*len(df))])```：把陣列做分割  
   上述第二個參數的 list，代表分別取道的相對位置，分割個數 = 該 list 的長度，以本例子來說：  
   第一區塊：```int(.6*len(df))```：取到 df 的 60 % => 分割的第一組陣列是 0% ~ 60% 的 df  
   第二區塊：```int(.8*len(df))```：取到 df 的 80 % => 分割的第一組陣列是 60% ~ 80% 的 df  
   第三區塊：分割的第一組陣列是 80% ~ 100% 的 df  

In [4]:
np.random.seed(112)
df_train, df_val, df_test = np.split(df.sample(frac=1, random_state=42), 
                                     [int(.6*len(df)), int(.8*len(df))])

print(len(df_train),len(df_val), len(df_test))

30000 10000 10000


# Model Building
[**BertModel**](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertModel)：主要是 transformer 的 Encoder 部分  
繼承於 [PreTrainedModel](https://huggingface.co/docs/transformers/v4.23.1/en/main_classes/model#transformers.PreTrainedModel)，在 ```forward```中較常使用的參數如下：  
* ```input_ids```: 輸入序列  
* ```attention_mask```: 輸入序列對應的 mask，以告知 attention 略過他們  
  1: 該 token 是 真實的字、\[CLS\]、\[SEP\]  
  0: 該 token 是 \[PAD\]，即要被 mask 掉的部分
* ```return_dict ```: True: 回傳值是 [ModelOutput](https://huggingface.co/docs/transformers/v4.23.1/en/main_classes/output#transformers.utils.ModelOutput)；False: 回傳值是 tuple

若 ```return_dict = True```，BertModel 的輸出為 [transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions](https://huggingface.co/docs/transformers/v4.23.1/en/main_classes/output#transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions)  
而範例中，```return_dict = False```，BertModel 的輸出為 ```tuple(torch.FloatTensor)```，範例以 ```_``` 與 ```pooled_output``` 表示：
* ```_```: sequence output，為序列在 BERT 最後一層 hidden layer 的輸出
* ```pooled_output```: 取出 BERT 最後一層 layer中，\[CLS\] 對應的向量 (vector)  
由上可知，BERT 可以處理序列任務 (如輸出```_```) 及 分類、迴歸分析任務 (如輸出```pooled_output```)

我們把 ```pooled_output``` 經過 dropout、線性轉換、ReLU 激活函數後，我們在最後的線性層可以得到維度為 5 的向量，代表著我們資料及的分類 (sport, business, politics, entertainment, and tech)

In [5]:
from torch import nn
from transformers import BertModel

class BertClassifier(nn.Module):

    def __init__(self, dropout=0.5, path = "", pretrained_model = ""):
        super(BertClassifier, self).__init__()                      # 繼承 nn.Module

        self.bert = BertModel.from_pretrained(pretrained_model, num_labels=2)    # 選擇 BertModel
        self.dropout = nn.Dropout(dropout)                          # dropout = 0.5 => 去掉 50% neural，避免 overfitting
        self.linear = nn.Linear(768, 2)                             # BERT Base size: 768，每句有 2 種可能的分類要做選擇  
        self.relu = nn.ReLU()
        self.path = path

    def forward(self, input_id, mask):
        _, pooled_output = self.bert(input_ids= input_id, attention_mask=mask,return_dict=False)
        dropout_output = self.dropout(pooled_output)
        linear_output = self.linear(dropout_output)
        final_layer = self.relu(linear_output)

        return final_layer
    
    def save_model(self):
        self.bert.save_pretrained(self.path)
    

# Training Loop
* Epoch: 4
* Loss Function: Categorical cross entropy (因為要做複數類別分類)
* Optimizer: Adam
* Learning Rate: 10<sup>-6</sup> (1e-6)

In [None]:
from torch.optim import Adam
from tqdm import tqdm
import os

def train(model, train_data, val_data, learning_rate, epochs, model_name):

    if os.path.isfile("fine_tune_record_IMDB.csv"):
        rec = pd.read_csv("fine_tune_record_IMDB.csv")
    else:
        rec = pd.DataFrame({"model_name":[], "train_acc":[], "train_loss":[], "train_Recall":[], "train_precision":[], "train_f1":[],\
                             "val_acc":[], "val_loss":[], "val_Recall":[], "val_precision":[], "val_f1":[]})

    # 把原本的資料經過 Dataset 類別包裝起來
    train, val = Dataset(train_data), Dataset(val_data)

    # 把訓練、驗證資料集丟進 Dataloader 定義取樣資訊 (ex: 設定 batch_size...等等)
    train_dataloader = torch.utils.data.DataLoader(train, batch_size=3, shuffle=True)
    val_dataloader = torch.utils.data.DataLoader(val, batch_size=3)

    # 偵測有 GPU，有就用
    use_cuda = torch.cuda.is_available()
    device = torch.device("cuda" if use_cuda else "cpu")

    criterion = nn.CrossEntropyLoss()                       # Loss Function: Categorical cross entropy
    optimizer = Adam(model.parameters(), lr= learning_rate) # Optimizer: Adam

    if use_cuda:

            model = model.cuda()
            criterion = criterion.cuda()

    train_acc = []
    train_loss = []
    train_recall = []
    train_precision = []
    train_f1 = []
    val_acc = []
    val_loss = []
    val_recall = []
    val_precision = []
    val_f1 = []
    # 每次完整訓練 (每個 epoch) 要做的事
    for epoch_num in range(epochs):

            # ---------- 訓練的部分 ----------
            total_acc_train = 0
            total_loss_train = 0
            all_pred_train = []
            all_label_train = []

            # 這邊加上 tqdm 模組來顯示 dataloader 處理進度條
            # 所以在程式意義上，可以直接把這行當作 for train_input, train_label in train_dataloader:
            for train_input, train_label in tqdm(train_dataloader):

                # .to(device): 把東西 (tensor) 丟到 GPU 的概念
                train_label = train_label.to(device)
                mask = train_input['attention_mask'].to(device)
                input_id = train_input['input_ids'].squeeze(1).to(device)

                # 把 data 丟進 BERT
                output = model(input_id, mask)
                
                # 計算 Cross Entropy，以此計算 loss
                batch_loss = criterion(output, train_label.long())  # 參數解釋：(模型的輸出, 原本預計的輸出)
                total_loss_train += batch_loss.item()               # .item(): tensor 轉 純量
                
                # 看 model output "可能性最高" 的 label 是不是和 data 一樣，是的話，acc + 1
                acc = (output.argmax(dim=1) == train_label).sum().item()
                total_acc_train += acc

                all_pred_train += output.argmax(dim=1)
                all_label_train += train_label
                
                recall_train = recall_score(y_true=all_label_train, y_pred=all_pred_train)
                f1_train = f1_score(y_true=all_label_train, y_pred=all_pred_train)
                precision_train = precision_score(y_true=all_label_train, y_pred=all_pred_train)

                model.zero_grad()       # 清空前一次 Gradient
                batch_loss.backward()   # 根據 lost 計算 back propagation
                optimizer.step()        # 做 Gradient Decent
            
            # ---------- 驗證的部分 ----------
            total_acc_val = 0
            total_loss_val = 0
            all_pred_val = []
            all_label_val = []

            # 步驟和訓練時差不多，差在沒做 Gradient Decent
            with torch.no_grad():

                for val_input, val_label in val_dataloader:

                    val_label = val_label.to(device)
                    mask = val_input['attention_mask'].to(device)
                    input_id = val_input['input_ids'].squeeze(1).to(device)

                    output = model(input_id, mask)

                    batch_loss = criterion(output, val_label.long())
                    total_loss_val += batch_loss.item()
                    
                    acc = (output.argmax(dim=1) == val_label).sum().item()
                    total_acc_val += acc

                    all_pred_val += output.argmax(dim=1)
                    all_label_val += train_label
                    
                    recall_val = recall_score(y_true=all_label_val, y_pred=all_pred_val)
                    f1_val = f1_score(y_true=all_label_val, y_pred=all_pred_val)
                    precision_val = precision_score(y_true=all_label_val, y_pred=all_pred_val)
            
            train_loss.append(total_loss_train / len(train_data))
            train_acc.append(total_acc_train / len(train_data))
            train_recall.append(recall_train)
            train_precision.append(precision_train)
            train_f1.append(f1_train)
            val_loss.append(total_loss_val / len(val_data))
            val_acc.append(total_acc_val / len(val_data))
            val_recall.append(recall_val)
            val_precision.append(precision_val)
            val_f1.append(f1_val)
            print(
                f'Epochs: {epoch_num + 1} | Train Loss: {total_loss_train / len(train_data): .3f} \
                | Train Accuracy: {total_acc_train / len(train_data): .3f} \
                | Train Recall: {recall_train: .3f} \
                | Train precision: {precision_train: .3f} \
                | Train F1: {f1_train: .3f} \
                | Val Loss: {total_loss_val / len(val_data): .3f} \
                | Val Accuracy: {total_acc_val / len(val_data): .3f}\
                | Val Recall: {recall_val: .3f} \
                | Val precision: {precision_val: .3f} \
                | Val F1: {f1_val: .3f}')
            
    rec = pd.DataFrame({"model_name":[], "train_acc":[], "train_loss":[], "train_Recall":[], "train_precision":[], "train_f1":[],\
                             "val_acc":[], "val_loss":[], "val_Recall":[], "val_precision":[], "val_f1":[]})
    
    new_rec = pd.concat([rec, pd.DataFrame(pd.DataFrame({'model_name': model_name,\
                                                         'train_acc': [train_acc], 'train_loss': [train_loss], "train_Recall":[train_recall], "train_precision":[train_precision], "train_f1":[train_f1],\
                                                         'val_acc': [val_acc], 'val_loss': [val_loss], "val_Recall":[val_recall], "val_precision":[val_precision], "val_f1":[val_f1]}))], ignore_index=True)
    new_rec.to_csv("fine_tune_record_IMDB.csv", index = None)
    model.save_model()

EPOCHS = 8
model = BertClassifier(path="fine_tuned_model/saved_model_mask_dyn10", pretrained_model = '../saved_model/saved_model_mask_grow')
LR = 1e-6
              
train(model, df_train, df_val, LR, EPOCHS, "mask_dyn10")

In [9]:
EPOCHS = 8
model = BertClassifier(path="fine_tuned_model/saved_model_mask15", pretrained_model = 'saved_model/saved_model_mask15')
LR = 1e-6
              
train(model, df_train, df_val, LR, EPOCHS, "mask15")

Some weights of the model checkpoint at saved_model/saved_model_mask15 were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100%|██████████| 297/297 [01:27<00:00,  3.41it/s]


Epochs: 1 | Train Loss:  0.272                 | Train Accuracy:  0.255                 | Val Loss:  0.263                 | Val Accuracy:  0.306


100%|██████████| 297/297 [01:28<00:00,  3.37it/s]


Epochs: 2 | Train Loss:  0.245                 | Train Accuracy:  0.388                 | Val Loss:  0.220                 | Val Accuracy:  0.500


100%|██████████| 297/297 [01:27<00:00,  3.38it/s]


Epochs: 3 | Train Loss:  0.198                 | Train Accuracy:  0.599                 | Val Loss:  0.171                 | Val Accuracy:  0.667


100%|██████████| 297/297 [01:27<00:00,  3.38it/s]


Epochs: 4 | Train Loss:  0.150                 | Train Accuracy:  0.753                 | Val Loss:  0.123                 | Val Accuracy:  0.784


100%|██████████| 297/297 [01:28<00:00,  3.34it/s]


Epochs: 5 | Train Loss:  0.099                 | Train Accuracy:  0.874                 | Val Loss:  0.068                 | Val Accuracy:  0.932


100%|██████████| 297/297 [01:28<00:00,  3.37it/s]


Epochs: 6 | Train Loss:  0.058                 | Train Accuracy:  0.947                 | Val Loss:  0.039                 | Val Accuracy:  0.977


100%|██████████| 297/297 [01:29<00:00,  3.30it/s]


Epochs: 7 | Train Loss:  0.036                 | Train Accuracy:  0.968                 | Val Loss:  0.028                 | Val Accuracy:  0.977


100%|██████████| 297/297 [01:29<00:00,  3.33it/s]


Epochs: 8 | Train Loss:  0.024                 | Train Accuracy:  0.980                 | Val Loss:  0.019                 | Val Accuracy:  0.982


In [10]:
EPOCHS = 8
model = BertClassifier(path="fine_tuned_model/saved_model_mask6", pretrained_model = 'saved_model/saved_model_mask6')
LR = 1e-6
              
train(model, df_train, df_val, LR, EPOCHS, "mask6")

Some weights of the model checkpoint at saved_model/saved_model_mask6 were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100%|██████████| 297/297 [01:27<00:00,  3.41it/s]


Epochs: 1 | Train Loss:  0.271                 | Train Accuracy:  0.235                 | Val Loss:  0.261                 | Val Accuracy:  0.288


100%|██████████| 297/297 [01:28<00:00,  3.36it/s]


Epochs: 2 | Train Loss:  0.256                 | Train Accuracy:  0.314                 | Val Loss:  0.249                 | Val Accuracy:  0.338


100%|██████████| 297/297 [01:28<00:00,  3.37it/s]


Epochs: 3 | Train Loss:  0.219                 | Train Accuracy:  0.520                 | Val Loss:  0.192                 | Val Accuracy:  0.658


100%|██████████| 297/297 [01:28<00:00,  3.36it/s]


Epochs: 4 | Train Loss:  0.152                 | Train Accuracy:  0.795                 | Val Loss:  0.101                 | Val Accuracy:  0.928


100%|██████████| 297/297 [01:28<00:00,  3.36it/s]


Epochs: 5 | Train Loss:  0.082                 | Train Accuracy:  0.951                 | Val Loss:  0.067                 | Val Accuracy:  0.950


100%|██████████| 297/297 [01:27<00:00,  3.39it/s]


Epochs: 6 | Train Loss:  0.052                 | Train Accuracy:  0.971                 | Val Loss:  0.045                 | Val Accuracy:  0.964


100%|██████████| 297/297 [01:26<00:00,  3.42it/s]


Epochs: 7 | Train Loss:  0.036                 | Train Accuracy:  0.979                 | Val Loss:  0.036                 | Val Accuracy:  0.968


100%|██████████| 297/297 [01:27<00:00,  3.41it/s]


Epochs: 8 | Train Loss:  0.027                 | Train Accuracy:  0.981                 | Val Loss:  0.027                 | Val Accuracy:  0.968


In [11]:
EPOCHS = 8
model = BertClassifier(path="fine_tuned_model/saved_model_mask_dyn_grow1", pretrained_model = 'saved_model/saved_model_mask_dyn_grow1')
LR = 1e-6
              
train(model, df_train, df_val, LR, EPOCHS, "mask_dyn_grow1")

Some weights of the model checkpoint at saved_model/saved_model_mask_dyn_grow1 were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100%|██████████| 297/297 [01:25<00:00,  3.48it/s]


Epochs: 1 | Train Loss:  0.250                 | Train Accuracy:  0.339                 | Val Loss:  0.217                 | Val Accuracy:  0.455


100%|██████████| 297/297 [01:25<00:00,  3.46it/s]


Epochs: 2 | Train Loss:  0.218                 | Train Accuracy:  0.475                 | Val Loss:  0.199                 | Val Accuracy:  0.572


100%|██████████| 297/297 [01:26<00:00,  3.45it/s]


Epochs: 3 | Train Loss:  0.194                 | Train Accuracy:  0.593                 | Val Loss:  0.167                 | Val Accuracy:  0.667


100%|██████████| 297/297 [01:26<00:00,  3.44it/s]


Epochs: 4 | Train Loss:  0.156                 | Train Accuracy:  0.710                 | Val Loss:  0.130                 | Val Accuracy:  0.788


100%|██████████| 297/297 [01:26<00:00,  3.45it/s]


Epochs: 5 | Train Loss:  0.123                 | Train Accuracy:  0.780                 | Val Loss:  0.099                 | Val Accuracy:  0.851


100%|██████████| 297/297 [01:26<00:00,  3.44it/s]


Epochs: 6 | Train Loss:  0.090                 | Train Accuracy:  0.882                 | Val Loss:  0.066                 | Val Accuracy:  0.932


100%|██████████| 297/297 [01:26<00:00,  3.44it/s]


Epochs: 7 | Train Loss:  0.053                 | Train Accuracy:  0.971                 | Val Loss:  0.039                 | Val Accuracy:  0.968


100%|██████████| 297/297 [01:27<00:00,  3.41it/s]


Epochs: 8 | Train Loss:  0.031                 | Train Accuracy:  0.987                 | Val Loss:  0.029                 | Val Accuracy:  0.959


# Evaluate Model on Test Data
把最後 10% 的 Unseen Data 拿來做測試  
程式流程基本上和驗證差不多

In [12]:
def evaluate(model, test_data):

    # 把那 10% unseen data 拿來用
    test = Dataset(test_data)

    test_dataloader = torch.utils.data.DataLoader(test, batch_size=3)

    use_cuda = torch.cuda.is_available()
    device = torch.device("cuda" if use_cuda else "cpu")

    # 不用做 loss 了，所以不用丟 criterion 進去
    if use_cuda:

        model = model.cuda()

    total_acc_test = 0
    with torch.no_grad():

        for test_input, test_label in test_dataloader:

              test_label = test_label.to(device)
              mask = test_input['attention_mask'].to(device)
              input_id = test_input['input_ids'].squeeze(1).to(device)

              output = model(input_id, mask)

              acc = (output.argmax(dim=1) == test_label).sum().item()
              total_acc_test += acc
    
    print(f'Test Accuracy: {total_acc_test / len(test_data): .3f}')
    
evaluate(model, df_test)

Test Accuracy:  0.982


這邊再做個極端點的測試  
這邊再丟一個 [**新的 BBC 資料集**](https://www.kaggle.com/datasets/chalikamihiran/bbc-news-classification) 當作測試資料，看正確率會不會還是那麼高  
(註：該資料集有先經過處理，取 'text' 與 'category' 部分)

In [13]:
df2 = pd.read_csv("new_test/BBC_News_processed.csv")
evaluate(model, df2)

Test Accuracy:  0.987


In [14]:
model = None