### **主題：啤酒評論評分預測 - 分類模型建構**
### **說明：**
繼續上次啤酒的評鑑資料集的練習，這次的目標是把啤酒評分的預測當作分類問題，建構 BERT 模型，評估其各項屬性(apperance, aroma, overall, palate, taste)得分。<br />
特別需要注意的是，與課程中範例不同的地方在於這次必須預測多個目標，也就是典型的多標籤問題(multi-label classification)
### **題目：**
1. 以上次處理好的啤酒資料為範例，建構相對應的 Dataset 與 Dataloader(完成底下的 BeerDataset 與 create_data_loader)
2. 建構主要模型的架構(完成底下的 BeerRateClassifier)
3. 完成最後的訓練流程並得到權重檔，確認模型架構沒有問題
#### **提示1: 若因 GPU 限制無法快速訓練，可以考慮調低訓練回合數，MAX_LEN，或選擇較小的 BERT 模型。**
#### **提示2: 若對 multi-labeling 問題建構不知從何下手，可以參考[範例](https://www.learnopencv.com/multi-label-image-classification-with-pytorch/)**

In [1]:
import os
import numpy as np
import pandas as pd

import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader
from transformers import (
    BertModel,
    BertTokenizer,
    AdamW,
    get_linear_schedule_with_warmup
)

import warnings
warnings.filterwarnings('ignore')

In [2]:
PRE_TRAINED_MODEL_NAME = 'bert-base-cased'
BATCH_SIZE = 16
MAX_LEN = 256
EPOCHS = 10

DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
TOKENIZER = BertTokenizer.from_pretrained(PRE_TRAINED_MODEL_NAME)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…




In [3]:
class BeerDataset(Dataset):
    """
    將資料集轉換為 DataLoader 需求的 Dataset 形式
    Convert beer review dataframe into torch dataset instance
    """
    def __init__(self,
                 comments,
                 apperance_target,
                 aroma_target,
                 overall_target,
                 palate_target,
                 taste_target,
                 max_len):
        self.comments = comments
        self.apperance_target = apperance_target
        self.aroma_target = aroma_target
        self.overall_target = overall_target
        self.palate_target = palate_target
        self.taste_target = taste_target
        self.max_len = max_len

    def __len__(self):
        return len(self.comments)

    def __getitem__(self, item):
        comment = str(self.comments[item])
        apperance_target = self.apperance_target[item]
        aroma_target = self.aroma_target[item]
        overall_target = self.overall_target[item]
        palate_target = self.palate_target[item]
        taste_target = self.taste_target[item]
        encoding = TOKENIZER.encode_plus(
            comment,
            max_length=self.max_len,
            truncation=True,
            add_special_tokens=True,
            return_token_type_ids=False,
            pad_to_max_length=True,
            return_attention_mask=True,
            return_tensors='pt',
        )

        return {
            'comment': comment,
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'apperance_target': torch.LongTensor([apperance_target]),
            'aroma_target': torch.LongTensor([aroma_target]),
            'overall_target': torch.LongTensor([overall_target]),
            'palate_target': torch.LongTensor([palate_target]),
            'taste_target': torch.LongTensor([taste_target])
        }

In [4]:
def create_data_loader(dataframe, max_len, batch_size):
    """
    將 Dataset 包裝為 DataLoader
    convert dataset to pytorch dataloader format object
    """
    dataset = BeerDataset(
        comments=dataframe['review/text'],
        apperance_target=dataframe.review_appearance,
        aroma_target=dataframe.review_aroma,
        overall_target=dataframe.review_overall,
        palate_target=dataframe.review_palate,
        taste_target=dataframe.review_taste,
        max_len=max_len
    )

    return DataLoader(
        dataset,
        batch_size=batch_size
    )

In [5]:
class BeerRateClassifier(nn.Module):
    """
    啤酒評論評分分類模型主體
    Main model of beer sentiment for review sentiment analyzer
    """
    def __init__(self,
                 apperance_n_classes,
                 aroma_n_classes,
                 overall_n_classes,
                 palate_n_classes,
                 taste_n_classes):
        super(BeerRateClassifier, self).__init__()
        aspects = {
            'apperance': apperance_n_classes,
            'aroma': aroma_n_classes,
            'overall': overall_n_classes,
            'palate': palate_n_classes,
            'taste': taste_n_classes
        }

        self.bert = BertModel.from_pretrained(PRE_TRAINED_MODEL_NAME)
        self.aspect_outs = nn.ModuleDict({
            aspect: nn.Linear(self.bert.config.hidden_size, n_classes)
            for aspect, n_classes in aspects.items()  
        })
        self.drop = nn.Dropout(0.2)

    def forward(self, input_ids, attention_mask):
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        out = self.drop(outputs.pooler_output)
        aspect_outputs = {
            aspect: aspect_out(out)
            for aspect, aspect_out in self.aspect_outs.items()
        }

        return aspect_outputs

In [6]:
def train_epoch(model,
                data_loader,
                loss_fn,
                optimizer,
                scheduler,
                n_examples):
    """
    分類器的主要訓練流程
    Main training process of bert sentiment classifier
    """
    model = model.train()

    losses = []
    correct_predictions = 0.
    for batch in data_loader:
        input_ids = batch['input_ids'].to(DEVICE)
        attention_mask = batch['attention_mask'].to(DEVICE)
        outputs = model(input_ids=input_ids, attention_mask=attention_mask)

        preds = {
            aspect: torch.max(output, dim=1)[1]
            for aspect, output in outputs.items()
        }
        targets = {
            aspect: batch[f"{aspect}_target"].view(-1).to(DEVICE)
            for aspect in preds.keys()
        }
        aspect_losses = {
            aspect: loss_fn(outputs[aspect], targets[aspect])
            for aspect in preds.keys()
        }
        correct_predictions += sum([
            torch.sum(preds[aspect] == targets[aspect]).item() for aspect in preds.keys()
        ])

        loss = torch.stack([val for _, val in aspect_losses.items()]).sum()
        losses.append(loss.item())

        loss.backward()
        nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()

    return correct_predictions / n_examples / 5, np.mean(losses)

In [7]:
def eval_model(model,
               data_loader,
               loss_fn,
               n_examples):
    """
    分類器訓練時，每個 epoch 評估流程
    Main evaluate process in training of bert sentiment classifier
    """
    model = model.eval()

    losses = []
    correct_predictions = 0.
    with torch.no_grad():
        for batch in data_loader:
            input_ids = batch['input_ids'].to(DEVICE)
            attention_mask = batch['attention_mask'].to(DEVICE)
            outputs = model(
                input_ids=input_ids,
                attention_mask=attention_mask
            )

            preds = {
                aspect: torch.max(output, dim=1)[1]
                for aspect, output in outputs.items()
            }
            targets = {
                aspect: batch[f"{aspect}_target"].view(-1).to(DEVICE)
                for aspect in preds.keys()
            }
            aspect_losses = {
                aspect: loss_fn(outputs[aspect], targets[aspect])
                for aspect in preds.keys()
            }
            correct_predictions += sum([
                torch.sum(preds[aspect] == targets[aspect]).item() for aspect in preds.keys()
            ])

            loss = torch.stack([val for _, val in aspect_losses.items()]).sum()
            losses.append(loss.item())

    return correct_predictions / n_examples / 5, np.mean(losses)

In [8]:
TRAIN = pd.read_json(os.path.join('data', 'train_set.json'), encoding='utf-8')
TRAIN = TRAIN.sample(frac=1).reset_index(drop=True)
VAL = pd.read_json(os.path.join('data', 'test_set.json'), encoding='utf-8')
VAL = VAL.sample(frac=1).reset_index(drop=True)

In [9]:
MODEL = BeerRateClassifier(4, 4, 4, 4, 4)
MODEL.to(DEVICE)

TRAIN_DATA_LOADER = create_data_loader(TRAIN, MAX_LEN, BATCH_SIZE)
VAL_DATA_LOADER = create_data_loader(VAL, MAX_LEN, BATCH_SIZE)

OPTIMIZER = AdamW(MODEL.parameters(), lr=2e-5, correct_bias=False)
TOTAL_STEPS = len(TRAIN_DATA_LOADER) * EPOCHS
SCHEDULER = get_linear_schedule_with_warmup(
    OPTIMIZER,
    num_warmup_steps=TOTAL_STEPS // 10,
    num_training_steps=TOTAL_STEPS
)
LOSS_FN = nn.CrossEntropyLoss().to(DEVICE)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=433.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=435779157.0, style=ProgressStyle(descri…




In [10]:
BEST_ACCURACY = 0

for epoch in range(EPOCHS):
    print(f"Epoch {epoch + 1}/{EPOCHS}")
    print('-' * 10)

    train_acc, train_loss = train_epoch(
        MODEL,
        TRAIN_DATA_LOADER,
        LOSS_FN,
        OPTIMIZER,
        SCHEDULER,
        len(TRAIN)
    )
    print(f"Train loss {train_loss}, accuracy {train_acc}")

    val_acc, val_loss = eval_model(
        MODEL,
        VAL_DATA_LOADER,
        LOSS_FN,
        len(VAL)
    )
    print(f"Val   loss {val_loss}, accuracy {val_acc}")
    print()

    if val_acc > BEST_ACCURACY:
        MODEL.bert.save_pretrained('.')
        best_accuracy = val_acc

Epoch 1/10
----------
Train loss 4.929799111085221, accuracy 0.5110888888888889
Val   loss 4.869084894466705, accuracy 0.42607999999999996

Epoch 2/10
----------
Train loss 4.400166407116274, accuracy 0.573608888888889
Val   loss 4.941175489760816, accuracy 0.42944000000000004

Epoch 3/10
----------
Train loss 4.071237479669955, accuracy 0.6172311111111111
Val   loss 5.608161236150577, accuracy 0.39332

Epoch 4/10
----------
Train loss 3.736957794696972, accuracy 0.65724
Val   loss 6.949976505182041, accuracy 0.32792

Epoch 5/10
----------
Train loss 3.382561741966901, accuracy 0.6973022222222223
Val   loss 7.725731462716294, accuracy 0.34371999999999997

Epoch 6/10
----------
Train loss 3.051494197280984, accuracy 0.7325066666666666
Val   loss 8.87326494344888, accuracy 0.35308

Epoch 7/10
----------
Train loss 2.7533949413970826, accuracy 0.7626533333333334
Val   loss 9.453801775130982, accuracy 0.37476

Epoch 8/10
----------
Train loss 2.5196749772671234, accuracy 0.7858666666666666