# **模型調校（Model Tuning）**
此份程式碼會提供針對某資料集的模型調校策略，以及比較其超參數的選擇。

## 本章節內容大綱
* ### [損失函數（Loss function）](#LossFunction)
* ### [激活函數（Activation function）](#ActivationFunction)
* ### [優化器（Optimizer）](#Optimizer)
* ### [學習率（Learning rate）](#LearningRate)
* ### [模型架構（Model architecture）](#ModelArchitecture)

## 匯入套件

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm.auto import tqdm

# PyTorch 相關套件
import torch
import torch.nn as nn
import torch.nn.functional as F

## 創建資料集／載入資料集（Dataset Creating / Loading）

In [None]:
# 上傳資料
!wget -q https://github.com/TA-aiacademy/course_3.0/releases/download/DL/Data_part3.zip
!unzip -q Data_part3.zip

In [None]:
train_df = pd.read_csv('./Data/News_train.csv')
test_df = pd.read_csv('./Data/News_test.csv')

In [None]:
train_df.head()

* #### 新聞文章資料集
訓練集，測試集分別為 7728，1907 筆，4081 種常用字詞，若在同一篇新聞中出現該字詞為 1，若否則為 0，y_category 標記文章類別，共 11 種類別。

In [None]:
X_df = train_df.iloc[:, :-1].values
y_df = train_df.y_category.values

In [None]:
X_test = test_df.iloc[:, :-1].values
y_test = test_df.y_category.values

## 資料前處理（Data Preprocessing）

In [None]:
from sklearn.preprocessing import StandardScaler, MinMaxScaler
# Feature scaling
sc = StandardScaler()
X_scale = sc.fit_transform(X_df, y_df)
X_test_scale = sc.transform(X_test)

In [None]:
# train, valid/test dataset split
from sklearn.model_selection import train_test_split
X_train, X_valid, y_train, y_valid = train_test_split(X_scale, y_df,
                                                      test_size=0.2,
                                                      random_state=5566,
                                                      stratify=y_df)

In [None]:
print(f'X_train shape: {X_train.shape}')
print(f'X_valid shape: {X_valid.shape}')
print(f'y_train shape: {y_train.shape}')
print(f'y_valid shape: {y_valid.shape}')

In [None]:
# build dataset and dataloader
train_ds = torch.utils.data.TensorDataset(torch.tensor(X_train, dtype=torch.float32),
                                          torch.tensor(y_train, dtype=torch.long))
valid_ds = torch.utils.data.TensorDataset(torch.tensor(X_valid, dtype=torch.float32),
                                          torch.tensor(y_valid, dtype=torch.long))

BATCH_SIZE = 64
train_loader = torch.utils.data.DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True)
valid_loader = torch.utils.data.DataLoader(valid_ds, batch_size=BATCH_SIZE)

## 模型建置（Model Building）

In [None]:
NUM_CLASS = 11

In [None]:
torch.manual_seed(5566)

def build_model(input_shape, num_class):
    model = nn.Sequential(
        nn.Linear(input_shape, 16),
        nn.Sigmoid(),
        nn.Linear(16, 16),
        nn.Sigmoid(),
        nn.Linear(16, num_class),
    )
    return model

In [None]:
model = build_model(X_train.shape[1], NUM_CLASS)
print(model)

## 模型訓練（Model Training）

In [None]:
optimizer = torch.optim.RMSprop(model.parameters(), lr=0.001)
loss_fn = nn.CrossEntropyLoss() # 多元分類損失函數

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'device: {device}')
model = model.to(device)

In [None]:
def train_epoch(model, optimizer, loss_fn, train_dataloader, val_dataloader):
    # 訓練一輪
    model.train()
    total_train_loss = 0
    total_train_correct = 0
    for x, y in tqdm(train_dataloader, leave=False):
        x, y = x.to(device), y.to(device) # 將資料移至GPU
        y_pred = model(x) # 計算預測值
        if type(loss_fn) != nn.CrossEntropyLoss:
            y_pred = F.softmax(y_pred, dim=1)
            y = F.one_hot(y, num_classes=NUM_CLASS).float() # one-hot encoding
        loss = loss_fn(y_pred, y) # 計算誤差
        optimizer.zero_grad() # 梯度歸零
        loss.backward() # 反向傳播計算梯度
        optimizer.step() # 更新模型參數

        total_train_loss += loss.item()
        if type(loss_fn) != nn.CrossEntropyLoss:
            y = y.argmax(dim=1).long()
        # 利用argmax計算最大值是第n個類別，與解答比對是否相同
        total_train_correct += ((y_pred.argmax(dim=1) == y).sum().item())

    # 驗證一輪
    model.eval()
    total_val_loss = 0
    total_val_correct = 0
    # 關閉梯度計算以加速
    with torch.no_grad():
        for x, y in val_dataloader:
            x, y = x.to(device), y.to(device)
            y_pred = model(x)
            if type(loss_fn) != nn.CrossEntropyLoss:
                y_pred = F.softmax(y_pred, dim=1)
                y = F.one_hot(y, num_classes=NUM_CLASS).float() # one-hot encoding
            loss = loss_fn(y_pred, y)
            total_val_loss += loss.item()
            # 利用argmax計算最大值是第n個類別，與解答比對是否相同
            if type(loss_fn) != nn.CrossEntropyLoss:
                y = y.argmax(dim=1).long()
            total_val_correct += ((y_pred.argmax(dim=1) == y).sum().item())

    avg_train_loss = total_train_loss / len(train_dataloader)
    avg_train_acc = total_train_correct / len(train_dataloader.dataset)
    avg_val_loss = total_val_loss / len(val_dataloader)
    avg_val_acc = total_val_correct / len(val_dataloader.dataset)

    return avg_train_loss, avg_val_loss, avg_train_acc, avg_val_acc

In [None]:
def run(model, optimizer, loss_fn, train_loader, valid_loader, verbose=1):
    train_loss_log = []
    val_loss_log = []
    train_acc_log = []
    val_acc_log = []
    for epoch in tqdm(range(20)):
        avg_train_loss, avg_val_loss, avg_train_acc, avg_val_acc = train_epoch(model, optimizer, loss_fn, train_loader, valid_loader)
        train_loss_log.append(avg_train_loss)
        val_loss_log.append(avg_val_loss)
        train_acc_log.append(avg_train_acc)
        val_acc_log.append(avg_val_acc)
        if verbose == 1:
            print(f'Epoch: {epoch}, Train Loss: {avg_train_loss:.3f}, Val Loss: {avg_val_loss:.3f} | Train Acc: {avg_train_acc:.3f}, Val Acc: {avg_val_acc:.3f}')
    return train_loss_log, train_acc_log, val_loss_log, val_acc_log

In [None]:
train_loss_log, train_acc_log, val_loss_log, val_acc_log = run(model, optimizer, loss_fn, train_loader, valid_loader)

## 模型評估（Model Evaluation）

In [None]:
plt.figure(figsize=(15, 4))
plt.subplot(1, 2, 1)
plt.plot(range(len(train_loss_log)), train_loss_log, label='train_loss')
plt.plot(range(len(val_loss_log)), val_loss_log, label='valid_loss')
plt.xlabel('Epochs')
plt.ylabel('Binary crossentropy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(range(len(train_acc_log)), train_acc_log, label='train_acc')
plt.plot(range(len(val_acc_log)), val_acc_log, label='valid_acc')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

## 模型調校

![](https://hackmd.io/_uploads/SyE5RYIbT.png)


<a name="LossFuction"></a>
* ## 損失函數（Loss function）
torch.nn Loss function: https://pytorch.org/docs/stable/nn.html#loss-functions

In [None]:
# 以下放置要比較的 loss function
loss_funcs = [
    nn.MSELoss(),
    nn.CrossEntropyLoss(),
    nn.L1Loss(), # mean absolute error
]

# 建立兩個 list 記錄選用不同 loss function 的訓練結果
all_loss, all_acc = [], []

# 迭代不同的 loss function 去訓練模型
for loss_fn in loss_funcs:
    print(f'Running model, loss = {loss_fn}')

    # 確保每次都是訓練新的模型，而不是接續上一輪的模型
    model = build_model(X_train.shape[1], NUM_CLASS)

    optimizer = torch.optim.RMSprop(model.parameters(), lr=0.001)
    model = model.to(device)

    # # 確保每次都設定一樣的參數
    history = run(model, optimizer, loss_fn, train_loader, valid_loader, verbose=0)

    # 將訓練過程記錄下來
    all_loss.append(history[0])
    all_acc.append(history[1])
print('----------------- training done! -----------------')

In [None]:
# 視覺化訓練過程
plt.figure(figsize=(15, 7))

# 繪製 Training loss
plt.subplot(121)
for k in range(len(loss_funcs)):
    plt.plot(range(len(all_loss[k])), all_loss[k], label=loss_funcs[k])
plt.title('Loss')

# 繪製 Training accuracy
plt.subplot(122)
for k in range(len(loss_funcs)):
    plt.plot(range(len(all_acc[k])), all_acc[k], label=loss_funcs[k])
plt.title('Accuracy')

plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=2.)
plt.ylim((0, 1))
plt.show()

---
![](https://hackmd.io/_uploads/BknsRtLZa.png)

---

<a name="ActivationFuction"></a>
* ## 激活函數（Activation function）
torch.nn: https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity

In [None]:
def build_model_activation(input_shape, num_class, activation):
    torch.manual_seed(5566)
    # 重新建構一個可以更改 Activation 的模型
    model = nn.Sequential(
        nn.Linear(input_shape, 16),
        activation(),
        nn.Linear(16, 16),
        activation(),
        nn.Linear(16, num_class),
    )
    return model

In [None]:
# 以下放置要比較的 activation function
activation_funcs = [
    nn.Identity,
    nn.Sigmoid,
    nn.Tanh,
    nn.ReLU,
    nn.Softplus,
    nn.LeakyReLU,
    nn.Mish,
]

# 建立兩個 list 記錄選用不同 activation function 的訓練結果
all_loss, all_acc = [], []

# 迭代不同的 activation function 去訓練模型
for activation_f in activation_funcs:
    print(f'Running model, activation = {activation_f}')

    # 確保每次都是訓練新的模型，而不是接續上一輪的模型
    model = build_model_activation(X_train.shape[1],
                                   NUM_CLASS,
                                   activation_f)
    model = model.to(device)
    optimizer = torch.optim.RMSprop(model.parameters(), lr=0.001)
    loss_fn = nn.CrossEntropyLoss() # 多元分類損失函數

    history = run(model, optimizer, loss_fn, train_loader, valid_loader, verbose=0)

    # 將訓練過程記錄下來
    all_loss.append(history[0])
    all_acc.append(history[1])
print('----------------- training done! -----------------')

In [None]:
# 視覺化訓練過程
plt.figure(figsize=(15, 7))

# 繪製 Training loss
plt.subplot(121)
for k in range(len(activation_funcs)):
    plt.plot(range(len(all_loss[k])), all_loss[k], label=activation_funcs[k])
plt.title('Loss')

# 繪製 Training accuracy
plt.subplot(122)
for k in range(len(activation_funcs)):
    plt.plot(range(len(all_acc[k])), all_acc[k], label=activation_funcs[k])
plt.title('Accuracy')

plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

plt.ylim((0, 1))
plt.show()

In [None]:
# 視覺化訓練過程
plt.figure(figsize=(15, 4))

# 繪製 Training loss
plt.subplot(121)
for k in range(len(activation_funcs)):
    plt.plot(range(len(all_loss[k])), all_loss[k], label=activation_funcs[k])
plt.title('Loss')
plt.ylim((0, 0.3))

# 繪製 Training accuracy
plt.subplot(122)
for k in range(len(activation_funcs)):
    plt.plot(range(len(all_acc[k])), all_acc[k], label=activation_funcs[k])
plt.title('Accuracy')

plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

plt.ylim((0.95, 0.975))
plt.show()

---
![](https://hackmd.io/_uploads/BJ1pAY8bT.png)

---

<a name="Optimizer"></a>
* ## 優化器（Optimizer）
torch.optim: https://pytorch.org/docs/stable/optim.html#algorithms

In [None]:
# 以下放置要比較的 optimizer
optimizer_funcs = [
    torch.optim.SGD,
    torch.optim.RMSprop,
    torch.optim.Adam,
    torch.optim.NAdam,
]

# 建立兩個 list 記錄選用不同 optimizer 的訓練結果
all_loss, all_acc = [], []

# 迭代不同的 optimizer 去訓練模型
for optimizer_f in optimizer_funcs:
    print(f'Running model, optimizer = {optimizer_f}')

    # 確保每次都是訓練新的模型，而不是接續上一輪的模型
    model = build_model_activation(X_train.shape[1],
                                   NUM_CLASS,
                                   nn.Tanh)
    model = model.to(device)
    optimizer = optimizer_f(model.parameters())
    loss_fn = nn.CrossEntropyLoss() # 多元分類損失函數

    # 確保每次都設定一樣的參數
    history = run(model, optimizer, loss_fn, train_loader, valid_loader, verbose=0)
    # 將訓練過程記錄下來
    all_loss.append(history[0])
    all_acc.append(history[1])

print('----------------- training done! -----------------')

In [None]:
# 視覺化訓練過程
plt.figure(figsize=(15, 7))

# 繪製 Training loss
plt.subplot(121)
for k in range(len(optimizer_funcs)):
    plt.plot(range(len(all_loss[k])), all_loss[k], label=optimizer_funcs[k])
plt.title('Loss')

# 繪製 Training accuracy
plt.subplot(122)
for k in range(len(optimizer_funcs)):
    plt.plot(range(len(all_acc[k])), all_acc[k], label=optimizer_funcs[k])
plt.title('Accuracy')

plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=2.)
plt.ylim((0, 1))
plt.show()

<a name="LearningRate"></a>
* ## 學習率（Learning rate）

In [None]:
# 以下放置要比較的 learning rate
lr_list = [0.1, 0.01, 0.001, 0.0001, 0.00001]

# 建立兩個 list 記錄選用不同 learning rate 的訓練結果
all_loss, all_acc = [], []

# 迭代不同的 learning rate 去訓練模型
for lr in lr_list:
    print(f'Running model, learning rate = {lr}')

    # 確保每次都是訓練新的模型，而不是接續上一輪的模型
    model = build_model_activation(X_train.shape[1],
                                   NUM_CLASS,
                                   nn.Tanh)
    model = model.to(device)
    optimizer = optimizer_f(model.parameters(), lr=lr)
    loss_fn = nn.CrossEntropyLoss() # 多元分類損失函數

    # 確保每次都設定一樣的參數
    history = run(model, optimizer, loss_fn, train_loader, valid_loader, verbose=0)
    # 將訓練過程記錄下來
    all_loss.append(history[0])
    all_acc.append(history[1])
print('----------------- training done! -----------------')

In [None]:
# 視覺化訓練過程
plt.figure(figsize=(15, 7))

# 繪製 Training loss
plt.subplot(121)
for k in range(len(lr_list)):
    plt.plot(range(len(all_loss[k])), all_loss[k], label=lr_list[k])
plt.title('Loss')

# 繪製 Training accuracy
plt.subplot(122)
for k in range(len(lr_list)):
    plt.plot(range(len(all_acc[k])), all_acc[k], label=lr_list[k])
plt.title('Accuracy')

plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=2.)
plt.ylim((0, 1))
plt.show()

<a name="ModelArchitecture"></a>
* ## 模型架構（Model architecture）

In [None]:
def build_model_architecture(input_shape, num_class, layer, neuron):
    torch.manual_seed(5566)
    layers = []
    input_dim = input_shape
    for i in range(layer):
        layers.append(nn.Linear(input_dim, neuron))
        layers.append(nn.Tanh())
        input_dim = neuron
    layers.append(nn.Linear(input_dim, num_class))
    model = nn.Sequential(*layers)
    return model

In [None]:
# 以下放置要比較的 layers/ neurons
layers_num = [1, 2, 3]
neurons_num = [16, 32, 64]

batch_size = 64
epochs = 20

# 建立兩個 list 記錄選用不同 layers/ neurons 的訓練結果
all_loss, all_acc = [], []

# 迭代不同的 layers/ neurons 去訓練模型
for layer in layers_num:
    for neuron in neurons_num:
        print(f'Running model, (layer, neuron) = {(layer, neuron)}')

        # 確保每次都是訓練新的模型，而不是接續上一輪的模型
        model = build_model_architecture(X_train.shape[1],
                                         NUM_CLASS,
                                         layer,
                                         neuron)
        model = model.to(device)
        optimizer = torch.optim.NAdam(model.parameters())

        # 確保每次都設定一樣的參數
        history = run(model, optimizer, loss_fn, train_loader, valid_loader, verbose=0)
        # 將訓練過程記錄下來
        all_loss.append(history[0])
        all_acc.append(history[1])
print('----------------- training done! -----------------')

In [None]:
layer_neuron = list(zip(sum([[i]*3 for i in layers_num], []), neurons_num*3))

# 視覺化訓練過程
plt.figure(figsize=(15, 7))

# 繪製 Training loss
plt.subplot(121)
for k in range(len(layer_neuron)):
    plt.plot(range(len(all_loss[k])), all_loss[k], label=layer_neuron[k])
plt.title('Loss')

# 繪製 Training accuracy
plt.subplot(122)
for k in range(len(layer_neuron)):
    plt.plot(range(len(all_acc[k])), all_acc[k], label=layer_neuron[k])
plt.title('Accuracy')

plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

plt.ylim((0, 1))
plt.show()

In [None]:
# 視覺化訓練過程
plt.figure(figsize=(15, 4))

# 繪製 Training loss
plt.subplot(121)
for k in range(len(layer_neuron)):
    plt.plot(range(len(all_loss[k])), all_loss[k], label=layer_neuron[k])
plt.title('Loss')
plt.ylim((0.05, 0.1))

# 繪製 Training accuracy
plt.subplot(122)
for k in range(len(layer_neuron)):
    plt.plot(range(len(all_acc[k])), all_acc[k], label=layer_neuron[k])
plt.title('Accuracy')

plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.ylim((0.96, 1.))
plt.show()

---
### Quiz
請試著利用 Data/pkgo_train.csv 做多元分類問題，預測五個種類的 pokemon，並調整模型（網路層數、神經元數目、激活函數）以及訓練相關的參數得到更高的準確度。