# 優化模型

我有了模型和資料之後，該是訓練模型的時候

在訓練中 我們會分成train, validate and test來優化並驗證模型參數(W parameters)

在一個epoch中(也就是一個iterate)，模型會根據收到的資料透過運算的出模型猜的答案

再來計算真實答案與模型答案的差距(loss)，並透過梯度下降法來達到優化模型的動作

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

# 資料集

In [2]:
training_data = datasets.FashionMNIST(
    root='../data',
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root='../data',
    train=False,
    download=True,
    transform=ToTensor()
)

# Dataloader 資料讀取器

In [3]:
train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)

# 定義的model

In [4]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

In [5]:
FCModel = NeuralNetwork()

# 設定超參數

In [6]:
learning_rate = 1e-3
batch_size = 64
epochs = 5

## Add an optimization loop

Once we set our hyperparameters, we can then train and optimize our model with an optimization loop. Each 
iteration of the optimization loop is called an **epoch**. 

Each epoch consists of two main parts:
 - **The Train Loop** - iterate over the training dataset and try to converge to optimal parameters.
 - **The Validation/Test Loop** - iterate over the test dataset to check if model performance is improving.

Let's briefly familiarize ourselves with some of the concepts used in the training loop. Jump ahead to 
see the `full-impl-label` of the optimization loop.

### Add a loss function

When presented with some training data, our untrained network is likely not to give the correct 
answer. **Loss function** measures the degree of dissimilarity of obtained result to the target value, 
and it is the loss function that we want to minimize during training. To calculate the loss we make a 
prediction using the inputs of our given data sample and compare it against the true data label value.

Common loss functions include:
- `nn.MSELoss` (Mean Square Error) used for regression tasks
- `nn.NLLLoss` (Negative Log Likelihood) used for classification
- `nn.CrossEntropyLoss` combines `nn.LogSoftmax` and `nn.NLLLoss`

We pass our model's output logits to `nn.CrossEntropyLoss`, which will normalize the logits and compute the prediction error.

In [7]:
# 初始化loss function
loss_fn = nn.CrossEntropyLoss()

### Optimization pass

Optimization is the process of adjusting model parameters to reduce model error in each training step. **Optimization algorithms** define how this process is performed (in this example we use Stochastic Gradient Descent).
All optimization logic is encapsulated in  the ``optimizer`` object. Here, we use the SGD optimizer; additionally, there are many different optimizers
available in PyTorch such as `ADAM' and 'RMSProp`, that work better for different kinds of models and data.

We initialize the optimizer by registering the model's parameters that need to be trained, and passing in the learning rate hyperparameter.



In [8]:
optimizer = torch.optim.SGD(FCModel.parameters(), lr = learning_rate)

Inside the training loop, optimization happens in three steps:
 * Call `optimizer.zero_grad()` to reset the gradients of model parameters. Gradients by default add up; to prevent double-counting, we explicitly zero them at each iteration.
 * Back-propagate the prediction loss with a call to `loss.backwards()`. PyTorch deposits the gradients of the loss w.r.t. each parameter. 
 * Once we have our gradients, we call ``optimizer.step()`` to adjust the parameters by the gradients collected in the backward pass.

In [9]:
def train_loop(dataloader : DataLoader, model: nn.Module, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        # 模型執行預測並計算loss值
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        # 為了保險，先將所有剃度初始化
        # 而在計算loss.backward()時並不會將梯度歸零，所以會梯度會被累計
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0 :
            loss, current = loss.item(), batch * len(X)
            print(f'loss: {loss:>7f}, [{current:>5d} / {size:>5d}]')

In [10]:
def test_loop(dataloder, model, loss_fn):
    size = len(dataloder.dataset)
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloder:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= size
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [11]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(FCModel.parameters(), lr=learning_rate)

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, FCModel, loss_fn, optimizer)
    test_loop(test_dataloader, FCModel, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.300768, [    0 / 60000]
loss: 2.299737, [ 6400 / 60000]
loss: 2.292356, [12800 / 60000]
loss: 2.289424, [19200 / 60000]
loss: 2.292302, [25600 / 60000]
loss: 2.265813, [32000 / 60000]
loss: 2.274067, [38400 / 60000]
loss: 2.260820, [44800 / 60000]
loss: 2.240176, [51200 / 60000]
loss: 2.232794, [57600 / 60000]
Test Error: 
 Accuracy: 29.0%, Avg loss: 0.035293 

Epoch 2
-------------------------------
loss: 2.235668, [    0 / 60000]
loss: 2.255345, [ 6400 / 60000]
loss: 2.229270, [12800 / 60000]
loss: 2.244750, [19200 / 60000]
loss: 2.245237, [25600 / 60000]
loss: 2.179070, [32000 / 60000]
loss: 2.209753, [38400 / 60000]
loss: 2.176279, [44800 / 60000]
loss: 2.135143, [51200 / 60000]
loss: 2.144182, [57600 / 60000]
Test Error: 
 Accuracy: 37.1%, Avg loss: 0.034074 

Epoch 3
-------------------------------
loss: 2.135780, [    0 / 60000]
loss: 2.181825, [ 6400 / 60000]
loss: 2.144078, [12800 / 60000]
loss: 2.184676, [19200 / 60000]
loss: 2.

Saving Models
-------------

When you are satisfied with the model's performance, you can use `torch.save` to save it. PyTorch models store the learned parameters in an internal state dictionary, called `state_dict`. These can be persisted wit the `torch.save` method:

In [14]:
torch.save(FCModel.state_dict(), "../data/model.pth")

print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth
