# Optimizing model parameters
Now that we have a model and data it's time to train, validate and test our model by optimizing its parameters on our data.
Training a model is an iterative process; in each iteration (called an _epoch_) the model makes a guess about the output, calculates the error in its guess (_loss_), collects the derivatives of the error with respect to its parameters (as we sae in the [previous section](https://pytorch.org/tutorials/beginner/basics/autograd_tutorial.html)), and **optimizes** these parameters using gradient descent.
For a more detailed walkthrough of this process, check out this video on [backpropagation from 3Blue1Brown](https://www.youtube.com/watch?v=tIeHLnjs5U8).


モデルとデータがあるので，データに対してパラメータを最適化して，モデルを訓練，検証，テストをする．
モデルを訓練することは反復プロセスである．各反復をエポックといい，エポックではモデルは出力について推測し，推測の誤差(損失; loss)を計算し，パラメータに関する誤差の微分を収集し，勾配降下法を用いてパラメータを最適化する．
この過程の詳細なウォークスルーはビデオを参照する．

## Prerequisite Code
We load the code from the previous sections on [Datasets & DataLoaders](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html) and [Build Model](https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html).


以前のセクション(Datasets & DataLoadersとBuild Model)のコードを使う．

In [2]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda
print(f'torch version: {torch.__version__}')

torch version: 1.8.1


In [3]:
training_data = datasets.FashionMNIST(
    root='data',
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root='data',
    train=False,
    download=True,
    transform=ToTensor()
)

In [4]:
train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)

In [5]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )
    
    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

In [7]:
model = NeuralNetwork()

## Hyperparameters
Hyperparameters are adjutable parameters that let you control the model optimization process.
Different hyperparameter values can impact model training and convergence rates ([read more](https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html) about hyperparameter tuning).

We define the following hyperparameters for training:

- Number of Epochs - the number times to iterate over the dataset
- Batch Size - the number of data samples propagated through the network before the parameters are updated
- Learning Rate - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training


ハイパーパラメータはモデルの最適化過程を制御することができる調整可能なパラメータである．
異なるハイパーパラメータの値はモデルの訓練と収束率に影響を与える．(詳細は上記のURLを見る)

訓練用に次のハイパーパラメータを定義する．

- エポック数：データセットを反復処理する回数
- バッチサイズ：パラメータを更新する前にサンプルがネットワークを伝播する回数
- 学習率：一回のバッチ/エポックでモデルのパラメータを更新する諒．小さいほど学習速度が遅くなり，大きいと訓練中に予期せぬ動作が発生することがある．

In [8]:
learning_rate = 1e-3
batch_size = 64
epochs = 5

## Optimization Loop
Once we set our hyperparameters, we can then train and optimize our model with an optimization loop.
Each iteration of the optimization loop is called an **epoch**.

Each epoch consists of two main parts:

- **The Train Loop** - iterate over the training dataset and try to converge to optimal parameters
- **The Validation/Test Loop** - iterate over the test dataset to check if model performance is improving

Let's briefly familiarize ourselves with some of the concepts used in the training loop.
Jump ahead to see the Full Implementation of the optimization loop(\* under Full Implementation).


ハイパーパラメータを設定したら，最適化ループを用いてモデルを訓練し最適化する．
最適ループのそれぞれの繰り返しを**エポック**という．

それぞれのエポックは主に二つのパートからなり，

- 訓練ループ；訓練セットを反復処理し，最適なパラメータに収束しようとする
- 検証/テストループ；テストセットを反復処理し，モデルの性能が向上したかを調べる

訓練ループ中で用いられるいくつかの概念について簡単に理解しましょう．
先に進んで最適化ループの完全な実装を見てください．

## Loss Function
When presented with some training data, our untrained network is likely not to give the correct answer.
**Loss function** measures the degree of dissimilarity of obtained result to the target value, and it is the loss function that we want to minimize during training.
To calculate the loss we make a prediction using the inputs of our given data sample and compare it against the true data label value.

Common loss functions include [nn.MSELoss](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss) (Mean Square Error) for regression tasks, and [nn.NLLLoss](https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss) (Negative Log Likelihood) for classification.
[nn.CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss) combines `nn.LogSoftmax` and `nn.NLLLoss`.

We pass our model's output logits to `nn.CrossEntropyLoss`, which will normalize the logits and compute the prediction error.


訓練データがある時に，訓練されていないネットワークでは正しい答えが得られない可能性がある．
**損失関数**は得られた結果と目標値の差の程度を表す指標であり，訓練の過程で最小化したいのが損失関数である．
損失を計算するために，与えられたデータサンプルの入力値を用いて予測値を計算し，それと真の値を比較する．

一般的な損失関数は回帰問題用のnn.MSELossや分類問題用のnn.NLLLoss，`nn.LogSoftmax`と`nn.NLLLoss`を合わせた`nn.CrosEntropyLoss`に含まれている．

モデルの出力ロジットを`nn.CrossEntropyLoss`に渡し，ロジットを正規化して予測誤差を計算する．

In [9]:
# Initialize the loss function
loss_fn = nn.CrossEntropyLoss()

## Optimizer
Optimization is the process of adjusting model parameters to reduce model error in each training step.
**Optimization algorithm** define how this process is performed (in this example we use Stochastic Gradient Descent).
All optimization logic is encapsulated in the `optimizer` object.
Here, we use the SGD optimizer; additionally, there are many [different optimizers](https://pytorch.org/docs/stable/optim.html) available in PyTorch such as ADAM and RMSProp, that work better for different kinds of models and data.

We initialize the oprimizer by registering the model's parameters that need to be trained, and passing in the learning rate hyperparameter.

最適化はモデルのパラメータを調整する過程であり，毎回の訓練過程を通してモデル誤差を減らす．
最適化アルゴリズムは最適化過程の実行方法を定義する(この例では，確率的勾配降下法; SGDを用いる)．
全ての最適化ロジックは`optimizer`オブジェクト内にカプセル化されている．
ここでは，SGD最適化を用いる．PyTrochにはADAMやRMSPropといった様々なオプティマイザが使用でき，様々なモデルやデータに適している．

訓練するモデルパラメータをオプティマイザに登録し，ハイパーパラメータである学習率を渡すことで，オプティマイザを初期化する．

In [11]:
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Inside the training loop, optimization happens in three steps:

- Call `optimizer.zero_grad()` to reset the grafients of model parameters. Gradients by default add up; to prevent double-dounting, we explicity zero them at each iteration.
- Backpropagate the prediction loss with a call to `loss.backward()`. PyTorch deposits the gradients of the loss w.r.t each paramter.
- Once we have our gradients, we call `optimizer.step()` to adjust the parameters by the gradients collected in the backward pass.


訓練ループ内では，最適化は3つのステップで行われる．

- モデルパラメータの勾配をリセットするために，`optimizer.zero_grad()`を呼び出す．デフォルトでは勾配は加算されていくので，二重カウントを防ぐため，各反復毎に明示的にゼロにする．
- `loss.backward()`を呼び出して，予測損失を逆伝播する．PyTorchは各パラメータに誤差の勾配を割り当てる．
- 勾配を取得したら，`optimizer.step()`を呼び出し，逆伝播で集めた勾配によってパラメータを調整する．

## Full Implementation
We define `train_loop` that loops over our optimization code, and `test_loop` that evaluates the model's performance against our test data.


最適化コードをループする`train_loop`と，テストデータに対するモデルの性能を評価する`test_loop`を定義する

In [14]:
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)
        
        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if batch % 100 == 0:
            loss, current = loss.item(), batch*len(X)
            print(f'loss: {loss:>7f} [{current:>5d}/{size:>5d}]')

def test_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    test_loss, correct = 0, 0
    
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    
    test_loss /= size
    correct /= size
    print(f'Test Error: \n Accuracy: {(100*correct):>0.1f}%, Ave loss: {test_loss:8f} \n')

We initialize the loss function and optimizer, and pass it to `train_loop` and `test_loop`.
Feel free to increase the number of epochs to track the model's improving performance.


損失関数とオプティマイザを初期化し，`train_loop`と`test_loop`に渡す．
モデルの性能向上を追跡するためにエポックを自由に増やしてよい．

In [15]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 3
for t in range(epochs):
    print(f'Epoch {t+1}\n---------------------------')
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)

print('Done!')

Epoch 1
---------------------------
loss: 2.227785 [    0/60000]
loss: 2.229826 [ 6400/60000]
loss: 2.211591 [12800/60000]
loss: 2.219120 [19200/60000]
loss: 2.220472 [25600/60000]
loss: 2.161410 [32000/60000]
loss: 2.183613 [38400/60000]
loss: 2.155972 [44800/60000]
loss: 2.125925 [51200/60000]
loss: 2.121494 [57600/60000]
Test Error: 
 Accuracy: 46.5%, Ave loss: 0.033516 

Epoch 2
---------------------------
loss: 2.132346 [    0/60000]
loss: 2.137581 [ 6400/60000]
loss: 2.106991 [12800/60000]
loss: 2.126703 [19200/60000]
loss: 2.140616 [25600/60000]
loss: 2.036841 [32000/60000]
loss: 2.070652 [38400/60000]
loss: 2.023538 [44800/60000]
loss: 1.980736 [51200/60000]
loss: 1.969386 [57600/60000]
Test Error: 
 Accuracy: 47.0%, Ave loss: 0.031295 

Epoch 3
---------------------------
loss: 1.986887 [    0/60000]
loss: 1.984213 [ 6400/60000]
loss: 1.934246 [12800/60000]
loss: 1.984288 [19200/60000]
loss: 1.980724 [25600/60000]
loss: 1.816857 [32000/60000]
loss: 1.883922 [38400/60000]
loss:

## Further Reaging

- [Loss Functions](https://pytorch.org/docs/stable/nn.html#loss-functions)
- [torch.optim](https://pytorch.org/docs/stable/optim.html)
- [Warmstart Training a Model](https://pytorch.org/tutorials/recipes/recipes/warmstarting_model_using_parameters_from_a_different_model.html)