## Optimizng Model Parameters
- 모델을 학습하는 것은 반복적인 과정임.
- 경사 하강법(gradient descent)을 사용하여 변수들을 최적화(optimizes)시킴.

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size=64, shuffle = True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle = True)

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork()

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 26421880/26421880 [00:01<00:00, 16614164.79it/s]


Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 29515/29515 [00:00<00:00, 306170.61it/s]


Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 4422102/4422102 [00:00<00:00, 5495089.08it/s]


Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 5148/5148 [00:00<00:00, 17540436.22it/s]


Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw



## 초매개변수들 (Hyperparameters)

- 기계학습이 조정하는 매개변수 이외의 변수들을 초매개변수라함.
- 다른 초매개변수들은 모델 학습에 영향을 줄 수 있음.
- Epochs, Batch Size, Learning Rate 등이 있음.


In [2]:
learning_rate = 1e-3
batch_size = 64
epochs = 5

## Optimization Loop
- 각 epoch의 주요 두 개의 부분으로 구성됨
    : The Train Loop - 학습 데이터를 반복하고 매개변수들을 최적화함.
    : The Validation/Test Loop - 모델 성능이 향상되었는 지 확인하기 위해 테스트 데이터를 반복함.

## Loss Function
- 모델에서 출력값과 정답값의 불일치 정도를 측정함.
- 학습 동안 loss function을 최소화하는 것을 원함.
- 회귀에서는 nn.MSELoss (Mean Square Error), 분류에서는 nn.NLLLoss(Negative Log Likelihood), nn.CrossEntropyLoss (nn.LogSoftmax + nn.NLLLoss)

In [3]:
# Initialize the loss function
loss_fn = nn.CrossEntropyLoss()

Optimizer
=========

Optimization is the process of adjusting model parameters to reduce
model error in each training step. **Optimization algorithms** define
how this process is performed (in this example we use Stochastic
Gradient Descent). All optimization logic is encapsulated in the
`optimizer` object. Here, we use the SGD optimizer; additionally, there
are many [different
optimizers](https://pytorch.org/docs/stable/optim.html) available in
PyTorch such as ADAM and RMSProp, that work better for different kinds
of models and data.

We initialize the optimizer by registering the model\'s parameters that
need to be trained, and passing in the learning rate hyperparameter.


In [None]:
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Inside the training loop, optimization happens in three steps:

:   -   Call `optimizer.zero_grad()` to reset the gradients of model
        parameters. Gradients by default add up; to prevent
        double-counting, we explicitly zero them at each iteration.
    -   Backpropagate the prediction loss with a call to
        `loss.backward()`. PyTorch deposits the gradients of the loss
        w.r.t. each parameter.
    -   Once we have our gradients, we call `optimizer.step()` to adjust
        the parameters by the gradients collected in the backward pass.


## Optimizer
- Optimization(최적화)는 각 학습 단계에서 모델의 에러를 줄이기 위해 모델 매개변수들을 조정하는 과정임.
- Optimization algorithms은 어떻게 이 과정을 수행하는지 정의한 것임.
- 모든 optimization logicsms `optimizer` 객체 안에 캡슐화됨.
- 여기서는 SGD optimizer를 사용함. (다른 많은 optimizer은 [different optimizers](https://pytorch.org/docs/stable/optim.html)찾아볼 수 있음)

In [4]:
optimizer = torch.optim.SGD(model.parameters(), lr = 1e-3)

학습하는 loop안에서, 최적화되는 3단계를 겪음.
1. `optimizer.zero_grad()` : 모델의 매개변수의 기울기를 초기화함. 기울기는 기본적으로 추가하게 되어 있음. 따라서 double-counting을 막기 위해서는 각 반복마다 명확히 기울기를 zero로 만들어 줘야함.
2. 역전파(Backpropagate)는 `loss.backward()`를 불러옴. (loss의 기울기를 축적함)
3. 일단 기울기를 가지고 있으면, `optimizer.step()`을 불러와 역방향으로 통과하며 축적된 기울기에 의해 매개변수들이 조정됨.


## Full Implementation

We define `train_loop` that loops over our optimization code, and
`test_loop` that evaluates the model\'s performance against our test
data.


In [None]:
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    # Set the model to training mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * batch_size + len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    # Set the model to evaluation mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.eval()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [6]:
len(train_dataloader)

938

In [17]:
def train_loop(dataloader, model, optimizer, loss_fn):
    size = len(dataloader.dataset)
    model.train()
    for batch, (x, y) in enumerate(dataloader):
        preds = model(x)
        loss = loss_fn(preds, y)

        #backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch+1)*batch_size
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    len_batch = len(dataloader)
    total_loss, correct = 0, 0
    model.eval()
    with torch.no_grad():
        for x, y in dataloader:
            preds = model(x)
            total_loss += loss_fn(preds, y).item()
            correct += (preds.argmax(1) == y).type(torch.float).sum().item()

    test_loss = total_loss/len_batch # average for batch_size
    test_correct = correct/len(dataloader.dataset)
    print(f"Test Error: \n Accuracy: {(100*test_correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

- loss function과 optimizer를 초기화 하고 이것을 train_loop와 test_loop에 통과시킴.
- epochs의 수를 증가시켜 모델의 성능 향상을 확인.

In [18]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 1e-3)

epochs = 10
for epoch in range(epochs):
    print(f"Epoch {epoch+1}\n----------------------------")
    train_loop(train_dataloader, model, optimizer, loss_fn)
    test_loop(test_dataloader, model, loss_fn)
print("Done")

Epoch 1
----------------------------
loss: 2.313744  [   64/60000]
loss: 2.297681  [ 6464/60000]
loss: 2.280901  [12864/60000]
loss: 2.259946  [19264/60000]
loss: 2.247251  [25664/60000]
loss: 2.247690  [32064/60000]
loss: 2.217069  [38464/60000]
loss: 2.211756  [44864/60000]
loss: 2.208843  [51264/60000]
loss: 2.169601  [57664/60000]
Test Error: 
 Accuracy: 42.8%, Avg loss: 2.161524 

Epoch 2
----------------------------
loss: 2.152873  [   64/60000]
loss: 2.139109  [ 6464/60000]
loss: 2.133725  [12864/60000]
loss: 2.085512  [19264/60000]
loss: 2.048757  [25664/60000]
loss: 2.032989  [32064/60000]
loss: 2.014278  [38464/60000]
loss: 1.996171  [44864/60000]
loss: 1.953714  [51264/60000]
loss: 1.919172  [57664/60000]
Test Error: 
 Accuracy: 58.7%, Avg loss: 1.908196 

Epoch 3
----------------------------
loss: 1.885417  [   64/60000]
loss: 1.905309  [ 6464/60000]
loss: 1.854254  [12864/60000]
loss: 1.756485  [19264/60000]
loss: 1.743126  [25664/60000]
loss: 1.707375  [32064/60000]
loss:

Further Reading
===============

-   [Loss
    Functions](https://pytorch.org/docs/stable/nn.html#loss-functions)
-   [torch.optim](https://pytorch.org/docs/stable/optim.html)
-   [Warmstart Training a
    Model](https://pytorch.org/tutorials/recipes/recipes/warmstarting_model_using_parameters_from_a_different_model.html)
