# Optimizing Model Parameters

 - Now that we have a model and data it’s time to train, validate and test our model by optimizing its parameters on our data.
 - Training a model is an iterative process; in each iteration the model makes a guess about the output, calculates the error in its guess (loss), collects the derivatives of the error with respect to its parameters (as we saw in the previous section), and optimizes these parameters using gradient descent.

## Prerequisite Code
we load the code from the previous sections on Datasets & DataLoaders and Build Model

In [2]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork()

  from .autonotebook import tqdm as notebook_tqdm
  warn(


## Hyperparameters
Hyperparameters are adjustable parameters that let you control the model optimization process.Different hyperparameter values can impact model training and convergence rates
 - Number of Epochs : dataset를 반복하는 횟수
 - Batch size : parameter가 업데이트되기 전에 네트워크를 통해 전파되는 데이터 샘플 수
 - Learning Rate : 각 batch/epoch에서 model parameter를 업데이트 하는 정도
    - 값이 작을수록 학습 속도가 느려지고 값이 크면 훈련 중에 예측할 수 없는 동작이 발생할 수 있음

In [4]:
learning_rate = 1e-3
batch_size = 64
epochs = 5

## Optimization Loop
hyperparameter를 설정한 후 optimization loop를 통해 model을 trainning 하고 optimization 할 수 있음. 최적화 루프의 각 반복을 epoch라고 함
 - Train Loop : 훈련 데이터 세트를 반복하고 최적의 매개변수로 수렴하려고 시도
 - The Validation/Test Loop : test dataset를 반복하여 모델 성능이 향상되는지 확인

## Loss Function
 - Loss Function는 획득된 결과와 목표값의 차이 정도를 측정하는 것으로 훈련 시 최소화하고자 함
 - Loss를 계산하기 위해 주어진 input 데이터 샘플을 사용하여 예측하고 이를 실제 데이터 레이블 값과 비교

In [5]:
# Initialize the loss function
loss_fn = nn.CrossEntropyLoss()

## Optimizer
optimization은 각 training step 에서 모델 오류를 줄이기 위해 모델의 parameter를 조정하는 과정
 - Optimization algorithms은 위 process가 수행되는 방법을 정의

In [6]:
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Inside the training loop, optimization happens in three steps:
 - ``optimizer.zero_grad()`` : model parameter의 gradient를 재설정, 이중 계산을 방지하기 위해 각 반복마다 명시적으로 0을 지정
 - ``loss.backward()`` : prediction loss를 backpropagate(역전파)함, Pytorch는 loss의 gradient를 w.r.t로 저장
 - ``optimizer.step()`` : gradient가 있으면, ``optimizer.step()``을 통해 backward pass에서 수집된 gradient로 parameter를 조정 

## Full Implementation
 - ``train_loop``: 최적화 코드에 대한 loop를 정의
 - ``test_loop``: 테스트 데이터에 대해 모델의 성능 평가

In [7]:
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    # Set the model to training mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    # Set the model to evaluation mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.eval()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

We initialize the loss function and optimizer, and pass it to train_loop and test_loop. Feel free to increase the number of epochs to track the model’s improving performance.

In [8]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.311211  [   64/60000]
loss: 2.304592  [ 6464/60000]
loss: 2.286362  [12864/60000]
loss: 2.279516  [19264/60000]
loss: 2.259472  [25664/60000]
loss: 2.236116  [32064/60000]
loss: 2.240813  [38464/60000]
loss: 2.209957  [44864/60000]
loss: 2.209824  [51264/60000]
loss: 2.181401  [57664/60000]
Test Error: 
 Accuracy: 35.9%, Avg loss: 2.176876 

Epoch 2
-------------------------------
loss: 2.185322  [   64/60000]
loss: 2.177688  [ 6464/60000]
loss: 2.125262  [12864/60000]
loss: 2.143444  [19264/60000]
loss: 2.084878  [25664/60000]
loss: 2.037133  [32064/60000]
loss: 2.061258  [38464/60000]
loss: 1.984179  [44864/60000]
loss: 1.999742  [51264/60000]
loss: 1.933973  [57664/60000]
Test Error: 
 Accuracy: 55.0%, Avg loss: 1.926742 

Epoch 3
-------------------------------
loss: 1.951146  [   64/60000]
loss: 1.923960  [ 6464/60000]
loss: 1.814510  [12864/60000]
loss: 1.863097  [19264/60000]
loss: 1.738659  [25664/60000]
loss: 1.696143  [32064/600