# optimizing model parameters
有了模型和数据。就可以通过优化数据上的参数来训练，验证和测试模型。训练模型是一个迭代过程。

在每次迭代（称为epoch）中，模型都会对输出进行猜测，计算其猜测（损失）中的误差，收集误差相对于其参数的导数，使用梯度下降优化这些参数。

In [2]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

training_data = datasets.FashionMNIST(
    root="./data/",
    train=True,
    download=False,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="./data/",
    train=False,
    download=False,
    transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)

device = "cuda" if torch.cuda.is_available() else "cpu"

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)

#### Hyperparameters
超参数：超参数是可调整的参数，可让您控制模型优化过程。不同的超参数值可能会影响模型训练和收敛速度。

定义以下用于训练的超参数：

Number of Epochs - the number times to iterate over the dataset

Batch Size - the number of data samples seen by the model in each epoch

Learning Rate - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training.（较小的值会导致学习速度变慢，而较大的值可能会导致训练期间出现无法预测的行为。）

In [3]:
learning_rate = 1e-3
batch_size = 64
epochs = 5

#### Optimization Loop
设置超参数后，我们便可以使用优化循环来训练和优化模型。优化循环的每次迭代都称为epoch。

包括两个主要部分：

    训练循环-遍历训练数据集并尝试收敛到最佳参数。

    验证/测试循环-遍历测试数据集以检查模型性能是否有所改善。

#### Loss Function
提供一些训练数据时，我们未经训练的网络很可能无法给出正确的答案。

损失函数衡量的是获得的结果与目标值的不相似程度，这是我们在训练过程中要最小化的损失函数。

为了计算损失，我们使用给定数据样本的输入进行预测，并将其与真实数据标签值进行比较。

常见的损失函数包括用于回归任务的nn.MSELoss（均方误差）和 用于分类的nn.NLLLoss（负对数似然）。 nn.CrossEntropyLoss结合nn.LogSoftmax和nn.NLLLoss

In [5]:
# 将模型的输出logits传递给nn.CrossEntropyLoss，这将对logits进行归一化并计算预测误差

# Initialize the loss function
loss_fn = nn.CrossEntropyLoss()

#### Optimizer
优化是调整模型参数以减少每个训练步骤中模型误差的过程。

所有优化逻辑都封装在optimizer对象中。在这里，我们使用SGD优化器。

此外， PyTorch中提供了许多不同的优化器，例如ADAM和RMSProp，它们对于不同类型的模型和数据更有效

In [8]:
# 初始化优化器
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

inside the training loop, optimization happens in three steps:

    （1）、Call optimizer.zero_grad() to reset the gradients of model parameters. Gradients by default add up; to prevent double-counting, we explicitly zero them at each iteration.
    （2）、Backpropagate the prediction loss with a call to loss.backwards(). PyTorch deposits the gradients of the loss w.r.t. each parameter.
    （3）、Once we have our gradients, we call optimizer.step() to adjust the parameters by the gradients collected in the backward pass.

#### Full Implementation
定义train_loop遍历优化代码的循环，并test_loop根据测试数据评估模型的性能.

In [16]:
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= size
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

初始化损失函数和优化器，并将其传递给train_loop和test_loop。随意增加时期数以跟踪模型的改进性能。

In [18]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 2
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.182994  [    0/60000]
loss: 2.150644  [ 6400/60000]
loss: 2.070633  [12800/60000]
loss: 2.069875  [19200/60000]
loss: 2.001325  [25600/60000]
loss: 1.993394  [32000/60000]
loss: 2.072837  [38400/60000]
loss: 1.990067  [44800/60000]
loss: 2.092111  [51200/60000]
loss: 1.968934  [57600/60000]
Test Error: 
 Accuracy: 40.3%, Avg loss: 0.030776 

Epoch 2
-------------------------------
loss: 2.100241  [    0/60000]
loss: 2.042052  [ 6400/60000]
loss: 1.923992  [12800/60000]
loss: 1.923524  [19200/60000]
loss: 1.827075  [25600/60000]
loss: 1.847034  [32000/60000]
loss: 1.958503  [38400/60000]
loss: 1.857383  [44800/60000]
loss: 2.008325  [51200/60000]
loss: 1.857908  [57600/60000]
Test Error: 
 Accuracy: 43.9%, Avg loss: 0.028843 

Done!


60000 / 64 = 937, 每一轮有937个batch,每一轮会迭代937次，%100打印一次，会打印10次。