<a href="https://colab.research.google.com/github/YinGuoX/Deep_Learning_Pytorch_WithDeeplizard/blob/master/27_CNN_Training_Loop_Explained_Neural_Network_Code_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CNN Training Loop - Teach A Neural Network

 在这节中，我们将学习如何使用Python为卷积神经网络构建训练循环。
在上一集中，我们了解到训练过程是一个迭代过程，并且为了训练神经网络，我们建立了所谓的训练循环。

* 准备数据

* 建立模型

* 训练模型

  * 建立训练循环

* 分析模型的结果

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim

In [None]:
class NetWork(nn.Module):
  def __init__(self):
    super().__init__()
    self.conv1  = nn.Conv2d(in_channels=1,out_channels=6,kernel_size=5)
    self.conv2 = nn.Conv2d(in_channels=6,out_channels=12,kernel_size=5)
    self.fc1 = nn.Linear(in_features=12*4*4,out_features=120)
    self.fc2 = nn.Linear(in_features=120,out_features=60)
    self.out = nn.Linear(in_features=60,out_features=10)


  def forward(self,t):
    t = t;
    t = self.conv1(t)
    t = F.relu(t)
    t = F.max_pool2d(t,kernel_size=2,stride=2)

    t = self.conv2(t)
    t = F.relu(t)
    t = F.max_pool2d(t,kernel_size=2,stride=2)

    t = t.reshape(-1,12*4*4)

    t = self.fc1(t)
    t = F.relu(t)

    t = self.fc2(t)
    t = F.relu(t)

    t = self.out(t)



    return t;

In [None]:
train_set = torchvision.datasets.FashionMNIST(root='./data',train=True,
                                download=True,
                                transform=transforms.Compose([
        transforms.ToTensor()
    ]))

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./data/FashionMNIST/raw/train-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=26421880.0), HTML(value='')))


Extracting ./data/FashionMNIST/raw/train-images-idx3-ubyte.gz to ./data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=29515.0), HTML(value='')))


Extracting ./data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=4422102.0), HTML(value='')))


Extracting ./data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=5148.0), HTML(value='')))


Extracting ./data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw

Processing...
Done!


  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


### 单批训练
---
我们可以通过以下方式总结用于单批训练的代码：

In [None]:
network = NetWork()

train_loader = torch.utils.data.DataLoader(train_set,batch_size=100)
optimizer = optim.Adam(network.parameters(),lr=0.01)

# 获取一批数据
batch = next(iter(train_loader))
images,labels = batch

preds = network(images)
loss = F.cross_entropy(preds,labels)

loss.backward()
optimizer.step()

print("loss1:",loss)
preds = network(images)
loss = F.cross_entropy(preds,labels)
print("loss2:",loss)

loss1: tensor(2.2986, grad_fn=<NllLossBackward>)
loss2: tensor(2.2765, grad_fn=<NllLossBackward>)


您会注意到的一件事是，每次运行此代码时，我们都会得到不同的结果。 这是因为每次都在顶部创建模型，并且从以前的帖子中我们知道，模型权重是随机初始化的。

现在让我们看看如何修改此代码以使用所有批次进行训练，从而使用整个训练集进行训练。

### 全批量训练（单epoch）
现在，要训练数据加载器中所有可用的批次，我们需要进行一些更改并添加一行额外的代码：

In [None]:
def get_num_correct(preds,labels):
  return preds.argmax(dim=1).eq(labels).sum().item()

In [None]:
network = NetWork()

train_loader = torch.utils.data.DataLoader(train_set,batch_size=100)
optimizer = optim.Adam(network.parameters(),lr=0.01)

total_loss = 0
total_correct = 0

for batch in train_loader:
  images,labels = batch

  preds = network(images)
  loss = F.cross_entropy(preds,labels)
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()

  total_loss +=loss.item()
  total_correct +=get_num_correct(preds,labels)

print(
    "epoch:",0,
    "total_correct:",total_correct,
    "loss:",total_loss
)

epoch: 0 total_correct: 46458 loss: 353.7132867574692


我们将创建一个for循环，该循环将遍历所有批次，而不是从数据加载器中获取一个批次。

由于我们的训练集中有60,000个样本，因此我们将有60,000 / 100 = 600次迭代。 因此，我们将从循环内删除打印语句，并跟踪总损失以及最后打印它们的正确预测的总数。

关于这600次迭代的注意事项是，到循环结束时，我们的权重将被更新600次。 如果增加batch_size，则该数字将减小；如果降低batch_size，则该数字将增大。

最后，在loss张量上调用backward（）方法后，我们知道将计算梯度并将其添加到网络参数的grad属性中。 因此，我们需要将这些梯度归零。 我们可以使用优化器随附的名为zero_grad（）的方法来执行此操作。

我们准备运行此代码。 这次，代码将花费更长的时间，因为循环正在处理600个批次。

我们得到了结果，可以看到正确的总数为60,000，其中总数为42,104。

In [None]:
total_correct/len(train_set)

0.7743

只需一个纪元（对数据进行一次完整传递），就可以了。 即使我们做了一个epoch，我们仍然要记住，权重张量已更新了600次，这取决于我们的批次大小。 如果使我们的batch_batch大小更大，例如10,000，则权重将仅更新6次，结果将不尽如人意。

### 多时期训练
---
要执行多个时期，我们要做的就是将这段代码放入for循环中。 我们还将纪元号添加到print语句中。

In [None]:
network = NetWork()

train_loader = torch.utils.data.DataLoader(train_set,batch_size=100)
optimizer = optim.Adam(network.parameters(),lr=0.01)

for epoch in range(10):
  total_loss = 0
  total_correct = 0

  for batch in train_loader:
    images,labels = batch

    preds = network(images)
    loss = F.cross_entropy(preds,labels)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    total_loss +=loss.item()
    total_correct +=get_num_correct(preds,labels)

  print(
      "epoch:",epoch,
      "total_correct:",total_correct,
      "loss:",total_loss
  )


epoch: 0 total_correct: 47627 loss: 324.8065667152405
epoch: 1 total_correct: 51924 loss: 218.88622657954693
epoch: 2 total_correct: 52652 loss: 198.63491877913475
epoch: 3 total_correct: 52955 loss: 191.9812345802784
epoch: 4 total_correct: 53075 loss: 186.77816872298717
epoch: 5 total_correct: 53286 loss: 183.8610407039523
epoch: 6 total_correct: 53416 loss: 179.5964128524065
epoch: 7 total_correct: 53388 loss: 178.58316673338413
epoch: 8 total_correct: 53327 loss: 181.0990073978901
epoch: 9 total_correct: 53677 loss: 170.94294719398022


运行此代码后，我们将获得每个时期的结果：

我们可以看到正确值的数量增加而损失减少。

## 1.完整的训练循环
把所有这些放在一起，我们就可以把network、optimizer和train_loader从训练循环单元中拉出来。

In [None]:
network = NetWork()
optimizer = optim.Adam(network.parameters(),lr=0.001)
train_loader = torch.utils.data.DataLoader(
    train_set,
    batch_size=100,
    shuffle=True
)

这样一来，我们可以在不重置网络权重的情况下运行训练循环。

In [None]:
for epoch in range(10):
  total_correct=0
  total_loss = 0
  for batch in train_loader:
    images,labels = batch
    preds = network(images)
    loss = F.cross_entropy(preds,labels)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    total_loss +=loss.item()
    total_correct += get_num_correct(preds,labels)

  print(
      "epoch:",epoch,
      "total_correct:",total_correct,
      "loss:",total_loss
  )

epoch: 0 total_correct: 51938 loss: 219.5452172011137
epoch: 1 total_correct: 52401 loss: 207.966393917799
epoch: 2 total_correct: 52774 loss: 196.5000238120556
epoch: 3 total_correct: 52957 loss: 190.1710530370474
epoch: 4 total_correct: 53222 loss: 183.92313005030155
epoch: 5 total_correct: 53472 loss: 177.4093141257763
epoch: 6 total_correct: 53633 loss: 171.89684499055147
epoch: 7 total_correct: 53765 loss: 167.30943682044744
epoch: 8 total_correct: 53951 loss: 162.38079644739628
epoch: 9 total_correct: 53970 loss: 160.22220024466515


## 下一步是可视化模型结果

我们现在应该很好地理解了训练循环以及如何使用PyTorch构建它们。PyTorch的酷之处在于，我们可以像使用forward（）函数那样调试训练循环代码。

在下一篇文章中，我们将看到如何获得训练集中每个样本的预测，并使用这些预测创建混淆矩阵。下次见！