<br>

# PyTorch 基础 2 

<br>

## 0. 概述

<br>

<font color=black size=3 face=雅黑>　　在上一次实验中，我们围绕 PyTorch 的基本数据类型 Tensor 做过一系列练习。本次 PyTorch 基础练习，我们将学习如下要点：

<font color=black size=3 face=雅黑>　　　(1) 在 GPU 上实现模型训练与测试，并与 CPU 上的训练时长对比；

<font color=black size=3 face=雅黑>　　　(2) 进一步了解 PyTorch 的自动求导功能。

<br>

## 1. GPU vs. CPU

<br>

<font color=black size=3 face=雅黑>　　在这一部分，我们将学习如何将数据和模型从 CPU 转移到 GPU 上，并在 GPU 中进行训练与测试。

<br>

### 1.1 数据集准备

<br>

<font color=black size=3 face=雅黑>　　我们首先来准备数据集 MNIST。如果数据集在原网上下载太慢，大家可以像上节课一样，自行从实验材料中下载四个压缩包，创建路径“./dataset_mnist/MNIST/raw”并将压缩包上传。

<code>
%%html
<img src = "https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fpic4.zhimg.com%2Fv2-78abf1f3cfb557f9e4dd2fbb9c135ecc_b.jpg&refer=http%3A%2F%2Fpic4.zhimg.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg?sec=1635427177&t=0617da4aedcf40fd21088e98c852a9de", width = 45%>
    
<br>

In [1]:
import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import time

<br>

<font color=black size=3 face=雅黑>与之前的实验相同，我们用 torchvision 读入数据，并对数据进行标准化处理，最后加载到 DataLoader 中。
    
<br>

In [3]:
batch_size = 200  # 设置训练集和测试集的 batch size，即每批次将参与运算的样本数

# 训练集
train_set = torchvision.datasets.MNIST('./dataset_mnist', train=True, download=True,
                                       transform=torchvision.transforms.Compose([
                                           torchvision.transforms.ToTensor(),
                                           torchvision.transforms.Normalize(
                                               (0.1307,), (0.3081,)
                                           )
                                       ])
)

# 测试集
test_set = torchvision.datasets.MNIST('./dataset_mnist', train=False, download=True,
                                      transform=torchvision.transforms.Compose([
                                          torchvision.transforms.ToTensor(),
                                          torchvision.transforms.Normalize(
                                              (0.1307,), (0.3081,)
                                          )
                                      ]))

train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size, shuffle=True)

### 1.2 构建卷积神经网络

<br>

<font color=black size=3 face=雅黑>此处我们构建一个和上节课相同的卷积神经网络。

<br>

In [4]:
class Network(nn.Module):
    def __init__(self):
        super(Network, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        
        self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)
        
        
    def forward(self, t):
        
        # conv1
        t = self.conv1(t)
        t = F.relu(t) 
        t = F.max_pool2d(t, kernel_size=2, stride=2) 
        
        # conv2
        t = self.conv2(t)
        t = F.relu(t)
        t = F.max_pool2d(t, kernel_size=2, stride=2)
        
        t = t.reshape(batch_size, 12*4*4)
        
        # fc1
        t = self.fc1(t)
        t = F.relu(t)
        
        # fc2
        t = self.fc2(t)
        t = F.relu(t)
        
        # output layer
        t = self.out(t)
        
        return t

<br>

### 1.3 在 GPU 上训练

<br>

In [5]:
network1 = Network()
network1.cuda()  # 将模型转移到 GPU 上

Network(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=192, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=60, bias=True)
  (out): Linear(in_features=60, out_features=10, bias=True)
)

In [6]:
loss_func = nn.CrossEntropyLoss()  # 损失函数：交叉熵损失
optimizer1 = torch.optim.SGD(network1.parameters(), lr=0.1)  # 优化器

def get_num_correct(preds, labels):  # 计算正确分类的次数
    return preds.argmax(dim=1).eq(labels).sum().item()

In [7]:
total_epochs = 20
time_start1 = time.time()

for epoch in range(total_epochs):

    total_loss = 0
    total_train_correct = 0

    for batch in train_loader:  # 抓取一个 batch
        
        # 读取样本数据        
        images, labels = batch
        images = images.cuda()  # 数据转移到 GPU 上
        labels = labels.cuda()  # 标签转移到 GPU 上
        
        # 完成正向传播，计算损失
        preds = network1(images)
        loss = loss_func(preds, labels)
        
        # 偏导归零
        optimizer1.zero_grad()
        
        # 反向传播 
        loss.backward()
        
        # 更新参数        
        optimizer1.step()
          
        total_loss += loss.item()
        total_train_correct += get_num_correct(preds, labels)
    
    print("epoch:", epoch, 
          "correct times:", total_train_correct,
          "training accuracy:", "%.3f" %(total_train_correct/len(train_set)*100), "%", 
          "total_loss:", "%.3f" %total_loss)
    
time_end1 = time.time()

epoch: 0 correct times: 47396 training accuracy: 78.993 % total_loss: 195.570
epoch: 1 correct times: 57842 training accuracy: 96.403 % total_loss: 36.052
epoch: 2 correct times: 58575 training accuracy: 97.625 % total_loss: 22.972
epoch: 3 correct times: 58803 training accuracy: 98.005 % total_loss: 18.921
epoch: 4 correct times: 59072 training accuracy: 98.453 % total_loss: 15.306
epoch: 5 correct times: 59179 training accuracy: 98.632 % total_loss: 13.196
epoch: 6 correct times: 59263 training accuracy: 98.772 % total_loss: 11.549
epoch: 7 correct times: 59364 training accuracy: 98.940 % total_loss: 10.523
epoch: 8 correct times: 59417 training accuracy: 99.028 % total_loss: 9.424
epoch: 9 correct times: 59477 training accuracy: 99.128 % total_loss: 8.288
epoch: 10 correct times: 59489 training accuracy: 99.148 % total_loss: 7.907
epoch: 11 correct times: 59545 training accuracy: 99.242 % total_loss: 6.864
epoch: 12 correct times: 59621 training accuracy: 99.368 % total_loss: 6.098


<br>

<font color=black size=3 face=雅黑>在 GPU 上的训练时长：

<br>

In [8]:
print("Training time on GPU: ", time_end1 - time_start1)

Training time on GPU:  233.5057246685028


<br>

### 1.4 在 CPU 上训练，对比时长

<br>

In [9]:
network2 = Network()
optimizer2 = torch.optim.SGD(network2.parameters(), lr=0.1)

In [10]:
time_start2 = time.time()

for epoch in range(total_epochs):  # 训练周期

    total_loss = 0
    total_train_correct = 0

    for batch in train_loader:  # 抓取一个 batch
        
        # 读取样本数据        
        images, labels = batch
        
        # 完成正向传播，计算损失
        preds = network2(images)
        loss = loss_func(preds, labels)
        
        # 偏导归零
        optimizer2.zero_grad()
        
        # 反向传播 
        loss.backward()
        
        # 更新参数        
        optimizer2.step()
          
        total_loss += loss.item()
        total_train_correct += get_num_correct(preds, labels)
    
    print("epoch:", epoch, 
          "correct times:", total_train_correct,
          f"training accuracy:", "%.3f" %(total_train_correct/len(train_set)*100), "%", 
          "total_loss:", "%.3f" %total_loss)
    
time_end2 = time.time()

epoch: 0 correct times: 47007 training accuracy: 78.345 % total_loss: 195.887
epoch: 1 correct times: 57978 training accuracy: 96.630 % total_loss: 32.093
epoch: 2 correct times: 58675 training accuracy: 97.792 % total_loss: 21.673
epoch: 3 correct times: 58919 training accuracy: 98.198 % total_loss: 17.294
epoch: 4 correct times: 59143 training accuracy: 98.572 % total_loss: 13.867
epoch: 5 correct times: 59240 training accuracy: 98.733 % total_loss: 11.984
epoch: 6 correct times: 59344 training accuracy: 98.907 % total_loss: 10.398
epoch: 7 correct times: 59429 training accuracy: 99.048 % total_loss: 9.109
epoch: 8 correct times: 59419 training accuracy: 99.032 % total_loss: 9.819
epoch: 9 correct times: 59556 training accuracy: 99.260 % total_loss: 7.433
epoch: 10 correct times: 59590 training accuracy: 99.317 % total_loss: 6.450
epoch: 11 correct times: 59617 training accuracy: 99.362 % total_loss: 5.896
epoch: 12 correct times: 59681 training accuracy: 99.468 % total_loss: 5.325
e

<br>

<font color=black size=3 face=雅黑>在 CPU 上的训练时长：

<br>

In [11]:
print("Training time on CPU: ", time_end2 - time_start2)

Training time on CPU:  764.838308095932


<br>

<font color=black size=3 face=雅黑>　　与 GPU 上的训练相比，训练时长有所增加。请注意，由于本例中神经网络较小，数据集也简单，所以使用 GPU 的加速效果不是特别显著，在复杂的任务中，GPU 将取得更好的加速效果。

<br>

<br>

<font color=black size=3 face=雅黑>**请同学们在实验报告中对比模型在 CPU 和 GPU 上的训练时长。**
   
<br>
    
<br>

## 2. PyTorch 与自动求导

<br>

<font color=black size=3 face=雅黑>　　在上次实验中，我们已经知道 PyTorch 有自动求导功能，通过 "loss.backward()" 可以很方便的实现神经网络的反向传播。下面我们就来进一步了解这个功能。

<code>
%%html
<img src = "https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fpic3.zhimg.com%2Fv2-58b9712696a499fd9b01380a9926a3b3_1200x500.jpg&refer=http%3A%2F%2Fpic3.zhimg.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg?sec=1639062767&t=d95e4e63797c3b1a62afcab6c1fa2c0a", width=45%>
<br>

<font color=black size=3 face=雅黑>　　Tensor 是 PyTorch 最基础的数据类型。上次实验中，我们介绍了 tensor 的三个基本属性: shape, dtype 和 device。今天我们将再介绍三个与自动求导相关的 tensor 属性，分别是 requires_grad, grad 和 grad_fn。
    
<font color=black size=3 face=雅黑>　　其中，requires_grad 用于说明当前张量是否需要在计算中保留对应的梯度信息（requires_grad=True 时保留）。对于那些要求梯度的 tensor，PyTorch 会存储他们相关的梯度信息和产生他们的操作。这将造成额外的内存消耗，因此为了优化内存使用，不做特殊说明时，创建一个 tensor 默认是不需要梯度的（即 requires_grad 默认为 False）。
    
<font color=black size=3 face=雅黑>　　grad 属性对应张量的偏导。在 requires_grad=False 的情况下，grad 不再改变，其值等于 None 或之前已经计算过的 grad。
    
<font color=black size=3 face=雅黑>　　grad_fn 属性记录了得到这个 tensor 进行的操作，例如加法、乘法运算等。
    
<br>

In [12]:
# 创建 tensor t1
t1 = torch.tensor([1,2,3], dtype=torch.float32)
print("t1: ", t1)

# 上次实验学习过的三种基本属性
print("t1.shape: ", t1.shape)  # 形状
print("t1.dtype: ", t1.dtype)  # 数据类型
print("t1.device: ", t1.device)  # 默认为 cpu
print("")

# 与自动求导相关的三种属性
print("t1.requires_grad: ", t1.requires_grad)  # 用于说明当前张量是否需要在计算中保留对应的梯度信息（默认 False）    
print("t1.grad: ", t1.grad)  # 偏导
print("t1.grad_fn: ", t1.grad_fn)  # 得到这个 tensor 进行的操作
print("")

# 基于 t1 计算得到张量 t2，其 requires_grad 属性默认与 t1 一致
t2 = t1*2
print("t2.requires_grad: ", t2.requires_grad)
print("t2.grad: ", t2.grad)  
print("t2.grad_fn: ", t2.grad_fn)  

t1:  tensor([1., 2., 3.])
t1.shape:  torch.Size([3])
t1.dtype:  torch.float32
t1.device:  cpu

t1.requires_grad:  False
t1.grad:  None
t1.grad_fn:  None

t2.requires_grad:  False
t2.grad:  None
t2.grad_fn:  None


In [13]:
# 因为 t2.requires_grad=False，此时使用如下代码对 t2[0] 反向求导会报错
t2[0].backward()

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

<br>

<font color=black size=3 face=雅黑>下面我们重新创建 t1，使其属性 requires_grad=True，观察以下代码的输出。
    
<br>

In [22]:
# 创建 tensor t1
t1 = torch.tensor([1,2,3], dtype=torch.float32, requires_grad=True)
print("t1: ", t1)

# 查看与自动求导相关的三种属性
print("t1.requires_grad: ", t1.requires_grad)  # 用于说明当前张量是否需要在计算中保留对应的梯度信息（默认 False）    
print("t1.grad: ", t1.grad)  # 偏导
print("t1.grad_fn: ", t1.grad_fn)  # 得到这个 tensor 进行的操作
print("")

# 基于 t1 计算得到张量 t2，其 requires_grad 属性默认与 t1 一致
t2 = t1*2
print("t2.requires_grad: ", t2.requires_grad)
print("t2.grad: ", t2.grad)  
print("t2.grad_fn: ", t2.grad_fn) 

t1:  tensor([1., 2., 3.], requires_grad=True)
t1.requires_grad:  True
t1.grad:  None
t1.grad_fn:  None

t2.requires_grad:  True
t2.grad:  None
t2.grad_fn:  <MulBackward0 object at 0x7fa7801c2f28>


<br>

<font color=black size=3 face=雅黑>　　可以看到 t2 的 requires_grad 也为 True，且 t2.grad_fn 中记录了得到 t2 进行的操作 ("Mul" 表示乘法)。下面我们来看看自动求导是如何做的。
    
<br>

In [23]:
t2[0].backward()  # 我们对 t2[0] 反向求导

In [25]:
print("t1.grad: ", t1.grad)
print("t1.grad_fn: ", t1.grad_fn)

t1.grad:  tensor([2., 0., 0.])
t1.grad_fn:  None


<br>

<font color=black size=3 face=雅黑>　　可见 t1.grad 保存了 t2[0] 相对于 t1 的导数。（t2[0] 仅与 t1[0] 相关，因此导数为 [2,0,0]。）
    
<br>

<br>

<font color=black size=3 face=雅黑>　　下面我们再来看一个更接近实际情况的例子。发生在一个全连接层中**单个**神经元上的运算如下所示，其中 $a$ 表示输入激活值, $w$ 表示该神经元的权重，$b$ 表示偏置，$a$ 与 $w$ 皆是一维张量。该神经元上的线性计算结果 $z = a * w + b$，$z$ 是一个标量。
    
<br>

In [26]:
a = torch.tensor([[1.,2.]], requires_grad=True)
w = torch.tensor([[3.],[4.]], requires_grad=True)
b = torch.tensor([6.], requires_grad=True)

In [27]:
z = torch.mm(a, w) + b
print(z)

tensor([[17.]], grad_fn=<AddBackward0>)


In [28]:
z.backward()

print("a.grad: ", a.grad)
print("w.grad: ", w.grad)
print("b.grad: ", b.grad)

a.grad:  tensor([[3., 4.]])
w.grad:  tensor([[1.],
        [2.]])
b.grad:  tensor([1.])


<br>

<font color=black size=3 face=雅黑>我们可以通过如下方法 (requires_grad_) 来随时修改一个 tensor 的 requires_grad 属性。
    
<br>

In [29]:
a = torch.tensor([[1.,2.]], requires_grad=True)
w = torch.tensor([[10.],[20.]], requires_grad=True)
b = torch.tensor([6.], requires_grad=True)

In [30]:
a.requires_grad_(False)  # 将张量 a 的 requires_grad 属性重置为 False

z = torch.mm(a, w) + b
print(z)

tensor([[56.]], grad_fn=<AddBackward0>)


In [31]:
z.backward()

print("a.grad: ", a.grad)  # None
print("w.grad: ", w.grad)
print("b.grad: ", b.grad)

a.grad:  None
w.grad:  tensor([[1.],
        [2.]])
b.grad:  tensor([1.])


<br>

<font color=black size=3 face=雅黑>在一个真实的神经网络中，我们可以很方便的查看每层参数的导数。首先，我们取出一个 batch 的样本，传递给 network，并计算损失。
    
<br>

In [32]:
network = Network()
batch = next(iter(train_loader))
images, labels = batch

preds = network(images)
loss = loss_func(preds, labels)
loss.item()

2.3110077381134033

<br>

<font color=black size=3 face=雅黑>　　现在我们有了 loss，下一步可以使用 loss.backward() 计算偏导，PyTorch 会自动帮我们做相关的计算。在调用 loss.backward() 前，我们先检查一下第一个卷积层 conv1。

<br>

In [33]:
# 查看 conv1 信息
print("network.conv1: \n", network.conv1, "\n")

# 查看 conv1 参数
print("network.conv1.weight.data.shape: \n", network.conv1.weight.data.shape, "\n")
print("network.conv1.weight.data: \n", network.conv1.weight.data, "\n")

# 查看偏导用 network.conv1.weight.grad
# 运行以下代码，会发现输出为 None，说明目前还没有梯度
print("network.conv1.weight.grad: ", network.conv1.weight.grad)

network.conv1: 
 Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1)) 

network.conv1.weight.data.shape: 
 torch.Size([6, 1, 5, 5]) 

network.conv1.weight.data: 
 tensor([[[[-0.0449, -0.0154,  0.1726,  0.1990,  0.0540],
          [ 0.0357, -0.1352,  0.1075,  0.0187,  0.1720],
          [-0.0247, -0.1631, -0.0904,  0.1197,  0.0477],
          [ 0.1917,  0.1586, -0.0250, -0.1857, -0.0512],
          [-0.1307, -0.1811, -0.0955,  0.1714, -0.0448]]],


        [[[ 0.1397,  0.0865,  0.1119,  0.0320,  0.0990],
          [ 0.0637, -0.0431, -0.1158,  0.1260, -0.1406],
          [-0.0442, -0.1311, -0.0283, -0.0788,  0.0498],
          [ 0.1961, -0.0384, -0.1773,  0.1945, -0.0459],
          [-0.0639, -0.1666,  0.1595,  0.1933, -0.0768]]],


        [[[ 0.0877, -0.1281, -0.1778, -0.1731, -0.0442],
          [ 0.0920,  0.0139, -0.1226, -0.0697,  0.0855],
          [-0.1184, -0.0533, -0.0984, -0.1494,  0.1653],
          [ 0.1371, -0.1016, -0.0461, -0.1720,  0.1870],
          [-0.1062,  0.1078,  0.1669

In [34]:
# 现在运行反向函数
loss.backward()

In [35]:
# 再来看一下，会发现偏导被计算出来了，它是一个四维张量，其维度与 conv1 的权重张量相同
print("network.conv1.weight.grad.shape: \n", network.conv1.weight.grad.shape, "\n")
print("network.conv1.weight.grad: \n", network.conv1.weight.grad)

network.conv1.weight.grad.shape: 
 torch.Size([6, 1, 5, 5]) 

network.conv1.weight.grad: 
 tensor([[[[-2.8441e-03, -4.1445e-03, -2.8533e-03, -2.0798e-03,  9.2551e-06],
          [-5.6289e-04, -2.2342e-03, -1.6740e-03, -1.1345e-03, -3.2309e-04],
          [-7.9662e-04, -2.5382e-03, -2.1988e-03, -6.8201e-04, -1.0163e-03],
          [-3.2804e-03, -3.7188e-03, -2.8743e-03, -2.1172e-03, -1.1748e-03],
          [-3.5901e-03, -3.2050e-03, -1.6132e-03, -6.6099e-04, -9.6127e-04]]],


        [[[ 2.4007e-03,  1.2730e-03,  4.0646e-04, -2.2383e-03, -2.8743e-03],
          [ 4.0802e-03,  2.4020e-03, -2.0438e-04, -1.9596e-03, -1.5713e-03],
          [ 4.3766e-03,  1.7513e-03, -9.4140e-04, -1.3440e-03, -2.4142e-05],
          [ 3.9015e-03,  2.7723e-03, -5.7056e-04, -7.9472e-04,  2.7943e-04],
          [ 2.6285e-03,  1.5545e-03, -7.4485e-04, -1.0408e-03,  2.2509e-04]]],


        [[[-4.8325e-03, -1.6001e-03, -9.4157e-04, -8.1408e-04, -4.0624e-04],
          [-3.2856e-03, -8.2046e-04, -4.8646e-04, -5.2