# NEURAL NETWORKS

## 参考资料

> https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py  


In [2]:
import torch
print(torch.__version__)
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.random.seed()

1.5.0


1128057769511200

## 神经网络

神经网络的典型训练过程如下：

- 定义具有一些可学习参数（或权重）的神经网络
- 遍历输入数据集
- 通过网络处理输入
- 计算损失（输出正确的距离有多远）
- 将梯度传播回网络参数
- 通常使用简单的更新规则来更新网络的权重： weight = weight - learning_rate * gradient

### 定义网络


In [3]:
class Net(nn.Module):

    def __init__(self):
        super().__init__()
        # 定义卷积层
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # 定义全连接层
        self.fc1 = nn.Linear(16*6*6, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # 1x32x32 -> 6x30x30 -> 6x15x15
        x = F.max_pool2d(F.relu(self.conv1(x)), 2)
        # 6x15x15 -> 16x13x13 -> 16x6x6
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        # 16x6x6 -> 576
        x = x.view(-1, self.num_flat_feats(x))
        # 576 -> 120
        x = F.relu(self.fc1(x))
        # 120 -> 84
        x = F.relu(self.fc2(x))
        # 84 -> 10
        x = self.fc3(x)
        return x
    
    def num_flat_feats(self, x):
        sz = x.size()[1:]
        num = 1
        for s in sz: num *= s
        return num

In [4]:
net = Net()
print(net)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


In [9]:
# 可学习参数
params = list(net.parameters())
print(len(params))
for param in net.named_parameters():
    print(param[0], param[1].size())

10
conv1.weight torch.Size([6, 1, 3, 3])
conv1.bias torch.Size([6])
conv2.weight torch.Size([16, 6, 3, 3])
conv2.bias torch.Size([16])
fc1.weight torch.Size([120, 576])
fc1.bias torch.Size([120])
fc2.weight torch.Size([84, 120])
fc2.bias torch.Size([84])
fc3.weight torch.Size([10, 84])
fc3.bias torch.Size([10])


In [10]:
# 随机输入
inp = torch.randn(4, 1, 32, 32)
outp = net(inp)
print(outp)

tensor([[-0.0638, -0.0186, -0.0626, -0.0354, -0.0327,  0.1017,  0.0080, -0.1287,
          0.0306,  0.1064],
        [-0.0621, -0.0091, -0.0458, -0.0277, -0.0204,  0.1034,  0.0251, -0.1382,
          0.0444,  0.0991],
        [-0.0605, -0.0160, -0.0579, -0.0329, -0.0259,  0.0981,  0.0044, -0.1230,
          0.0498,  0.1089],
        [-0.0605, -0.0278, -0.0669, -0.0455, -0.0112,  0.0977,  0.0273, -0.1443,
          0.0490,  0.1125]], grad_fn=<AddmmBackward>)


In [12]:
# 由于梯度是累加的，反向传播之前需要先将梯度缓冲区清零
net.zero_grad()
outp.backward(torch.randn(4, 10))

注意：`torch.nn`仅支持批次输入，因此`nn.Conv2D()`接收4-D张量(nSamples x nChannels x Height x Width)  
其中第一个维度是批次尺寸。当仅有一条数据时可以调用`.unsqueeze(0)`生成“伪轴”  

简单回顾：

- `torch.Tensor`: 多维数组，支持autograd操作，并保存梯度  
- `nn.Module`: 神经网络模块。封装参数及移动到设备、导出、加载等辅助方法  
- `nn.Parameter`: 一种Tensor，将其作为属性分配给Module时会自动注册为参数  
- `autograd.Function`: 实现autograd操作的前/后向定义。每个Tensor操作都至少创建一个Function节点，该节点连接到创建Tensor的函数并编码其历史  

### 损失函数

更多 `torch.nn` 中定义的损失函数，参见 https://pytorch.org/docs/nn.html#loss-functions

In [15]:
outp = net(inp)
tgt = torch.randn(10).view(1, -1)
criterion = nn.MSELoss()   # 定义 MSELoss 损失函数

In [14]:
loss = criterion(outp, tgt)
print(loss)

tensor(0.7632, grad_fn=<MseLossBackward>)


In [44]:
# 反向传播过程：
# input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
#       -> view -> linear -> relu -> linear -> relu -> linear
#       -> MSELoss
#       -> loss
print(loss.grad_fn)   # MSELoss
print(loss.grad_fn.next_functions[0][0])   # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[1][0])   # ReLU

<MseLossBackward object at 0x000002801FBB0C88>
<AddmmBackward object at 0x000002801FB157C8>
<ReluBackward0 object at 0x000002801F564F08>


### 反向传播


In [45]:
# 反向传播之前需要先将梯度缓冲区清零
net.zero_grad()
print('conv1.bias.grad before backward')
# 根据名称获取参数：`.conv1.weight`、`.conv1.bias`
print(net.conv1.bias.grad)
loss.backward()
print('conv1.bias.grad after backward')
# 根据名称获取参数：`.conv1.weight`、`.conv1.bias`
print(net.conv1.bias.grad)

conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([ 0.0079,  0.0031,  0.0038, -0.0004, -0.0031,  0.0023])


### 权重更新

#### 手动更新


In [49]:
learning_rate = 0.01
for param in net.parameters():
    # 注意！！！在`param.data`上操作而不是`param`
    param.data.sub_(param.grad.data * learning_rate)

In [47]:
# 权重更新时，不要追踪操作，因此在`param.data`或`param.detach()`上更新
param = next(net.parameters())
# `param`与`param.data`的区别
print(param.requires_grad)
print(param.data.requires_grad)
# 或者 `param.detach().requires_grad`

True
False


In [48]:
# 权重更新前后对比
param = net.conv1.bias
print('conv1.bias before update')
print(param)
param.data.sub_(param.grad.data * 0.1)
print('conv1.bias after update')
print(param)

conv1.bias before update
Parameter containing:
tensor([ 0.0121,  0.2160, -0.0436, -0.2049, -0.3043, -0.1758],
       requires_grad=True)
conv1.bias after update
Parameter containing:
tensor([ 0.0113,  0.2157, -0.0440, -0.2049, -0.3040, -0.1760],
       requires_grad=True)


#### 自动更新


In [51]:
# torch.optim包 - 优化器
# 创建优化器，需要传入网络参数
optimizer = optim.SGD(net.parameters(), lr=0.01)
# 训练Loops中使用优化器
optimizer.zero_grad()   # 所有梯度清零（包括网络中的参数）
outp = net(inp)
loss = criterion(outp, tgt)
loss.backward()     # 反向传播

param = net.conv1.bias
print('conv1.bias before update')
print(param)

optimizer.step()    # 更新权重

print('conv1.bias after update')
print(param)

conv1.bias before update
Parameter containing:
tensor([ 0.0112,  0.2157, -0.0441, -0.2050, -0.3040, -0.1760],
       requires_grad=True)
conv1.bias after update
Parameter containing:
tensor([ 0.0113,  0.2158, -0.0441, -0.2052, -0.3040, -0.1759],
       requires_grad=True)
