特色：

- 使用相同且较小的卷积核，通过增加网络深度来达到提升感受野的目的
- 尽管深度增加，较小的卷积核保证了参数数量不至于过多
- 图像size每过几层后通过max pooling缩小一倍，而通道数扩大一倍（非常的简洁且规整）

![VGG](./images/VGG.jpg)

In [1]:
import time
import torch
from torch import nn, optim

import d2lzh_pytorch as d2l

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

定义VGG块

本质上是将 不定项个（num_convs）卷积层 + ReLU + 池化层 抽象为一个函数

由于VGG中，各卷积层中除了通道数（in_channels, out_channels）以外参数相同，因此这样抽象会使代码简单

**注：**这里有个Python知识：  
列表前面加星号作用是将列表解开成两个独立的参数，传入函数，  
字典前面加两个星号，是将字典解开成独立的元素作为形参。

In [2]:
def vgg_block(num_convs, in_channels, out_channels):
    blk = []
    for i in range(num_convs):
        # 只有每个VGG块刚开始的时候会改变通道数
        if i == 0:
            blk.append(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
        else:
            blk.append(nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1))
        blk.append(nn.ReLU())
        
    blk.append(nn.MaxPool2d(2, 2))
    
    return nn.Sequential(*blk)

以下实现VGG-11，首先定义部分参数

In [3]:
conv_arch = ((1, 1, 64), (1, 64, 128), (2, 128, 256), (2, 256, 512), (2, 512, 512))

# 做了5次max pooling，结果是224 / (2 ^ 5) = 7
fc_features = 7 * 7 * 512
fc_hidden_units = 4096 

In [4]:
def vgg(conv_arch, fc_features, fc_hidden_units=4096):
    net = nn.Sequential()
    
    # 卷积部分
    for i, (num_convs, in_channels, out_channels) in enumerate(conv_arch):
        net.add_module("vgg_block_" + str(i + 1), vgg_block(num_convs, in_channels, out_channels))
    
    # 全连接层部分
    net.add_module("fc", nn.Sequential(
                            d2l.FlattenLayer(),
                            nn.Linear(fc_features, fc_hidden_units), 
                            nn.ReLU(),
                            nn.Dropout(0.5),
                            nn.Linear(fc_hidden_units, fc_hidden_units),
                            nn.ReLU(),
                            nn.Dropout(0.5),
                            nn.Linear(fc_hidden_units, 10)
                            ))
    
    return net

构造一个高和宽均为224的单通道数据样本来观察每一层的输出形状

In [5]:
net = vgg(conv_arch, fc_features, fc_hidden_units)

X = torch.rand(1, 1, 224, 224) # 第一维是数据量

for name, blk in net.named_children():
    X = blk(X)
    print(name, "output shape:", X.shape)

vgg_block_1 output shape: torch.Size([1, 64, 112, 112])
vgg_block_2 output shape: torch.Size([1, 128, 56, 56])
vgg_block_3 output shape: torch.Size([1, 256, 28, 28])
vgg_block_4 output shape: torch.Size([1, 512, 14, 14])
vgg_block_5 output shape: torch.Size([1, 512, 7, 7])
fc output shape: torch.Size([1, 10])


构建一个通道数更少的VGG以便于测试

In [6]:
ratio = 8
small_conv_arch = [(1, 1, 64//ratio), (1, 64//ratio, 128//ratio), (2, 128//ratio, 256//ratio), 
                   (2, 256//ratio, 512//ratio), (2, 512//ratio, 512//ratio)]
net = vgg(small_conv_arch, fc_features // ratio, fc_hidden_units // ratio)
print(net)

Sequential(
  (vgg_block_1): Sequential(
    (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (vgg_block_2): Sequential(
    (0): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (vgg_block_3): Sequential(
    (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (vgg_block_4): Sequential(
    (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1

训练网络

In [7]:
batch_size = 64

train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=224)

lr, num_epochs = 0.001, 5
optimizer = optim.Adam(net.parameters(), lr=lr)

# 这个作者在train_ch5里面设定loss为nn.CrossEntropyLoss()了
d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)

training on  cuda
epoch 1, loss 0.5639, train acc 0.793, test acc 0.873, time 98.4 sec
epoch 2, loss 0.3174, train acc 0.886, test acc 0.891, time 95.4 sec
epoch 3, loss 0.2726, train acc 0.903, test acc 0.911, time 96.3 sec
epoch 4, loss 0.2390, train acc 0.914, test acc 0.917, time 100.5 sec
epoch 5, loss 0.2172, train acc 0.922, test acc 0.915, time 99.2 sec
