## **VGG网络 - 使用重复元素的网络**


**参考文献：** [1] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.


VGG的组成规律是：连续使用数个相同的填充为1、窗口形状为$3\times 3$ 的卷积层后接上一个步幅为2、窗口形状为 $2\times 2$ 的最大池化层。卷积层保持输入的高和宽不变，而池化层则对其减半。

In [1]:
import d2lzh as d2l
from mxnet import gluon,init,nd
from mxnet.gluon import nn

def vgg_block(num_convs, num_channels):
    blk = nn.Sequential()
    for _ in range(num_convs):
        blk.add(nn.Conv2D(num_channels, kernel_size = 3,padding = 1, activation = 'relu'))
        
    blk.add(nn.MaxPool2D(pool_size = 2, strides = 2))        
    return blk



与AlexNet、LeNet一样，VGG网络由卷积层模型后接全连接层模块构成。卷积层模块串联数个vgg_block，其超参数由变量conv_arch定义。该变量指定了每个VGG块里卷积层个数和输出通道数。全连接模块则与AlexNet一样。

In [2]:
# 构造一个VGG网络。它有5个卷积块，前2块使用单卷积层，后3块使得双卷积层。 第1块输出通道是64，之后每次对输出通道翻倍，直到变为512。
# 因为这个网络使用了8个卷积层和3个全连接层，所以常称为VGG-11
conv_arch = ((1,64),(1,128),(2,256),(2,512),(2,512))

In [3]:
# VGG-11 实现

def vgg(conv_arch):
    net = nn.Sequential()
    
    # 卷积层部分
    for (num_convs, num_channels) in conv_arch:
        net.add(vgg_block(num_convs,num_channels))
    # 全连接层
    net.add(nn.Dense(4096, activation = 'relu'), nn.Dropout(0.5),
            nn.Dense(4096, activation = 'relu'), nn.Dropout(0.5),
            nn.Dense(10))
    return net

net = vgg(conv_arch)

In [4]:
# 构造一个高和宽均为224的单通道数据来观察每一层的输出形状
net.initialize()
X = nd.random.uniform(shape = (1,1,224,224))

for blk in net:
    X = blk(X)
    print(blk.name, "output shsape:\t", X.shape)

sequential1 output shsape:	 (1, 64, 112, 112)
sequential2 output shsape:	 (1, 128, 56, 56)
sequential3 output shsape:	 (1, 256, 28, 28)
sequential4 output shsape:	 (1, 512, 14, 14)
sequential5 output shsape:	 (1, 512, 7, 7)
dense0 output shsape:	 (1, 4096)
dropout0 output shsape:	 (1, 4096)
dense1 output shsape:	 (1, 4096)
dropout1 output shsape:	 (1, 4096)
dense2 output shsape:	 (1, 10)


In [6]:
# 训练网络
ratio = 4
small_cov_arch = [(pair[0], pair[1] // ratio) for pair in conv_arch]
print(small_cov_arch)

net = vgg(small_cov_arch)

[(1, 16), (1, 32), (2, 64), (2, 128), (2, 128)]


In [None]:
lr, num_epochs, batch_size, ctx = 0.05, 5, 128, d2l.try_gpu()

net.initialize(ctx = ctx, init = init.Xavier())
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': lr})

train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize = 224)
d2l.train_ch5(net, train_iter , test_iter, batch_size, trainer, ctx, num_epochs)