## VGG：使用重复元素的非常深的网络
我们从Alexnet看到网络的层数的激增。这个意味着即使是用Gluon手动写代码一层一层的堆每一层也很麻烦，更不用说从0开始了。幸运的是编程语言提供了很好的方法来解决这个问题：函数和循环。如果网络结构里面有大量重复结构，那么我们可以很紧凑来构造这些网络。第一个使用这种结构的深度网络是VGG。

### VGG架构
VGG的一个关键是使用很多有着相对小的kernel（$3×3$）的卷积层然后接上一个池化层，之后再将这个模块重复多次。下面我们先定义一个这样的块：

In [1]:
from mxnet.gluon import nn
import tensorflow as tf
slim = tf.contrib.slim

def vgg_block(input, num_convs, num_channels, scope='vgg_block'):
    with tf.variable_scope(scope) as sc:
        for i in xrange(num_convs):
            input = tf.pad(input, [[0,0],[1,1],[1,1],[0,0]])
            conv = slim.conv2d(input, num_channels, [3, 3], scope='conv' + str(i), padding='VALID')
            input = conv
        pool = slim.max_pool2d(conv, [2, 2], 2, scope='max_pool')
    return pool

我们实例化一个这样的块，里面有两个卷积层，每个卷积层输出通道是128：



然后我们定义如何将这些块堆起来：



In [2]:
def vgg_stack(input, architecture):
    for i, (num_convs, channels) in enumerate(architecture):
        out = vgg_block(input, num_convs, channels, 'vgg_block_'+str(i))
        input = out
    return out


这里我们定义一个最简单的一个VGG结构，它有8个卷积层，和跟Alexnet一样的3个全连接层。这个网络又称VGG 11.

In [3]:
num_outputs = 10
architecture = ((1,64), (1,128), (2,256), (2,512), (2,512))
def vggnet(input, is_training):
    with tf.variable_scope('vgg') as sc:
        with slim.arg_scope([slim.conv2d, slim.fully_connected]):
            output = vgg_stack(input, architecture)
            output = slim.flatten(output)
            fc1 = slim.fully_connected(output, 4096, scope='fc1')
            fc1 = slim.dropout(fc1, keep_prob=0.5, is_training=is_training)
            
            fc2 = slim.fully_connected(fc1, 4096, scope='fc2')
            fc2 = slim.dropout(fc2, keep_prob=0.5, is_training=is_training)     
            
            return slim.fully_connected(fc2, num_outputs, scope='fc3', activation_fn=None)
        

### 模型训练
这里跟Alexnet的训练代码一样除了我们只将图片扩大到$96×96$来节省些计算，和默认使用稍微大点的学习率。



In [4]:
import sys
import numpy as np

sys.path.append('../../utils')
import utils

data_dir = '../../data/fashion_mnist'
train_images, train_labels, test_images, test_labels = utils.load_data_fashion_mnist(data_dir, one_hot=True)
print train_images.shape
print test_images.shape

from tensorflow.contrib.learn.python.learn.datasets.mnist import DataSet
train_dataset = DataSet(train_images, train_labels, one_hot=True)
test_dataset = DataSet(test_images, test_labels, one_hot=True)

Extracting ../../data/fashion_mnist/train-images-idx3-ubyte.gz
Extracting ../../data/fashion_mnist/train-labels-idx1-ubyte.gz
Extracting ../../data/fashion_mnist/t10k-images-idx3-ubyte.gz
Extracting ../../data/fashion_mnist/t10k-labels-idx1-ubyte.gz
(60000, 28, 28, 1)
(10000, 28, 28, 1)


In [5]:
learning_rate = 1e-1
max_steps = 1000
batch_size = 64
height = width = 28
num_channels = 1
num_outputs = 10

tf.reset_default_graph()

input_placeholder = tf.placeholder(tf.float32, [None, height, width, num_channels])
resize_input = tf.image.resize_images(input_placeholder, [96, 96])
gt_placeholder = tf.placeholder(tf.int64, [None, num_outputs])
is_training = tf.placeholder(tf.bool)


logits = vggnet(resize_input, is_training)

loss = tf.losses.softmax_cross_entropy(logits=logits, onehot_labels=gt_placeholder)
acc = utils.accuracy(logits, gt_placeholder)

test_images_reshape = np.reshape(np.squeeze(test_images), (test_images.shape[0], height, width, 1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate)

train_op = optimizer.minimize(loss)
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)
test_acc = []


for step in range(max_steps):
    data, label = train_dataset.next_batch(batch_size)
    data = np.reshape(data, (batch_size, height, width, num_channels))
    feed_dict = {input_placeholder: data, gt_placeholder: label, is_training: True}
    loss_, acc_, _ = sess.run([loss, acc, train_op], feed_dict=feed_dict)
    print("Batch %d, Loss: %f, Train acc %f " % (step, loss_, acc_))
        
for i in range(100):
    test_data, test_label = test_dataset.next_batch(100)
    test_data = np.reshape(test_data, (100, height, width, num_channels))
    test_loss_, test_acc_ = sess.run([loss, acc], feed_dict={input_placeholder: test_data, gt_placeholder: test_label, is_training: False})
    test_acc.append(test_acc_)
print ("Test Loss: %f, Test acc %f " % (np.mean(test_loss_), np.mean(test_acc_)))


  return np.fromstring(tensor.tensor_content, dtype=dtype).reshape(shape)


> <ipython-input-3-6930ad184bd9>(7)vggnet()
-> output = vgg_stack(input, architecture)
(Pdb) c
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

Batch 0, Loss: 2.302390, Train acc 0.109375 
Batch 1, Loss: 2.301980, Train acc 0.093750 
Batch 2, Loss: 2.303169, Train acc 0.156250 
Batch 3, Loss: 2.302202, Train acc 0.109375 
Batch 4, Loss: 2.301865, Train acc 0.109375 
Batch 5, Loss: 2.302718, Train acc 0.109375 
Batch 6, Loss: 2.300086, Train acc 0.125000 
Batch 7, Loss: 2.297577, Train acc 0.140625 
Batch 8, Loss: 2.303068, Train acc 0.109375 
Batch 9, Loss: 2.304276, Train acc 0.046875 
Batch 10, Loss: 2.306055, Train acc 0.062500 
Batch 11, Loss: 2.295269, Train acc 0.140625 
Batch 12, Loss: 2.298706, Train acc 0.109375 
Batch 13, Loss: 2.301336, Train acc 0.062500 
Batch 14, Loss: 2.298002, Train acc 0.125000 
Batch 15, Loss: 2.297425, Train acc 

Batch 166, Loss: 1.251129, Train acc 0.640625 
Batch 167, Loss: 1.287834, Train acc 0.531250 
Batch 168, Loss: 0.928725, Train acc 0.609375 
Batch 169, Loss: 0.983610, Train acc 0.609375 
Batch 170, Loss: 1.074163, Train acc 0.562500 
Batch 171, Loss: 1.043530, Train acc 0.656250 
Batch 172, Loss: 1.174133, Train acc 0.515625 
Batch 173, Loss: 0.794117, Train acc 0.671875 
Batch 174, Loss: 1.143815, Train acc 0.484375 
Batch 175, Loss: 1.761341, Train acc 0.328125 
Batch 176, Loss: 1.727128, Train acc 0.562500 
Batch 177, Loss: 1.318952, Train acc 0.562500 
Batch 178, Loss: 1.116539, Train acc 0.546875 
Batch 179, Loss: 1.156007, Train acc 0.515625 
Batch 180, Loss: 1.084418, Train acc 0.546875 
Batch 181, Loss: 1.024498, Train acc 0.484375 
Batch 182, Loss: 1.299186, Train acc 0.484375 
Batch 183, Loss: 1.313792, Train acc 0.437500 
Batch 184, Loss: 1.024073, Train acc 0.625000 
Batch 185, Loss: 1.170929, Train acc 0.578125 
Batch 186, Loss: 1.101246, Train acc 0.609375 
Batch 187, Lo

KeyboardInterrupt: 