## 网络中的网络
Alexnet之后一个重要的工作是Network in Network（NiN），其提出的两个想法影响了后面的网络设计。

首先一点是注意到卷积神经网络一般分成两块，一块主要由卷积层构成，另一块主要是全连接层。在Alexnet里我们看到如何把卷积层块和全连接层分别加深加宽从而得到深度网络。另外一个自然的想法是，我们可以串联数个卷积层块和全连接层块来构建深度网络。

![image.png](http://zh.gluon.ai/_images/nin.svg)
不过这里的一个难题是，卷积的输入输出是4D矩阵，然而全连接是2D。同时在卷积神经网络里我们提到如果把4D矩阵转成2D做全连接，这个会导致全连接层有过多的参数。NiN提出只对通道层做全连接并且像素之间共享权重来解决上述两个问题。就是说，我们使用kernel大小是$1×1$的卷积。

下面代码定义一个这样的块，它由一个正常的卷积层接上两个kernel是$1×1$的卷积层构成。后面两个充当两个全连接层的角色。



In [1]:
from mxnet.gluon import nn
import tensorflow as tf
slim = tf.contrib.slim
import pdb

def mlpconv(input, channels, kernel_size, padding, scope, strides=1, max_pooling=True):
    input = tf.pad(input, [[0,0],[padding,padding],[padding,padding],[0,0]])
    conv1 = slim.conv2d(input, channels, [kernel_size, kernel_size], strides, scope=scope+'_conv1')
    conv2 = slim.conv2d(conv1, channels, [1, 1], strides, scope=scope+'_conv2')
    output = slim.conv2d(conv2, channels, [1, 1], strides, scope=scope+'_conv3')
    if max_pooling:
        output = slim.max_pool2d(output, [3, 3], 2, scope=scope+'_max_pool')
    return output


In [2]:
import numpy as np

y = np.random.uniform(size=(32, 16, 16, 3)).astype(np.float32)
print mlpconv(y, 64, 3, 0, 'haha')


Tensor("haha_max_pool/MaxPool:0", shape=(32, 7, 7, 64), dtype=float32)


  return np.fromstring(tensor.tensor_content, dtype=dtype).reshape(shape)


NiN的卷积层的参数跟Alexnet类似，使用三组不同的设定

- kernel: $11×11$, channels: 96
- kernel: $5×5$, channels: 256
- kernel: $3×3$, channels: 384

除了使用了$1×1$卷积外，NiN在最后不是使用全连接，而是使用通道数为输出类别个数的mlpconv，外接一个平均池化层来将每个通道里的数值平均成一个标量。



In [3]:
def ninnet(input, is_training):
    with tf.variable_scope('nin') as sc:
        mlpconv1 = mlpconv(input, 96, 11, 0, 'mlpconv1', strides=4)
        mlpconv2 = mlpconv(mlpconv1, 256, 5, 2, 'mlpconv2')
        mlpconv3 = mlpconv(mlpconv2, 384, 3, 1, 'mlpconv3')
        dp = slim.dropout(mlpconv3, keep_prob=0.5, is_training=is_training)
        # 目标类为10类
        mlpconv4 = mlpconv(dp, 10, 3, 1, 'mlpconv4', max_pooling=False)
        # 输入为 batch_size x 4 x 4 x 10, 通过AvgPool2D转成
        # batch_size x 1 x 1 x 10。
        avg_pool = slim.avg_pool2d(mlpconv4, [3, 3], scope='global_avg_pooling')
        return slim.flatten(avg_pool)

### 获取数据并训练
跟Alexnet类似，但使用了更大的学习率。

In [4]:
import sys
import numpy as np

sys.path.append('../../utils')
import utils

data_dir = '../../data/fashion_mnist'
train_images, train_labels, test_images, test_labels = utils.load_data_fashion_mnist(data_dir, one_hot=True)
print train_images.shape
print test_images.shape

from tensorflow.contrib.learn.python.learn.datasets.mnist import DataSet
train_dataset = DataSet(train_images, train_labels, one_hot=True)
test_dataset = DataSet(test_images, test_labels, one_hot=True)

Extracting ../../data/fashion_mnist/train-images-idx3-ubyte.gz
Extracting ../../data/fashion_mnist/train-labels-idx1-ubyte.gz
Extracting ../../data/fashion_mnist/t10k-images-idx3-ubyte.gz
Extracting ../../data/fashion_mnist/t10k-labels-idx1-ubyte.gz
(60000, 28, 28, 1)
(10000, 28, 28, 1)


In [5]:
learning_rate = 1e1
max_steps = 1000
batch_size = 128
height = width = 28
num_channels = 1
num_outputs = 10

tf.reset_default_graph()

input_placeholder = tf.placeholder(tf.float32, [None, height, width, num_channels])
resize_input = tf.image.resize_images(input_placeholder, [224, 224])
gt_placeholder = tf.placeholder(tf.int64, [None, num_outputs])
is_training = tf.placeholder(tf.bool)


logits = ninnet(resize_input, is_training)
loss = tf.losses.softmax_cross_entropy(logits=logits, onehot_labels=gt_placeholder)
acc = utils.accuracy(logits, gt_placeholder)

test_images_reshape = np.reshape(np.squeeze(test_images), (test_images.shape[0], height, width, 1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate)

train_op = optimizer.minimize(loss)
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)
test_acc = []

for step in range(max_steps):
    data, label = train_dataset.next_batch(batch_size)
    data = np.reshape(data, (batch_size, height, width, num_channels))
    feed_dict = {input_placeholder: data, gt_placeholder: label, is_training: True}
    loss_, acc_, _ = sess.run([loss, acc, train_op], feed_dict=feed_dict)
    print("Batch %d, Loss: %f, Train acc %f " % (step, loss_, acc_))
    if step % 100 == 0 and step != 0:
        test_data, test_label = test_dataset.next_batch(100)
        test_data = np.reshape(test_data, (100, height, width, num_channels))
        test_loss_, test_acc_ = sess.run([loss, acc], feed_dict={input_placeholder: test_data, gt_placeholder: test_label, is_training: False})
        print ("Test Loss: %f, Test acc %f " % (test_loss_, test_acc_))
        
for i in range(100):
    test_data, test_label = test_dataset.next_batch(100)
    test_data = np.reshape(test_data, (100, height, width, num_channels))
    test_loss_, test_acc_ = sess.run([loss, acc], feed_dict={input_placeholder: test_data, gt_placeholder: test_label, is_training: False})
    test_acc.append(test_acc_)
print ("Test Loss: %f, Test acc %f " % (np.mean(test_loss_), np.mean(test_acc_)))


> <ipython-input-3-a70e6f62bfb6>(5)ninnet()
-> mlpconv1 = mlpconv(input, 96, 11, 0, 'mlpconv1', strides=4)
(Pdb) n
> <ipython-input-3-a70e6f62bfb6>(6)ninnet()
-> mlpconv2 = mlpconv(mlpconv1, 256, 5, 2, 'mlpconv2')
(Pdb) n
> <ipython-input-3-a70e6f62bfb6>(7)ninnet()
-> mlpconv3 = mlpconv(mlpconv2, 384, 3, 1, 'mlpconv3')
(Pdb) n
> <ipython-input-3-a70e6f62bfb6>(8)ninnet()
-> dp = slim.dropout(mlpconv3, keep_prob=0.5, is_training=is_training)
(Pdb) n
> <ipython-input-3-a70e6f62bfb6>(10)ninnet()
-> mlpconv4 = mlpconv(dp, 10, 3, 1, 'mlpconv4', max_pooling=False)
(Pdb) s
--Call--
> <ipython-input-1-3f1192b28d80>(6)mlpconv()
-> def mlpconv(input, channels, kernel_size, padding, scope, strides=1, max_pooling=True):
(Pdb) n
> <ipython-input-1-3f1192b28d80>(7)mlpconv()
-> input = tf.pad(input, [[0,0],[padding,padding],[padding,padding],[0,0]])
(Pdb) max_pooling
False
(Pdb) padding
1
(Pdb) n
> <ipython-input-1-3f1192b28d80>(8)mlpconv()
-> conv1 = slim.conv2d(input, channels, [kernel_size, kernel_

BdbQuit: 

In [None]:
c
