# 深层卷积神经网络 - AlexNet

1.制约深度学习发展的两个因素有
* 数据
* 硬件
深度学习是指通过学习浅层特征层层递进，进而可以表示出高层语义，是一种层进式的结构。

2.**AlexNet**

* 与相对较小的LeNet相比，AlexNet包含8层变换，其中有五层卷积和两层全连接隐含层，以及一个输出层。

* 第一层中卷积核的大小是$11 \times 11$，接着第二层中的是$5 \times 5$，之后都是$3 \times 3$。此外，第一，第二和第五个卷积层之后都跟了有重叠的大小为$3 \times 3$，步距为$2 \times 2$的池化操作。

* 五层卷积层的输出通道数分别为\[96, 256, 384, 384, 256\]

* 紧接着卷积层，原版的AlexNet有每层大小为4096个节点的全连接层，这两个巨大的全连接层带来将近1GB的模型大小。

下面我们实现AlexNet。

我们将图片resize到$224 \times 224$，这是原论文中的图片尺寸，我们使用CIFAR10数据集。

另外科普一下$ImageNet$，包含1000个类别，100万张图片，每张图片大小为$256 \times 256$。

In [1]:
import mxnet as mx

from mxnet import nd
from mxnet import gluon
from mxnet import autograd

import utils

ctx = mx.gpu()

## 加载数据

In [2]:
batch_size = 64
train_data, test_data = utils.load_dataset(batch_size, resize=224, data_type='cifar10')

In [3]:
for data, _ in train_data:
    data = data.as_in_context(ctx)
    print(data.shape)
    break

(64, 3, 224, 224)


## 定义模型

**AlexNet**

* 与相对较小的LeNet相比，AlexNet包含8层变换，其中有五层卷积和两层全连接隐含层，以及一个输出层。

* 第一层中卷积核的大小是$11 \times 11$，接着第二层中的是$5 \times 5$，之后都是$3 \times 3$。此外，第一，第二和第五个卷积层之后都跟了有重叠的大小为$3 \times 3$，步距为$2 \times 2$的池化操作。

* 五层卷积层的输出通道数分别为\[96, 256, 384, 384, 256\]

* 紧接着卷积层，原版的AlexNet有每层大小为4096个节点的全连接层，这两个巨大的全连接层带来将近1GB的模型大小。


In [4]:
def get_net():
    net = gluon.nn.Sequential()
    with net.name_scope():
        # 1st Conv Layer
        net.add(gluon.nn.Conv2D(96, kernel_size=(11, 11), strides=(4,4), activation='relu'))
        net.add(gluon.nn.MaxPool2D(pool_size=(3,3), strides=(2,2)))
        # 2nd Conv Layer
        net.add(gluon.nn.Conv2D(256, kernel_size=(5,5), strides=(1,1), activation='relu'))
        net.add(gluon.nn.MaxPool2D(pool_size=(3,3), strides=(2,2)))
        # 3rd Conv Layer 
        net.add(gluon.nn.Conv2D(384, kernel_size=(3,3), strides=(1,1), activation='relu'))
        # 4th Conv Layer 
        net.add(gluon.nn.Conv2D(384, kernel_size=(3,3), strides=(1,1), activation='relu'))
        # 5th Conv Layer 
        net.add(gluon.nn.Conv2D(256, kernel_size=(3,3), strides=(1,1), activation='relu'))
        # 6th fc1 Layer 
        net.add(gluon.nn.Flatten()) # Flatten
        net.add(gluon.nn.Dense(4096, activation='relu'))
        # 7th fc2 Layer
        net.add(gluon.nn.Dense(4096, activation='relu'))
        # 8th output Layer 
        net.add(gluon.nn.Dense(10, activation='relu'))
    return net

In [5]:
net = get_net()
net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx, force_reinit=True)

In [6]:
net.collect_params()

sequential0_ (
  Parameter sequential0_conv0_weight (shape=(96, 0, 11, 11), dtype=<class 'numpy.float32'>)
  Parameter sequential0_conv0_bias (shape=(96,), dtype=<class 'numpy.float32'>)
  Parameter sequential0_conv1_weight (shape=(256, 0, 5, 5), dtype=<class 'numpy.float32'>)
  Parameter sequential0_conv1_bias (shape=(256,), dtype=<class 'numpy.float32'>)
  Parameter sequential0_conv2_weight (shape=(384, 0, 3, 3), dtype=<class 'numpy.float32'>)
  Parameter sequential0_conv2_bias (shape=(384,), dtype=<class 'numpy.float32'>)
  Parameter sequential0_conv3_weight (shape=(384, 0, 3, 3), dtype=<class 'numpy.float32'>)
  Parameter sequential0_conv3_bias (shape=(384,), dtype=<class 'numpy.float32'>)
  Parameter sequential0_conv4_weight (shape=(256, 0, 3, 3), dtype=<class 'numpy.float32'>)
  Parameter sequential0_conv4_bias (shape=(256,), dtype=<class 'numpy.float32'>)
  Parameter sequential0_dense0_weight (shape=(4096, 0), dtype=<class 'numpy.float32'>)
  Parameter sequential0_dense0_bias (s

In [7]:
net(data).shape

(64, 10)

## 定义优化器和损失函数

In [8]:
softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()

In [9]:
learning_rate = 0.1
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': learning_rate})

## 训练

In [10]:
from time import time

epochs = 10

niter = 0
moving_loss = .0
smoothing_constant = 0.01

print("Start training on ", ctx)
for epoch in range(epochs):
    start = time()
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(ctx)
        label = label.as_in_context(ctx)
        with autograd.record():
            output = net(data)
            loss = softmax_cross_entropy(output, label)
        loss.backward()
        trainer.step(batch_size)
        
        niter += 1
        curr_loss = nd.mean(loss).asscalar()
        moving_loss = (1 - smoothing_constant) * moving_loss + smoothing_constant * curr_loss
        estimated_loss = moving_loss / (1 - (1-smoothing_constant)**niter)
        
    train_acc = utils.evaluate_accuracy_gluon(train_data, net, ctx)
    test_acc = utils.evaluate_accuracy_gluon(test_data, net, ctx)
    
    print("Epoch %s, Train Moving Loss %s, Train acc %s, Test acc %s, Time consume %s s."
         % (epoch, estimated_loss, train_acc, test_acc, time() - start))

Start training on  gpu(0)
Epoch 0, Train Moving Loss 2.29514025723, Train acc 0.12506, Test acc 0.1222, Time consume 126.42092108726501.
Epoch 1, Train Moving Loss 2.19277653277, Train acc 0.22118, Test acc 0.2248, Time consume 126.10781288146973.
Epoch 2, Train Moving Loss 2.0417418976, Train acc 0.29554, Test acc 0.2965, Time consume 126.2792866230011.
Epoch 3, Train Moving Loss 1.89395673118, Train acc 0.36768, Test acc 0.367, Time consume 125.96690487861633.
Epoch 4, Train Moving Loss 1.79321873896, Train acc 0.39568, Test acc 0.3849, Time consume 125.86386466026306.
Epoch 5, Train Moving Loss 1.70088283089, Train acc 0.42492, Test acc 0.4016, Time consume 126.08402061462402.
Epoch 6, Train Moving Loss 1.6306769425, Train acc 0.48108, Test acc 0.4435, Time consume 125.99323844909668.
Epoch 7, Train Moving Loss 1.54959068667, Train acc 0.4847, Test acc 0.4361, Time consume 126.04888653755188.
Epoch 8, Train Moving Loss 1.49294990862, Train acc 0.52216, Test acc 0.4523, Time consume 

In [11]:
filename = 'models/alexnet-cifar10-0000.params'
net.save_params(filename)