### AlexNet简介
+ **作者** ： Hinton的学生Alex Keizhevsky
+ **特点** ： 首次在CNN中成功应用ReLU，Dropout和LRN等Trick。使用了GPU加速。
+ **结构：** 6亿3千万个连接，6千万个参数，65万个神经元。5个卷积层。3个卷积层后面连接最大池化层，3个全连接层。

AlexNet可以说是神经网络在低谷期的第一次发生，确立了深度学习在计算机视觉的统治地位。同时推动了深度学习在语音识别，自然语言处理，强化学习等领域的拓展。

### 新技术点如下

1. 成功使用ReLU最为激活函数，并验证其结果在较深网络超过了Sigmoid。解决Sigmoid在网络较深时的梯度弥散问题。
2. 使用Dropout避免过拟合。AlexNet将Dropout实用化，并证实了它的效果。AlexNet中，只要后面几个全连接层使用了Dropout
3. 在CNN中使用重叠最大池化，此前CNN普遍使用平均池化，AlexNet全部使用最大池化，避免平均池化的模糊效果。**并且，AlexNet提出让步长比池化核尺寸小，这样池化层输出之间会有重叠和覆盖，提升了丰富性**
4. 提出了LRN层，创建竞争机制。增强了模型泛化能力
5. 使用CUDA加速训练，利用GPU。
6. 数据增强。随机从256\*256图像中截取224\*224的区域，相当于增加了\\((256-224)^2=2048\\)倍的数据量。**进行预测是，则是取图片四个角加中间共5个位置，并进行左右翻转，一共获得10张图片，对他们预测并对10次结果求均值，** 同时，AlexNet论文中提到了对图像的RGB数据进行PCA处理，并对主成分做一个标准差为0.1的高斯扰动，增加一些噪声。这个Trick让错误率再下降1%。

### AlexNet结构

1. 输入图片尺寸224x224
2. 第一个卷积层 卷积核尺寸[11,11], 步长4， 96个卷积核
3. LRN层
4. 3x3最大池化层，步长2
 .... 之后的卷积层卷积核尺寸都较小，都是5x5或者3x3，并且步长都为1。而最大池化保持为3x3，步长为2。
 
 下面是AlexNet参数情况
 ![3](img/3.png)
 
 可以看到，虽然每一个卷积层占整个网络参数量的1%不到，但是如果去掉任何一个卷积层都会使网络的分类性能大幅地下降。

In [1]:
from datetime import datetime
import math
import time
import tensorflow as tf

In [2]:
batch_size = 32
num_batches = 100

In [3]:
def print_activations(t):
    print(t.op.name, " ", t.get_shape().as_list())

In [25]:
def inference(images):
    parameters = []
    
    # 第一个卷积层
    with tf.name_scope('conv1') as scope:
        kernel = tf.Variable(
            tf.truncated_normal([11, 11, 3, 64], dtype=tf.float32, stddev=1e-1),
            name = 'weights')
        conv = tf.nn.conv2d(images, kernel, strides=[1, 4, 4, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32),
                             trainable=True, 
                             name = 'biases')
        bias = tf.nn.bias_add(conv, biases)
        conv1 = tf.nn.relu(bias, name = scope)
        print_activations(conv1)
        parameters += [kernel, biases]
        
    lrn1 = tf.nn.lrn(conv1, 4, bias=1.0, alpha=0.001/9, beta=0.75, name='lrn1')
    pool1 = tf.nn.max_pool(lrn1, 
                            ksize=[1, 3, 3, 1], 
                            strides=[1, 2, 2, 1], 
                            padding='VALID',
                            name = 'pool1')
    print_activations(pool1)
        
    # 第二个卷积层
    with tf.name_scope('conv2') as scope:
        kernel = tf.Variable(
            tf.truncated_normal([5,5,64, 192], dtype=tf.float32, stddev=1e-1),
            name = 'weights')
        conv = tf.nn.conv2d(pool1, kernel, strides=[1,1,1,1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[192], dtype=tf.float32),
                            trainable=True,
                            name = 'biases')
        bias = tf.nn.bias_add(conv, biases)
        conv2 = tf.nn.relu(bias, name = scope)
        print_activations(conv2)
        parameters += [kernel, biases]
    lrn2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001/9, beta=0.75, name='lrn2')
    pool2 = tf.nn.max_pool(lrn2, 
                           ksize=[1,3,3,1], 
                           strides=[1,2,2,1], 
                           padding='VALID', 
                           name='pool2')
    print_activations(pool2)
    
    # 第三个卷积层
    with tf.name_scope('conv3') as scope:
        kernel = tf.Variable(
            tf.truncated_normal([3, 3, 192, 384], dtype=tf.float32, stddev=1e-1), 
            name = 'weights')
        conv = tf.nn.conv2d(pool2, kernel, strides=[1,1,1,1], padding='SAME')
        biases = tf.Variable(
            tf.constant(0.0, shape=[384], dtype=tf.float32), 
            trainable=True, 
            name = 'biases')
        bias = tf.nn.bias_add(conv, biases)
        conv3 = tf.nn.relu(bias, name = scope)
        parameters += [kernel, biases]
        print_activations(conv3)
    
    # 第四个卷积层
    with tf.name_scope('conv4') as scope:
        kernel = tf.Variable(
            tf.truncated_normal([3,3,384,256], dtype=tf.float32, stddev=1e-1), 
            name='weights')
        conv = tf.nn.conv2d(conv3, kernel, [1,1,1,1], padding='SAME')
        biases = tf.Variable(
            tf.constant(0.0, shape=[256], dtype=tf.float32), 
            trainable=True, 
            name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv4 = tf.nn.relu(bias, name = scope)
        parameters += [kernel, biases]
        print_activations(conv4)
    
    # 第五个卷积层
    with tf.name_scope('conv5') as scope:
        kernel = tf.Variable(
            tf.truncated_normal([3,3,256,256], dtype=tf.float32, stddev=1e-1), 
            name = 'weights')
        conv = tf.nn.conv2d(conv4, kernel, [1,1,1,1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), 
                             trainable=True, 
                             name = 'biases')
        bias = tf.nn.bias_add(conv, biases)
        conv5 = tf.nn.relu(bias, name = scope)
        parameters += [kernel, biases]
        print_activations(conv5)
    pool5 = tf.nn.max_pool(conv5, ksize=[1,3,3,1], strides=[1,2,2,1], padding="VALID", name='pool5')
    print_activations(pool5)
    
    reshape = tf.reshape(pool5, shape=[batch_size, -1], name='reshape')
    dim = reshape.get_shape()[1].value
    
    # 第一个全连接层
    with tf.name_scope('fat1') as scope:
        weights = tf.Variable(
            tf.truncated_normal([dim, 4096], dtype=tf.float32, stddev=0.04), 
            name = "weights")
        biases = tf.Variable(tf.constant(0.0, shape=[4096]), trainable=True, name='biases')
        fat1 = tf.nn.relu(tf.matmul(reshape, weights) + biases, name= scope)
        parameters += [weights, biases]
        print_activations(fat1)
    
    # 第二个全连接层
    with tf.name_scope('fat2') as scope:
        weights = tf.Variable(
            tf.truncated_normal([4096, 4096], dtype=tf.float32, stddev=0.001), 
            name = 'weights')
        biases = tf.Variable(tf.constant(0.0, shape=[4096]), trainable=True, name = 'biases')
        fat2 = tf.nn.relu(tf.matmul(fat1, weights) + biases, name=scope)
        parameters += [weights, biases]
        print_activations(fat2)
    
    # 第三个全连接层
    with tf.name_scope('fat3') as scope:
        weights = tf.Variable(
            tf.truncated_normal([4096, 1000], dtype=tf.float32, stddev=0.01), 
            name = 'weights')
        biases = tf.Variable(tf.constant(0.0, shape=[1000]), trainable=True, name = 'biases')
        fat3 = tf.nn.softmax(tf.matmul(fat2, weights) + biases, name=scope)
        parameters += [weights, biases]
        print_activations(fat3)
    
    return fat3, parameters     

In [29]:
def time_tensorflow_run(sess, target, info_string):
    num_step_burn_in = 10
    total_duration = 0.0
    total_duration_squared = 0.0
    
    for i in range(num_batches + num_step_burn_in):
        start_time = time.time()
        _ = sess.run(target)
        duration = time.time() - start_time
        if i >= num_step_burn_in:
            if not i % 10:
                print('%s: step %d, duration=%.3f'%(datetime.now(), i - num_step_burn_in, duration))
            total_duration += duration
            total_duration_squared += duration * duration
    
    mn = total_duration / num_batches
    vr = total_duration_squared / num_batches - mn * mn
    sd = math.sqrt(vr)
    print('%s:%s across %d steps, %.3f +/- %.3sec/batch' %(datetime.now(), info_string, num_batches, mn, sd))

In [30]:
def run_benchmark():
    with tf.Graph().as_default():
        image_size = 224
        images = tf.Variable(tf.random_normal([batch_size, 
                                               image_size, 
                                               image_size, 3],
                                              dtype=tf.float32, 
                                              stddev=1e-1))
        fat3, parameters = inference(images)
        
        init = tf.global_variables_initializer()
        sess = tf.Session()
        sess.run(init)
        time_tensorflow_run(sess, fat3, "Forward")
        objective = tf.nn.l2_loss(fat3)
        grad = tf.gradients(objective, parameters)
        time_tensorflow_run(sess, grad, "Forward-backward")

In [31]:
run_benchmark()

conv1   [32, 56, 56, 64]
pool1   [32, 27, 27, 64]
conv2   [32, 27, 27, 192]
pool2   [32, 13, 13, 192]
conv3   [32, 13, 13, 384]
conv4   [32, 13, 13, 256]
conv5   [32, 13, 13, 256]
pool5   [32, 6, 6, 256]
fat1   [32, 4096]
fat2   [32, 4096]
Tensor("fat3:0", shape=(32, 1000), dtype=float32)
2018-02-11 16:46:15.826222: step 0, duration=2.228
2018-02-11 16:46:33.553098: step 10, duration=1.797
2018-02-11 16:46:52.707389: step 20, duration=2.057
2018-02-11 16:47:11.014037: step 30, duration=1.753
2018-02-11 16:47:28.597603: step 40, duration=1.784
2018-02-11 16:47:46.148612: step 50, duration=1.760
2018-02-11 16:48:03.999014: step 60, duration=1.758
2018-02-11 16:48:22.689601: step 70, duration=1.946
2018-02-11 16:48:41.990248: step 80, duration=2.436
2018-02-11 16:49:00.337716: step 90, duration=1.860
2018-02-11 16:49:17.557719:Forward across 100 steps, 1.840 +/- 0.1ec/batch
2018-02-11 16:50:37.322553: step 0, duration=6.981
2018-02-11 16:51:48.343563: step 10, duration=7.036
2018-02-11 16