## **LeNet 网络 - 卷积神经网络**


**参考文献：** [1] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
https://www.researchgate.net/publication/2985446_Gradient-Based_Learning_Applied_to_Document_Recognition

<img src = "LeNet-5.png" width = 800 height = 400></img>

LeNet 分为卷积层和全连接层两个部分。

**卷积层块**里的基本单位是卷积层后接最大池化层：卷积层用来识别图像里的空间格式，如线条的物体局部，之后最大池化层则用来除外卷积层对位置的敏感性。卷积层块由两个 这样的基本单位重复堆叠构成。在卷积层块中，每个卷积层都使用 $5\times 5$ 的窗口，并在 输出上使用sigmoid激活函数。
第一个卷积层输出通道为6，第2个卷积层输出通道数则增加到16。这是因为第二个卷积层比第一个卷积层的输入的高和宽要小，所以增加输出通道使两个卷积层的参数尺寸类似。卷积层块的两个最大池化层窗口形状均为 $2\times 2$, 且步幅为 2. 由于池化窗口与步幅形状相同，池化窗口在输出上每次滑动所覆盖的区域互不重叠。

卷积层的输出形状为（批大小，通道，高，宽）。当卷积层块的输出传入全连接层块时，全连接层块会将小批量中每个样本变平（flatten)。也就是说，全连接层的输入形状将变成 二维，其中第一维是小批量中的样本，第二维是每个样本变平后的向量表示，且向量长度为通道、高和宽的乘积。全连接层块含3个全连接层。它们的输出个数分别是120、84和10，其中10为输出的类别个数。



#### **通过Sequential类来实现LeNet模型**

In [1]:
import d2lzh as d2l
import mxnet as mx
from mxnet import autograd,gluon,init,nd
from mxnet.gluon import loss as gloss,nn
import time 

net = nn.Sequential()
net.add(nn.Conv2D(channels = 6, kernel_size = 5, activation = "sigmoid"),
        nn.MaxPool2D(pool_size=2, strides = 2),
        nn.Conv2D(channels = 16, kernel_size = 5, activation = "sigmoid"),
        nn.MaxPool2D(pool_size = 2, strides = 2),
        nn.Dense(120,activation = "sigmoid"), # Dense会默认将（批大小，通道，高，宽）转换为（批大小，通道*高*宽）
        nn.Dense(84,activation = "sigmoid"),
        nn.Dense(10)
       )

In [2]:
# # 构造一个高和宽为32*32的单通道样本，并逐层进行前向计算来查看每个层的输出形状
# X = nd.random.uniform(shape = (1,1,32,32))
# net.initialize()
# for layer in net:
#     X = layer(X)
#     print(layer.name,"output shape: \t",X.shape)

In [3]:
# 获取数据和训练模型
batch_size = 256
train_iter,test_iter = d2l.load_data_fashion_mnist(batch_size = batch_size)

def try_gpu():
    try:
        ctx = mx.gpu()
        _ = nd.zeros((1,),ctx = ctx)
    except mx.base.MXNetError:
        ctx = mx.cpu()
    return ctx 

ctx = try_gpu()
ctx

gpu(0)

In [7]:
def evalluate_accuracy(data_iter,net,ctx):
    acc_sum, n = nd.array([0],ctx = ctx), 0
    for X,y in data_iter:
        # 如果ctx 代表GPU及相应的显存，将数据复制到显存上
        X,y = X.as_in_context(ctx), y.as_in_context(ctx).astype('float32')
        acc_sum += (net(X).argmax(axis = 1) ==y).sum()
        n += y.size
    return acc_sum.asscalar()/n

In [8]:
def train_ch5(net, train_iter,test_iter,batch_size, trainer, ctx, num_epochs):
    
    print("trainging on",ctx)
    loss = gloss.SoftmaxCrossEntropyLoss()
    
    for epoch in range(num_epochs):
        train_l_sum,train_acc_sum,n,start = 0.0,0.0,0,time.time()
       
        for X,y in train_iter:
            X,y = X.as_in_context(ctx), y.as_in_context(ctx)
            
            with autograd.record():
                y_hat = net(X)
                l = loss(y_hat,y).sum()
            l.backward()
            trainer.step(batch_size)
            y = y.astype("float32")
            
            train_l_sum += l.asscalar()
            train_acc_sum += (y_hat.argmax(axis = 1) ==y).sum().asscalar()
            
            n += y.size
            
        test_acc = evalluate_accuracy(test_iter, net, ctx)
        print("epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec" 
              %(epoch +1, train_l_sum/n, train_acc_sum/n, test_acc, time.time() - start))
        

In [10]:
lr,num_epochs = 0.9,50
net.initialize(force_reinit= True, ctx = ctx, init = init.Xavier())
trainer = gluon.Trainer(net.collect_params(),'sgd',{'learning_rate':lr})

train_ch5(net, train_iter, test_iter, batch_size, trainer, ctx, num_epochs)

trainging on gpu(0)
epoch 1, loss 2.3181, train acc 0.102, test acc 0.100, time 6.7 sec
epoch 2, loss 1.9018, train acc 0.267, test acc 0.585, time 6.7 sec
epoch 3, loss 0.9577, train acc 0.620, test acc 0.640, time 4.9 sec
epoch 4, loss 0.7430, train acc 0.711, test acc 0.745, time 4.4 sec
epoch 5, loss 0.6484, train acc 0.744, test acc 0.750, time 4.3 sec
epoch 6, loss 0.6005, train acc 0.763, test acc 0.782, time 4.5 sec
epoch 7, loss 0.5557, train acc 0.780, test acc 0.804, time 4.4 sec
epoch 8, loss 0.5181, train acc 0.797, test acc 0.818, time 4.4 sec
epoch 9, loss 0.4907, train acc 0.808, test acc 0.820, time 4.4 sec
epoch 10, loss 0.4667, train acc 0.820, test acc 0.831, time 4.3 sec
epoch 11, loss 0.4439, train acc 0.831, test acc 0.841, time 4.4 sec
epoch 12, loss 0.4277, train acc 0.839, test acc 0.842, time 4.4 sec
epoch 13, loss 0.4131, train acc 0.844, test acc 0.852, time 4.4 sec
epoch 14, loss 0.4006, train acc 0.850, test acc 0.857, time 4.4 sec
epoch 15, loss 0.3881, 