### **LeNet: Handwritten Recognition**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm

from datetime import datetime

plt.rcParams['font.family'] = ['sans-serif']
plt.rcParams['font.sans-serif'] = ['SimHei'] # 用来正常显示中文标签`
plt.rcParams['axes.unicode_minus'] = False
plt.rcParams['figure.figsize'] = (8,5) #提前设置图片形状大小

%config InlineBackend.figure_format = 'svg' #在notebook中可以更好的显示，svg输出是一种向量化格式，缩放网页并不会导致图片失真。这行代码似乎只用在ipynb文件中才能使用。

%matplotlib inline 

import warnings
warnings.filterwarnings('ignore')  # 忽略一些warnings

# 切记训练大量DL模型的时候不要允许multiple output!

from IPython.display import display
pd.set_option('expand_frame_repr', False)
pd.set_option('display.unicode.ambiguous_as_wide', True)
pd.set_option('display.unicode.east_asian_width', True)
pd.set_option('display.width', 180)

In [None]:
# 0是GPU的reference index，如果有第二个GPU那么reference是1
import tensorflow as tf
from tensorflow import keras
tf.config.list_physical_devices('GPU')

In [None]:
tf.test.is_gpu_available()

### **Introduction to LeNet**

#### **LeNet：**
![](./img/LeNet.png)
- LeNet是最早期用于手写数字识别的基础CNN，通过梯度下降训练卷积神经网络实现手写数字识别
- 分为卷积层块和全连接层块两个部分，卷积层块里的基本单位是卷积层后接最大池化层
- 卷积层用来识别图像里的空间模式，如线条和物体局部，之后的最大池化层则用来降低卷积层对位置的敏感性。卷积层块由两个这样的卷积层基本单位重复堆叠构成
- 在卷积层块中，每个卷积层都使用$5\times 5$的窗口，并在输出上使用sigmoid激活函数
- 第一个卷积层输出通道数为6，第二个卷积层输出通道数则增加到16。这是因为第二个卷积层比第一个卷积层的输入的高和宽要小，所以增加输出通道使两个卷积层的参数尺寸类似
- 卷积层块的两个最大池化层的窗口形状均为$2\times 2$，且步幅为2。由于池化窗口与步幅形状相同，池化窗口在输入上每次滑动所覆盖的区域互不重叠
- 当卷积层块的输出传入全连接层块时，全连接层块会将小批量中每个样本变平（flatten）。也就是说，全连接层的输入形状将变成二维，其中第一维是小批量中的样本，第二维是每个样本变平后的向量表示，且向量长度为通道、高和宽的乘积。全连接层块含3个全连接层。它们的输出个数分别是120、84和10，其中10为输出的类别个数

#### **Note:**

**LeNet论文仅是提供了最经典简洁的一种CNN的嵌入方式，当时的算力并不充足，网络结构也相对简单。不同的模块以及参数可以自行修改和添加，在文章基础上搭建自己的CNN model。**

In [None]:
# 使用sequential class实现LeNet
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense

net = Sequential([
    Conv2D(filters=6, kernel_size=5, activation='sigmoid', input_shape=(28, 28, 1)), # 输入卷积层，28*28的一个一维即黑白图像
    MaxPool2D(pool_size=2, strides=2), #池化层大小，步幅
    Conv2D(filters=16, kernel_size=5, activation='sigmoid'),
    MaxPool2D(pool_size=2, strides=2),
    Flatten(), #伸展成向量
    Dense(120, activation='sigmoid'),
    Dense(84, activation='sigmoid'),
    Dense(10, activation='sigmoid')
])

In [None]:
# 构造一个高和宽均为28的单通道数据样本，并逐层进行前向计算来查看每个层的输出形状

X = tf.random.uniform((1, 28, 28, 1))
for layer in net.layers:
    X = layer(X)
    print(layer.name, 'output shape\t', X.shape)

可以看到，在卷积层块中输入的高和宽在逐层减小。卷积层由于使用高和宽均为5的卷积核，从而将高和宽分别减小4，而池化层则将高和宽减半，但通道数则从1增加到16。全连接层则逐层减少输出个数，直到变成图像的类别数10。

### **Kannada-MNIST**

数据组成为一个training set和两个test set：
* Kannada-MNIST：28x28灰度图像，训练集包含60000张图片，测试集包含10000张图片，由熟悉Kannada语言的志愿者书写
* Dig-MNIST：同样是28x28灰度图像，作为一个额外的测试集，包含10240张图片，由非母语的志愿者进行模仿和书写

#### **Standard LeNet**

In [None]:
def load_data():
    train_images = np.load('./data/Kannada_MNIST/X_kannada_MNIST_train.npz')['arr_0']
    train_labels = np.load('./data/Kannada_MNIST/y_kannada_MNIST_train.npz')['arr_0']
    
    test_images = np.load('./data/Kannada_MNIST/X_kannada_MNIST_test.npz')['arr_0']
    test_labels = np.load('./data/Kannada_MNIST/y_kannada_MNIST_test.npz')['arr_0']
    
    test_images_dig = np.load('./data/Dig_MNIST/X_dig_MNIST.npz')['arr_0']
    test_labels_dig = np.load('./data/Dig_MNIST/y_dig_MNIST.npz')['arr_0']
    
    # 改变数据格式
    # 最后1代表人为加上一个通道维
    train_images = np.reshape(train_images, (train_images.shape[0], train_images.shape[1], train_images.shape[2], 1))
    test_images = np.reshape(test_images, (test_images.shape[0], test_images.shape[1], test_images.shape[2], 1))
    test_images_dig = np.reshape(test_images_dig, (test_images_dig.shape[0], test_images_dig.shape[1], test_images_dig.shape[2], 1))
    
    # 归一化
    train_images = train_images.astype('float32') / 255
    test_images = test_images.astype('float32') / 255
    test_images_dig = test_images_dig.astype('float32') / 255
    
    return train_images, train_labels, test_images, test_labels, test_images_dig, test_labels_dig

train_images, train_labels, test_images, test_labels, test_images_dig, test_labels_dig = load_data()

In [None]:
optimizer = 'adam'

net.compile(optimizer=optimizer,
            loss='sparse_categorical_crossentropy',
            metrics=['accuracy'])

net.fit(train_images, train_labels, epochs=5, validation_split=0.1)
loss, acc = net.evaluate(test_images, test_labels, verbose=2)
acc
loss, acc = net.evaluate(test_images_dig, test_labels_dig, verbose=2)
# 结果差，因为这些志愿者并非以这个语言为母语
acc

#### **Self-defined CNN**

In [None]:
from tensorflow.keras.layers import Dropout

model = Sequential()

# 算力足够，多用点卷积层，kernel size小一点
model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same', 
                 activation ='relu', input_shape = (28, 28, 1)))
model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same', 
                 activation ='relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Dropout(0.25))

model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', 
                 activation ='relu'))
model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', 
                 activation ='relu'))
model.add(MaxPool2D(pool_size=(2,2), strides=(2,2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(256, activation = "relu"))
model.add(Dropout(0.5))
model.add(Dense(10, activation = "softmax"))

In [None]:
optimizer = 'adam'

model.compile(optimizer=optimizer,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=5, validation_split=0.1)
loss, acc = model.evaluate(test_images, test_labels, verbose=2)
acc
loss, acc = model.evaluate(test_images_dig, test_labels_dig, verbose=2)
acc

新建的CNN模型在测试集上的表现要优于LeNet模型。

**Complicated Built-in Keras Models (`https://keras.io/api/applications/`)**

Keras提供了内置的多种复杂网络结构，列出了大量keras为百万级+神经网络设计的复杂算法，提供了API可以直接调用。对于手写数字识别属于大材小用。不过需要注意各种不同网络的输入尺寸和格式要求。
* Xception
* VGG
* ResNet
* Inception
* MobileNet
* DenseNet
* NASNet
* EfficientNet
* ......

In [None]:
# 定义一个生成器，在读取数据的时候提前将图像高和宽扩大到模型要求的图像高和宽，符合网络的输入需求

from skimage.transform import resize
# dataGenerator后面会反复用到
# batch size是批量的数据大小，函数的作用是把数据本身的size变成resize_size的大小
def dataGenerator(X_train, y_train, batch_size, resize_size=(224, 224, 1)):    
    total_size = X_train.shape[0]
    
    while True:
        permutation = list(np.random.permutation(total_size))
        for i in range(total_size // batch_size):
            index = permutation[i * batch_size : (i + 1) * batch_size]
            X_batch = X_train[index]
            y_batch = y_train[index]
            
            if resize_size is not None:
                X_batch = resize(X_batch, (X_batch.shape[0], *resize_size))
                    
            yield X_batch, y_batch

train_images, train_labels, test_images, test_labels, test_images_dig, test_labels_dig = load_data()

In [None]:
batch_size = 8
resize_size = (32, 32, 1)

datagen = dataGenerator(train_images, train_labels, batch_size, resize_size)
for X_batch, y_batch in datagen:
    print("X_batch shape:", X_batch.shape, "y_batch shape:", y_batch.shape)
    break

#### **MobileNet**

- 以MobileNet为例，MobileNets是为移动和嵌入式设备提出的高效模型
- MobileNets使用深度可分离卷积(depthwise separable convolutions,即Xception变体结构)来构建轻量级深度神经网络

In [None]:
from tensorflow.keras.applications import MobileNet 
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D


# 先设置一个base model
base_model = MobileNet(input_shape=(32, 32, 1),  # MobileNet要求输入的尺寸不小于32，不同模型有不同要求
                       include_top=False,  # 不使用最后一层，因为我们要按照具体的数据来设置最后一层的神经元个数。最后一层是对应一个1000类的分类问题
                       weights=None)  # 不使用预训练的权重

# 添加a global spatial average pooling layer
x = base_model.output
# 取每一个通道的平均值作为输出
x = GlobalAveragePooling2D()(x)
# 添加a fully-connected layer
x = Dense(1024, activation='relu')(x)
# 假设我们接下来使用Fashion-MNIST，有10种输出类别
predictions = Dense(10, activation='softmax')(x)

# 这里没有用到flatten的操作，被替代了。flatten相当于输出串联拼接，然后再通过全连接层输出，而global average pooling是每一层直接取average得到对应的结果
                       
# 我们最终会训练的模型
net = Model(inputs=base_model.input, outputs=predictions)

In [None]:
# 其中GlobalAveragePooling2D与Flatten的差别：
net.summary()

In [None]:
# 训练的过程跟上面类似，注意输入图像的尺寸需要进行转换
batch_size = 8
resize_size = (32, 32, 1)
datagen = dataGenerator(train_images, train_labels, batch_size, resize_size)

optimizer = 'adam'

net.compile(optimizer=optimizer,
            loss='sparse_categorical_crossentropy',
            metrics=['accuracy'])

epochs = 10

net.fit(datagen,
        steps_per_epoch = len(train_images) // batch_size,
        epochs=epochs,
        verbose=1)

# 同样转换测试集的尺寸
batch_size = 8
resize_size = (32, 32, 1)
test_datagen = dataGenerator(test_images, test_labels, batch_size, resize_size)

loss, acc = net.evaluate(test_datagen, steps=len(test_images) // batch_size)
acc

batch_size = 8
resize_size = (32, 32, 1)
test_datagen = dataGenerator(test_images_dig, test_labels_dig, batch_size, resize_size)

loss, acc = net.evaluate(test_datagen, steps=len(test_images) // batch_size)
acc

**Confusion Matrix**

In [None]:
from sklearn.metrics import confusion_matrix

def plot_cm(y_true, y_pred, figsize=(12, 9), fontsize=14):
    cm = confusion_matrix(y_true, y_pred, labels=np.unique(y_true))
    cm_sum = np.sum(cm, axis=1, keepdims=True)
    cm_perc = cm / cm_sum.astype(float) * 100
    annot = np.empty_like(cm).astype(str)
    nrows, ncols = cm.shape
    for i in range(nrows):
        for j in range(ncols):
            c = cm[i, j]
            p = cm_perc[i, j]
            annot[i, j] = '%.1f%%\n%d' % (p, c)
    
    
    cm = pd.DataFrame(cm_perc / 100, index=np.unique(y_true), columns=np.unique(y_true))
    cm.index.name = 'Actual'
    cm.columns.name = 'Predicted'
    fig, ax = plt.subplots(figsize=figsize)
    # sns.heatmap(cm, cmap= "YlGnBu", annot=annot, fmt='', ax=ax)
    sns.heatmap(cm, cmap="Blues", annot=annot, fmt='', ax=ax, annot_kws={"size": fontsize})  # font size

In [None]:
# 预测结果转换为字符串的形式用于绘图
label_mapping = {i:str(i) for i in range(10)}
label_mapping

In [None]:
# 比较麻烦的处理：在生成器中原本的数字的顺序被打乱，也即预测的结果跟原本的真实值顺序不一样，因此真实值也需要通过生成器获得
batch_size = 8
resize_size = (32, 32, 1)
test_datagen = dataGenerator(test_images, test_labels, batch_size, resize_size)
y_pred = []
y_true = []

steps = len(test_images) // batch_size
i = 0
for X_batch, y_batch in test_datagen:
    testPredict = net.predict(X_batch)
    testPredict = np.floor(np.argmax(testPredict, axis=1)).astype(int)
    y_true.extend(list(y_batch))
    y_pred.extend(list(testPredict))
    
    i += 1
    if i % 100 == 0:
        print('Progress: ', i, '/', steps)
    if i > steps:
        break

y_true = [label_mapping[i] for i in y_true]
y_pred = [label_mapping[i] for i in y_pred]

In [None]:
plot_cm(y_true, y_pred)

In [None]:
batch_size = 8
resize_size = (32, 32, 1)
test_datagen = dataGenerator(test_images_dig, test_labels_dig, batch_size, resize_size)
y_pred = []
y_true = []

steps = len(test_images_dig) // batch_size
i = 0
for X_batch, y_batch in test_datagen:
    testPredict = net.predict(X_batch)
    testPredict = np.floor(np.argmax(testPredict, axis=1)).astype(int)
    y_true.extend(list(y_batch))
    y_pred.extend(list(testPredict))
    
    i += 1
    if i % 100 == 0:
        print('Progress: ', i, '/', steps)
    if i > steps:
        break

y_true = [label_mapping[i] for i in y_true]
y_pred = [label_mapping[i] for i in y_pred]       

In [None]:
plot_cm(y_true, y_pred)

- 可以看到有很大比例的0被错误地分类为了9（占了全部0的数字的比例为20.3%），其次是被错误分类为了6（占了全部0的数字的比例为12.0%）。

- 可以按照模版更改为其他keras内置model，更换不同模型一定注意模型输入的要求格式和分类的目标是几类。