# Introduction for classic CNNs
1. [CNN](#CNN)
2. [LeNet](#LeNet)
2. [AlexNet](#AlexNet)
3. [Network in Network](#Network-in-Network)
4. [VGGNet](#VGGNet)
5. [SPPNet](#SPPNet)
6. [Hands-on](#Hands-on)
7. [Reference](#Reference)


## CNN
 - [深度學習：CNN原理](https://medium.com/@CinnamonAITaiwan/%E6%B7%B1%E5%BA%A6%E5%AD%B8%E7%BF%92-cnn%E5%8E%9F%E7%90%86-keras%E5%AF%A6%E7%8F%BE-432fd9ea4935)

## LeNet
[_"Gradient-based learning applied to document recognition"_](https://ieeexplore.ieee.org/document/726791)
LeCun, Yann, et al. Proceedings of the IEEE 86.11 (1998): 2278-2324.
- Pattern Recognition
- Case study: handwritten character recognition 


![lenet_1](https://drive.google.com/uc?export=view&id=1hGXdbMVvqZdBXj88IbEIGpUmKcrPsdg1)
![lenet_2](https://drive.google.com/uc?export=view&id=166V2AbS4rNSkDVhPNud1p1aOCUyqLP80)

### Model Architecture
#### Overview
![lenet_3](https://drive.google.com/uc?export=view&id=1VgwvElx42Z8hvdpnMw-Dpo6LzkCBHOKX)
#### Feature map between S2 and C3
- Each Column Indicates Which Feature Map in **S2** Are Combined by the Units in a Particular Feature Map of **C3**
![lenet_4](https://drive.google.com/uc?export=view&id=1sV4A_Wcke56jOqWV7SpEJJ5oUai212U_)

### Demo
![lenet_5](https://drive.google.com/uc?export=view&id=13cDxc-Ewekqnk6G1JmVN8a-WTx2B1sfR)



In [1]:
from keras.models import Sequential
from keras.layers import Dense, Activation, Conv2D, MaxPooling2D, Flatten

#Build LetNet model with Keras
def LetNet(width, height, depth, classes):
    # initialize the model
    model = Sequential()

    # first layer, convolution and pooling
    model.add(Conv2D(input_shape=(width, height, depth), kernel_size=(5, 5), filters=6, strides=(1,1), activation='tanh'))
    model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

    # second layer, convolution and pooling
    model.add(Conv2D(input_shape=(width, height, depth), kernel_size=(5, 5), filters=16, strides=(1,1), activation='tanh'))
    model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

    # Fully connection layer
    model.add(Flatten())
    model.add(Dense(120,activation = 'tanh'))
    model.add(Dense(84,activation = 'tanh'))

    # softmax classifier
    model.add(Dense(classes))
    model.add(Activation("softmax"))

    return model

LetNet_model = LetNet(224, 224, 3, 100)
LetNet_model.summary()

Using TensorFlow backend.


Instructions for updating:
Colocations handled automatically by placer.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 220, 220, 6)       456       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 110, 110, 6)       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 106, 106, 16)      2416      
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 53, 53, 16)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 44944)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 120)               5393400   
_________________________________________________________________
dens

---

## AlexNet
[_"Imagenet classification with deep convolutional neural networks."_](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. Advances in neural information processing systems. 2012.

- To learn about thousands of objects from millions of images, we need a model with a large learning capacity 
- Highly-optimized GPU implementation of 2D convolution

### Model Architecture
#### Overview
- 8-layers: 5 Conv + 3 FC
- The output of the last fc layer is fed to a 1000-way softmax (1000 classes)
- The kernels of the 3rd convolutional layer are connected to all kernel maps in the 2nd layer.
![alexnet_1](https://drive.google.com/uc?export=view&id=1zfo4ehBmgfs7zxoa6YmpATAwV3L7q1kS)

#### tanh vs ReLU
- A four-layer convolutional neural network with _ReLUs_ (solid line) <br>
reaches a 25% training error rate on CIFAR-10 six times faster than an equivalent network with _tanh_ neurons (dashed line).

![alexnet_2](https://drive.google.com/uc?export=view&id=1-Lj2xRBcdLM7RgjvHKHw_4HbpzdwvmpB)

### Training Detail
- GTX 580 *2,  3GB of memory, 1.2 million training examples
- Local Response Normalization
    - reduces our top-1 and top-5 error rates by 1.4% and 1.2% 
- Overlapping Pooling 
    - size = 3, stride = 2 
    - reduces the top-1 and top-5 error rates by 0.4% and 0.3%, respectively 
- Stochastic gradient descent
    - batch size of 128, momentum of 0.9, and weight decay of 0.0005.
- Data Augmentation
    - Generating image translations (crop) and horizontal reflections 
    - Extracting random 224 x 224 patches from the 256 x 256 images 
    - Altering the intensities of the RGB channels in training images 
- Dropout (0.5) in the first 2 fc layers

#### Local Response Normalization
![alexnet_3](https://drive.google.com/uc?export=view&id=1MgnMHontPOOkp7Kh7vf1rUu1g0yHrSWK)
![alexnet_4](https://drive.google.com/uc?export=view&id=1do116Pe5bTmY2m2J9rpeVXqxgc2MLnGR)

### Results
- ILSVRC-2010
![alexnet_5](https://drive.google.com/uc?export=view&id=1RgdRVPzvuyF9VL1cYZuqGPOTBvYPBIXF)
- ILSVRC-2012
![alexnet_6](https://drive.google.com/uc?export=view&id=1t7hxnNCT6BNc1iwoxG-ZfAAXQCHPaxWO)
![alexnet_7](https://drive.google.com/uc?export=view&id=1zTu4uj3uE86DhBE0RKKq54396RrVVN9S)

In [2]:
from keras.layers import Dropout

#Build AlexNet model
def AlexNet(width, height, depth, classes):
    
    model = Sequential()
    
    #First Convolution and Pooling layer
    model.add(Conv2D(96,(11,11),strides=(4,4),input_shape=(width,height,depth),padding='valid',activation='relu'))
    model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
    
    #Second Convolution and Pooling layer
    model.add(Conv2D(256,(5,5),strides=(1,1),padding='same',activation='relu'))
    model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
    
    #Three Convolution layer and Pooling Layer
    model.add(Conv2D(384,(3,3),strides=(1,1),padding='same',activation='relu'))
    model.add(Conv2D(384,(3,3),strides=(1,1),padding='same',activation='relu'))
    model.add(Conv2D(256,(3,3),strides=(1,1),padding='same',activation='relu'))
    model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
    
    #Fully connection layer
    model.add(Flatten())
    model.add(Dense(4096,activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(4096,activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(1000,activation='relu'))
    model.add(Dropout(0.5))
    
    #Classfication layer
    model.add(Dense(classes,activation='softmax'))

    return model
  
AlexNet_model = AlexNet(224, 224, 3, 100)
AlexNet_model.summary()

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_3 (Conv2D)            (None, 54, 54, 96)        34944     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 26, 26, 96)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 26, 26, 256)       614656    
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 12, 12, 256)       0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 12, 12, 384)       885120    
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 12, 12, 384)       1327488   
___________________________

---

## Network in Network
[_"Network in network."_](https://arxiv.org/abs/1312.4400.pdf) Lin, Min, Qiang Chen, and Shuicheng Yan.  arXiv preprint arXiv:1312.4400 (2013). 

- Enhance model discriminability for local patches within the receptive field 

### Model Architecture
#### Overview
![nin_2](https://drive.google.com/uc?export=view&id=1wNbDEBXHQYIvU8RzKuHTrO1NtYT0ggpC)

#### mlpconv
![nin_1](https://drive.google.com/uc?export=view&id=19rednh6LDUSCaC2ELTYOHchHlIYhn2M_)

#### Global Average Pooling
- Generate one feature map for each corresponding category
- Take the average of each feature map, and the resulting vector is fed directly into the softmax layer


### Training
#### The regularization effect of dropout in between mlpconv layers
![nin_5](https://drive.google.com/uc?export=view&id=1HZxYpBsbFEN22iol338Ay7wPj1TV_2BP)

### Result - CIFAR-10
![nin_3](https://drive.google.com/uc?export=view&id=1B0fW_Om5n_y2OZxu_gTpH0kbq8EF96hQ)
![nin_4](https://drive.google.com/uc?export=view&id=12Y2w95CO0bKU3bDmFdiCiX9aAy9sBpMz)

### Result - CIFAR-100
![nin_6](https://drive.google.com/uc?export=view&id=1iIjMqW_c5qShrbcML3SKPML3QT1LZmZj)

### Result - SVHN 
![nin_7](https://drive.google.com/uc?export=view&id=1678sieIHCe4YpeD2BOx3SpTTxp43Jyud)
![nin_8](https://drive.google.com/uc?export=view&id=1KuW276jAlfsQLmp17HkMX6D890I6OSi9)

### Result - MNIST 
![nin_9](https://drive.google.com/uc?export=view&id=1JXRnNIB7rFTsz_85fDVMPzLsYXp722W6)

### Global Average Pooling vs fc layer
![nin_10](https://drive.google.com/uc?export=view&id=1MHhYiV5eQcP4louHuY4LiVPE2dyOC7Sn)

### Visualization
![nin_11](https://drive.google.com/uc?export=view&id=1fYBBRRvuH-pY_UyMJlXcUSskkG6iN006)


In [3]:
from keras.layers import GlobalAveragePooling2D, BatchNormalization, MaxPool2D
from keras.regularizers import l2

weight_decay = 1e-6

def mlpconv(model, conv_config):
    
    w, h, c = conv_config['conv2d_0_dim']
    if 'input_shape' in conv_config:
        model.add(Conv2D(c, (w, h), padding='same',
                         kernel_regularizer=l2(weight_decay),
                         kernel_initializer='he_normal',
                         input_shape=conv_config['input_shape']))
    else:
        model.add(Conv2D(c, (w, h), padding='same',
                         kernel_regularizer=l2(weight_decay),
                         kernel_initializer='he_normal'))

    model.add(BatchNormalization())
    model.add(Activation('relu'))

    w, h, c = conv_config['conv2d_1_dim']
    model.add(Conv2D(c, (w, h), padding='same',
                     kernel_regularizer=l2(weight_decay),
                     kernel_initializer='he_normal'))
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    
    w, h, c = conv_config['conv2d_2_dim']    
    model.add(Conv2D(c, (w, h), padding='same',
                     kernel_regularizer=l2(weight_decay),
                     kernel_initializer='he_normal'))
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    
    model.add(MaxPool2D(pool_size=(3,3),strides=(2,2),padding='same'))
    model.add(Dropout(0.2))
    return model
    
def NIN(width, height, depth, classes):
    model = Sequential()
    
    conv_config_0 = {
        'input_shape': (width, height, depth),  # GLM
        'conv2d_0_dim': (5, 5, 192),  # GLM
        'conv2d_1_dim': (1, 1, 192),  # 1 X 1
        'conv2d_2_dim': (1, 1, 96),  # 1 X 1
    }
    conv_config_1 = {
        'conv2d_0_dim': (5, 5, 192),  # GLM
        'conv2d_1_dim': (1, 1, 192),  # 1 X 1
        'conv2d_2_dim': (1, 1, 96),  # 1 X 1
    }
    conv_config_2 = {
        'conv2d_0_dim': (5, 5, 192),  # GLM
        'conv2d_1_dim': (1, 1, 192),  # 1 X 1
        'conv2d_2_dim': (1, 1, classes),  # 1 X 1
    }
    model = mlpconv(model, conv_config_0)
    model = mlpconv(model, conv_config_1)
    model = mlpconv(model, conv_config_2)
    
    model.add(GlobalAveragePooling2D())
    model.add(Activation('softmax'))
    return model

NIN_model = NIN(224, 224, 3, 100)
NIN_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_8 (Conv2D)            (None, 224, 224, 192)     14592     
_________________________________________________________________
batch_normalization_1 (Batch (None, 224, 224, 192)     768       
_________________________________________________________________
activation_2 (Activation)    (None, 224, 224, 192)     0         
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 224, 224, 192)     37056     
_________________________________________________________________
batch_normalization_2 (Batch (None, 224, 224, 192)     768       
_________________________________________________________________
activation_3 (Activation)    (None, 224, 224, 192)     0         
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 224, 224, 96)      18528     
__________

---

## VGGNet
[_"Very deep convolutional networks for large-scale image recognition."_](https://arxiv.org/abs/1409.1556/) Simonyan, Karen, and Andrew Zisserman. arXiv preprint arXiv:1409.1556(2015).

- Increasing **depth** using an architecture with very **small** (3×3) convolution filters

### 3x3 Filters
- 1 layer of 5×5 filter, number of parameters = 5×5=25
- 2 layers of 3×3 filters, number of parameters = 3×3+3×3=18

    **Number of parameters is reduced by 28%**

![vgg_1](https://drive.google.com/uc?export=view&id=1N4RqAWIgd4wwnH28w5PHEFwIjn_ttomk)

### Model Architecture
![vgg_2](https://drive.google.com/uc?export=view&id=1h6cMk3K35BHHaXiByz2ElxX8rQLlTp2r)

### Training
- Batch size = 256
- Gradient descent with momentum = 0.9
- Weight decay ( L2 penalty multiplier set to 5e−4)
- Dropout = 0.5, for first 2 fc
- Learning rate = 0.01, decreased by a factor of 10 when val accuracy stops improving

#### Data
- Crop size = 224 x 224
- Single-scale training:
    - fixed ‘S’ = 256 or 384, ‘S’: the smallest side of an image
- Multi-scale training:
    - ‘S’ in range [Smin, Smax], Smin = 256, Smax = 512
- Pre-train multi-scale models by a single-scale model with the same configuration, with ‘S’ = 384 .

### Results
![vgg_3](https://drive.google.com/uc?export=view&id=15md49vDtOx2-6wOvdMW9Uldti6GRJui4)
![vgg_4](https://drive.google.com/uc?export=view&id=1UpKl1AqUWJ-OhsozI__Y-Z1vV_Irmy9b)
- LRN does not improve on Model-A without any normalization layers
- Classification error decreases with increased Conv depth

![vgg_4](https://drive.google.com/uc?export=view&id=1sESpvs676Db6UBbSErtOUAax9buOIIYs)


In [4]:
def VGG16Net(width, height, depth, classes):
    
    model = Sequential()
    
    model.add(Conv2D(64,(3,3),strides=(1,1),input_shape=(224,224,3),padding='same',activation='relu'))
    model.add(Conv2D(64,(3,3),strides=(1,1),padding='same',activation='relu'))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Conv2D(128,(3,2),strides=(1,1),padding='same',activation='relu'))
    model.add(Conv2D(128,(3,3),strides=(1,1),padding='same',activation='relu'))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Conv2D(256,(3,3),strides=(1,1),padding='same',activation='relu'))
    model.add(Conv2D(256,(3,3),strides=(1,1),padding='same',activation='relu'))
    model.add(Conv2D(256,(3,3),strides=(1,1),padding='same',activation='relu'))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Conv2D(512,(3,3),strides=(1,1),padding='same',activation='relu'))
    model.add(Conv2D(512,(3,3),strides=(1,1),padding='same',activation='relu'))
    model.add(Conv2D(512,(3,3),strides=(1,1),padding='same',activation='relu'))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Conv2D(512,(3,3),strides=(1,1),padding='same',activation='relu'))
    model.add(Conv2D(512,(3,3),strides=(1,1),padding='same',activation='relu'))
    model.add(Conv2D(512,(3,3),strides=(1,1),padding='same',activation='relu'))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Flatten())
    model.add(Dense(4096,activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(4096,activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(1000,activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(17,activation='softmax'))
    
    return model
  
VGG16_model = VGG16Net(224, 224, 3, 100)
VGG16_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_17 (Conv2D)           (None, 224, 224, 64)      1792      
_________________________________________________________________
conv2d_18 (Conv2D)           (None, 224, 224, 64)      36928     
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 112, 112, 64)      0         
_________________________________________________________________
conv2d_19 (Conv2D)           (None, 112, 112, 128)     49280     
_________________________________________________________________
conv2d_20 (Conv2D)           (None, 112, 112, 128)     147584    
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 56, 56, 128)       0         
_________________________________________________________________
conv2d_21 (Conv2D)           (None, 56, 56, 256)       295168    
__________

---

## SPPNet
[_"Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition"_](https://arxiv.org/abs/1406.4729) He, K., Zhang, X., Ren, S. and Sun, J. arXiv:1406.472(2014).

- Cons: Fixed input image size (e.g., 224×224), which limits both the aspect ratio and the scale of the input image
- SPP-net generates a fixed-length representation regardless of image size/scale

![spp_1](https://drive.google.com/uc?export=view&id=1MP-4L3v8EeV8UnXiTE2gUPhCbi-4uaO2)
<!--
![spp_2](https://drive.google.com/uc?export=view&id=1z77Pu-oYdsRSMAA-jAz8T6Q8vLNGSAs7)
--->

### Model Architecture
![spp_3](https://drive.google.com/uc?export=view&id=1Mg7cmrPz3Lubn9aMtQAb5Z78G6dgV1Xa)

### Training
- Single-size training
    - fixed-size input (224×224) cropped from images
    - compute the bin sizes first

- Multi-size training
    - consider two sizes: 180×180 in addition to 224×224
    - train each full epoch on one network, and then switch to the other one (keeping all weights)
    - resize instead of crop
    
- Data Augmentation
    - Horizontal flipping
    - Color altering
    - Dropout on 2 fc
    
- **NOTE** <br>
The above single/multi-size solutions are for training only. <br>
At the testing stage, it is straightforward to apply SPP-net on images of any sizes.

### Results
- ImageNet 2012 (standard 10-view)
![spp_4](https://drive.google.com/uc?export=view&id=1N1y-9NhJ3fjCa5W0jbIU4ChxwXhXMDPc)

- ImageNet 2012 (single view)
![spp_5](https://drive.google.com/uc?export=view&id=1YtlIB1zun2yI94BgoORxz-RbbY1Ic7M3)

- ImageNet 2012 (multi-view)
![spp_6](https://drive.google.com/uc?export=view&id=1uDO6EJGKZmLgy6Kr2ftrs5AWOZ7WZQRi)

- ImageNet 2014 :(
![spp_6](https://drive.google.com/uc?export=view&id=1Jx7otVjOm0LyTdbEgpEwkJsIZUsdYaC7)


In [5]:
import numpy as np
import keras.backend as K

def SPP(x, pool_list=[1, 2, 4]):
    x = K.variable(x)
    outputs = []
    input_shape = K.shape(x)
    num_batch, num_rows, num_cols, depth = input_shape[0], input_shape[1], input_shape[2], input_shape[3]
    row_length = [K.cast(num_rows, 'float32') / i for i in pool_list]
    col_length = [K.cast(num_cols, 'float32') / i for i in pool_list]
        
    for pool_num, num_pool_regions in enumerate(pool_list):
        for jy in range(num_pool_regions):
            for ix in range(num_pool_regions):
                x1 = ix * col_length[pool_num]
                x2 = ix * col_length[pool_num] + col_length[pool_num]
                y1 = jy * row_length[pool_num]
                y2 = jy * row_length[pool_num] + row_length[pool_num]

                x1 = K.cast(K.round(x1), 'int32')
                x2 = K.cast(K.round(x2), 'int32')
                y1 = K.cast(K.round(y1), 'int32')
                y2 = K.cast(K.round(y2), 'int32')

                new_shape = [num_batch, y2 - y1, x2 - x1, depth]
                x_crop = x[:, y1:y2, x1:x2, :]

                xm = K.reshape(x_crop, new_shape)
                pooled_val = K.max(xm, axis=(1, 2))
                outputs.append(pooled_val)

    outputs = K.concatenate(outputs)
    return outputs


print(SPP(np.random.rand(1, 224, 224, 1), pool_list=[1, 2, 4]))
print(SPP(np.random.rand(1, 164, 164, 1), pool_list=[1, 2, 4]))

Tensor("concat:0", shape=(1, 21), dtype=float32)
Tensor("concat_1:0", shape=(1, 21), dtype=float32)


---

## Hands-on

---

## Reference
> [[機器學習 ML NOTE] CNN演化史(AlexNet、VGG、Inception、ResNet)+Keras Coding]( https://medium.com/%E9%9B%9E%E9%9B%9E%E8%88%87%E5%85%94%E5%85%94%E7%9A%84%E5%B7%A5%E7%A8%8B%E4%B8%96%E7%95%8C/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-ml-note-cnn%E6%BC%94%E5%8C%96%E5%8F%B2-alexnet-vgg-inception-resnet-keras-coding-668f74879306)

> [LeNet-5, convolutional neural networks](http://yann.lecun.com/exdb/lenet/)

> https://github.com/yhenon/keras-spp