> # CNN Structure

A typical CNN tructure is like under image. 

![CNNstruc.png](attachment:4cef3f84-3bd8-489e-990d-33ed31b51735.png)

Image is getting smaller as passing network. But it is getting deeper because of convolutional layer.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
import warnings ; warnings.filterwarnings("ignore")

In [2]:
fasion_data = keras.datasets.fashion_mnist
(X_train, y_train), (X_test, y_test) = fasion_data.load_data()

In [12]:
X_train_scd, X_valid = (X_train[5000:] / 255.0) , (X_train[:5000] / 255.0)
y_train_scd, y_valid = y_train[5000:] , y_train[:5000]
X_test_scd = X_test / 255.0

In [13]:
model = keras.models.Sequential([
    keras.layers.Conv2D(64, 7, activation="relu", padding="same", input_shape=[28, 28, 1]),
    keras.layers.MaxPooling2D(2),
    keras.layers.Conv2D(128, 3, activation="relu", padding="same"),
    keras.layers.Conv2D(128, 3, activation="relu", padding="same"),
    keras.layers.MaxPooling2D(2),
    keras.layers.Conv2D(256, 3, activation="relu", padding="same"),
    keras.layers.Conv2D(256, 3, activation="relu", padding="same"),
    keras.layers.MaxPooling2D(2),
    keras.layers.Flatten(),
    keras.layers.Dense(128, activation="relu"),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(64, activation="relu"),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(10, activation="softmax")
])

In [14]:
model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

In [15]:
history = model.fit(X_train_scd, y_train_scd, epochs=30, validation_data=(X_valid, y_valid))

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [18]:
model.save("model.h5")

In [19]:
model.evaluate(X_test_scd, y_test)



[0.29053187370300293, 0.8978000283241272]

There are various CNN structure.

#### - LeNet-5

This is used for MNIST recognition problem.

![lenet5.jpg](attachment:f1f5612c-6634-417e-b6e3-03b184ce5532.jpg)

Zero padding is applied to images so 28X28 image becomes 32X32 image. And at output layer, it prints uclid distance between input vector and weight vector. Each output estimates probability of belonging to a class.

#### - AlexNet

This sturucture is simillar with LeNet-5 but more big and deep. And it stacks filter layer on filter layer.

![AlexNet.jpg](attachment:259d3041-33c6-49e0-b503-f61753c61a8c.jpg)

To avoid overfitting, we can consider data augmentation. This altificalartifically generate some train samples through move, rotate or reverse train images. Then model can be less sensitive to object's location, direction or scale change. And we can also increase number of train samples.

And AlexNet uses LRN competitive normalization step. The most activation neuron suppress neuron in same location of different feature map. This feature map be differentiated to other things and search features in wide view. So it increases generalization performance.

$b_i = a_i(k+\alpha \displaystyle\sum^{j_{high}}_{j=j_{low}} {\alpha_j}^2)^{-\beta}$

$j_{high} = min(i+\frac{r}{2}, f_n-1) \quad \quad j_{low} = max(0,i-\frac{r}{2})$

$b_i$ is noramlized print of neuron that is located at $i$th feature map, $u$th row, $v$th column. $\alpha_i$ is neuron's active value after ReLU step before normalization step. $k, \alpha, \beta, r$ are hyperparameter. $k$ is bias and $r$ is depth radius. $f_n$ is number of feature map. If $r=2$ and a neuron is strongly activated, it will suppress neuron's activation in under and above feaure map.

#### - GooGLeNet

This network has inception module subnetwork so it effectively uses parameters. 

![inception.jpg](attachment:ab72f892-4d74-4e33-bebe-fee2ed81e865.jpg)

All layers use 1 stride SAME padding and RELU activation function. Input signal is copied and sent to different 4 layers. Each second layer uses different kernel size and recognize different size pattern. Then we can connect all outputs in depth-wise at depth concatenation layer.



![Googlenet.jpg](attachment:425b569b-0bd5-4675-b09d-b2c769ddc0e9.jpg)

First 2 layers reduce image's height and width in 4 times. LRN layer helps previous layer to learn various feature. In next two layers, first layer operates like bottleneck. And maxpooling layer reduces image's height and width in half. And 9 inception module contiues. 2 maxpooling layer are put between them to exceed calculation speed and dimension reduction. And AngPool layer prints average of each feature map. Next dropout for regularizaton, 1000s unit and softmax print probability of belonging to each class.

#### - ResNet

Core element of this network is skip connection. A signal that is injected to a layer is added to output of high level layer.

Purpose of training neural network is to modeling a function $h(\mathbf{x})$. If we add input $\mathbf{x}$ network will learn $f(\mathbf{x})=h(\mathbf{x})-\mathbf{x}$ instead of $f(\mathbf{x})$. This is residual learning.

When we initialize general neural network, network print 0 because weights are almost 0. If we add skip connection, this network will print a value same with input. So, it is an equivalent function. and if we add much skip connection, although some layers are not trained, network can start train. Deep residual network can be seen like a stack of residual unit which is a small network with skip connection.

![RI.png](attachment:77e1a734-b5fb-4b69-9bf9-f0f297d61995.png)

This structure starts and ends same with GoogLeNet. Each residuala unit uses batch normalization and ReLU, 3X3 kernel and there are two convolution layer that remain space information. Number of feature map increases 2 times each some residual unit, height and width becomes half. then input size is different to output so it can't be added to output of residual unit. To overcome the problem, we pass input through 1x1 convolution whose stride is 1 and has same feature map number.

![resnet.png](attachment:813977be-ac82-4847-a03b-674ff99818c9.png)

#### - Xception

This idea combines idea of GoogLeNet and ResNet. But inception module is substituted by depthwise separable convolution layer. This layer supposes we can modeling by seperating spatial pattern and channel pattern. First, apply a spatial filter to eaxh input feature map. Next investigate channel pattern. We have to avoid to use it next to layer with few channel like input layer bcecause it has one spatial filter at each input channel. So Xception starts with 2 general convilution layer. Left are spatial convolution layer.

#### - SENet

SENet increases its performance by adding SE block at all units in original structure.

![SE.png](attachment:b8bb59bc-e555-4f62-a97b-d95f4a7e95fa.png)