# ImageNet Classification with Deep Convolutional Neural Networks

4824-imagenet-classification-with-deep-convolutional-neural-Networks


---
## Architecture

다음은 "4824-imagenet-classification-with-deep-convolutional-neural-Networks" 논문의 일부이다.

### 3.5 Overall Architecture

Now we are ready to describe the overall architecture of our CNN. As depicted in Figure 2, the net contains eight layers with weights; the first five are convolutional and the remaining three are fullyconnected. The output of the last fully-connected layer is fed to a 1000-way softmax which produces a distribution over the 1000 class labels. Our network maximizes the multinomial logistic regression objective, which is equivalent to maximizing the average across training cases of the log-probability of the correct label under the prediction distribution.

The kernels of the second, fourth, and fifth convolutional layers are connected only to those kernel maps in the previous layer which reside on the same GPU (see Figure 2). The kernels of the third convolutional layer are connected to all kernel maps in the second layer. The neurons in the fully-connected layers are connected to all neurons in the previous layer. Response-normalization layers follow the first and second convolutional layers. Max-pooling layers, of the kind described in Section 3.4, follow both response-normalization layers as well as the fifth convolutional layer. The ReLU non-linearity is applied to the output of every convolutional and fully-connected layer.

The first convolutional layer filters the 224×224×3 input image with 96 kernels of size 11×11×3 with a stride of 4 pixels (this is the distance between the receptive field centers of neighboring neurons in a kernel map). The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5 × 5 × 48. The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling or normalization layers. The third convolutional layer has 384 kernels of size 3 × 3 × 256 connected to the (normalized, pooled) outputs of the second convolutional layer. The fourth convolutional layer has 384 kernels of size 3 × 3 × 192 , and the fifth convolutional layer has 256 kernels of size 3 × 3 × 192. The fully-connected layers have 4096 neurons each.

![Figure 2](https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Ft1.daumcdn.net%2Fcfile%2Ftistory%2F99FEB93C5C80B5192E "AlexNet의 구조도")

Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities between the two GPUs. One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, and the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264–4096–4096–1000.

---

위 내용을 토대로 AlexNet의 구성을 정리하면 다음과 같다.

* 총 8개의 층으로 구성
    * x5 Convolutional layers
    * x3 Fully-Connected layers
* 출력은 마지막층에 대한 1000-way softmax 분포.


그 외 참고: [url](https://engmrk.com/alexnet-implementation-using-keras/)


---

## Local Response Normalization

Keras에는 Local Response Normalization 명령이 없어졌다고 한다. [이 곳](https://datascienceschool.net/view-notebook/d19e803640094f76b93f11b850b920a4/)에 구현된 LRN명령을 가져와 사용한다.

참고:
* [Difference between Local Response Normalization and Batch Normalization](https://towardsdatascience.com/difference-between-local-response-normalization-and-batch-normalization-272308c034ac)
* [AlexNet - Data Science School](https://datascienceschool.net/view-notebook/d19e803640094f76b93f11b850b920a4/)

In [1]:
from tensorflow.keras.layers import Layer
from tensorflow.keras import backend

class LocalResponseNormalization(Layer):

    def __init__(self, n=5, alpha=1e-4, beta=0.75, k=2, **kwargs):
        self.n = n
        self.alpha = alpha
        self.beta = beta
        self.k = k
        super(LocalResponseNormalization, self).__init__(**kwargs)

    def build(self, input_shape):
        self.shape = input_shape
        super(LocalResponseNormalization, self).build(input_shape)

    def call(self, x):
        _, r, c, f = self.shape 
        squared = backend.square(x)
        pooled = backend.pool2d(squared, (self.n, self.n), strides=(1,1), padding="same", pool_mode='avg')
        summed = backend.sum(pooled, axis=3, keepdims=True)
        averaged = self.alpha * backend.repeat_elements(summed, f, axis=3)
        denom = backend.pow(self.k + averaged, self.beta)
        return x / denom 
    
    def compute_output_shape(self, input_shape):
        return input_shape 

---

## AlexNet using Keras

In [2]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *

model = Sequential()

### The first convolutional layer

@ 3.5 Overall Architecture
> The first convolutional layer filters the 224×224×3 input image with 96 kernels of size 11×11×3 with a stride of 4 pixels (this is the distance between the receptive field centers of neighboring neurons in a kernel map).

정리:
* input_shape = (224, 224, 3)
* kernel_size = (11, 11, 3)
* kernels = 96
* strides = 4

In [3]:
# The first convolutional layer
model.add(Conv2D(
    input_shape=(224,224,3),
    kernel_size=(11,11),
    filters=96,
    strides=(4,4),
    padding='valid',
    activation='relu'))

### The second convolutional layer

@ 3.5 Overall Architecture
> The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5 × 5 × 48.

정리:
* input_shape = response-normalized, pooled of 1st layer
* kernel_size = (5, 5, 48)
* kernels = 256

In [4]:
# The second convolutional layer
model.add(LocalResponseNormalization())
model.add(MaxPooling2D(
    pool_size=(5,5),
    strides=(1,1),
    padding='valid'))
model.add(Conv2D(
    filters=256,
    kernel_size=(5,5),
    padding='valid',
    activation='relu'))

### The third convolutional layer

@ 3.5 Overall Architecture
> The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling or normalization layers. The third convolutional layer has 384 kernels of size 3 × 3 × 256 connected to the (normalized, pooled) outputs of the second convolutional layer.

정리:
* input_shape = response-normalized, pooled of 2nd layer
* kernel_size = (3, 3, 256)
* kernels = 384

In [5]:
# The third convolutional layer
model.add(LocalResponseNormalization())
model.add(MaxPooling2D(
    pool_size=(3,3),
    strides=(1,1),
    padding='valid'))
model.add(Conv2D(
    filters=384,
    kernel_size=(3,3),
    padding='valid',
    activation='relu'))

### The fourth convolutional layer

@ 3.5 Overall Architecture
> The fourth convolutional layer has 384 kernels of size 3 × 3 × 192 

정리:
* kernel_size = (3, 3, 192)
* kernels = 384

In [6]:
# The fourth convolutional layer
model.add(
    Conv2D(
        filters=384,
        kernel_size=(3,3),
        padding='valid',
        activation='relu'))

### The fifth convolutional layer

@ 3.5 Overall Architecture
> ... , and the fifth convolutional layer has 256 kernels of size 3 × 3 × 192.

정리:
* kernel_size = (3, 3, 192)
* kernels = 256

In [7]:
# The fifth convolutional layer
model.add(Conv2D(
    filters=256,
    kernel_size=(3,3),
    padding='valid',
    activation='relu'))

In [8]:
# Connect between Convolutional layers and Fully-Connected layers
model.add(Flatten())

### Fully-connected layers

@ 3.5 Overall Architecture
>  The fully-connected layers have 4096 neurons each.

@ 4.2 Dropout
> We use dropout in the first two fully-connected layers of Figure 2. Without dropout, our network exhibits substantial overfitting. Dropout roughly doubles the number of iterations required to converge

In [9]:
# 6th Layer: Fully-Connected
model.add(Dense(units=4096, activation='relu'))
model.add(Dropout(rate=0.4))
# 7th Layer: Fully-Connected
model.add(Dense(units=4096, activation='relu'))
model.add(Dropout(rate=0.4))
# 8th Layer: Fully-Connected
model.add(Dense(units=1000, activation='relu'))

In [10]:
# Output Layer
model.add(Dense(units=1000, activation='softmax'))

In [11]:
# End of layers
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy'])
model.summary(line_length=72, positions=[.5, .86, 1., 1.])
model.save('./models/AlexNet.no-division.model')

Model: "sequential"
________________________________________________________________________
Layer (type)                        Output Shape             Param #    
conv2d (Conv2D)                     (None, 54, 54, 96)       34944      
________________________________________________________________________
local_response_normalization (Local (None, 54, 54, 96)       0          
________________________________________________________________________
max_pooling2d (MaxPooling2D)        (None, 50, 50, 96)       0          
________________________________________________________________________
conv2d_1 (Conv2D)                   (None, 46, 46, 256)      614656     
________________________________________________________________________
local_response_normalization_1 (Loc (None, 46, 46, 256)      0          
________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)      (None, 44, 44, 256)      0          
_______________________________