# ImageNet Classification with Deep Convolutional Neural Networks

4824-imagenet-classification-with-deep-convolutional-neural-Networks


## Architecture

---

다음은 "4824-imagenet-classification-with-deep-convolutional-neural-Networks" 논문의 일부이다.

### 3.5 Overall Architecture

Now we are ready to describe the overall architecture of our CNN. As depicted in Figure 2, the net contains eight layers with weights; the first five are convolutional and the remaining three are fullyconnected. The output of the last fully-connected layer is fed to a 1000-way softmax which produces a distribution over the 1000 class labels. Our network maximizes the multinomial logistic regression objective, which is equivalent to maximizing the average across training cases of the log-probability of the correct label under the prediction distribution.

The kernels of the second, fourth, and fifth convolutional layers are connected only to those kernel maps in the previous layer which reside on the same GPU (see Figure 2). The kernels of the third convolutional layer are connected to all kernel maps in the second layer. The neurons in the fully-connected layers are connected to all neurons in the previous layer. Response-normalization layers follow the first and second convolutional layers. Max-pooling layers, of the kind described in Section 3.4, follow both response-normalization layers as well as the fifth convolutional layer. The ReLU non-linearity is applied to the output of every convolutional and fully-connected layer.

The first convolutional layer filters the 224×224×3 input image with 96 kernels of size 11×11×3 with a stride of 4 pixels (this is the distance between the receptive field centers of neighboring neurons in a kernel map). The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5 × 5 × 48. The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling or normalization layers. The third convolutional layer has 384 kernels of size 3 × 3 × 256 connected to the (normalized, pooled) outputs of the second convolutional layer. The fourth convolutional layer has 384 kernels of size 3 × 3 × 192 , and the fifth convolutional layer has 256 kernels of size 3 × 3 × 192. The fully-connected layers have 4096 neurons each.

![Figure 2](https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Ft1.daumcdn.net%2Fcfile%2Ftistory%2F99FEB93C5C80B5192E "AlexNet의 구조도")

Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities between the two GPUs. One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, and the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264–4096–4096–1000.

---

위 내용을 토대로 AlexNet의 구성을 정리하면 다음과 같다.

* 총 8개의 층으로 구성
    * x5 Convolutional layers
    * x3 Fully-Connected layers
* 출력은 마지막층에 대한 1000-way softmax 분포.


그 외 참고: [url](https://engmrk.com/alexnet-implementation-using-keras/)


* `strides`: convolution kernel의 보폭
* `padding`: 
* ``

In [6]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *

model = Sequential([
    # 1st Layer: Convolutional
    Conv2D(
        filters=96,
        input_shape=(224,224,3),
        kernel_size=(11,11),
        strides=(4,4),
        padding='valid',
        activation='relu'),
    MaxPooling2D(
        pool_size=(2,2),
        strides=(2,2),
        padding='valid'),

    # 2nd Layer: Convolutional
    Conv2D(
        filters=256,
        kernel_size=(11,11),
        strides=(1,1),
        padding='valid',
        activation='relu'),
    MaxPooling2D(
        pool_size=(2,2),
        strides=(2,2),
        padding='valid'),

    # 3rd Layer: Convolutional
    Conv2D(
        filters=384,
        kernel_size=(3,3),
        strides=(1,1),
        padding='valid',
        activation='relu'),

    # 4th Layer: Convolutional
    Conv2D(
        filters=384,
        kernel_size=(3,3),
        strides=(1,1),
        padding='valid',
        activation='relu'),

    # 5th Layer: Convolutional
    Conv2D(
        filters=256,
        kernel_size=(3,3),
        strides=(1,1),
        padding='valid',
        activation='relu'),
    MaxPooling2D(
        pool_size=(2,2),
        strides=(2,2),
        padding='valid'),

    # Connect between Convolutional layers and Fully-Connected layers
    Flatten(),
    
    # 6th Layer: Fully-Connected
    Dense(
        units=4096,
        activation='relu'),
    Dropout(
        0.4),
    
    # 7th Layer: Fully-Connected
    Dense(
        4096,
        activation='relu'),
    Dropout(
        0.4),

    # 8th Layer: Fully-Connected
    Dense(
        1000,
        activation='relu'),
    Dropout(
        0.4),

    # Output Layer
    Dense(1000, activation='softmax')
])

model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

model.summary()
model.save('./models/AlexNet.no-division.model')

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_5 (Conv2D)            (None, 54, 54, 96)        34944     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 27, 27, 96)        0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 17, 17, 256)       2973952   
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 8, 8, 256)         0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 6, 6, 384)         885120    
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 4, 4, 384)         1327488   
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 2, 2, 256)        

---

# CIFAR-100

이번에는 위의 내용을 응용하여 AlexNet의 변형 모델을 이용한 CIFAR-100의 분류기를 생성합니다.

CIFAR-100 입력 이미지의 shape는 (32, 32, 3)이므로, 이에 맞게 모델의 커널 크기와 필터 수를 조절합니다.

| shape | ImageNet | CIFAR-100 |
| :-- | :-: | :-: |
| input | 224x224x3 | 32x32x3 |
|  | (150,528) | (3,072) |
| layer1 | 55x55x96 |  |
|  |  |  |


In [3]:
from tensorflow.keras.datasets import cifar100

(x_train, y_train), (x_test, y_test) = cifar100.load_data()
print(x_train[0].shape)

(32, 32, 3)


In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *

model = Sequential([
    # 1st Layer: Convolutional
    Conv2D(
        input_shape=(32,32,3),
        kernel_size=(7,7),
        strides=(2,2),
        filters=96,
        padding='valid',
        activation='relu'),
    MaxPooling2D(
        pool_size=(2,2),
        strides=(2,2),
        padding='valid'),
    # 2nd Layer: Convolutional
    Conv2D(
        kernel_size=(7,7),
        strides=(1,1),
        filters=256,
        padding='valid',
        activation='relu'),
    MaxPooling2D(
        pool_size=(2,2),
        strides=(2,2),
        padding='valid'),
    # 3rd Layer: Convolutional
    Conv2D(
        filters=384,
        kernel_size=(3,3),
        strides=(1,1),
        padding='valid',
        activation='relu'),
    # 4th Layer: Convolutional
    Conv2D(
        filters=384,
        kernel_size=(3,3),
        strides=(1,1),
        padding='valid',
        activation='relu'),

    # 5th Layer: Convolutional
    Conv2D(
        filters=256,
        kernel_size=(3,3),
        strides=(1,1),
        padding='valid',
        activation='relu'),
    MaxPooling2D(
        pool_size=(2,2),
        strides=(2,2),
        padding='valid'),
    # Connect between Convolutional layers and Fully-Connected layers
    Flatten(),
    # 6th Layer: Fully-Connected
    Dense(
        units=4096,
        activation='relu'),
    Dropout(
        0.4),
    # 7th Layer: Fully-Connected
    Dense(
        4096,
        activation='relu'),
    Dropout(
        0.4),
    # 8th Layer: Fully-Connected
    Dense(
        1000,
        activation='relu'),
    Dropout(
        0.4),
    # Output Layer
    Dense(1000, activation='softmax')
])
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy'])
model.summary()
model.save('./models/AlexNet.CIFAR-100.model')

from tensorflow.keras.datasets import cifar100

(x_train, y_train), (x_test, y_test) = cifar100.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

