### Reproducing LeNet and AlexNet in Keras

In previous notebooks, I have reproduced example CNN's and simple feed forward NN's on the CIFAR-10 images in Keras. In this notebook, my goal will be to build CNN's with the structure of historically groundbreaking convulutional neural networks. 

The first historical CNN I will reproduce here will have the structure of LeNet:

    INPUT => CONV => RELU => POOL => CONV => RELU => POOL => FC => RELU => FC
    
The second will be AlexNet, the winner of the 2012 ImageNet Large-Scale Visual Recognition Challenge:
    
    INPUT => CONV => RELU => POOL => CONV => RELU => POOL => CONV => RELU => CONV => RELU => CONV => RELU => FC => RELU => FC => RELU => FC => SOFTMAX

#### LeNet

LeNet's structure has existed since 1998, years before CNN's achieved state-of-the-art status recognition in image classification. 

The disparity between the length of time CNN's have existed and how long they have been recognized as state-of-the-art stems from the availability of processing power to actually put the theoretical frameworks to work and begin building deep, complex structures. LeNet, which was functional as early as 1998, simply was not deep enough to be considered state-of-the-art compared to other methods of image classification. 

In [8]:
from keras.layers.normalization import BatchNormalization

In [1]:
%run __init__.py

Using TensorFlow backend.


X_train: (50000, 32, 32, 3), y_train: (50000, 10)
X_test: (10000, 32, 32, 3), y_test: (10000, 10)
Class labels: ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']


In [14]:
def print_summary(model, verbose=False):
    if verbose:
        for l in model.layers:
            print (l.name, l.input_shape,'==>',l.output_shape)
    
    print (model.summary())
    
    return None

In [3]:
model = Sequential()

model.add(Conv2D(32, (3,3), padding='same', input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dense(10))

In [4]:
print_summary(model)

conv2d_1 (None, 32, 32, 3) ==> (None, 32, 32, 32)
activation_1 (None, 32, 32, 32) ==> (None, 32, 32, 32)
max_pooling2d_1 (None, 32, 32, 32) ==> (None, 16, 16, 32)
conv2d_2 (None, 16, 16, 32) ==> (None, 14, 14, 64)
activation_2 (None, 14, 14, 64) ==> (None, 14, 14, 64)
max_pooling2d_2 (None, 14, 14, 64) ==> (None, 7, 7, 64)
flatten_1 (None, 7, 7, 64) ==> (None, 3136)
dense_1 (None, 3136) ==> (None, 512)
activation_3 (None, 512) ==> (None, 512)
dense_2 (None, 512) ==> (None, 10)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 32, 32, 32)        896       
_________________________________________________________________
activation_1 (Activation)    (None, 32, 32, 32)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 32)        0         
________________________________________________________

In [5]:
opt = keras.optimizers.RMSprop(lr=0.0001, decay=1e-6)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

In [6]:
model.fit(X_train, y_train,
              batch_size=32,
              epochs=25,
              validation_data=(X_test, y_test),
              shuffle=True)

Train on 50000 samples, validate on 10000 samples
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.History at 0x7f76cc646da0>

### AlexNet

The following Keras code seeks to implement a model similiar to that used in 2012 called AlexNet. This model was the first CNN to really win when it came to image classification, and hopefully will be able to produce much better results here than I have found so far. 

In [9]:
model = Sequential()

model.add(Conv2D(32, (3,3), padding='same', input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())

model.add(Conv2D(32, (3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())

model.add(Conv2D(64, (3,3), padding='same'))
model.add(Activation('relu'))

model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))

model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(Dense(3072))
model.add(Activation('relu'))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('softmax'))

In [10]:
print_summary(model)

conv2d_4 (None, 32, 32, 3) ==> (None, 32, 32, 32)
activation_5 (None, 32, 32, 32) ==> (None, 32, 32, 32)
max_pooling2d_4 (None, 32, 32, 32) ==> (None, 16, 16, 32)
batch_normalization_1 (None, 16, 16, 32) ==> (None, 16, 16, 32)
conv2d_5 (None, 16, 16, 32) ==> (None, 14, 14, 32)
activation_6 (None, 14, 14, 32) ==> (None, 14, 14, 32)
max_pooling2d_5 (None, 14, 14, 32) ==> (None, 7, 7, 32)
batch_normalization_2 (None, 7, 7, 32) ==> (None, 7, 7, 32)
conv2d_6 (None, 7, 7, 32) ==> (None, 7, 7, 64)
activation_7 (None, 7, 7, 64) ==> (None, 7, 7, 64)
conv2d_7 (None, 7, 7, 64) ==> (None, 5, 5, 64)
activation_8 (None, 5, 5, 64) ==> (None, 5, 5, 64)
conv2d_8 (None, 5, 5, 64) ==> (None, 3, 3, 64)
activation_9 (None, 3, 3, 64) ==> (None, 3, 3, 64)
max_pooling2d_6 (None, 3, 3, 64) ==> (None, 1, 1, 64)
flatten_2 (None, 1, 1, 64) ==> (None, 64)
dense_3 (None, 64) ==> (None, 3072)
activation_10 (None, 3072) ==> (None, 3072)
dense_4 (None, 3072) ==> (None, 512)
activation_11 (None, 512) ==> (None, 512)
de

In [11]:
opt = keras.optimizers.RMSprop(lr=0.0001, decay=1e-6)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

In [12]:
model.fit(X_train, y_train,
              batch_size=32,
              epochs=25,
              validation_data=(X_test, y_test),
              shuffle=True)

Train on 50000 samples, validate on 10000 samples
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.History at 0x7f76783bc978>

In [13]:
model.evaluate(X_test, y_test)



[1.7638811379432677, 0.66490000000000005]

This model appears to begin offering promising results. On training data it performs extremely well, reaching accuracy of .91 and only .23 loss. However, when validation scores are considered, it does not appear to be learning the data better. In fact, val scores peak around Epoch 13, at .67.

Compared to the baseline models I have considered so far, this model is not doing terrible, but there still remains a lot of room for improvement. 

A notable difference between this model and the example model fit in notebook one is the presence of batch normalization layers instead of dropout layers. These layers are both importation layers for preventing overfitting. 

Dropout layers prevent overfitting by randomly resetting a fraction of the inputs to zero at each layer. This helps by reducing the noise - think of it as decreasing the resolution or sharpness of the image - so that the true features that make up a shape can be extracted. For example, human faces are easily recognizable to us. However, no two single humans faces have the same exact shape. By randomly dropping some of the uniqueness of each input, the true features may be extracted. 

As a final addition to this notebook, I will now fit a variation of the AlexNet model that incorporates these dropout layers while removing the normalization layers in an attempt to produce better results. 

In [15]:
model = Sequential()

model.add(Conv2D(32, (3,3), padding='same', input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))

model.add(Conv2D(32, (3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3,3), padding='same'))
model.add(Activation('relu'))

model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))

model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(Dense(3072))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('softmax'))

In [16]:
print_summary(model)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_9 (Conv2D)            (None, 32, 32, 32)        896       
_________________________________________________________________
activation_13 (Activation)   (None, 32, 32, 32)        0         
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 16, 16, 32)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 14, 14, 32)        9248      
_________________________________________________________________
activation_14 (Activation)   (None, 14, 14, 32)        0         
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 7, 7, 32)          0         
__________

In [17]:
opt = keras.optimizers.RMSprop(lr=0.0001, decay=1e-6)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

In [18]:
model.fit(X_train, y_train,
              batch_size=32,
              epochs=25,
              validation_data=(X_test, y_test),
              shuffle=True)

Train on 50000 samples, validate on 10000 samples
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.History at 0x7f76ccae2978>

Adding dropout does help the validation score slightly here, boosting it from .66 in the first iteration of the AlexNet model to just above .70 in this iteration. 

After reading through the paper located here http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf in greater depth, I made a few changes again to the model. Now I incorporate Normalization layers in the first two layers of the model, before applying MaxPooling instead of after. I also adjusted the size of the Fully Connected layers at the end and added to new Dropout .5 layers after each, as was suggested in order to decrease overfitting in the AlexNet published paper.

By the end of the 7 convulutional layers, 64 filters have been created. Each of these 64 filters then connects to each of the fully connected neurons. The neurons learn to give different weightings to each of the 64 shape filters in order to put together ideas about what shapes form together to create certain image classifications. This process happens through two layers, and then final decisions are made. Dropout is introduced in these layers to prevent overfitting in the training model. 

In [19]:
model = Sequential()

model.add(Conv2D(32, (3,3), padding='same', input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(32, (3,3)))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(64, (3,3), padding='same'))
model.add(Activation('relu'))

model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))

model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(Dense(4096))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(4096))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))

In [20]:
print_summary(model)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_14 (Conv2D)           (None, 32, 32, 32)        896       
_________________________________________________________________
activation_21 (Activation)   (None, 32, 32, 32)        0         
_________________________________________________________________
batch_normalization_3 (Batch (None, 32, 32, 32)        128       
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_15 (Conv2D)           (None, 14, 14, 32)        9248      
_________________________________________________________________
activation_22 (Activation)   (None, 14, 14, 32)        0         
_________________________________________________________________
batch_normalization_4 (Batch (None, 14, 14, 32)        128       
__________

In [None]:
opt = keras.optimizers.RMSprop(lr=0.0001, decay=1e-6)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

In [None]:
model.fit(X_train, y_train,
              batch_size=32,
              epochs=25,
              validation_data=(X_test, y_test),
              shuffle=True)