### Reproducing LeNet and AlexNet in Keras

In previous notebooks, I have reproduced example CNN's and simple feed forward NN's on the CIFAR-10 images in Keras. In this notebook, my goal will be to build CNN's with the structure of historically groundbreaking convulutional neural networks. 

The first historical CNN I will reproduce here will have the structure of LeNet:

    INPUT => CONV => RELU => POOL => CONV => RELU => POOL => FC => RELU => FC
    
The second will be AlexNet, the winner of the 2012 ImageNet Large-Scale Visual Recognition Challenge:
    
    INPUT => CONV => RELU => POOL => CONV => RELU => POOL => CONV => RELU => CONV => RELU => CONV => RELU => FC => RELU => FC => RELU => FC => SOFTMAX

I will also implement TensorBoard functionality in this notebook and use it to track my model performance. 

#### LeNet

LeNet's structure has existed since 1998, years before CNN's achieved state-of-the-art status recognition in image classification. 

The disparity between the length of time CNN's have existed and how long they have been recognized as state-of-the-art stems from the availability of processing power to actually put the theoretical frameworks to work and begin building deep, complex structures. LeNet, which was functional as early as 1998, simply was not deep enough to be considered state-of-the-art compared to other methods of image classification. 

In [1]:
%run __initremote__.py

Using TensorFlow backend.


x_train shape: (50000, 32, 32, 3)
50000 train samples
10000 test samples


In [2]:
early_stop = keras.callbacks.EarlyStopping(monitor='val_acc', 
                                           min_delta=0, 
                                           patience=5, 
                                           verbose=0, 
                                           mode='auto')

tb = TensorBoard(log_dir='./logs/', embeddings_layer_names='emb')

In [32]:
model = Sequential()

model.add(Conv2D(32, (3,3), padding='same', input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('softmax'))

In [33]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_27 (Conv2D)           (None, 32, 32, 32)        896       
_________________________________________________________________
activation_38 (Activation)   (None, 32, 32, 32)        0         
_________________________________________________________________
max_pooling2d_12 (MaxPooling (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_28 (Conv2D)           (None, 14, 14, 64)        18496     
_________________________________________________________________
activation_39 (Activation)   (None, 14, 14, 64)        0         
_________________________________________________________________
max_pooling2d_13 (MaxPooling (None, 7, 7, 64)          0         
_________________________________________________________________
flatten_10 (Flatten)         (None, 3136)              0         
__________

In [34]:
opt = keras.optimizers.RMSprop(lr=0.0001, decay=1e-6)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

In [35]:
lenet_hist = model.fit(x_train, y_train,
              batch_size=32,
              epochs=100,
              validation_data=(x_test, y_test),
              callbacks=[early_stop, tb],
              shuffle=True)

Train on 50000 samples, validate on 10000 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100

IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)



In [39]:
model.evaluate(x_test, y_test)



[1.3201976045608521, 0.69359999999999999]

In terms of today's Neural Network architectures, LeNet appears ancient. It lacks layers such as Dropout and Normalization that aid in preventing overfitting. Clearly, in this case the network is indeed overfitting, as we see 91% train accuracy and less than 70% test accuracy. It may also need adjustments to the number of MaxPooling layers, filter size, and padding. 

Still, on a complex problem such as image classification, it does reasonably well. Test accuracy tops out at 69%, considerably better than the best fully connected neural networks of the previous notebook. By adding just two Convulutional 2D layers, an increase of 19% accuracy was achieved. 

Let's see if investigating LeNet's simple architecture, adding some dropout, and ensuring that the image size is not dropping to much could have a positive impact on this networks ability to classify images. 

In [4]:
model = Sequential()

model.add(Conv2D(32, (5,5), padding='same', input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(.25))

model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))

model.add(Flatten())
model.add(Dense(512))
model.add(Dropout(.5))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('softmax'))

In [5]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 32, 32, 32)        2432      
_________________________________________________________________
activation_1 (Activation)    (None, 32, 32, 32)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 32)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 14, 14, 64)        18496     
_________________________________________________________________
activation_2 (Activation)    (None, 14, 14, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 12544)             0         
__________

In [6]:
opt = keras.optimizers.RMSprop(lr=0.0001, decay=1e-6)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

lenet_hist_2 = model.fit(x_train, y_train,
              batch_size=32,
              epochs=100,
              validation_data=(x_test, y_test),
              callbacks=[early_stop, tb],
              shuffle=True)

Train on 50000 samples, validate on 10000 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100


We are not yet seeing any increase in performace. In fact, adding these more complex features on a relatively simple network hurt its overall validation score. Let's try adding just one more Conv2D layer and see what happens.

In [5]:
model = Sequential()

model.add(Conv2D(32, (5,5), padding='same', input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(.25))

model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))

model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))

model.add(Flatten())
model.add(Dense(512))
model.add(Dropout(.5))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('softmax'))
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_4 (Conv2D)            (None, 32, 32, 32)        2432      
_________________________________________________________________
activation_6 (Activation)    (None, 32, 32, 32)        0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 16, 16, 32)        0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 14, 14, 64)        18496     
_________________________________________________________________
activation_7 (Activation)    (None, 14, 14, 64)        0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 12, 12, 64)        36928     
__________

In [6]:
opt = keras.optimizers.RMSprop(lr=0.0001, decay=1e-6)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

lenet_hist_3 = model.fit(x_train, y_train,
              batch_size=32,
              epochs=100,
              validation_data=(x_test, y_test),
              callbacks=[early_stop, tb],
              shuffle=True)

Train on 50000 samples, validate on 10000 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100


Small changes can have huge effects on a neural networks ability to learn the data. Previously, I tried the same exact architecture as above, but included ```padding='same'``` on the second Convulutional2D layer. That caused the model to fail, producing only 10% accuracy. I do not know why removing padding on this layer so greatly impacted the models ability to learn, but conjecture that it had to do with the addition of zeros along the edges of the images in the second layer causing information to be lost. 

In [27]:
model = Sequential()

model.add(Conv2D(32, (5,5), padding='same', input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Dropout(.25))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))

model.add(Conv2D(64, (3,3), padding='same'))
model.add(Activation('relu'))

model.add(Conv2D(32, (3,3)))
model.add(Activation('relu'))

model.add(Flatten())
model.add(Dense(512))
model.add(Dropout(.5))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('softmax'))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_47 (Conv2D)           (None, 32, 32, 32)        2432      
_________________________________________________________________
activation_72 (Activation)   (None, 32, 32, 32)        0         
_________________________________________________________________
dropout_28 (Dropout)         (None, 32, 32, 32)        0         
_________________________________________________________________
max_pooling2d_13 (MaxPooling (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_48 (Conv2D)           (None, 14, 14, 64)        18496     
_________________________________________________________________
activation_73 (Activation)   (None, 14, 14, 64)        0         
_________________________________________________________________
conv2d_49 (Conv2D)           (None, 14, 14, 64)        36928     
__________

In [31]:
opt = keras.optimizers.RMSprop(lr=0.1, decay=1e-6)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

lenet_hist_4 = model.fit(x_train, y_train,
              batch_size=32,
              epochs=10,
              validation_data=(x_test, y_test),
              callbacks=[early_stop, tb],
              shuffle=True)

Train on 50000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10


Strangely, adding just one more convulutional 2D layer kills this models ability to learn the images. Neural Networks, although powerful, are clearly extremely sensitive to small changes in their architecture. Not only that, but with so many different variations and parameters to tune, neural networks really are not something just to throw at any problem a data scientist might encounter. 

Before we add another Conv2D layer, what would happen if we added a second fully connected layer?

In [32]:
model = Sequential()

model.add(Conv2D(32, (5,5), padding='same', input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(.25))

model.add(Conv2D(32, (3,3)))
model.add(Activation('relu'))
model.add(Dropout(.25))

model.add(Conv2D(64, (3,3), padding='same'))
model.add(Activation('relu'))

model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))

model.add(Flatten())
model.add(Dense(1024))
model.add(Dropout(.5))
model.add(Activation('relu'))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('softmax'))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_55 (Conv2D)           (None, 32, 32, 32)        2432      
_________________________________________________________________
activation_84 (Activation)   (None, 32, 32, 32)        0         
_________________________________________________________________
max_pooling2d_15 (MaxPooling (None, 16, 16, 32)        0         
_________________________________________________________________
dropout_32 (Dropout)         (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_56 (Conv2D)           (None, 14, 14, 32)        9248      
_________________________________________________________________
activation_85 (Activation)   (None, 14, 14, 32)        0         
_________________________________________________________________
dropout_33 (Dropout)         (None, 14, 14, 32)        0         
__________

In [33]:
opt = keras.optimizers.RMSprop(lr=0.0001, decay=1e-6)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

lenet_hist_5 = model.fit(x_train, y_train,
              batch_size=32,
              epochs=100,
              validation_data=(x_test, y_test),
              callbacks=[early_stop, tb],
              shuffle=True)

Train on 50000 samples, validate on 10000 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100


### AlexNet

In [34]:
model = Sequential()

model.add(Conv2D(32, (3,3), padding='same', input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())

model.add(Conv2D(32, (3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())

model.add(Conv2D(64, (3,3), padding='same'))
model.add(Activation('relu'))

model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))

model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(Dense(3072))
model.add(Activation('relu'))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('softmax'))

In [35]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_59 (Conv2D)           (None, 32, 32, 32)        896       
_________________________________________________________________
activation_91 (Activation)   (None, 32, 32, 32)        0         
_________________________________________________________________
max_pooling2d_16 (MaxPooling (None, 16, 16, 32)        0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 16, 16, 32)        128       
_________________________________________________________________
conv2d_60 (Conv2D)           (None, 14, 14, 32)        9248      
_________________________________________________________________
activation_92 (Activation)   (None, 14, 14, 32)        0         
_________________________________________________________________
max_pooling2d_17 (MaxPooling (None, 7, 7, 32)          0         
__________

In [36]:
opt = keras.optimizers.RMSprop(lr=0.0001, decay=1e-6)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

In [39]:
model.fit(x_train, y_train,
              batch_size=32,
              epochs=25,
              validation_data=(x_test, y_test),
              callbacks=[early_stop],
              shuffle=True)

Train on 50000 samples, validate on 10000 samples
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25


<keras.callbacks.History at 0x7f3e32cb57b8>

In [41]:
model.evaluate(x_test, y_test)



[1.6516393534660339, 0.65480000000000005]

This model appears to begin offering promising results. On training data it performs extremely well, reaching accuracy of .91 and only .23 loss. However, when validation scores are considered, it does not appear to be learning the data better. In fact, val scores peak around Epoch 13, at .67.

Compared to the baseline models I have considered so far, this model is not doing terrible, but there still remains a lot of room for improvement. 

A notable difference between this model and the example model fit in notebook one is the presence of batch normalization layers instead of dropout layers. These layers are both importation layers for preventing overfitting. 

Dropout layers prevent overfitting by randomly resetting a fraction of the inputs to zero at each layer. This helps by reducing the noise - think of it as decreasing the resolution or sharpness of the image - so that the true features that make up a shape can be extracted. For example, human faces are easily recognizable to us. However, no two single humans faces have the same exact shape. By randomly dropping some of the uniqueness of each input, the true features may be extracted. 

As a final addition to this notebook, I will now fit a variation of the AlexNet model that incorporates these dropout layers while removing the normalization layers in an attempt to produce better results. 

In [44]:
model = Sequential()

model.add(Conv2D(32, (3,3), padding='same', input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))

model.add(Conv2D(32, (3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3,3), padding='same'))
model.add(Activation('relu'))

model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))

model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(Dense(3072))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('softmax'))

In [45]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_64 (Conv2D)           (None, 32, 32, 32)        896       
_________________________________________________________________
activation_99 (Activation)   (None, 32, 32, 32)        0         
_________________________________________________________________
max_pooling2d_19 (MaxPooling (None, 16, 16, 32)        0         
_________________________________________________________________
dropout_35 (Dropout)         (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_65 (Conv2D)           (None, 14, 14, 32)        9248      
_________________________________________________________________
activation_100 (Activation)  (None, 14, 14, 32)        0         
_________________________________________________________________
max_pooling2d_20 (MaxPooling (None, 7, 7, 32)          0         
__________

In [46]:
opt = keras.optimizers.RMSprop(lr=0.0001, decay=1e-6)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

In [50]:
model.fit(x_train, y_train,
              batch_size=32,
              epochs=100,
              validation_data=(x_test, y_test),
              callbacks=[early_stop],
              shuffle=True)

Train on 50000 samples, validate on 10000 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100


<keras.callbacks.History at 0x7f3e32098b38>

Adding dropout does help the validation score slightly here, boosting it from .66 in the first iteration of the AlexNet model to just above .70 in this iteration. 

After reading through the paper located here http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf in greater depth, I made a few changes again to the model. Now I incorporate Normalization layers in the first two layers of the model, before applying MaxPooling instead of after. I also adjusted the size of the Fully Connected layers at the end and added to new Dropout .5 layers after each, as was suggested in order to decrease overfitting in the AlexNet published paper.

By the end of the 7 convulutional layers, 64 filters have been created. Each of these 64 filters then connects to each of the fully connected neurons. The neurons learn to give different weightings to each of the 64 shape filters in order to put together ideas about what shapes form together to create certain image classifications. This process happens through two layers, and then final decisions are made. Dropout is introduced in these layers to prevent overfitting in the training model. 

In [51]:
model = Sequential()

model.add(Conv2D(24, (5,5), padding='same', input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(BatchNormalization())
#model.add(MaxPooling2D(pool_size=(2,2)))


model.add(Conv2D(64, (5,5)))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(64, (3,3), padding='same'))
model.add(Activation('relu'))

model.add(Conv2D(64, (3,3), padding="same"))
model.add(Activation('relu'))

model.add(Conv2D(32, (3,3), padding="same"))
model.add(Activation('relu'))
#model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))

In [52]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_69 (Conv2D)           (None, 32, 32, 24)        1824      
_________________________________________________________________
activation_107 (Activation)  (None, 32, 32, 24)        0         
_________________________________________________________________
batch_normalization_3 (Batch (None, 32, 32, 24)        96        
_________________________________________________________________
conv2d_70 (Conv2D)           (None, 28, 28, 64)        38464     
_________________________________________________________________
activation_108 (Activation)  (None, 28, 28, 64)        0         
_________________________________________________________________
batch_normalization_4 (Batch (None, 28, 28, 64)        256       
_________________________________________________________________
max_pooling2d_22 (MaxPooling (None, 14, 14, 64)        0         
__________

In [53]:
opt = keras.optimizers.RMSprop(lr=0.0001, decay=1e-6)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

In [55]:
model.fit(x_train, y_train,
              batch_size=32,
              epochs=100,
              validation_data=(x_test, y_test),
              callbacks=[early_stop],
              shuffle=True)

Train on 50000 samples, validate on 10000 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100


<keras.callbacks.History at 0x7f3e31888048>

In [58]:
model = Sequential()

model.add(Conv2D(64, (5,5), padding='same', input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), padding="same"))
model.add(BatchNormalization())
model.add(Dropout(0.25))

model.add(Conv2D(64, (5,5), padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2), padding='same'))

model.add(Flatten())
model.add(Dense(4096))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(4096))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))

In [59]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_74 (Conv2D)           (None, 32, 32, 64)        4864      
_________________________________________________________________
activation_115 (Activation)  (None, 32, 32, 64)        0         
_________________________________________________________________
max_pooling2d_23 (MaxPooling (None, 16, 16, 64)        0         
_________________________________________________________________
batch_normalization_5 (Batch (None, 16, 16, 64)        256       
_________________________________________________________________
dropout_40 (Dropout)         (None, 16, 16, 64)        0         
_________________________________________________________________
conv2d_75 (Conv2D)           (None, 16, 16, 64)        102464    
_________________________________________________________________
activation_116 (Activation)  (None, 16, 16, 64)        0         
__________

In [60]:
opt = keras.optimizers.RMSprop(lr=0.0001, decay=1e-6)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

In [62]:
model.fit(x_train, y_train,
              batch_size=32,
              epochs=100,
              validation_data=(x_test, y_test),
              callbacks=[early_stop],
              shuffle=True)

Train on 50000 samples, validate on 10000 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100


<keras.callbacks.History at 0x7f3e30f5aac8>

In [63]:
model = Sequential()

model.add(Conv2D(32, (5,5), padding='same', input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (5,5)))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(64, (3,3), padding='same'))
model.add(Activation('relu'))

model.add(Conv2D(64, (3,3), padding="same"))
model.add(Activation('relu'))

model.add(Conv2D(32, (3,3), padding="same"))
model.add(Activation('relu'))
#model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))

In [64]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_76 (Conv2D)           (None, 32, 32, 32)        2432      
_________________________________________________________________
activation_120 (Activation)  (None, 32, 32, 32)        0         
_________________________________________________________________
batch_normalization_7 (Batch (None, 32, 32, 32)        128       
_________________________________________________________________
max_pooling2d_25 (MaxPooling (None, 16, 16, 32)        0         
_________________________________________________________________
dropout_43 (Dropout)         (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_77 (Conv2D)           (None, 12, 12, 64)        51264     
_________________________________________________________________
activation_121 (Activation)  (None, 12, 12, 64)        0         
__________

In [65]:
opt = keras.optimizers.RMSprop(lr=0.0001, decay=1e-6)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

model.fit(x_train, y_train,
              batch_size=32,
              epochs=100,
              validation_data=(x_test, y_test),
              callbacks=[early_stop],
              shuffle=True)

Train on 50000 samples, validate on 10000 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100


<keras.callbacks.History at 0x7f3e3079c6a0>

In [66]:
model.evaluate(x_test, y_test)



[1.1390913041114807, 0.69530000000000003]

In [None]:
model.save()