## Simple Convolutional Neural Network

In [1]:
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras import backend as K
K.set_image_dim_ordering('th')

Using TensorFlow backend.


In [2]:
seed = 7
np.random.seed(seed)

In [3]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][pixels dimensions][width][height]
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28).astype('float32')
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28).astype('float32')

In [4]:
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
print num_classes

10


### network architecture
* input layer: Convolution2D
 * 32 feature maps
 * size 5\*5
 * rectifier activation function
* pooling layer: MaxPooling2D
 * pool size: 2\*2
* regularization layer: Dropout
 * randomly exclude 20% of neurons in the layer to reduce overfitting
* Flatten
 * converts the 2D matrix data to a vector
 * allows the output to be processed by standard fully connected layers
* fully connected layer
 * 128 neurons
 * rectifier activation function
* output layer
 * 10 neurons for 10 classes
 * softmax activation function to output probability-like predictions for each class

In [5]:
# the model is trained using logarithmic loss and the ADAM gradient descent algorithm
def baseline_model():
    # create model
    model = Sequential()
    model.add(Convolution2D(32, 5, 5, border_mode='valid', input_shape=(1, 28, 28), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.2))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dense(num_classes, activation='softmax'))
    # compile model
    model.compile(loss='categorical_crossentropy', optimizer='adam', metric=['accuracy'])
    return model

In [6]:
# the CNN is fit over 10 epochs with a batch size of 200
# build the model
model = baseline_model()
# fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), nb_epoch=10, batch_size=200, verbose=2)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)

kwargs passed to function are ignored with Tensorflow backend


Train on 60000 samples, validate on 10000 samples
Epoch 1/10
389s - loss: 0.2670 - val_loss: 0.0974
Epoch 2/10
411s - loss: 0.0862 - val_loss: 0.0642
Epoch 3/10
411s - loss: 0.0606 - val_loss: 0.0467
Epoch 4/10
345s - loss: 0.0477 - val_loss: 0.0403
Epoch 5/10
394s - loss: 0.0385 - val_loss: 0.0396
Epoch 6/10
439s - loss: 0.0323 - val_loss: 0.0362
Epoch 7/10
409s - loss: 0.0273 - val_loss: 0.0358
Epoch 8/10
350s - loss: 0.0217 - val_loss: 0.0336
Epoch 9/10
410s - loss: 0.0214 - val_loss: 0.0372
Epoch 10/10
373s - loss: 0.0170 - val_loss: 0.0383


IndexError: invalid index to scalar variable.

In [8]:
print scores
# print 'Baseline error: %.2f%%' % (100-scores[1]*100)

0.038305208453
