## Convolutional Neural Networks: MNIST classification with Keras

The MNIST classification task is a classic machine learning benchmark. The data includes 70,000 handwritten grayscale digits, and the task is to identify them. The digits run from 0 to 9 so this is a multiclass classification problem.In particular, there are 10 possible classes.

The MNIST classification task is sort of like a "hello world" for computer vision, so a solution can be implemented quickly with an off-the-shelf machine learning library.

Since convolutional neural networks have thus far proven to be the best at computer vision tasks, we'll use the Keras library to implement a convolutional neural net as our solution. Keras provides a well-designed and readable API on top of both Theano and TensorFlow fast backends, so we'll be done in a surprisingly short amount of steps!

Because MNIST is such a common task, the dataset is included with many machine learning libraries. With Keras, you can load the dataset with just a couple of lines:

In [6]:
from keras.datasets import mnist
(X_train,y_train),(X_test,y_test)=mnist.load_data()

n_train,height,width =X_train.shape
n_test,_,_=X_test.shape

n_train,n_test,height,width


[5 0 4 ..., 5 6 8]


We have 60,000 28*28 training grayscale images and 10,000 28*28 test grayscale images.Some Preprocessing steps are required to get the data into proper form for the CNN

In [9]:
from keras.utils.np_utils import to_categorical

#we have to preprocess the data into the right form
X_train=X_train.reshape(n_train,1,height,width).astype('float32')
X_test = X_test.reshape(n_test, 1, height, width).astype('float32')

#normalize from [0-255] to [0-1]
X_train =X_train/ 255
X_test =X_test/ 255

#numbers 0,9,so ten classes
n_classes=10

y_train=to_categorical(y_train,n_classes)
y_test=to_categorical(y_test,n_classes)

Keras makes it very easy to define a neural network.We first instantiate a sequentail keras model,meaning the component model come one after the other-eg,layer by layer

In [11]:
from keras.models import Sequential 
model=Sequential()

The general architecture of a convolutinal neural network is:

* convolution layers,followed by pooling layers
* fully-connected layers
* a final fully-connected softmax layer

We'll follow this same basic strucure and interweave some other components,such as dropout,improvement

To begin,we start with our convolution layers,We first need to specify some architecture hyperparameters:

* How many filters do we want for our convolution layers? Like most hyperparameters, this is chosen through a mix of intuition and tuning. A rough rule of thumb is: the more complex the task, the more filters. (Note that we don't need to have the same number of filters for each convolution layer, but we are doing so here for convenience.)
* What size should our convolution filters be? We don't want filters to be too large or the resulting matrix might not be very meaningful. For instance, a useless filter size in this task would be a 28x28 filter since it covers the whole image. We also don't want filters to be too small for a similar reason, e.g. a 1x1 filter just returns each pixel.
* What size should our pooling window be? Again, we don't want pooling windows to be too large or we'll be throwing away information. However, for larger images, a larger pooling window might be appropriate (same goes for convolution filters).


In [12]:
#number of convolutional filters
n_filters=32

#convolutional filter size
n_conv=3

#pooling window size
#i.e we will use a n_pool*n_pool pooling window
n_pool=2

Now we can begin adding our convolution and pooling layers.

We're using only two convolutional layers because this is a relatively simple task. Generally for more complex tasks you may want more convolution layers to extract higher and higher level features.

For our convolution activation functions we use ReLU, which is common and effective.

The particular pooling layer we're using is a max pooling layer, which can be thought of as a "feature detector".

In [15]:
from keras.layers import Activation
from keras.layers.convolutional import Convolution2D,MaxPooling2D

model.add(Convolution2D(
        n_filters,n_conv,n_conv,
        
        #apply the filter to only full parts of the image 
        #(i.e do not "spill over" the border)
        # this is  called a narrow convolution
        border_mode='valid',
        # we have a 28x28 single channel (grayscale) image
        # so the input shape should be (1, 28, 28)
        input_shape=(1,height,width)
    ))

model.add(Activation('relu'))

model.add(Convolution2D(n_filters,n_conv,n_conv))
model.add(Activation('relu'))

#then we apply pooling to smmarize the features
#extracted thus far
model.add(MaxPooling2D(pool_size=(n_pool,n_pool)))

Then we can add dropout and our dense and output (softmax) layers.

In [17]:
from keras.layers import Dropout,Flatten,Dense

model.add(Dropout(0.25))

#flatten the data for 1D layers
model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))

#the softmax output layer give us a probability for each class
model.add(Dense(n_classes))
model.add(Activation('softmax'))

We tell Keras to compile the model using whatever backend we have configured (Theano or TensorFlow). At this stage we specify the loss function we want to optimize. Here we're using categorical cross-entropy, which is the standard loss function for multiclass classification.
We also specify the particular optimization method we want to use. Here we're using Adam, which adapts the learning rate based on how training is going and improves the training process.

In [18]:
model.compile(
    loss='categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

Now that the model is defined and compiled, we can begin training and fit the model to our training data.
Here we're training for only 10 epochs. This is plenty for this task, but for more difficult tasks, more epochs may be necessary.
Training will take quite a while if you are running this on a CPU. Generally with neural networks, and especially with convolutional neural networks, you want to train on a GPU for much, much faster training times. Again, this dataset is relatively small so it won't take a terrible amount of time, but more than you might want to sit around and wait for.

In [20]:
#how many examples to look at during each training iteration
batch_size=128

#how many times to run through the full set of examples
n_epochs=2

model.fit(X_train,y_train,batch_size=batch_size,nb_epoch=n_epochs,validation_data=(X_test,y_test))


Train on 60000 samples, validate on 10000 samples
Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7ff2c0ce1190>

Now we can evaluate the model on the test data to get a sense of how we did.

In [21]:
#how'd we dp?
loss,accuracy=model.evaluate(X_test,y_test)
print('loss:',loss)
print('accuracy',accuracy)

('loss:', 0.039719996613910187)
('accuracy', 0.98719999999999997)
