### MNIST Handwritten Digit Classification

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from keras.datasets import mnist

##### Load the images and reshape the data arrays to have a single color channel.

In [5]:
# Load Dataset
(trainX,trainy),(testX,testy) = mnist.load_data()

In [14]:
trainX.shape

(60000, 28, 28)

In [18]:
print('Train: X=%s, y=%s' % (trainX.shape,trainy.shape))

Train: X=(60000, 28, 28), y=(60000,)


In [29]:
#Reshape dataset to have single channel

trainX=trainX.reshape(trainX.shape[0],28,28,1)
testX=testX.reshape(testX.shape[0],28,28,1)


In [32]:
testX.shape

(10000, 28, 28, 1)

In [34]:
#One hot encode target values
from keras.utils import to_categorical
trainy = to_categorical(trainy)
testy = to_categorical(testy)

##### load_dataset() function implements these behaviors and can be used to load the dataset

In [35]:
def load_dataset():
    #Load dataset
    (trainX,trainY),(testX,testY) = mnist.load_dataset()
    
    #Reshape
    trainX=trainX.reshape(trainX.shape[0],28,28,1)
    trainY=trainY.reshape(trainY.shape[0],28,28,1)
    
    #One hot encode
    testX = to_categorical(testX)
    testY = to_categorical(testY)
    
    return trainX,trainY,testX,testY

### Prepare Pixel Data

A good starting point is to normalize the pixel values of grayscale images, 
e.g. rescale them to the range [0,1]. This involves first converting the data type from unsigned 
integers to floats,then dividing the pixel values by the maximum value

In [37]:
#Convert from integer to float
train_norm = trainX.astype('float32')
test_norm = testX.astype('float32')

#normalize to range 0 to 1
train_norm = train_norm / 255.0
test_norm = test_norm / 255.0

In [39]:
#Put everything in prep_pixels() function
#Scale_pixels
def prep_pixels(train,test):
    #Convert from integer to float
    train_norm = train.astype('float32')
    test_norm = test.astype('float32')

    #normalize to range 0 to 1
    train_norm = train_norm / 255.0
    test_norm = test_norm / 255.0
    
    return train_norm,test_norm

The model has two main aspects: the feature extraction front end comprised of convolutional and pooling layers, and the classifier backend that will make a prediction.

For the convolutional front-end, we can start with a single convolutional layer with a small filter size (3,3) and a modest number of filters (32) followed by a max pooling layer. The filter maps can then be flattened to provide features to the classifier.

Given that the problem is a multi-class classification task, we know that we will require an output layer with 10 nodes in order to predict the probability distribution of an image belonging to each of the 10 classes. This will also require the use of a softmax activation function. Between the feature extractor and the output layer, we can add a dense layer to interpret the features, in this case with 100 nodes.

All layers will use the ReLU activation function and the He weight initialization scheme, both best practices.

We will use a conservative configuration for the stochastic gradient descent optimizer with a learning rate of 0.01 and a momentum of 0.9. The categorical cross-entropy loss function will be optimized, suitable for multi-class classification, and we will monitor the classification accuracy metric, which is appropriate given we have the same number of examples in each of the 10 classes.

In [43]:
from keras.models import Sequential
from keras.layers import Conv2D,Dense,MaxPooling2D,Flatten
from keras.optimizers import Adam, SGD

### Define CNN Model

In [44]:
def define_model():
    model = Sequential()
    model.add(Conv2D(32,(3,3),activation='relu',kernel_initializer='he_uniform',input_shape=(28,28,1)))
    model.add(MaxPooling2D((2,2)))
    model.add(Flatten())
    model.add(Dense(100,activation='relu',kernel_initializer='he_uniform'))
    model.add(Dense(10,activation='softmax'))
    
    #Compile model
    opt = SGD(learning_rate=0.01,momentum=0.9)
    model.compile(optimizer=opt,loss='categorical_crossentropy',metrics='accuracy')
    
    return model

The model will be evaluated using five-fold cross-validation. The value of k=5 was chosen to provide a baseline for both repeated evaluation and to not be so large as to require a long running time. Each test set will be 20% of the training dataset, or about 12,000 examples, close to the size of the actual test set for this problem.

The training dataset is shuffled prior to being split, and the sample shuffling is performed each time, so that any model we evaluate will have the same train and test datasets in each fold, providing an apples-to-apples comparison between models.

We will train the baseline model for a modest 10 training epochs with a default batch size of 32 examples. The test set for each fold will be used to evaluate the model both during each epoch of the training run, so that we can later create learning curves, and at the end of the run, so that we can estimate the performance of the model. As such, we will keep track of the resulting history from each run, as well as the classification accuracy of the fold.

The evaluate_model() function below implements these behaviors, taking the training dataset as arguments and returning a list of accuracy scores and training histories that can be later summarized.

In [45]:
from sklearn.model_selection import KFold

### Evaluate Model

In [48]:
#Evaluate a model using K-fold cross validation
def model_evaluate(dataX,dataY,n_fold=5):
    scores,histories = list(),list()
    
    #Prepare cross validation
    kfold = KFold(n_splits=n_fold,shuffle=True,random_state=1)
    
    #enumarate splits
    for train_ix,test_ix in kfold.split(dataX):
        #define model
        model = define_model()
        
        #select rows for train and test
        trainX,trainY,testX,testY = dataX[train_ix],dataY[train_ix],dataX[test_ix],dataY[test_ix]
        
        #fit model
        history = model.fit(trainX,trainY,epochs=10,batch_size=32,validation_data=(testX,testY),verbose=0)
        
        #evaluate model
        _,acc = model.evaluate(testX,testY,verbose=0)
        print('> %.3f',(acc*100.0))
        
        #store scores
        scores.append(acc)
        histories.append(history)
        
        return scores,histories 