# MNIST Accuracy = 99.79%
It's amazing that convolutional neural networks can classify handwritten digits so accurately. In this notebook, we witness an ensemble of 15 CNNs classify MNIST's 10,000 test images after training on MNIST's 60,000 training images plus 25 million more images created by rotating, scaling, and shifting MNIST's training images. Learning from 25,060,000 images, this ensemble of CNNs achieves 99.79% classification accuracy (with average accuracy 99.745% and standard deviation of 0.020 as indicated by 100 trials). This accuracy revivals the best to date. This notebook uses ideas from the best published models found on the internet. Advanced techniques include data augmentation, nonlinear convolution layers, learnable pooling layers, ReLU activation, ensembling, bagging, decaying learning rates, dropout, batch normalization, and adam optimization.

More information about this ensemble of CNNs can be found [here][1].
[1]:https://www.kaggle.com/cdeotte/25-million-images-0-99757-mnist

In [None]:
# LOAD LIBRARIES
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
# USE KERAS WITH DEFAULT TENSORFLOW BACKEND
from keras.utils.np_utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D, BatchNormalization
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import LearningRateScheduler
from keras.datasets import mnist

# Load MNIST's 60,000 training images

In [None]:
# LOAD MNIST DATASET AS 60K TRAIN AND 10K TEST
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
# PREPARE DATA FOR NEURAL NETWORK
X_train = x_train / 255.0
X_test = x_test / 255.0
X_train = X_train.reshape(-1,28,28,1)
X_test = X_test.reshape(-1,28,28,1)
Y_train = to_categorical(y_train, num_classes = 10)

# Generate 25 million more images!!
by randomly rotating, scaling, and shifting MNIST's 60,000 training images.

In [None]:
# CREATE MORE IMAGES WITH DATA AUGMENTATION
datagen = ImageDataGenerator(
        rotation_range=15,
        zoom_range = 0.15,  
        width_shift_range=0.1, 
        height_shift_range=0.1)

# Build 15 Convolutional Neural Networks!

In [None]:
# BUILD CONVOLUTIONAL NEURAL NETWORKS
nets = 15
model = [0] *nets
for j in range(nets):
    model[j] = Sequential()

    model[j].add(Conv2D(32, kernel_size = 3, activation='relu', input_shape = (28, 28, 1)))
    model[j].add(BatchNormalization())
    model[j].add(Conv2D(32, kernel_size = 3, activation='relu'))
    model[j].add(BatchNormalization())
    model[j].add(Conv2D(32, kernel_size = 5, strides=2, padding='same', activation='relu'))
    model[j].add(BatchNormalization())
    model[j].add(Dropout(0.4))

    model[j].add(Conv2D(64, kernel_size = 3, activation='relu'))
    model[j].add(BatchNormalization())
    model[j].add(Conv2D(64, kernel_size = 3, activation='relu'))
    model[j].add(BatchNormalization())
    model[j].add(Conv2D(64, kernel_size = 5, strides=2, padding='same', activation='relu'))
    model[j].add(BatchNormalization())
    model[j].add(Dropout(0.4))

    model[j].add(Conv2D(128, kernel_size = 4, activation='relu'))
    model[j].add(BatchNormalization())
    model[j].add(Flatten())
    model[j].add(Dropout(0.4))
    model[j].add(Dense(10, activation='softmax'))

    # COMPILE WITH ADAM OPTIMIZER AND CROSS ENTROPY COST
    model[j].compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

# Train 15 CNN

In [None]:
# DECREASE LEARNING RATE BY 0.95 EACH EPOCH
annealer = LearningRateScheduler(lambda x: 1e-3 * 0.95 ** x)

# TRAIN CNNs AND DISPLAY ACCURACIES
epochs = 30
history = [0] * nets
results = [0] * nets
for j in range(nets):
    X_train2, X_val2, Y_train2, Y_val2 = train_test_split(X_train, Y_train, test_size = 0.1)
    history[j] = model[j].fit_generator(datagen.flow(X_train2,Y_train2, batch_size=64),
      epochs = epochs, steps_per_epoch = X_train2.shape[0]//64,
      validation_data = (X_val2,Y_val2), callbacks=[annealer], verbose=0)
    print("CNN {0:d}: Epochs={1:d}, Train accuracy={2:.5f}, Validation accuracy={3:.5f}".format
      (j+1,epochs,history[j].history['acc'][epochs-1],history[j].history['val_acc'][epochs-1]))
    
    # PREDICT DIGITS FOR CNN J ON MNIST 10K TEST
    results[j] = model[j].predict(X_test)
    results2 = np.argmax(results[j],axis = 1)

    # CALCULATE ACCURACY OF CNN J ON MNIST 10K TEST
    c=0
    for i in range(10000):
        if results2[i]!=y_test[i]:
            c +=1
    print("CNN %d: Test accuracy = %f" % (j+1,1-c/10000.))

# Ensemble 15 CNN and Predict

In [None]:
# PREDICT DIGITS FOR ENSEMBLE ON MNIST 10K TEST
results2 = np.zeros( (X_test.shape[0],10) )
for j in range(nets):
    results2 = results2 + results[j]
results2 = np.argmax(results2,axis = 1)
 
# CALCULATE ACCURACY OF ENSEMBLE ON MNIST 10K TEST SET    
c=0
for i in range(10000):
    if results2[i]!=y_test[i]:
        c +=1
print("Ensemble Accuracy = %f" % (1-c/10000.))

# Performance
A neural network learns different weights and biases each time you train it. Therefore this notebook was exectuted 100 times to assess performance. This ensemble's average accuracy classifying MNIST 10k test images is 99.745% with standard deviation of 0.020 and maximum accuracy 99.79%. An individual CNN's average accuracy is 99.641% with standard deviation of 0.047 and maximum accuracy 99.81%. The accuracy of this code for classifying MNIST test images rivals the best reported to date. A list of the best classifiers can be found [here][1]. More information about this ensemble of CNNs can be found [here][2].
[1]:http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html
[2]:https://www.kaggle.com/cdeotte/25-million-images-0-99757-mnist