##HandWritten Digit Recognition using SVM, CNN with TensorFlow and Keras

### MNIST Dataset
The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. These images were normalized in size and centered. Each image is in a 28x28 square (784 pixels). 60,000 images were used to train a model and 10,000 were used to test it.

### Digit Recognition using Convolutional Layer Networks

##### Imports

In [5]:
from matplotlib import pyplot 
import numpy

We need to import several things from Keras.

In [7]:
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.utils import np_utils
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D

In [8]:
# fix dimension ordering issue
from keras import backend as K
K.set_image_dim_ordering('th')
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [9]:
# reshape to be [samples][channels][width][height]
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28).astype('float32')
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28).astype('float32')

In [10]:
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255

The class-labels are One-Hot encoded, which means that each label is a vector with 10 elements, all of which are zero except for one element. The index of this one element is the class-number, that is, the digit shown in the associated image.

In [12]:
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]

#### Simple Convolutional Neural Network
Here we will be using one convolutional layer, one max pooling layer and one hidden layer

##### The neural network structure is shown below

Visible Layer (1x28x28 Inputs) >> Convolutional Layer (32 maps, 5x5) >> Max Pooling Layer (2x2) >> Dropout Layer (20%) >> Flatten Layer >> Hidden Layer (128 Neurons) >> Output Layer (10 Outputs)

#####Sequential Model
The Keras API has two modes of constructing Neural Networks. The simplest is the Sequential Model which only allows for the layers to be added in sequence.

In [16]:
# define a simple CNN model
def baseline_model():
	# create model
    # The Keras API has two modes of constructing Neural Networks. The simplest is the Sequential Model which only allows for the layers to be added in sequence.
	model = Sequential()
    # Convolutional layer with ReLU-activation and max-pooling.
	model.add(Conv2D(32, (5, 5), input_shape=(1, 28, 28), activation='relu'))
	model.add(MaxPooling2D(pool_size=(2, 2)))
	model.add(Dropout(0.2))
    # Flatten the 4-rank output of the convolutional layers to 2-rank that can be input to a fully-connected / dense layer.
	model.add(Flatten())
    # First fully-connected / dense layer with ReLU-activation.
	model.add(Dense(128, activation='relu'))
    # Last fully-connected / dense layer with softmax-activation for use in classification.
	model.add(Dense(num_classes, activation='softmax'))
	# Compile model
    # The Neural Network has now been defined and must be finalized by adding a loss-function, optimizer and performance metrics. This is called model "compilation" in Keras.
    # For a classification-problem such as MNIST which has 10 possible classes, we need to use the loss-function called categorical_crossentropy. The performance metric we are     # interested in is the classification accuracy.
	model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model


###### Build, Fit and Final evalution of the model

Here the epochs are given 10, which means the function iterates 10 times with a batch size of 200.

In [19]:
# build the model
model = baseline_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("CNN Error for the Simple Convolutional Network is: %.2f%%" % (100-scores[1]*100))

##### Summary of a model

In [21]:
model.summary()

### Large Convolutional Neural Network
Here we will be using Two convolutional layers, Two max pooling layers and Two hidden layers

##### The neural network structure is shown below

Visible Layer (1x28x28 Inputs) >> Convolutional Layer (30 maps, 5x5) >> Max Pooling Layer (2x2) >> Convolutional Layer (15 maps, 3x3) >> Max Pooling Layer (2x2) >> Dropout Layer (20%) >> Hidden Layer (128 Neurons) >> Hidden Layer (50 Neurons) >> Output Layer (10 Outputs

#######Sequential Model
The Keras API has two modes of constructing Neural Networks. The simplest is the Sequential Model which only allows for the layers to be added in sequence.

In [25]:
# define a simple CNN model
def larger_model():
	# create model
    # The Keras API has two modes of constructing Neural Networks. The simplest is the Sequential Model which only allows for the layers to be added in sequence.
	model = Sequential()
    # First convolutional layer with ReLU-activation and max-pooling.
	model.add(Conv2D(30, (5, 5), input_shape=(1, 28, 28), activation='relu'))
	model.add(MaxPooling2D(pool_size=(2, 2)))
    # Second convolutional layer with ReLU-activation and max-pooling.
	model.add(Conv2D(15, (3, 3), activation='relu'))
	model.add(MaxPooling2D(pool_size=(2, 2)))
	model.add(Dropout(0.2))
    # Flatten the 4-rank output of the convolutional layers to 2-rank that can be input to a fully-connected / dense layer.
	model.add(Flatten())
    # First fully-connected / dense layer with ReLU-activation.
	model.add(Dense(128, activation='relu'))
    # Second fully-connected / dense layer with ReLU-activation.
	model.add(Dense(50, activation='relu'))
    # Last fully-connected / dense layer with softmax-activation for use in classification.
	model.add(Dense(num_classes, activation='softmax'))
	# Compile model
    # The Neural Network has now been defined and must be finalized by adding a loss-function, optimizer and performance metrics. This is called model "compilation" in Keras.
    # For a classification-problem such as MNIST which has 10 possible classes, we need to use the loss-function called categorical_crossentropy. The performance metric we are     # interested in is the classification accuracy.
	model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model


###### Build, Fit and final evaluation of the model

Here the epochs are given 10, which means the function iterates 10 times with a batch size of 200.

In [28]:
# build the model
model = larger_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("CNN Error for the Large Convolutional Network is: %.2f%%" % (100-scores[1]*100))

### Large Convolutional Neural network with image flip of degree 180
Different people write in different angles. Here we randomly rotate images up to 180 degrees.

In [30]:
# Simple CNN for the MNIST Dataset
import numpy
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils
# fix dimension ordering issue
from keras import backend as K
K.set_image_dim_ordering('th')
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][channels][width][height]
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28).astype('float32')
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28).astype('float32')
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
# define data preparation
datagen = ImageDataGenerator(rotation_range=180)







In [31]:

# define a simple CNN model
def baseline_model():
	# create model
	model = Sequential()
	model.add(Conv2D(32, (5, 5), input_shape=(1, 28, 28), activation='relu'))
	model.add(MaxPooling2D(pool_size=(2, 2)))
	model.add(Dropout(0.2))
	model.add(Flatten())
	model.add(Dense(128, activation='relu'))
	model.add(Dense(num_classes, activation='softmax'))
	# Compile model
	model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
# build the model
model = baseline_model()
# Fit the model
model.fit_generator(datagen.flow(X_train.reshape(60000,1,28,28), y_train, batch_size=200), validation_data=(X_test.reshape(10000,1,28,28), y_test), epochs=10)


In [32]:
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("CNN Error for Large Convolutional Network with image flip of degree 180: %.2f%%" % (100-scores[1]*100))

In [33]:
model.summary()

### Support Vector Machine(SVM)

##### Imports

In [36]:
import matplotlib.pyplot as plt
import numpy as np
import time
import datetime as dt

In [37]:
#fetch original mnist dataset
from sklearn.datasets import fetch_mldata

In [38]:
# Import datasets, classifiers and performance metrics
from sklearn import datasets, svm, metrics

##### Fetching of data from MNIST

In [40]:
mnist = fetch_mldata('MNIST original', data_home='./')

##### Looking up keys present in data

In [42]:

mnist.keys()

In [43]:
#data field is 70k x 784 array, each row represents pixels from 28x28=784 image
images = mnist.data
targets = mnist.target

In [44]:
#full dataset classification
X_data = images/255.0
Y = targets

In [45]:
#split data to train and test
#from sklearn.cross_validation import train_test_split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_data, Y, test_size=0.15, random_state=42)

In [46]:
# Create a classifier: a support vector classifier

param_C = 5
param_gamma = 0.05
classifier = svm.SVC(C=param_C,gamma=param_gamma)

# We learn the digits on train part
start_time = dt.datetime.now()
print('Start learning at {}'.format(str(start_time)))
classifier.fit(X_train, y_train)
end_time = dt.datetime.now() 
print('Stop learning {}'.format(str(end_time)))
elapsed_time= end_time - start_time
print('Elapsed learning {}'.format(str(elapsed_time)))
expected = y_test
predicted = classifier.predict(X_test)
print("Classification report for classifier %s:\n%s\n"
      % (classifier, metrics.classification_report(expected, predicted)))
      
cm = metrics.confusion_matrix(expected, predicted)
print("Confusion matrix:\n%s" % cm)

print("Accuracy={}".format(metrics.accuracy_score(expected, predicted)))

##### Citations

https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/03C_Keras_API.ipynb

## Results

##### Accuracy of Finding the digit using Simple Convolutional Layer                                  : 99.07

##### Accuracy of Finding the digit using Large Convolutional Layer                                   : 99.26

##### Accuracy of Finding the digit using Large Convolutional Layer with image flip of degree 180     : 96.46 %

##### Accuracy of Finding the digit using Support Vector Machine (SVM)                                : 98.52 %

The text in the document by Rajasekhar Reddy Duddugunta is licensed under CC BY 3.0 https://creativecommons.org/licenses/by/3.0/us/

The code in the document by Rajasekhar Reddy Duddugunta is licensed under the MIT License https://opensource.org/licenses/MIT

Copyright 2018 Rajasekhar Reddy Duddugunta

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.