# Digit Identification

Identifying few strokes on a  piece of paper to be number may seem to be a piece of cake for us humans, thanks to millions of years of evolution and a highly evolved brain, but for a computer, this task can be really daunting. 

In this project, I trained a convolutional neural network to identify digits. Then I took it a step further by writing another program which can locate digits in a picture and then identify that them.

![Image identifcation example](https://www.concordia.ca/students/birks/student-id/8-digit-student-id-card/_jcr_content/content-main/image.img.jpg/1449603985466.jpg)

Follow this comprehensive notebook to gain a deeper understanding of the whole process.


In [6]:
import warnings
warnings.filterwarnings('ignore')

import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.metrics import accuracy_score
import numpy as np

from keras.datasets import mnist
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.models import model_from_json #Save neural network
from keras import backend as K
K.set_image_dim_ordering('th')

We start off by disabling the warnings issued by Jupyter Notebook. They can tend to be irritating at times.

We import __matplotlib__ for displaying the images in the notebook and _%matplotlib_ is a magic function in IPython which instructs the notebook to plot images right below the cell that produced it.

We import the training data from Keras' dataset collection.

An RGB image has three dimensions, namely - rows, columns and the number of colour channels which is three(Red, green and blue). So a 64x64 RGB image is represented as (64, 64, 3) by Keras.
The __set_image_dim_ordering('th')__ function specifies if the colour channel comes first or last. 


In [17]:
def acquire_data():
    
    (train_data, train_labels), (test_data, test_labels) = mnist.load_data()
    
    train_data = train_data.reshape(train_data.shape[0], 1, 28, 28).astype('float32')/255
    test_data = test_data.reshape(test_data.shape[0], 1, 28, 28).astype('float32')/255
    
    train_labels = np_utils.to_categorical(train_labels)
    test_labels = np_utils.to_categorical(test_labels)
    num_categories = test_labels.shape[1]
    
    return train_data, train_labels, test_data, test_labels, num_categories

A neural network requires a lot of data to train on and leaning on an existing dataset helps us save a lot of time. We will use the MNIST dataset of handwritten digits, which contains 70,000 images of digits that are  28 px wide and 28 px long.

![MNIST sample image](https://corochann.com/wp-content/uploads/2017/02/mnist_plot.png)

We load the required data into respective variables and then type-cast it and reshape the variables. Then normalise it by dividing it by 255. Rescaling the pixel values can speed up the training.

In [20]:
def define_model():
    
    model = Sequential()
    model.add(Conv2D(30, (5, 5), input_shape=(1, 28, 28), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(15, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.2))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dense(50, activation='relu'))
    model.add(Dense(num_categories, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    print(model.summary())
    return model

This function is invoked to build the neural network. Please refer to my previous notebook for explanations about each layer's functioning. Given bellow is a diagram of this CNN's architecture. 
![CNN Architecture](Cnn_architecture.jpg)

We have two convolution layers that will scan the image part by part, to learn features associated with the certain digit. These layers are followed by a fully connected three-layered neural which aptly makes classifies the digit. We receive the output as probabilities and the digit with the highest probability is chosen. 


In [10]:
def save_model(model):
    path = "Models\\"
    model_json = model.to_json()
    with open(path + "model.json", "w") as json_file:
        json_file.write(model_json)
    model.save_weights(path + "model_weights.h5")
    print("Neural Network saved to disk. Path:",path)

In [22]:
train_data, train_labels, test_data, test_labels, num_categories = acquire_data()

print ("Train data size = ", len(train_data))
print ("Test data size = ", len(test_data))

model = define_model()
model.fit(train_data, train_labels, batch_size = 200, epochs = 30)


Train data size =  60000
Test data size =  10000
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_10 (Conv2D)           (None, 30, 24, 24)        780       
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 30, 12, 12)        0         
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 15, 10, 10)        4065      
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 15, 5, 5)          0         
_________________________________________________________________
dropout_5 (Dropout)          (None, 15, 5, 5)          0         
_________________________________________________________________
flatten_5 (Flatten)          (None, 375)               0         
_________________________________________________________________
dense_12 (Dense)           

<keras.callbacks.History at 0x153024a5320>

In [27]:
predicted_labels = np.argmax(model.predict(test_data), axis = 1)
test_labels = np.argmax(test_labels, axis = 1)
print("Accuracy = ", accuracy_score(test_labels, predicted_labels)*100,"%")

Accuracy =  99.32 %


In [160]:
save_model(model)

Saved model to disk


We achieved an accuracy of 99.32%. Well this is good but not too good. Higher accuracies can be achieved by using a different network or tuning the hyper-parameters(Number of Epochs, batch_size in this case). This model can now predict the digits accurately whenever it is tasked to do so. Though in real life numbers appear as a batch, what should we do in that case? Check the digit_locater_and_identifier.ipynb for the solution.