# Digits recognizer using convolutional neural network

Here a dataset from training kaggle competition is used. This is a MNIST dataset with hand written digits. Actually data could be very well classified even using such algorithms as k Nearest Neighbours. So the aim of this work is get familiar with keras library and convolutional neural networks.

In [17]:
#importing libraries
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers.core import Dense
from keras.layers.core import Dropout
from keras.layers.convolutional import Conv2D
from keras.layers.pooling import MaxPooling2D
from keras.utils.np_utils import to_categorical 
from sklearn.metrics import accuracy_score, confusion_matrix, data 
from sklearn.model_selection import train_test_split

In [2]:
#importing datasets
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
print(train.shape, test.shape)
train.head()

(42000, 785) (28000, 784)


Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Data Preprocessing

A little of data preprocessing is needed here. There are no missing values, so all we have to do is to scale our features (MinMaxScaler could be used also, but to divide by 255 is simplier), input arrays have to be reshaped, and we have to transform vector of classes to a matrix

In [22]:
#preprocessing
X = train.iloc[:,1:].values / 255
X_test = test.values / 255
y = train['label'].values

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size = 0.1, random_state = 147, stratify = y)

X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_val = X_val.reshape(X_val.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)

y_train = to_categorical(y_train)

## Creating a neural network

The most important step: we build a convolutional neural network. The structure of CNN is the following: 

* Convolutional layer with 32 filters, kernel window 3x3 and ReLU as activation function
* Pooling layer with pooling window 2x2
* Dropout regularization, dropping 25% 
* Convolutional layer with 64 filters, kernel window 3x3 and ReLU as activation function
* Pooling layer with pooling window 2x2
* Dropout regularization, dropping 25%
* Flatten layer
* Dense layer with 256 output nodes and ReLU as activation function
* Dropout regularization, dropping 25%
* Dense layer (the last one) with 10 output nodes and softmax as activation function

Finally, we compile CNN using Adam as gradient descent optimizer, categorical crossentropy as loss function and accuracy as metrics

In [20]:
#neural network structure
classifier = Sequential()
classifier.add(Conv2D(32, (3, 3), input_shape = (28, 28, 1), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
classifier.add(Dropout(0.25))
classifier.add(Conv2D(64, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
classifier.add(Dropout(0.25))
classifier.add(Flatten())
classifier.add(Dense(output_dim = 256, activation = 'relu'))
classifier.add(Dropout(0.25))
classifier.add(Dense(output_dim = 10, activation = 'softmax'))
classifier.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])



## Fitting neural network

We fit CNN using batch gradient descent with batch size 64, we pass through all training data 28 times and we use 10% of our data as validation set.

In [21]:
#fitting
classifier.fit(X_train, y_train, epochs = 28, batch_size = 64, validation_split = 0.1)

Train on 34020 samples, validate on 3780 samples
Epoch 1/28
Epoch 2/28
Epoch 3/28
Epoch 4/28
Epoch 5/28
Epoch 6/28
Epoch 7/28
Epoch 8/28
Epoch 9/28
Epoch 10/28
Epoch 11/28
Epoch 12/28
Epoch 13/28
Epoch 14/28
Epoch 15/28
Epoch 16/28
Epoch 17/28
Epoch 18/28
Epoch 19/28
Epoch 20/28
Epoch 21/28
Epoch 22/28
Epoch 23/28
Epoch 24/28
Epoch 25/28
Epoch 26/28
Epoch 27/28
Epoch 28/28


<keras.callbacks.History at 0x121e03668>

In [23]:
#predicting validation set
y_val_pred = classifier.predict(X_val)
y_val_pred = np.argmax(y_val_pred, axis = 1)
print(accuracy_score(y_val, y_val_pred), confusion_matrix(y_val, y_val_pred))

0.993333333333 [[413   0   0   0   0   0   0   0   0   0]
 [  0 467   0   0   1   0   0   0   0   0]
 [  0   1 414   1   0   0   0   2   0   0]
 [  0   0   3 430   0   1   0   0   0   1]
 [  0   0   0   0 403   0   0   0   0   4]
 [  1   0   0   0   0 374   3   0   1   1]
 [  0   0   0   0   1   0 413   0   0   0]
 [  0   0   1   0   0   0   0 438   0   1]
 [  0   0   0   0   0   0   1   0 403   2]
 [  0   0   0   0   1   0   0   0   1 417]]


In [26]:
#predicting test set
y_pred = classifier.predict(X_test)
y_pred = np.argmax(y_pred, axis = 1)
prediction = pd.concat([pd.Series(range(1, X_test.shape[0] + 1), name = 'ImageId'), pd.Series(y_pred, name = 'Label')], axis = 1)
prediction.to_csv('prediction.csv', sep = ',', header = True, index = False)

We have got 0.997 accuracy on training set, 0.993 on validation set and 0.991 on test set. Good result, which belongs to top400 on kaggle (actually higher, because there are many false submissions, nevertheless model could be improved adding new layers and tuning hyperparameters.  