# Evaluating Mnist Dataset With Convolutional Neural Networks

As you remember, we had evaluated the MNIST dataset before using only densely connected layers. Thanks to Artificial Convolutional Networks (ConvNets), we can catch some patters using filters from images like edges of the objects, color differences, or shapes. In this notebook, we are going to explore convolutional neural networks and use them to obtain a better result.

The convolutional architecture of the used model at the end of this dataset looks like below;

<img src="images/ConvolutionalArchitecture.png">

Each convolutional filter has the shape of a 3x3 kernel while pooling filters have a 2x2. For the convolutional and pooling layers, we are applying element-wise multiplication in sliding windows manner.

For the example of Vertical edge detection below (padding=0, stride=1);

<img src="images/Filters.png">

Max pooling takes the maximum value in each window. It is generally used with a 2x2 kernel, and stride 2. For this reason, the output dimensions of a max-pooling layer would be half of the input.

For the example of max pooling;

<img src="images/MaxPooling.png" width=600 height=480>

### Importing Required Modules and Dataset

In [None]:
from keras import layers, models
from keras.datasets import mnist
from keras.utils import to_categorical

In [None]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz


* Because convolutional layers accept 4D tensors, we should reshape the data to the following format; (sample, height, weight, channel).

* Pixels are divided by 255 for normalization.

In [None]:
train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

### Creating and  Evaluating Model

In [None]:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

In [None]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 3, 3, 64)          36928     
Total params: 55,744
Trainable params: 55,744
Non-trainable params: 0
_________________________________________________________________


In the convolutional part of the model, we have used 3 convolutional layers and 2 max-pooling layers.

* Generally, convolutional layers are used with 3-by-3 filters while pooling layers are 2-by-2. That allows us to minimize input parameters for densely connected layers in terms of computational efficiency. In this example, we had 28x28x1 = 784 parameters at the beginning of the model. But at the end of the convolutional part, it has been shaped to 3x3x64 = 576. Moreover, imagine that we have RGB images with 3 channels. In this case, we would have 28x28x3 = 2352 parameters, which are 3 times more than our initial point.

* We can calculate the output of the convolutional layer with the formula of `[(n + 2p - f)/s] + 1` which `n` represents the length of the input while `p` for paddings and `f` for filter size. Then, divide them by stride and add 1 to the total result.

* Max-pooling reduces input size half since we used filter size as 2, and that takes strides automatically as 2.

* For this reason, the filter number in the convolutional layer is increasing gradually to catch more patterns while the output shape is decreasing.

In [None]:
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

In [None]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten_1 (Flatten)          (None, 576)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)               

* After convolutional layers, we should flat the parameters for densely connected layers.

* In the end, we have 93,322 trainable parameters.

In [None]:
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size=64)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.callbacks.History at 0x7fe5e01510f0>

In [None]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
test_acc



0.9927999973297119

As a result, thanks to convolutional layers, we have achieved 99% accuracy within only 5 epochs for the MNIST dataset.