# CNN MNIST Model - Train and Test

In [53]:
import tensorflow as tf
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

### Load data

x_train is the dataset of 28x28 images of handwritten digits that the model will be trained on.

y_train is the dataset of labels that correspond to x_train.

x_test is the dataset of 28x28 images of handwritten digits that the model will be tested on.

y_test is the dataset of labels that correspond to x_test.

In [54]:
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

### Count by digit

In [55]:
unique, counts = np.unique(y_train, return_counts=True)
dict(zip(unique, counts))

{0: 5923,
 1: 6742,
 2: 5958,
 3: 6131,
 4: 5842,
 5: 5421,
 6: 5918,
 7: 6265,
 8: 5851,
 9: 5949}

### Normalization and Reshaping

##### The first convolution layer expects a single 60000x28x28x1 tensor instead of 60000 28x28x1 tensors.

In [56]:
# our image shape is 28X28
input_shape = (28, 28, 1)

# reshape train
x_train=x_train.reshape(x_train.shape[0], x_train.shape[1], x_train.shape[2], 1)
# reshape test
x_test = x_test.reshape(x_test.shape[0], x_test.shape[1], x_test.shape[2], 1)
# normalize values
x_train=x_train / 255.0
x_test=x_test/255.0

The labels for the training and the testing dataset are currently categorical and is not continuous. To include categorical dataset in our model, our labels should be converted to one-hot encodings.

For example, 2 becomes [0,0,1,0,0,0,0,0,0,0] and 7 becomes [0,0,0,0,0,0,0,1,0,0].

Run the following cell to transform the labels into one-hot encodings

In [57]:
y_train = tf.one_hot(y_train.astype(np.int32), depth=10)
y_test = tf.one_hot(y_test.astype(np.int32), depth=10)

### Now lets build out model

In [59]:
# model settings
batch_size = 64
num_classes = 10
epochs = 5

### Model structure description

- Conv2D layers are convolutions. Each filter (32 in the first two convolution layers and 64 in the next two convolution layers) transforms a part of the image (5x5 for the first two Conv2D layers and 3x3 for the next two Conv2D layers). The transformation is applied on the whole image.

- MaxPool2D is a downsampling filter. It reduces a 2x2 matrix of the image to a single pixel with the maximum value of the 2x2 matrix. The filter aims to conserve the main features of the image while reducing the size.

- Dropout is a regularization layer. In our model, 25% of the nodes in the layer are randomly ignores, allowing the network to learn different features. This prevents overfitting.

- relu is the rectifier, and it is used to find nonlinearity in the data. It works by returning the input value if the input value >= 0. If the input is negative, it returns 0.

- Flatten converts the tensors into a 1D vector.

- The Dense layers are an artificial neural network (ANN). The last layer returns the probability that an image is in each class (one for each digit).

As this model aims to categorize the images, we will use a categorical_crossentropy loss function.