# Example 1: Simple MNIST CNN Model

GSB 545: Final Project

Stephanie Liu

June 7, 2022


# Simple MNIST convnet (Convolutional Neural Network)

See below for more information on the **original example** this was based on.

**Author:** [fchollet](https://twitter.com/fchollet)<br>
**Date created:** 2015/06/19<br>
**Last modified:** 2020/04/21<br>
**Description:** A simple convnet that achieves ~99% test accuracy on MNIST.

Resources Referenced in this Example:

- Neural Networks: https://sanchit2843.medium.com/introduction-to-neural-networks-660f6909fba9

- CNN: https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53

- MNIST Dataset: http://yann.lecun.com/exdb/mnist/

- Example: https://keras.io/examples/vision/mnist_convnet/


## Introduction

This example walks through how to classify images (of numbers) in the MNIST dataset to a digit between 0 and 9. The example uses a Convolutional Neural Network for classification, and shows how to do so in python using the keras package.

### **Overview of Neural Networks**

A neural network is the basic building block of deep learning, based on aspects of the human brain. It consists of layers stacked together to form a larger architecture. It consists of 3 types of layers, and within each layer, there are neurons, or nodes, that each have their own weight. 

- *Input layer*: This takes the independent variables from the data as input (each node is one variable)
- *Hidden layer(s*): This is the middle section of the model. There can be multiple hidden layers, and within each layer you can specify the number of nodes in each.
- *Output layer*: This is the layer that produces the predicted target variable. In our example, it would produce the predicted class (digit between 0-9) that the observation is assigned to.

Neural Networks are the weighted sum of inputs, that is the sum of each weighted node throughout the model. Keep in mind that in general, Simple Neural Networks take a series of inputs for one observation within the data, pass them through the nodes of each layer of the model, and produce a predicted output for that observation. The model then continues to do this for every observation of the data.



### **Convolutional Neural Networks**

Convolutional Neural Networks (CNN) is a class of artificial neural network (ANN), that is commonly applied to analyze visual imagery. It is a Deep Learning algorithm which can take in an input image, assign importance (i.e. updatable weights) to various objects in the image and be able to differentiate one from the other. 


An image is a matrix of pixel values. For instance, an image could be saved as a 3x3 matrix of pixels, and this is how it is stored in the data. The CNN model is able to capture the *Spatial* and *Temporal* dependencies in an image. The network can be trained to better understand the sophistication of the image by applying the relevant filters. The role of the CNN is to reduce the images into an easy-to-process form, without losing features which are essential to getting a good prediction.


**Convolution Layer:**
The element involved in carrying out the *convolution operation* in the first part of a Convolutional Layer is called the **Kernel/Filter, K**. This is typically a smaller subset (in size) of the of the original image. For example, for an image of size 5x5x1, we could choose a 3x3x1 kernel. When **Stride Length = 1** (Non-Strided), the kernel shifts 9 times, every time performing a matrix multiplication operation between K and the portion P of the image over which the kernel is hovering. The filter moves with a certain Stride Value, repeating the process until the entire image is traversed.


There are two types of results to the convolution operation, one in which the convolved feature has smaller dimensionality than the input (**Valid Padding)**, and the other in which the dimensionality is either increased or remains the same (**Same Padding**).

**Pooling Layer(s):**

Similar to the Convolutional Layer, the **Pooling layer** helps reduce the spatial size of the *Convolved Feature*. This is to decrease the computational power required to process the data. It is useful for extracting important features which do not vary from a rotational and positional standpoint. 

There are two types of Pooling: 
- **Max Pooling**: returns the *maximum value* from the portion of the image covered by the Kernel.
- **Average Pooling**: returns the *average* of all the values from the portion of the image covered by the Kernel.

Usually, Max Pooling is preferred (and we'll use this approach in teh exmaple below).

Finally, after going through the two layers described above, we flatten the final output and feed it to a regular Neural Network for classification output.




### **About the MNIST Dataset**

The MNIST dataset is a database of handwritten digits. It has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been normalized and centered in a fixed-size image. These images have been saved as a matrix of pixels, which is what we will feed into the model below.

## Setup

Our first step will be to load in the required packages used for the following model. This example walkthrough uses the *keras* package (similar to how tidymodels works in RStudio), which is a package in *tensorflow* that is typically used for Neural Networks. We'll also import *numpy*, which is a package in python that is used for many basic computations (and randomization).

In [1]:
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers

## Prepare the data


Loading in the MNIST Data Returns:
  Tuple of NumPy arrays: (x_train, y_train), (x_test, y_test).

**x_train**: uint8 NumPy array of grayscale image data with shapes
  (60000, 28, 28), containing the training data. Pixel values range
from 0 to 255.

**y_train**: uint8 NumPy array of digit labels (integers in range 0-9)
  with shape (60000,) for the training data.

**x_test**: uint8 NumPy array of grayscale image data with shapes
  (10000, 28, 28), containing the test data. Pixel values range
from 0 to 255.

**y_test**: uint8 NumPy array of digit labels (integers in range 0-9)
  with shape (10000,) for the test data.


Since the pixel values range from 0 to 255, we scale these values to get values between 0 and 1 by dividing the values by 255. 

The target (y) values range from 0 to 9, to represent one of the ten possible numbers displayed in the image.

Note that **x_train** and **y_train** have 60000 observations, while the **x_test** and **y_test** have 10000 observations. This indicates about an 85-15% train-test split.  

In [2]:
# Model / data parameters
num_classes = 10
input_shape = (28, 28, 1)

# load mnist data, split between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Scale images to [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")


# convert class vectors to binary class matrices (with 0 or 1)
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


## Build the model

Next, we will build the structure of the CNN model. Note that we will not fit/train the data on the model until a later step. 

The model below has the following layers:
- *Input layer*: with shape (28, 28, 1)
- *Convolutional layer*: with 32 filter, a 3x3 kernel, and relu activation function
- *Pooling layer* (MaxPooling): with a 2x2 pool size

- Second *Convolutional layer*: with 64 filter, a 3x3 kernal, and relu activation function
- Second *Pooling layer* (MaxPooling): with a 2x2 pool size
- *Flatten + Dropout layers*: to flatten the matrices before passing to output layer; also to change some inputs to 0's at a given frequency (0.5)
- *Dense layer* (Output): which outputs 10 classes and uses the softmax activation function

You can see a summary of the model structure in the code output below.

In [3]:
model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 flatten (Flatten)           (None, 1600)              0         
                                                                 
 dropout (Dropout)           (None, 1600)              0

## Train the model

Now that we've built the model, we can train the model on the training MNIST data.
Before doing so, we will use the ***compile*** function to specify the *loss function* (categorical crossentropy), *optimizer* (adam), and preferred *metrics* (accuracy).

- *categorical crossentropy loss function*: Computes the crossentropy loss between the labels and predictions. Use this crossentropy loss function when there are two or more label classes (in our case we have 10 classes, digits 0-9).

- *adam optimizer*: Optimizer that implements the Adam algorithm. Adam optimization is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments.

- *accuracy metric*: Calculates how often predictions equal labels. Commonly used for classification models, to help determine model performance.

Next, we use the ***fit*** function to fit the model on the train x and y data. Here, we specify the following:

- *batch size = 128*

    Number of samples per gradient update.

- *epochs = 15*

    Number of epochs to train the model. An epoch is an iteration over the entire x and y data provided.

- *validation split = 0.1*

    Float between 0 and 1. Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. The validation data is selected from the last samples in the x and y data provided, before shuffling.



In [4]:
batch_size = 128
epochs = 15

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x7fd20578be50>

## Evaluate the trained model

Now that we've trained the model on the data, we can use the ***evaluate*** function to see how well the model perfoms (i.e. how well it correctly classifies the images to its corresponding digit). Note here that we fit the model above on the train data, and we will evaluate here on the test data. We will print the test loss and accuracy, and use these metrics to help us determine if this is a good model or not.



In [5]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

Test loss: 0.024533724412322044
Test accuracy: 0.9914000034332275


We see that the model predicted (classified) the images with over **99% accuracy**. This is really good, and suggests that this model is good to use to classify the MNIST images to their corresponding digits. 