### **Goal:**
Test out a quick artificial neural network model for the MNIST dataset using a sequential model to construct a neural network that can classify the handwritten digits for the MNIST dataset. Technically for a neural network to be considered an implementation of Deep Learning there has to be roughly 3 or more node layers, so this application uses two sequential models; one that is considered Deep Learning and one that is not.

In [26]:
# imports
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# deep learning imports
import keras
from keras.layers import Dense
from keras.models import Sequential
from tensorflow.keras.utils import to_categorical # makes one-hot encoding easy
from keras.callbacks import EarlyStopping

# notebook settings
%matplotlib inline
pd.options.display.max_columns = 100

The data could have been loaded directly through keras. I chose not to do this as it is important to remember that the fit function here requires a one-hot encoded target set which the "to_categorical" function helps to achieve.

In [27]:
# load the MNIST dataset
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1)
mnist.keys()

dict_keys(['data', 'target', 'frame', 'categories', 'feature_names', 'target_names', 'DESCR', 'details', 'url'])

In [28]:
# set X and target matrices
X, y = mnist['data'], mnist['target'].astype('int')

target = to_categorical(y)
target

array([[0., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

Like when training any other model, we use a validation set to determine how well the model will perform on unseen data. With keras, that option is built in via the "validation_split" parameter. The trouble here is that there is not a way to stratify the validation sample which could affect model performance if there is a noticeable imbalance in the data. The following checks to see if there is an imbalance prior to running the model: 

In [29]:
pd.Series(y).value_counts(normalize=True)

1    0.112529
7    0.104186
3    0.102014
2    0.099857
9    0.099400
0    0.098614
6    0.098229
8    0.097500
4    0.097486
5    0.090186
dtype: float64

The proportions are roughly the same so stratified samples are not necessary.

Typically Deep Learning is applied when large amounts of training data are available, however it performs surprisingly well when trained on only 20% of the data and predicting on the other 80% as shown below:

In [30]:
# attempt to train on 20% of the daa and then predict on the rest

# build the model
model = Sequential()

# add early stopping 
early_stopping_monitor = EarlyStopping(patience=3)

# add the layers - rough guess to start
model.add(Dense(50, activation='relu', input_shape=(len(mnist['feature_names']),))) #minimum is zero so reLU would work nicely here
model.add(Dense(50, activation='relu'))

# add final layer
model.add(Dense(10, activation='softmax'))

# compile
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# fit and validate
model.fit(X, target, validation_split=.80, epochs=30, callbacks=[early_stopping_monitor])



Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30


<keras.callbacks.History at 0x1584ba580>

The model shows a validation accuracy of 91% for a 20/80 split of the data which is really good for a basic model trained on only 20% of the data. The following model shows the improvement from using a more proper split of the data (75/25).

In [31]:
# attempt to train on 20% of the data and then predict on the rest

# build the model
model = Sequential()

# add early stopping 
early_stopping_monitor = EarlyStopping(patience=3)

# add the layers - rough guess to start
model.add(Dense(50, activation='relu', input_shape=(len(mnist['feature_names']),))) #minimum is zero so reLU would work nicely here
model.add(Dense(50, activation='relu'))

# add final layer
model.add(Dense(10, activation='softmax'))

# compile
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# fit and validate
model.fit(X, target, validation_split=.25, epochs=30, callbacks=[early_stopping_monitor])


Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30


<keras.callbacks.History at 0x15859e550>

The model shows a validation accuracy of 95% for a 75/25 split of the data which is amazing for a basic model trained on only 20% of the data.

While the previous is a neural network, there are not enough layers for it to be considered a Deep Learning model. The following takes the model and increase the layers and nodes per layer to see if this enhances the accuracy of the model.

In [32]:
# build the model
model2 = Sequential()

# add the layers - rough guess to start
model2.add(Dense(75, activation='relu', input_shape=(len(mnist['feature_names']),))) #minimum is zero so reLU would work nicely here
model2.add(Dense(75, activation='relu'))
model2.add(Dense(75, activation='relu'))
model2.add(Dense(75, activation='relu'))

# add final layer
model2.add(Dense(10, activation='softmax'))

# compile
model2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# fit and validate
model2.fit(X, target, validation_split=.80, epochs=30, callbacks=[early_stopping_monitor])


Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30


<keras.callbacks.History at 0x1586cb6a0>

In [34]:
# build the model
model2 = Sequential()

# add the layers - rough guess to start
model2.add(Dense(75, activation='relu', input_shape=(len(mnist['feature_names']),))) #minimum is zero so reLU would work nicely here
model2.add(Dense(75, activation='relu'))
model2.add(Dense(75, activation='relu'))
model2.add(Dense(75, activation='relu'))

# add final layer
model2.add(Dense(10, activation='softmax'))

# compile
model2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# fit and validate
model2.fit(X, target, validation_split=.25, epochs=30, callbacks=[early_stopping_monitor])


Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30


<keras.callbacks.History at 0x1589743a0>

The new model while only increasing to 92% when training on 20% of the data, increased to 97% accuracy when training on a proper split (75/25).