# MNIST Model Training
#### Written by Ervin Mamutov - github/imervin

### Introduction
This is my jupyter notebook containing the code I used to train my Tensorflow model to predict hand drawn figures for 808. The model is trained and tested using the MNIST dataset and built using Keras with Python 3.

I will not be diving into deep explanation of the things I have already covered in my [other notebook](https://github.com/ImErvin/Tensorflow-Problem-Sheet/blob/master/IrisNotebook.ipynb), make sure to check that out before reading this through this.

I have adapted code from:

[1] https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py

[2] http://parneetk.github.io/blog/cnn-mnist/

### What is MNIST?

MNIST(Modified National Institute of Standards and Technology) is a sub data set of NIST(National Institute of Standards and Technology), a large database of handwritten digits. MNIST is used to train image processing systems and is basically the "hello world" of machine learning and computer vision.

MNIST contains 60,000 training images and 10,000 testing images. Training images are used to train a system, and testing images are used to test the trained system.

### Where does the MNIST dataset come from?

The set of images in the MNIST database is a combination of two of NIST's databases: Special Database 1 and Special Database 3. Special Database 1 and Special Database 3 consist of digits written by high school students and employees of the United States Census Bureau, respectively.[1]


### What is Tensorflow and Keras?
Tensorflow is a popular software library for dataflow programming across a range of tasks. Tensorflow is open-source and is developed by the Google Brain Team. Tensorflow is a symbolic math library and is also used for machine learning applications such as neaural networks [2]. I will be using Tensorflow's Python API but it is available for a range of languages.

Keras is an open source neural network library written in Python developed by a Google engineer: Francois Chollet. Keras acts like a "library on top of a library" as it is capable of running on top of MXNet, Deeplearning4j, Tensorflow, CNTK or Theano. Keras takes the functionality in core Tensorflow and adds a higher-level of abstraction to it, making it easier to experiment with deep neural networks [3].

### 1. Download the MNIST dataset
Before I can start building my model, I must first get the MNIST dataset and decode it into a format that allows me to use it later on. Luckily MNIST is quite a popular dataset for machine learning and Keras comes with MNIST pre-built with the MNIST dataset.

The keras.datasets.mnist.load_data() produces 2 tuples:

    x_train, x_test: uint8 array of grayscale image data with shape (num_samples, 28, 28).
    y_train, y_test: uint8 array of digit labels (integers in range 0-9) with shape (num_samples,).

I will be renaming x_train, x_test to training_images, testing_images and y_train, y_test to training_labels and testing_labels but it will work the same as if I kept the names as x_train etc.

In [1]:
# Use keras's dataset mnist
from keras.datasets import mnist

# Initiate two tuples, the first with uint8 array of grayscale image data, the other with integers between 0-9
(training_images, training_labels), (testing_images, testing_labels) = mnist.load_data()

print("How many values in a training image?", len(training_images[7]) * len(training_images[7][0]))
print("What the 8th training image looks like:")
print(training_images[7])
print()
print("What the 8th training image looks like as a sequence of dots and hashes:")
# Visualize a training image by printing out it's RGB value as a #
for x in training_images[7]:
    print()
    for y in x:
        if(y != 0):
            print("#", end="")
        else:
            print(".", end="")

Using TensorFlow backend.


How many values in a training image? 784
What the 8th training image looks like:
[[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0  38  43 105 255 253 253 253
  253 253 174   6   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0  43 139 224 226 252 253 252 252 252
  252 252 252 158  14   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0 178 252 252 252 252 253 252 252 252
  252 252 252 252  59   0 

As you can see, the array's values are displayed as integers between 0-255. There are 784 values and each value represents a pixel in the picture, the integer represents that pixel's RGB Grayscale value between 0-255. 0 means black, anything above 0 is lighter than black.

### 2. Import relevant libraries
I will need to import numpy to use numpys extensive arrays, keras to create, train and test the model

In [2]:
import keras as kr #importing keras
import numpy as np #importing numpy

### 3. Preparing the data for training/testing

In [3]:
print("Training/testing Images shape:",training_images.shape,"/",testing_images.shape)
print("Training/testing Labels shape:",training_labels.shape,"/",testing_labels.shape)
print("First 5 training labels and testing labels:", training_labels[:5], "/", testing_labels[:5])

Training/testing Images shape: (60000, 28, 28) / (10000, 28, 28)
Training/testing Labels shape: (60000,) / (10000,)
First 5 training labels and testing labels: [5 0 4 1 9] / [7 2 1 0 4]


As you can see from the output above, the shape of the image arrays is (number_of_images, 28, 28), meaning that there are 28 rows and 28 columns of pixels representing each picture. The shape of the label arrays is (number_of_labels), these labels correspond to the number_of_images in the image arrays. 

To prepare my data I must consider what type of architecture my neural network will have.. 
1. Basic Neural Network
2. Convolutional Neural Network

After some research I found that the basic NN works fine but limits out at around 96-97% accuracy and convolution neaural networks(CNN) hit around 99% accuracy using the same effort as you hit the point of deminishing returns sooner with an NN. [4] 

### Convolution

From what I understand **Convolution Neaural Networks** are made for image recognition and classification purposes, inspired by the animal visual cortex. CNN works by taking an image of n x n image (MyImage) and applying a k x k filter (or convolvution kernel) then compute MyImage x Convolution Kernel by multiplying the matrices to result in a new matrice that makes up the image. Once we have a new matrice, we add a "pooling" layer which will take a chunk of the image and aggregate them into a single value (downsampling) [5].

Here are image representations taken from [5].

Convolving the image matrice
![CNN1](https://cambridgespark.com/content/tutorials/convolutional-neural-networks-with-keras/figures/convolve.png)
(I = MyImage and K = Convolution Kernel)

Maxpooling an chunk of the image
![CNN2](https://cambridgespark.com/content/tutorials/convolutional-neural-networks-with-keras/figures/pool.png)

### Pre-processing arrays

As I'm building a CNN, I will need to reshape my data to add a "depth" dimension. A full image with all 3 RGB channels will have a depth of 3, however the mnist images only have a depth of 1 (grayscale). This means I must take the shape of the image arrays (number_of_images, 28, 28) and turn it into (number_of_images,28,28,1) [6]. Numpys .reshape function allows me to give an array a new shape without changing it's data. [7] I will also be turning the data into floats as for the normalisation of the data.

In [4]:
training_images_rs = training_images.reshape(training_images.shape[0],28,28,1).astype("float32")
print("Training images shape after reshape =",training_images.shape)
testing_images_rs = testing_images.reshape(testing_images.shape[0],28,28,1).astype("float32")
print("Testing images shape after reshape =",testing_images_rs.shape)

Training images shape after reshape = (60000, 28, 28)
Testing images shape after reshape = (10000, 28, 28, 1)


Now that I have added a depth channel to the shape and turned every value into a float, I can normalize the data.
A common method of normalizing the data so that each pixel's value is between 0 and 1 is dividing each value by the maximum value that it can be (in this case 255). Another method is dividing everything by the largest value present in the dataset. In the MNIST case, there are values that are 255 and so you can hardcode it in, otherwise you could use np.max to return the largest value in the array.

In [5]:
print("Maximum value in our training/testing images =", np.max(training_images_rs),np.max(testing_images_rs))
print("Maximum value in training images before normalization:", np.max(training_images_rs))

training_images_rs /= 255
testing_images_rs /= 255

print("Maximum value in training images after normalization:", np.max(training_images_rs))

Maximum value in our training/testing images = 255.0 255.0
Maximum value in training images before normalization: 255.0
Maximum value in training images after normalization: 1.0


The MNSIT problem is a classification problem. This means there are multiple classes than an image can be but it cannot be two classes at once, i.e the hand drawn image of number 1 can only be 1 so it's precentage of being 0,2..9 is 0%. This can be tackled using probabilistic classification by producing a single output neuron for each class, outputting a value which corresponds to the probability of the input being of that particular class. This means that I will need to change the labels of the images (outputs) into "one-hot" encoding. 

One-hot encoding is turning a vector class intro a binary matrix for the number of classes, e.g 5 classes would be represented as [0,0,0,0,0], if each class is a fruit: an orange, banana, apple, peach and pear. A peach would be represented as [0,0,0,1,0] using one-hot encoding. This will also be used for the loss function later.

Keras comes with a utility that does exactly that. The utils.to_categorical utility converts a class vector to binary class matrix [8].

In [6]:
training_labels_cats = kr.utils.to_categorical(training_labels,  num_classes=10)
testing_labels_cats = kr.utils.to_categorical(testing_labels,  num_classes=10)

print("The figure",training_labels[500],"is represented as",training_labels_cats[500],"after one-hot encoding")

The figure 3 is represented as [ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.] after one-hot encoding


### Building the Model
Now I can begin building my Model. I will be using Keras's sequential model which is a stack of layers. I will be using the Convolution2d layers accompanied by activation, maxpooling and hidden layers. The steps involved in building my CNN are:
1. Apply 3x3 Colvolution Kernel to 28x28 MNIST image with reLU Activation
2. Apply 3x3 Colvolution Kernel to the image produced in 1. with reLU Activation
3. Apply a maxpool of 2x2 to produce a downsampled image
4. Flatten the dimensions of the image produced in 3.
5. Preform basic NN operation on the image produced in 4. by adding a Dense layer with reLU Activation
6. Create an output layer with 10 classes and use the Softmax Activation

*Explanations for the activation functions and dense layers found in my [other notebook](https://github.com/ImErvin/Tensorflow-Problem-Sheet/blob/master/IrisNotebook.ipynb)*

In [41]:
# Import all necessary libraries for models
from keras.models import Sequential
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.layers import Activation, Flatten, Dense

# Stacked layer model
model = Sequential()
# Add convolution2d layer 32 inputs, kernel size of 3x3 - activation = relu
model.add(Convolution2D(32, kernel_size=(3, 3),activation='relu',input_shape=(28, 28,1 )))
print("Model Input Shape @ Conv Layer:",model.input_shape)
print("Model Output Shape @ Conv Layer:",model.output_shape,"<--- Changed by Convolution layer")
# Add another convolution2d layer 32 inputs, kernel size of 3x3 - activation = relu
model.add(Convolution2D(32, kernel_size=(3, 3),activation='relu'))
print("Model Input Shape @ Conv Layer 2:",model.input_shape)
print("Model Output Shape @ Conv Layer 2:",model.output_shape,"<--- Changed by Convolution layer")
# Add a MaxPooling2D pooling layer with pool size of 2x2
model.add(MaxPooling2D(pool_size=(2, 2)))
print("Model Input Shape @ Pooling layer:",model.input_shape)
print("Model Output Shape @ Pooling layer:",model.output_shape,"<--- Changed by Max Pool")
# Flatten the data - (,12,12,32) into (,4608) - multiply each dimension to create 1 dimension only.
model.add(Flatten())
print("Model Input Shape @ Flatten layer:",model.input_shape)
print("Model Output Shape @ Flatten layer:",model.output_shape,"<--- Changed by Flatten")
# Add a hidden layer to process flattened image with relu Activation
model.add(Dense(128, activation="relu"))
# Add an output layer with softmax activation
model.add(Dense(10, activation="softmax"))

model.summary()

Model Input Shape @ Conv Layer: (None, 28, 28, 1)
Model Output Shape @ Conv Layer: (None, 26, 26, 32) <--- Changed by Convolution layer
Model Input Shape @ Conv Layer 2: (None, 28, 28, 1)
Model Output Shape @ Conv Layer 2: (None, 24, 24, 32) <--- Changed by Convolution layer
Model Input Shape @ Pooling layer: (None, 28, 28, 1)
Model Output Shape @ Pooling layer: (None, 12, 12, 32) <--- Changed by Max Pool
Model Input Shape @ Flatten layer: (None, 28, 28, 1)
Model Output Shape @ Flatten layer: (None, 4608) <--- Changed by Flatten
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_45 (Conv2D)           (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_46 (Conv2D)           (None, 24, 24, 32)        9248      
_________________________________________________________________
max_pooling2d_21 (MaxPooling (None, 12, 12, 32)        0         
___

Now that I have defined my model, I need to compile the model and train it! I will be using categorical cross entropy as the loss function, more details about why in my [other notebook](https://github.com/ImErvin/Tensorflow-Problem-Sheet/blob/master/IrisNotebook.ipynb), 

### References
[1] https://en.wikipedia.org/wiki/MNIST_database

[2] https://en.wikipedia.org/wiki/TensorFlow

[3] https://en.wikipedia.org/wiki/Keras

[4] https://datascience.stackexchange.com/questions/22173/why-not-use-more-than-3-hidden-layers-for-mnist-classification

[5] https://cambridgespark.com/content/tutorials/convolutional-neural-networks-with-keras/index.html

[6] https://elitedatascience.com/keras-tutorial-deep-learning-in-python

[7] https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.reshape.html

[8] https://keras.io/utils/

### End