<a href="https://colab.research.google.com/github/chadmh/Short-Hands-on-Tutorial-for-Deep-Learning-in-Tensorflow/blob/master/1_Introduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1.1 A Minimal Deep Network with MNIST

One of the classic machine learning problems is digit recognition using the MNIST dataset.  Various solutions can be found on the web, and this notebook loosely follows [a process described on the Tensorflow website](https://colab.research.google.com/github/tensorflow/datasets/blob/master/docs/keras_example.ipynb#scrollTo=nTFoji3INMEM).  MNIST contains 70000 images of the handwritten digits 0 - 9 along with the correct digit label for each image.  The dataset is widely available as a standard model testing benchmark and can be accessed within Tensorflow as follows:


In [None]:
# Import the needed tensorflow components
import tensorflow as tf
import tensorflow_datasets as tfds

# Load the MNIST dataset.  Load checks whether the dataset is locally available and downloads it from 
# its official repository at http://yann.lecun.com/exdb/mnist if it cannot be found.
(train, test), info = tfds.load('mnist',                  # Pick the MNIST dataset
                                 split=['train', 'test'], # Load both the training and testing parts of the dataset
                                 with_info=True,          # Generate summary information about the dataset
                                 as_supervised=True)      # return both the inputs and labels as a tuple

print(info.description)
print(info.splits)

The MNIST database of handwritten digits.
{'test': <tfds.core.SplitInfo num_examples=10000>, 'train': <tfds.core.SplitInfo num_examples=60000>}


As with any dataset, the data must be preprocessed prior to feeding it to whatever model will train on it.  In the case of MNIST, each input image is 28 pixels by 28 pixels with uint8 encoding (integer grayscale values from 0 to 255).  Each label, or, output answer is a numeric value between 0 and 9.

train and test are [Tensorflow Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset) objects.  Technically, Tensorflow does not attempt to load 70000 images at once but creates accessor objects that will iteratively load the needed images from disk as requested.  While this approach may seem overkill for a small dataset like MNIST, it is essential once the size of the dataset exceeds system memory.  Tensorflow is designed to efficiently process datasets that are terabytes or even petabytes in size.

The next step, therefore, is to create the data preprocessing pipeline and assign it to the train and test objects.  The preprocessing steps will then be applied by Tensorflow whenever the data are loaded.

In [None]:
# Define the data preprocessing pipeline.  For MNIST, the only needed preprocessing is to convert from unit8 to 
# float.  Other data sets are likely more extensive.
def preprocess_data(input, label):
  # Convert unit8 to real on [0, 1]
  input = tf.cast(input, tf.float32) / 255.0
  return input, label

# Assign the preprocessing pipeline to each dataset: train and test
train = train.map(preprocess_data)
test = test.map(preprocess_data)

# Tell each dataset how many images it will load at once for processing
BATCH_SIZE=128
train = train.batch(BATCH_SIZE)
test = test.batch(BATCH_SIZE)

train

<BatchDataset shapes: ((None, 28, 28, 1), (None,)), types: (tf.float32, tf.int64)>

Now that the datasets are set up for processing, it is time to design the model that will be trained.  For image problems like this, typically some form of convolutional neural network (CNN) is used; however, for this simple demonstration, let's just use a basic deep neural network.

In [None]:
# Specify a basic sequential neural network
model = tf.keras.models.Sequential([
                                   tf.keras.layers.Flatten(input_shape=(28, 28)), # Input Layer: Convert 28 x 28 image into 784 x 1 vector
                                   tf.keras.layers.Dense(20, activation='relu'),  # Hidden Layer: Define a layer with 20 neurons
                                   tf.keras.layers.Dense(10)                      # Output Layer: Define a layer with a slot for each digit.
                                   ])

# Define the optimizing algorithm and optimizing criteria for the model
model.compile(
              optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),              # Use the Adam optimizer to train the model
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), # Minimize the categorical crossentropy function in training
              metrics=[tf.keras.metrics.SparseCategoricalAccuracy()]                # Measure our overall accuracy to see how well we've trained
              )

model.summary()

Model: "sequential_20"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_20 (Flatten)         (None, 784)               0         
_________________________________________________________________
dense_40 (Dense)             (None, 20)                15700     
_________________________________________________________________
dense_41 (Dense)             (None, 10)                210       
Total params: 15,910
Trainable params: 15,910
Non-trainable params: 0
_________________________________________________________________


The above model is a three layer network with one input layer, one hidden layer, and one output layer.  The model is still simple enough to be mathematically represented with ease as

> (1.1) *y = w2 * ( ReLU( w1 * x + b1 ) ) + b2*

In this equation, x is a 784 x 1 vector, w1 is a 20 x 784 matrix, b1 is a 20 x 1 vector, w2 is a 10 x 20 matrix and b2 is a 10 x1 vector.  The output y is a 10 x 1 vector. Mathematically, there are 2 transformations connecting our 3 layers: A linear transformation with a nonlinear ReLU activation function that implements the input to hidden layer computation, and a second linear transformation that implements the hidden layer to output layer computation.  While neural networks are generally envisioned and designed in terms of artificial neuron layers, mathematically they are implemented as transformations between layers.

`tf.keras.layers.Flatten` converts the 28 x 28 pixel image into a 784 x 1 column vector suitable for matrix multiplication.  The dense layer has 20 neurons with ReLU activation and therefore mandates a linear tranformation from 784 elements to 20 elements (w1 becomes a 20 x 784 matrix) followed by setting all negative elements to zero (the ReLU function).  The final layer has 10 outputs (w2 becomes a 10 x 20 matrix), or one output for each digit.






In [None]:
model.fit(train, epochs=2, validation_data=test)

Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x7fa4a9733990>

After two epochs (rounds of training on the data), the model has an estimated accuracy of 92%.  Increasing the number of epochs will increase the accuracy up to around 99%.  Further improvements can be made by using a convolutional model that accounts for the spatial relationships of the image pixels rather than flattening it into a vector.

The above code only scratches the surface of deep neural networks, but in general, the process is similar:

1.   Access the data
2.   Clean and preprocess the data
3.   Design the model, choose the optimizer, and select the minimization criteria
4.   Fit the model to the data and evaluate performance.

Deep learning provides an elegant mechanism for solving complex pattern recognition tasks with minimal programming.  This allows the machine learning developer to focus on the more creative tasks of data cleaning and model design.