<a href="https://colab.research.google.com/github/FernandoZR83/ANN_DL_ML/blob/master/MNIST_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep Neural Network for MNIST Classification

We'll apply all the knowledge from the lectures in this section to write a deep neural network. The problem we've chosen is referred to as the "Hello World" for machine learning because for most students it is their first example. The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional networks. The dataset provides 28x28 images of handwritten digits (1 per image) and the goal is to write an algorithm that detects which digit is written. Since there are only 10 digits, this is a classification problem with 10 classes. In order to exemplify what we've talked about in this section, we will build a network with 2 hidden layers between inputs and outputs.

## Import relevant libraries

In [0]:
#!pip install -q tensorflow-gpu==2.0.0-beta1

In [18]:
!pip show tensorflow

Name: tensorflow
Version: 1.14.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /usr/local/lib/python3.6/dist-packages
Requires: numpy, termcolor, protobuf, gast, tensorflow-estimator, six, wrapt, keras-applications, grpcio, absl-py, keras-preprocessing, google-pasta, astor, tensorboard, wheel
Required-by: stable-baselines, magenta, fancyimpute


In [0]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds


 TensorFLow includes a data provider for MNIST that we'll use.
 This function automatically downloads the MNIST dataset to the chosen directory. 
 The dataset is already split into training, validation, and test subsets. 
 Furthermore, it preprocess it into a particularly simple and useful format.
 Every 28x28 image is flattened into a vector of length 28x28=784, where every value
 corresponds to the intensity of the color of the corresponding pixel.
 The samples are grayscale (but standardized from 0 to 1), so a value close to 0 is almost white and a value close to
 1 is almost purely black. This representation (flattening the image row by row into
 a vector) is slightly naive but as you'll see it works surprisingly well.
 Since this is a classification problem, our targets are categorical.
 Recall from the lecture on that topic that one way to deal with that is to use one-hot encoding.
 With it, the target for each individual sample is a vector of length 10
 which has nine 0s and a single 1 at the position which corresponds to the correct answer.
 For instance, if the true answer is "1", the target will be [0,0,0,1,0,0,0,0,0,0] (counting from 0).
 Have in mind that the very first time you execute this command it might take a little while to run
 because it has to download the whole dataset. Following commands only extract it so 
 they're faster.

In [20]:
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)



In [25]:
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']
#Now we take the validation dataset becuase tensorflow does not contain validation data set
num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)
num_test_samples = 0.1 * mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)

def scale(image, label):
  image = tf.cast(image, tf.float32)
  image /= 255. 
  #Dot at the end indicates we want it to be flat even if divided by 255
  return image, label
#dataset.map(*function*) applies a custom transformation to a given dataset , it takes as input a funciotn which determines the transformation
scaled_train_and_validation_data = mnist_train.map(scale)
test_data = mnist_test.map(scale)

BUFFER_SIZE = 10000

shuffled_train_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

validation_data = shuffled_train_validation_data.take(num_validation_samples)
train_data = shuffled_train_validation_data.skip(num_validation_samples)

#Hyperparameter
batch_size = 100

#dataset.batch(BATCH_SIZE) a method that combines the consecutive elements of a set  into batches

train_data = train_data.batch(batch_size)
validation_data = validation_data.batch(num_validation_samples)
test_data = test_data.batch(num_test_samples)

validation_inputs, validation_targets = next(iter(validation_data))

validation_inputs
validation_targets

<tf.Tensor 'IteratorGetNext_2:1' shape=(?,) dtype=int64>

## Model

Outline the model

In [0]:
input_size = 784
output_size = 10 #one per digit
hidden_layer_size = 50 #50 nodes per layer
#tf.keras.sequential is a function used to "stack layers"
#Our model's name is model
model = tf.keras.Sequential([
    #we need to flat images to get them a vector
    #First line in sequential function is used to delcare our input layer
    tf.keras.layers.Flatten(input_shape = (28,28,1)),
    #tf.keras.layers.Dense(output_size) takes the inputs, provided to the model and calculates the dot product of the
    #inputs and the weights and adds the bias.
    #This is also where we can apply an activation function
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    #We create the second hidden layer the same way
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    #Output layer
    tf.keras.layers.Dense(output_size, activation = 'softmax')

])

Choose the optimizer and the loss function

In [38]:
#model.compile(optimizer, loss, metrics) configures the model for training

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics = ['accuracy'])
model.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_4 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_12 (Dense)             (None, 50)                39250     
_________________________________________________________________
dense_13 (Dense)             (None, 50)                2550      
_________________________________________________________________
dense_14 (Dense)             (None, 10)                510       
Total params: 42,310
Trainable params: 42,310
Non-trainable params: 0
_________________________________________________________________


Training

In [37]:
#Choose number of epochs
NUM_EPOCHS = 5

model.fit(train_data, epochs = NUM_EPOCHS, verbose=1)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f1bbdcf7eb8>

In [42]:
#to store in variables test loss and test accuracy, we write:
test_loss, test_accuracy = model.evaluate(test_data)

     10/Unknown - 3s 255ms/step - loss: 0.1194 - acc: 0.9636

In [47]:
print('Test loss: {0:.2f}, Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100))

Test loss: 0.12, Test accuracy: 96.36%
