<a href="https://colab.research.google.com/github/AndreNasci/TinyML/blob/main/IESTI01_List_3_TF_Exploring_DNN_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exploring DNN learning with TensorFlow

In this assignment we'll dive a little deeper with a series of hands on exercises to better understand DNN learning with Tensorflow. Remember that I could be asking you questions about this assignment in the Quiz!

We will work with the [Fashion MNIST Dataset](https://www.tensorflow.org/api_docs/python/tf/keras/datasets/fashion_mnist/load_data). This is a dataset of 60,000 28x28 grayscale images of 10 fashion categories, along with a test set of 10,000 images. This dataset can be used as a drop-in replacement for MNIST. 

We will define a possible model for you start working:

In [26]:
import tensorflow as tf

# Load in fashion MNIST
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()

# Define the base model
model = tf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape=(28,28)), 
                                    tf.keras.layers.Dense(512, activation=tf.nn.relu), 
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

# 512 neurônios na camada 2 (dense layer), com f_ativ 'relu'
# 10 neurônios na camada de saída, com f_ativ 'softmax'
# ML Supervisionado: possui labels/tutor
# DNN: Dense Neural Network ou Feed Foward NN: todos os neurônios da camada
# anterior se conectam com todos os neurônios da camada seguinte
# Classificador: gera uma predição em classe (10 classes = 10 neurônios na
# camada se saída + f_ativ na saída)

Neural Networks learn the best when the data is scaled / normalized to fall in a constant range. One practitioners often use is the range [0,1]. How might you do this to the training and test images used here?

*A hint: these images are saved in the standard [RGB](https://www.rapidtables.com/web/color/RGB_Color.html) format*

**R:** We can normalize the data to [0,1] range by dividing it by 255.

In [27]:
# Normalizando os valores de cada pixel para escala [0-1]
training_images  = training_images / 255.0
test_images = test_images / 255.0

Using these improved images lets compile our model using an adaptive optimizer to learn faster (`Adam`) and a categorical loss function (`sparse_categorical_crossentropy`) to differentiate between the the various classes we are trying to classify. Since this is a very simple dataset we will only train for 5 epochs.

In [28]:
# compile the model
model.compile(optimizer = tf.keras.optimizers.Adam(),
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy']) 
# metrics = metrics to be evaluated by the model during
# training and testing (evaluated = avaliado)

# fit (treina) the model to the training data
model.fit(training_images, training_labels, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f135e3566a0>

In [29]:
# test the model on the test data
model.evaluate(test_images, test_labels)



[0.331129252910614, 0.8817999958992004]

Once it's done training -- you should see **an accuracy value at the end of the final epoch**. It might look something like 0.8648. This tells you that your neural network is about 86% accurate in classifying the training data. I.E., it figured out a pattern match between the image and the labels that worked 86% of the time. But how would it work with unseen data? That's why we have the test images. We can call ```model.evaluate```, and pass in the two sets, and it will report back the loss for each. This should reach about .8747 or thereabouts, showing about 87% accuracy. Not Bad!

But what did it actually learn? If we inference on the model using ```model.predict``` we get out the following list of values. **What does it represent?**

*A hint: trying running ```print(test_labels[0])```*

**R:** It represents the predictions for each image in the test data subset. Each prediction is in a different index.

In [30]:
# Inference on the model
classifications = model.predict(test_images)
print(classifications[0]) # this is the prediction for test_images[0]

[1.9207448e-06 7.0294504e-08 1.2305502e-08 4.7650538e-08 3.1719107e-08 1.1690886e-02 1.2238793e-06 2.7363410e-02 5.5423106e-07 9.6094185e-01]


In [31]:
print(test_labels[0]) # this is the label for test_images[0]

9


Let's now look at the layers in your model. What happens if you double the number of neurons in the dense layer. What different results do you get for loss, training time etc? Why do you think that's the case? 

**R:** I expected a longer training time and slightly worse results because of the increase in the number of parameters. However, despite the longer time of trainnig, we achieved a slighly better accuracy. 

In [32]:
NUMBER_OF_NEURONS = 512 * 2

# define the new model
model = tf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape=(28,28)),
                                    tf.keras.layers.Dense(NUMBER_OF_NEURONS, activation=tf.nn.relu),
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

# compile fit and evaluate the model again
model.compile(optimizer = tf.keras.optimizers.Adam(),
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(training_images, training_labels, epochs=5)
model.evaluate(test_images, test_labels)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


[0.34367525577545166, 0.8725000023841858]

Consider the effects of additional layers in the network instead of simply more neurons to the same layer. First update the model to add an additional dense layer into the model between the two existing Dense layers.

In [33]:
# Uma nova camada com 20 neurônios
YOUR_NEW_LAYER = tf.keras.layers.Dense(20, activation=tf.nn.relu)

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape=(28,28)),
                                    tf.keras.layers.Dense(512, activation=tf.nn.relu),
                                    YOUR_NEW_LAYER,
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

Lets then compile, fit, and evaluate our model. What happens to the error? How does this compare to the original model and the model with double the number of neurons?

**R:** Both error and accuracy are worse than the original model. However, we achieved better accuracy and smaller error than the model with twice the number of neurons. 

In [34]:
# compile fit and evaluate the model again
model.compile(optimizer = tf.keras.optimizers.Adam(),
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(training_images, training_labels, epochs=5)
model.evaluate(test_images, test_labels)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


[0.3417714536190033, 0.8797000050544739]

Before you trained, you normalized the data. What would be the impact of removing that? To see it for yourself fill in the following lines of code to get a non-normalized set of data and then re-fit and evaluate the model using this data.

**R:** I expected a slightly worse results. However, it surprises me that the non-normalized dataset gives us such a poor result regarding accuracy and error.

In [35]:
# get new non-normalized mnist data
training_images_non = training_images * 255
test_images_non = test_images * 255

# re-compile, re-fit and re-evaluate
model = tf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape=(28,28)),
                                    tf.keras.layers.Dense(512, activation=tf.nn.relu),
                                    YOUR_NEW_LAYER,
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])
model.compile(optimizer = tf.keras.optimizers.Adam(),
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(training_images_non, training_labels, epochs=5)
model.evaluate(test_images_non, test_labels)
classifications = model.predict(test_images_non)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
