# Exercises

### 2. The *depth* of the algorithm. Add another hidden layer to the algorithm. This is an extremely important exercise! How does the validation accuracy change? What about the time it took the algorithm to train?
Hint: Be careful with the shapes of the weights and the biases.

**Solution**

Adding another hidden layer to the algorithm **is done in the same way** as in the lecture.

We simply add a new line in Sequential:

tf.keras.layers.Dense(hidden_layer_size, activation='relu')

We can see that the accuracy of the model ***does not necessarily improve***. This is an important lesson for us. ***Fiddling with a single hyperparameter may not be enough***. Sometimes, a deeper net needs to also be wider in order to have higher accuracy. Maybe you need more epochs?

**ADDITIONAL TASK: Try this new model, but with a wider one (200-500 hidden units). Basically, combine this and the previous exercises**

In any case, it takes longer for the algorithm to train.

1.   2 hidden laywers  
    Epoch 1/5 - 11s - loss: 0.4105 - accuracy: 0.8822 val_accuracy: 0.9400  
    Epoch 5/5 - 4s - loss: 0.0940 - accuracy: 0.9714 val_accuracy: 0.9693
2.   3 hidden laywers    
    Epoch 1/5 - 12s - loss: 0.4181 - accuracy: 0.8779 val_accuracy: 0.9390  
    Epoch 5/5 - 4s - loss: 0.0902 - accuracy: 0.9727 val_accuracy: 0.9695  
3.   4 hidden laywers  
    Epoch 1/5 - 13s - loss: 0.3930 - accuracy: 0.8831 val_accuracy: 0.9385  
    Epoch 5/5 - 4s - loss: 0.0830 - accuracy: 0.9745 val_accuracy: 0.9723  
4.   5 hidden laywers    
    Epoch 1/5 - 13s - loss: 0.4242 - accuracy: 0.8753 val_accuracy: 0.9472  
    Epoch 5/5 - 4s - loss: 0.0916 - accuracy: 0.9715 val_accuracy: 0.9725    
5.   5 hidden laywers with hidden layer size = 500   
    Epoch 1/5 - 14s - loss: 0.2348 - accuracy: 0.9277 val_accuracy: 0.9657  
    Epoch 5/5 - 4s - loss: 0.0493 - accuracy: 0.9856 val_accuracy: 0.9832

# Deep Neural Network for MNIST Classification

The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs).

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image).

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes.

Our goal would be to build a neural network with 2 hidden layers.

## Import the relevant packages

In [1]:
import numpy as np
import tensorflow as tf

import tensorflow_datasets as tfds



## Data

That's where we load and preprocess our data.

In [2]:

mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)


def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.

    return image, label

scaled_train_and_validation_data = mnist_train.map(scale)
test_data = mnist_test.map(scale)


BUFFER_SIZE = 10000

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)


BATCH_SIZE = 100

train_data = train_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(num_validation_samples)
test_data = test_data.batch(num_test_samples)

validation_inputs, validation_targets = next(iter(validation_data))

## Model

### Outline the model
When thinking about a deep learning algorithm, we mostly imagine building the model. So, let's do it :)

In [3]:
input_size = 784
output_size = 10

hidden_layer_size = 50

model = tf.keras.Sequential([

    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # input layer

    # tf.keras.layers.Dense is basically implementing: output = activation(dot(input, weight) + bias)
    # it takes several arguments, but the most important ones for us are the hidden_layer_size and the activation function
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 3rd hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 4th hidden layer
    # tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 5th hidden layer

    # the final layer is no different, we just make sure to activate it with softmax
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])

### Choose the optimizer and the loss function

In [4]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training
That's where we train the model we have built.

In [5]:
NUM_EPOCHS = 5

model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose =2)

Epoch 1/5
540/540 - 13s - loss: 0.3930 - accuracy: 0.8831 - val_loss: 0.1995 - val_accuracy: 0.9385 - 13s/epoch - 23ms/step
Epoch 2/5
540/540 - 5s - loss: 0.1579 - accuracy: 0.9531 - val_loss: 0.1359 - val_accuracy: 0.9600 - 5s/epoch - 9ms/step
Epoch 3/5
540/540 - 3s - loss: 0.1192 - accuracy: 0.9641 - val_loss: 0.1097 - val_accuracy: 0.9672 - 3s/epoch - 6ms/step
Epoch 4/5
540/540 - 3s - loss: 0.0976 - accuracy: 0.9697 - val_loss: 0.0969 - val_accuracy: 0.9723 - 3s/epoch - 6ms/step
Epoch 5/5
540/540 - 4s - loss: 0.0830 - accuracy: 0.9745 - val_loss: 0.0967 - val_accuracy: 0.9723 - 4s/epoch - 7ms/step


<keras.src.callbacks.History at 0x7ccba8f6af50>

## Test the model

In [6]:
test_loss, test_accuracy = model.evaluate(test_data)
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.11. Test accuracy: 96.81%


1.   2 hidden laywers  
    Epoch 1/5 - 11s - loss: 0.4105 - accuracy: 0.8822 val_accuracy: 0.9400  
    Epoch 5/5 - 4s - loss: 0.0940 - accuracy: 0.9714 val_accuracy: 0.9693
2.   3 hidden laywers    
    Epoch 1/5 - 12s - loss: 0.4181 - accuracy: 0.8779 val_accuracy: 0.9390  
    Epoch 5/5 - 4s - loss: 0.0902 - accuracy: 0.9727 val_accuracy: 0.9695  
3.   4 hidden laywers  
    Epoch 1/5 - 13s - loss: 0.3930 - accuracy: 0.8831 val_accuracy: 0.9385  
    Epoch 5/5 - 4s - loss: 0.0830 - accuracy: 0.9745 val_accuracy: 0.9723  
4.   5 hidden laywers    
    Epoch 1/5 - 13s - loss: 0.4242 - accuracy: 0.8753 val_accuracy: 0.9472  
    Epoch 5/5 - 4s - loss: 0.0916 - accuracy: 0.9715 val_accuracy: 0.9725    
5.   5 hidden laywers with hidden layer size = 500   
    Epoch 1/5 - 14s - loss: 0.2348 - accuracy: 0.9277 val_accuracy: 0.9657  
    Epoch 5/5 - 4s - loss: 0.0493 - accuracy: 0.9856 val_accuracy: 0.9832
6.   7 hidden laywers   
    Epoch 1/5 - 24s - loss: 0.2138 - accuracy: 0.9357 val_accuracy: 0.9687  
    Epoch 5/5 - 12s - loss: 0.0299 - accuracy: 0.9903 val_accuracy: 0.9843  







