# Exercises

### 4. Fiddle with the activation functions. Try applying sigmoid transformation to both layers. The sigmoid activation is given by the method: tf.nn.sigmoid()

**Solution**

Find the part where we stack layers (Sequential()).

Adjust the activations from 'relu' to 'sigmoid'.
    
Generally, we should **reach an inferior solution. That is because relu 'cleans' the noise in the data** (think about it - if a value is negative, relu filters it out, while if it is positive, it takes it into account). For the MNIST dataset, we care only about the intensely black and white parts in the images of the digits, so such filtering proves beneficial.

**The sigmoid does not filter the signals as well as relu, but still reaches a respectable result (around 95%).**

**Try using softmax activations for all layers. How does the result change? Can you explain why that happens?**

The accuracy will reach an inferior solution and even worser than using 'sigmoid'

 The softmax function ***squashes input values to a probability distribution***, which means the output values sum to 1. **This property is suitable for the output layer **of a neural network where you want to obtain probabilities for ***different classes.***

 Loss of Information:
 Difficulty in Training
 Reduced Capacity for Representation
 Non-Negativity and Sum Constraint



1.   layers size 50 , 2 hidden layer, activation func = relu  
    Epoch 1/5 - 12s - loss: 0.4274 - accuracy: 0.8777 val_accuracy: 0.9323  
    Epoch 5/5 - 6s - loss: 0.0930 - accuracy: 0.9719 val_accuracy: 0.9748

2.   layers size 50 , 2 hidden layer, activation func = sigmoid  
    Epoch 1/5 - 13s - loss: 0.9970 - accuracy: 0.7859 val_accuracy: 0.9017  
    Epoch 5/5 - 4s - loss: 0.1702 - accuracy: 0.9510 val_accuracy: 0.9537

2.   layers size 50 , 2 hidden layer, activation func = softmax for all   
    Epoch 1/5 - 16s - loss: 2.1969 - accuracy: 0.3836 val_accuracy: 0.6665  
    Epoch 5/5 - 4s - loss: 0.7911 - accuracy: 0.6817 val_accuracy: 0.6863
  

# Deep Neural Network for MNIST Classification

The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs).

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image).

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes.

Our goal would be to build a neural network with 2 hidden layers.

## Import the relevant packages

In [1]:
import numpy as np
import tensorflow as tf

import tensorflow_datasets as tfds



## Data

That's where we load and preprocess our data.

In [2]:

mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)


def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.

    return image, label

scaled_train_and_validation_data = mnist_train.map(scale)
test_data = mnist_test.map(scale)


BUFFER_SIZE = 10000

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)


BATCH_SIZE = 100

train_data = train_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(num_validation_samples)
test_data = test_data.batch(num_test_samples)

validation_inputs, validation_targets = next(iter(validation_data))

## Model

### Outline the model
When thinking about a deep learning algorithm, we mostly imagine building the model. So, let's do it :)

In [3]:
input_size = 784
output_size = 10

hidden_layer_size = 50

model = tf.keras.Sequential([

    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # input layer

    # tf.keras.layers.Dense is basically implementing: output = activation(dot(input, weight) + bias)
    # it takes several arguments, but the most important ones for us are the hidden_layer_size and the activation function
    tf.keras.layers.Dense(hidden_layer_size, activation='softmax'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='softmax'), # 2nd hidden layer
    # tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 3rd hidden layer
    # tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 4th hidden layer
    # tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 5th hidden layer

    # the final layer is no different, we just make sure to activate it with softmax
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])

### Choose the optimizer and the loss function

In [4]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training


In [5]:
NUM_EPOCHS = 5

model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose =2)

Epoch 1/5
540/540 - 16s - loss: 2.1969 - accuracy: 0.3836 - val_loss: 1.9796 - val_accuracy: 0.6665 - 16s/epoch - 30ms/step
Epoch 2/5
540/540 - 4s - loss: 1.6254 - accuracy: 0.6652 - val_loss: 1.2954 - val_accuracy: 0.6722 - 4s/epoch - 7ms/step
Epoch 3/5
540/540 - 4s - loss: 1.1071 - accuracy: 0.6715 - val_loss: 0.9688 - val_accuracy: 0.6773 - 4s/epoch - 6ms/step
Epoch 4/5
540/540 - 4s - loss: 0.8946 - accuracy: 0.6766 - val_loss: 0.8309 - val_accuracy: 0.6845 - 4s/epoch - 7ms/step
Epoch 5/5
540/540 - 4s - loss: 0.7911 - accuracy: 0.6817 - val_loss: 0.7580 - val_accuracy: 0.6863 - 4s/epoch - 8ms/step


<keras.src.callbacks.History at 0x7a68fbfff970>

## Test the model


In [6]:
test_loss, test_accuracy = model.evaluate(test_data)
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.76. Test accuracy: 67.98%




1.   layers size 50 , 2 hidden layer, activation func = relu  
    Epoch 1/5 - 12s - loss: 0.4274 - accuracy: 0.8777 val_accuracy: 0.9323  
    Epoch 5/5 - 6s - loss: 0.0930 - accuracy: 0.9719 val_accuracy: 0.9748

2.   layers size 50 , 2 hidden layer, activation func = sigmoid  
    Epoch 1/5 - 13s - loss: 0.9970 - accuracy: 0.7859 val_accuracy: 0.9017  
    Epoch 5/5 - 4s - loss: 0.1702 - accuracy: 0.9510 val_accuracy: 0.9537

2.   layers size 50 , 2 hidden layer, activation func = softmax for all   
    Epoch 1/5 - 16s - loss: 2.1969 - accuracy: 0.3836 val_accuracy: 0.6665  
    Epoch 5/5 - 4s - loss: 0.7911 - accuracy: 0.6817 val_accuracy: 0.6863
  


