## Classification Exercises

For these exercises use the GPU in Google Colab. To enable GPU go to top menu bar in **EDIT** menu go to **NoteBook Settings**. Once you click it a window opens, in the hardware accelerator dropdown menu choose GPU. 

![alt](https://drive.google.com/uc?id=1rZf9pvb5rqY4rFwYqUhdmPkSrzaXBhPg)

### Introduction

We have already learned about Neural Networks and discussed Multilayered Perceptrons in depth. In this exercise, we will be testing our understanding of the underlying concepts with special emphasis to [Hyperparameter tuning](https://towardsdatascience.com/understanding-hyperparameters-and-its-optimisation-techniques-f0debba07568). 

After doing these exercises, you would be able to better understand:

* The architecture of a neural network
* The parameters (training) of a neural network and how they change with changing architecture.
* Hyperparameter tuning: batch size, number of hidden units and optimizers.

We encourage you to work with other hyperparameters as well like learning rate, number of layers, activation functions etc.  And in the end there is an optional exercise, where you can see if what you observe for the MNIST dataset is true for other dataset as well.

The Notebook is divided in three parts: Building the Model, Reading the dataset and Hyperparameters. It contains five exercises in total and one additional optional exercise:

* [Exercise 1](#ex_1)
* [Exercise 2](#ex_2)
* [Exercise 3](#ex_3)
* [Exercise 4](#ex_4)
* [Exercise 5](#ex_5)
* [Optional Exercise](#ex_O)


You have to do all the five exercises. Run the code given with each exercise and write down your answer just below each exercise. Wish you all the best.


### Part 1: Building the model
Below we define a function to built a neural network model using TensorFlow Keras. 

In [1]:
import tensorflow as tf
import numpy as np
from tensorflow import keras

def built_model(input_shape, n_hidden, nb_classes, optimizer='SGD'):
  '''
  The function builds a fully connected neural network with two hidden layers
  Arguments:
  input_shape: The number of inputs to the neural network
  n_hidden: Number of hidden neurons in the hidden layers
  nb_classes: Number of neurons in the output layer
  optimizer: The optimizer used to train the model. 
  By default we use Stochastic Gradient Descent.
  
  Returns:
  The function returns A model with loss and optimizer defined
  '''  
  model = tf.keras.models.Sequential()
  ## First Hidden layer  
  model.add(keras.layers.Dense(n_hidden,
       input_shape=(input_shape,),
       name='dense_layer', activation='relu'))
    
  ## Second Hidden Layer
  model.add(keras.layers.Dense(n_hidden,
        name='dense_layer_2', activation='relu'))
    
  ## Output Layer  
  model.add(keras.layers.Dense(nb_classes,
        name='dense_layer_3', activation='softmax'))
    
  ## Define loss and optimizer 
  model.compile(optimizer=optimizer, 
              loss='categorical_crossentropy',
              metrics=['accuracy'])
  return model


<a id='ex_1'></a>
**Exercise 1** What should be the values of the arguments `INPUT_SHAPE`: the number of input units, `N_HIDDEN`: the number of hidden units, and `NB_CLASSES`: the number of output units, if we want to build a model using `built_model` function with the specifications given in the figure:

![](https://drive.google.com/uc?id=1pcj2sHJK6CmhMjUo43AMNBxnU4ixQne3)



To build this network we used TensorFlow Keras `plot_model` function available in `utils` model. You can learn more about the function from [TensorFlow docs](https://www.tensorflow.org/api_docs/python/tf/keras/utils/plot_model). 

In [2]:
# Task to do
INPUT_SHAPE = 5
N_HIDDEN = 10
NB_CLASSES = 2 


## Do not change anything below
assert(INPUT_SHAPE == 5), "Input shape incorrect"
assert(N_HIDDEN == 10), "Number of hidden neurons incorrect"
assert(NB_CLASSES == 2), "Number of output units incorrect"

In [3]:
model = built_model(INPUT_SHAPE, N_HIDDEN,NB_CLASSES)

<a id='ex_2'></a>
**Exercise 2** Based on the input, hidden and output units what are the total number of trainable parameters in this model?

In [4]:
# Task to do
trainable_parameters = (2*10+1*2)+(10*10+1*10)+(5*10+1*10)  # n_parameters = (output*input)+1*output

## Do not change anything below
assert trainable_parameters==model.count_params(), "Your answer is incorrect"
print("Number of trainable parameters in the model are", trainable_parameters)

Number of trainable parameters in the model are 192


Good work! Let us now visualize the summary of the model created. 

In [5]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_layer (Dense)          (None, 10)                60        
_________________________________________________________________
dense_layer_2 (Dense)        (None, 10)                110       
_________________________________________________________________
dense_layer_3 (Dense)        (None, 2)                 22        
Total params: 192
Trainable params: 192
Non-trainable params: 0
_________________________________________________________________


### Part 2: Reading the dataset

We will continue with the MNIST dataset. 

###### Just run the cells in this part of the notebook. Do not change anything.

In [6]:
mnist = keras.datasets.mnist
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [7]:
# Processing the data
assert(len(X_train.shape)==3), "The input data is not of the right shape"
RESHAPED = X_train.shape[1]*X_train.shape[2]

X_train = X_train.reshape(60000, RESHAPED)
X_test = X_test.reshape(10000, RESHAPED)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

In [8]:
# Data Normalization
X_train, X_test = X_train / 255.0, X_test / 255.0
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

60000 train samples
10000 test samples


For the MNIST dataset the number of input and number of output units are fixed. However we can choose different values of hidden units. 

In [9]:
INPUT_SHAPE = RESHAPED
NB_CLASSES = len(set(Y_train))

In [10]:
# one-hot encode
Y_train = tf.keras.utils.to_categorical(Y_train, NB_CLASSES)
Y_test = tf.keras.utils.to_categorical(Y_test, NB_CLASSES)

### Part 3: Hyperparameters

<a id='ex_3'></a>
**Exercise 3:** The aim of this exercise is to understand the affect of changing number of hidden units on the model performance. Change the number of hidden units, and train the model. Compare the model performance in terms of accuracy. What do you understand from this?

**Answer** 
I tried a set of five different values `[32,64,128,256,512]` and the accuracy has increased with the increase of the number of hidden units, it ranged from 95.4% to 96.9%. It seems that the model's accuracy increases with the network's complexity.

In [11]:
# Task to do choose different values for number of hidden units (minimum five different values)
N_HIDDEN = 512

In [12]:
## Do not change anything below
model = built_model(INPUT_SHAPE,N_HIDDEN, NB_CLASSES)
history = model.fit(X_train, Y_train,
		batch_size=128, epochs=50,
		verbose=1, validation_split=0.2)

# Evaluate the model
test_loss, test_acc = model.evaluate(X_test, Y_test)
print('Test accuracy: {:.2f} %'.format(test_acc*100))

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Test accuracy: 96.92 %


<a id='ex_4'></a>
**Exercise 4:** Let us now repeat the same after changing the batch size (minimum 5 different values). Compare the model performance in terms of accuracy. What do you understand from this?

**Answer** I tried a set of five different values `[32,64,128,256,512]` and the accuracy has decreased with the increase of batch size, it ranged from 97.8% to 92.9%.

In [13]:
# Task to do choose different values for batch size (minimum five different values)
BATCH_SIZE = 512

In [14]:
## Do not change anything below
model = built_model(INPUT_SHAPE,128, NB_CLASSES)
history = model.fit(X_train, Y_train,
		batch_size=BATCH_SIZE, epochs=50,
		verbose=1, validation_split=0.2)
# Evaluate the model
test_loss, test_acc = model.evaluate(X_test, Y_test)
print('Test accuracy: {:.2f} %'.format(test_acc*100))

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Test accuracy: 92.97 %


<a id='ex_5'></a>
**Exercise 5:** And now we do the same with different [optimizers](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers) available in TensorFlow. Change the optimizers and compare the model performance in terms of accuracy. What do you understand from this?

**Answer** I used a set of three different optimizers `[SGD, RMSProp, Adam]` and their accuracy was 96.3%, 97.9% and 98%. Adam and RMSProp provided slightly more accurate results than SGD.

In [15]:
# Task to do choose different optimizers
opt = 'Adam'

In [16]:
## Do not change anything below
N_HIDDEN = 128
model = built_model(INPUT_SHAPE,N_HIDDEN, NB_CLASSES, opt)
history = model.fit(X_train, Y_train,
		batch_size=128, epochs=50,
		verbose=1, validation_split=0.2)

# Evaluate the model
test_loss, test_acc = model.evaluate(X_test, Y_test)
print('Test accuracy: {:.2f} %'.format(test_acc*100))

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Test accuracy: 98.20 %


<a id='ex_O'></a>
### Optional Exercise: Fashion MNIST

Repeat the above exercises (3-5) with different dataset. You can use Fashion MNIST another popular ML dataset. Are the results same? Comment.

To download fashion mnist you can use the following code:

```
fashion_mnist = keras.datasets.fashion_mnist

(X_train, Y_train), (X_test, Y_test) = fashion_mnist.load_data()
```

In [17]:
# Import data
fashion_mnist = keras.datasets.fashion_mnist
 
(X_train, Y_train), (X_test, Y_test) = fashion_mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


In [18]:
# Processing the data
assert(len(X_train.shape)==3), "The input data is not of the right shape"
RESHAPED = X_train.shape[1]*X_train.shape[2]

X_train = X_train.reshape(60000, RESHAPED)
X_test = X_test.reshape(10000, RESHAPED)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

# Data Normalization
X_train, X_test = X_train / 255.0, X_test / 255.0
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

INPUT_SHAPE = RESHAPED
NB_CLASSES = len(set(Y_train))

# one-hot encode
Y_train = tf.keras.utils.to_categorical(Y_train, NB_CLASSES)
Y_test = tf.keras.utils.to_categorical(Y_test, NB_CLASSES)

60000 train samples
10000 test samples


In [19]:
# N_HIDDEN

LIST_N_HIDDEN = [32,64,128,256,512]
N_HIDDEN_TEST_ACC = []

for i in LIST_N_HIDDEN:
  model = built_model(INPUT_SHAPE,i, NB_CLASSES)
  history = model.fit(X_train, Y_train,
      batch_size=128, epochs=50,
      verbose=-1, validation_split=0.2)

  # Evaluate the model
  test_loss, test_acc = model.evaluate(X_test, Y_test)
  print('Test accuracy: {:.2f} %'.format(test_acc*100))
  N_HIDDEN_TEST_ACC.append(test_acc*100)

N_HIDDEN_TEST_ACC

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Test accuracy: 86.02 %
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50

[86.01999878883362,
 86.64000034332275,
 86.44000291824341,
 86.59999966621399,
 86.23999953269958]

Likewise the prior example, an increase of the number of hidden units has promoted an increase of the model's accuracy

In [20]:
# BATCH_SIZE

BATCH_SIZE_LIST = [32,64,128,256,512]
BATCH_SIZE_TEST_ACC = []

for j in BATCH_SIZE_LIST:
  model = built_model(INPUT_SHAPE,128, NB_CLASSES)
  history = model.fit(X_train, Y_train,
      batch_size=j, epochs=50,
      verbose=-1, validation_split=0.2)
  # Evaluate the model
  test_loss, test_acc = model.evaluate(X_test, Y_test)
  print('Test accuracy: {:.2f} %'.format(test_acc*100))
  BATCH_SIZE_TEST_ACC.append(test_acc*100)

BATCH_SIZE_TEST_ACC

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Test accuracy: 88.02 %
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50

[88.02000284194946,
 87.58999705314636,
 86.82000041007996,
 84.7100019454956,
 84.27000045776367]

As in the previous exercise, the accuracy of the model has decreased with the increase of batch size.

In [21]:
# OPTIMIZER

opt_list = ['SGD', 'RMSProp', 'Adam']
opt_test_acc = []

for k in opt_list:
  N_HIDDEN = 128
  model = built_model(INPUT_SHAPE,N_HIDDEN, NB_CLASSES, k)
  history = model.fit(X_train, Y_train,
      batch_size=128, epochs=50,
      verbose=-1, validation_split=0.2)

  # Evaluate the model
  test_loss, test_acc = model.evaluate(X_test, Y_test)
  print('Test accuracy: {:.2f} %'.format(test_acc*100))
  opt_test_acc.append(test_acc*100)

opt_test_acc

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Test accuracy: 86.49 %
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50

[86.489999294281, 88.20000290870667, 88.96999955177307]

As previously observed, SGD was overperformed by Adam and RMSProp and, once again, Adam has proved to be the optimizer getting the most accurate results.

In conclusion, we can observe that the assumptions previously made were all verified.