## Model optimization by parameter hypertuning

In the previous example we used as model hyperparameters: 

* 1 hidden layer with 32 neurons 
* 1 hidden layer with 16 neurons 
* learning rate = 0.001 
* batch size = 16 
* nr of epochs = 20 

and obtained:
![](images/model_optimization.png)

#### Conclusion
With this initial model, we can see that the learning curves have not converged yet: the accuracy is still increasing and the loss decreasing. The initial number of epochs set at 20 was not long enough. Plus, the curves are very bumpy. To fix both these problems, we can increase the number of epochs to reach convergence, and increase the batch size to make the curves smoother.

### Exercise 1: increase number of epochs and batch size

Use new model parameters: 

* 1 hidden layer with 32 neurons
* 1 hidden layer with 16 neurons
* learning rate = 0.001
* batch size = 512
* nr of epochs = 50

Make the new plot: 

#### Conclusion

We can see that the shapes of the curves now look better. However, training accuracy perhaps can be improved with a more complex architecture. 
  
Our goal for now is to make the architecture powerful enough, so we will only look at the training accuracy and loss, and improve them the best as possible. We will explore both layer width and network depth. 

### Exercise 2: improve the architecture to make it powerful enough


The initial model had 2 hidden layers with 32 and 16 neurons each. We will first try to increase the number of neurons to 128 for the first layer and 64 for the second layer, and see how the model perfoms with a wider architecture.
Model:
* 1 hidden layer with 128 neurons
* 1 hidden layer with 64 neurons
* learning rate = 0.001
* batch size = 1024
* nr of epochs = 50

Make the new plot: 


#### Conclusion

We can see that the training accuracy has significantly increased to be extremely close to 1. The loss has also decreased to be very close to 0. Adding more neurons to the hidden layers has increased the number of trainable parameters. This explains the better performance. Even if we see that the model is overfitting, our goal is to increase performance on the training set.
We will try to increase the number of neurons to 512 and 256. We will also decrease learning rate to 0.0001.




### Exercise 3: increase number of neurons and decrease learning rate

Model:
* 1 hidden layer with 512 neurons
* 1 hidden layer with 256 neurons
* learning rate = 0.0001
* batch size = 512
* nr of epochs = 50

Make the new plot: 

#### Conclusion

As expected, training accuracy has well increased with double the number of neurons per hidden layer.

### Exercise 4: make network deeper

We will now try to make our network deeper. We will add a third hidden layer in order to try to improve our neural network even more.

Model:
* 1 hidden layer with 512 neurons
* 1 hidden layer with 256 neurons
* 1 hidden layer with 128 neurons
* learning rate = 0.0001
* batch size = 512
* nr of epochs = 50

Make the new plot: 

#### Conclusion

We will try to make our architecture even more complex by increasing the number of neurons.

### Exercise 5: increase number of neurons

Model:
* 1 hidden layer with 800 neurons
* 1 hidden layer with 400 neurons
* 1 hidden layer with 200 neurons
* learning rate = 0.0001
* batch size = 1024
* nr of epochs = 50

Make the new plot: 

#### Conclusion

With 3 hidden layers and even more neurons, we obtain a very powerful model that reaches perfect accuracy again on the training data. Learning convergence is also verified.
To speed up processing time, we can try to increase the learning rate back from 0.0001 to 0.001.

### Exercise 6: increase learning rate

Model:
* 1 hidden layer with 800 neurons
* 1 hidden layer with 400 neurons
* 1 hidden layer with 200 neurons
* learning rate = 0.001
* batch size = 1024
* nr of epochs = 50

Make the new plot: 

#### Conclusion

As expected, processing time is faster. The performance offered by this model is satisfying : very high accuracy and very low error on the training data.
We would like to try to deepen our architecture once again and add a fourth hidden layer to see how the performance would evolve.


### Exercise 7: further deepen architecture
Model:
* 1 hidden layer with 800 neurons
* 1 hidden layer with 400 neurons
* 1 hidden layer with 400 neurons
* 1 hidden layer with 200 neurons
* learning rate = 0.001
* batch size = 1024
* nr of epochs = 50

Make the new plot: 

#### Conclusion

The curves are very bumpy. We choose to keep the simpler model with 3 hidden layers.
Now that we have obtained an architecture that offers satisfying results on the training data, the previous model (3 hidden layers with 800, 400 and 200 neurons) will be kept. However we can notice a generalization gap between the training and validation accuracy curves. We will then add dropout as to prevent the model from overfitting.


### Exercise 8: add dropout 

First, we will add dropout to the hidden layers only. Dropout should help the model to generalize better on unseen data. Indeed, a certain fraction of neurons are switched off. As each node has a certain probability of being dropped out, each neuron is forced to be important on its own and rely less on the other neurons. That is the reason why dropout enables a reduced probability of overfitting.

We first choose a dropout probability of 0,1 which means that each node has a probability of 0,1 of being dropped out.


Model:
* 1 hidden layer with 800 neurons
* Dropout = 0.1
* 1 hidden layer with 400 neurons
* Dropout = 0.1
* 1 hidden layer with 200 neurons
* Dropout = 0.1
* learning rate = 0.0001
* batch size = 1024
* nr of epochs = 50


Make the new plot: 

#### Conclusion

With a dropout probability of 0,1, the validation loss is lower. 
