# Tensorflow Playground

The goal of this exercise is to let you gain some code free intuition on neural network models. For this purpose we will use a tool developped by Google called the tensorflow playground: it's a code-free user friendly interface that let's you confront fully-connected neural network models to toy dataset examples.

Before we start the exercise follow this <a href="https://playground.tensorflow.org/"> link </a> to enter the playground!

Here is a short video explaining you the different fucntionnalities of the playground:

Throughout the whole exercise, unless specifically asked to modify a setting leave them as they are.

## Part 1 : Classification

### The Circle

1. Set the DATA to Circle, keep 1 hidden layer with one neuron on it, set the Activation to linear and the learning rate to 0.03. Start training by clicking the play button, what happens ? Is this neural network equivalent to a model we have studied before ?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/config_1.PNG" />
</details>

2. Add more neurons on the hidden layer and start training again, do the results improve? why?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/config_2.PNG" />
</details>

3. Try adding additional hidden layers, will it change the results? why? What are we missing here?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/config_3.PNG" />
</details>

4. Let's go back to only one hidden layer with 2 neurons on it, but this time set the activation to ReLu, what changes?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/config_4.PNG" />
</details>

5. Increase the number of neurons on the hidden layer, what happens? What is the minimum number of neurons we can put on the hidden layer so we get acceptable predictions, can you understand why?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/config_5.PNG" />
</details>

6. Let's the problem more difficult, set noise to the max value, what happens to our data?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/config_6.PNG" />
</details>

7. Let's increase the number of neurons on the hidden layer to the maximum and start training, what happens?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/config_7.PNG" />
</details>

8. Let's now add a new hidden layer with the maximum number of neurons and start training, what happens? Continue increasing the number of hidden layers until you reach the limit, what happens? What is the phenomenom called?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/config_8.PNG" />
</details>

9. Try adding new features to this model, does it solve our problem?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/config_9.PNG" />
</details>

### The Spiral

1. Let's switch to a more difficult problem: the spiral, set DATA to spiral with noise 0, deactivate all features but $x_1$ and $x_2$, and keep only one hidden layer with 8 neurons and start training, what happens?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/spiral_1.PNG" />
</details>

2. Now try and add a second hidden layer in the network with 4, 6, then 8 neurons on it, do the results improve? What does it tell you about the effect of using multiple layers in the network as opposed to adding neurons on a layer?

<details>
 <summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/spiral_2a.PNG" />
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/spiral_2b.PNG" />
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/spiral_2c.PNG" />
</details>

3. With two hidden layers with 8 neurons on both of them you should be able to get OK prediction with the spiral, however the model seems to overfit! What could we try adding to the model to limit this overfitting without changing the architecture (the number of hidden layers and number of neurons)?

<details>
<summary>Spoiler</summary>
Try adding L2 regularization with rate of 0.01
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/spiral_3.PNG" />
</details>

4. Try adding noise to the data (30 for example) and start training the same model, does it still perform well?

5. What is the effect of adding new features to this model?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/spiral_5.PNG" />
</details>

### Regression

1. Try solving the plane regression problem on your own, what is the most simple architecture you can use to get excellent performance?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/reg_1.PNG" />
</details>

2. Switch the data to multi-gaussian, what is the simplest architecture you can find to get good predictions? (Under 0.03 for train and test loss)

### Sum it up

Try and summarize what we have learned intuitively here about neural networks :
* What happens if we use a linear activation function?
* What is the effect of adding neurons on a layer?
* What is the effect of adding hidden layers?
* In a model with several layers, is it more useful to add more neurons on the layers near the bottom or near the top?
* If the model overfits, what can we do to limit overfitting?
* Would you say that using neural network models compensates the need for feature engineering?
* When you use additional features to feed the model, do you need to use as many neurons and layers? Would adding more neurons and layers be an alternative to using additional features?

**What happens if we use a linear activation function?**
* Using a linear activation function results in a linear model and **does not take** advantage of the capabilities of neural networks.


**What is the effect of adding neurons on a layer?**
* Adding a neuron to a layer makes it possible for the model to **create an additional "feature"** on a given level of complexity.


**What is the effect of adding hidden layers?**
* Adding a hidden layer lets the model add one more level of non-linearity by applying one more activation function to the previous output, leading to exponentially complex outputs.


**In a model with several layers, is it more useful to add more neurons on the layers near the bottom or near the top?**
* It is more useful to increase the number of neurons towards the bottom because the complexity of the outputs of earlier neurons limit the complexity of the outputs of later neurons. It is generally good practice to have more neurons on bottom layers and progressively decrease the number of neurons going up the network.

**If the model overfits, what can we do to limit overfitting?**
* We can reduce the number of neurons and hidden layers in the network.
* We can also introduce regularization like Ridge (L2) or Lasso (L1)

**Would you say that using neural network models compensates the need for feature engineering?**
* It does, as a matter of fact, the outputs of the neurons in the network may be interpreted as new features that will be used by later neurons to make even more complex features leading to the final prediciton. 
* In addition to that, these "features" are build by neurons which parameters get optimized according to the loss function, so it creates features that are linked to the target variable without having to be explicitely coded!
* In a way it is great because feature engineering is difficult and neural networks do it for us, the major downside is that it all happens in what may be qualified as a "black box" model. Depending on the data, neural network models may be using features to make predictions that are not at all interpretable or even well aligned with our final goal! 

**When you use additional features to feed the model, do you need to use as many neurons and layers? Would adding more neurons and layers be an alternative to using additional features?**
* Adding new features may let you use less complex architectures, the upside is that you know exactly what input features are used which makes the model more interpretable, on the other hand you may be missing some very useful features that model may have created for you!

