
![loss](https://machinelearningmastery.com/wp-content/uploads/2018/12/Example-of-Train-and-Validation-Learning-Curves-Showing-An-Overfit-Model.png)
____
# <font color=#FFAA00> Neural Network Playground: ANN Intuition</font>

In this HW, you will use a web app to build your intuition on ANNs.

ANNs have a large number of choices that you need to make; it is not easy to know how to make these choices. Here, you are going to build your intuition about how to make such choices and explain this intuition. To do this, you will run a large number of ANNs with various datasets, with various inputs, for various depths, for various widths, observing the contents of the hidden layers, examing the optimization process and so on. You can't really build your intuition for ANNs without spending **a lot** of time-varying all of these choices together simultaneously. No pain, no gain! 💪🏻 

Fortunately, there is an excellent web app for doing just this. And, you will have fun! 🥳

![pen](https://findicons.com/files/icons/766/base_software/128/pencil3.png)


Go to [this](https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4,2&seed=0.30258&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false) webpage. You will see a dashboard with:
* choice of four datasets at the upper left (DATA),
* below that you see training choices (ratio, noise, batch size),
* at the very right you can see the current output of the NN (OUTPUT),
* at the bottom right you can change what is being displayed - training versus testing, for example,
* along the top you have controls for the regularization, activation, learning rate and problem type,
* note that if you switch the problem type to regression, the choice of datasets at the upper left changes.

That is a lot to vary! And, we have not discussed the most interesting part in the center: the NN itself. Note that you also have control over these properties of the NN:
* along the top of the NN you can change the number of hidden layers - this is **true** deep learning!
* you can do feature engineering by choosing what features you use as inputs on the left (FEATURES),
* you can change the width of the NN with the +/- buttons above each hidden layer. 

You run the NN with the button ▶️ at the very upper left, and it will generate a running plot at the upper right with the loss function. Note that to the left of the "play" button is a reset button. And, as it runs you get a view of what is in each "neuron" in the hidden layers. You really get to see everything that is going on.

Play with some of these controls before proceeding. In practice you don't have the luxury of seeing all controls placed in front of you like this. 

Open a markdown cell below so that you are ready to capture the answers to the questions and collect your insights. Be sure to have a response to each of the numbered prompts.


![pen](https://findicons.com/files/icons/766/base_software/128/pencil3.png)

Now, you are going to following these steps and answer these questions, about a short paragraph for each:

---
1. Reduce the size of the NN to its minimum size: first input $X^1$, one hidden layer, and one neuron in that hidden layer. For each of the four classification datasets run the NN, remembering to reset it every time. Then, do this with *only* $X^2$. Be sure that regularization is set to *None* for now.  Describe what you see here. 

    Input with $X^1$ work the best with third dataset, which is a obvious binary dataset and not working for other dataset which have different disturbution. And for $X^2$ work the best for the first dataset. but not for other dataset.
---
    
2. Next, repeat what you just did for each of the other possible inputs, from $X^1$ to $\sin(X^2)$ using only one input at a time. Remember to reset every time. Describe the behavior. 

     Different datasets exhibit distinct features, and selecting the appropriate ones for training data is crucial for optimal results. However, some datasets present challenges in identifying distinguishable features. In such cases, relying solely on a single feature may not be effective across all scenarios.
---
3. You are getting the idea: now, vary all of the inputs in many different combinations. This is a form of "feature engineering" where you, the user, gets to control what the NN gets trained on. Describe the patterns you see and what conclusions about feature engineering you would draw from this.

    For the first dataset with the feature $X^2$ have the lowest test loss and the second dataset $X_1X_2$. For third dataset with $X^1$ feature have lowest test loss. And forth one is hard to fit it with any combination of features, may need to add more layers or neurons to train the NN.
    
    Overall, feature engineering involves a systematic exploration of the data to extract meaningful information and enhance the performance of machine learning models. By understanding the patterns observed during this process, one can make informed decisions about feature selection, transformation, and model architecture to achieve better predictive accuracy and generalization.
---
4. Reset everything and choose the first feature $X^1$ again, still with only one hidden layer. One by one add neurons to the hidden layer and describe what happens. Be sure to do all of these tests for the four datasets and comment on which ones are easier for the NN and which are harder. 

    NN works the best for the third one and for second and forth dataset is harder.


---
5. Now reset, use the same input with two hidden layers with only one neuron in each layer; you will need to remove a neuron in the second layer because it will try to put two there. After noticing what it does, put that second neuron in the second layer and compare. What did you see?

    while adding a second hidden layer increases the model's capacity to capture more complex patterns in the data compared to having only one neuron in each layer, without non-linear activation functions, the model's ability to capture non-linear relationships remains limited
    
---
6. Build a deep and wide NN by adding layers with lots of neurons, but use *only* the first input $X^1$ and pay most attention to the data that has blue dots at the center surrounded by a ring of orange dots (upper left). Just using $X^1$ as an input can you build any NN to get the circular separation boundary needed? What if you add $X^2$ to the possible inputs as well? Describe the shape of the boundary.

    When utilizing the $ X^1 $ feature alone, the resulting boundary will be polygonal. Conversely, opting for the $X^2$ feature will yield a circular boundary. Furthermore, leveraging the $X^2$ feature tends to accelerate convergence, facilitating quicker training of the model.

---
7. This tool allows you to see what is in the hidden layers - what patterns do you see forming there? (If you hover your mouse over an internal neuron it expands it so that you can see it better.)

    To me is like different combination of different feature maps and with different number of neuron it will have more combination of feature maps.

---
8. Repeat step 6 with the two regularizers, varying their strength. What do you see? 

    With L1 and L2 regularization with varying strengths enhances regularization effects. L1 encourages sparsity, aiding feature selection and interpretability, while L2 prevents overfitting by penalizing large weights. By adjusting strengths, a balance is struck between model complexity and generalization, leading to improved performance on unseen data. This approach also boosts robustness to outliers and noisy features. Ultimately, fine-tuning regularization provides flexible control over model complexity, tailoring it to dataset characteristics for optimal performance.

---
9. Click on one of the weights; that is, one of the lines that connects two neurons. After hovering, your click will open a box allowing you to change that particular weight. Change some of the weights to see what the consequences are. (It is more instructive if you change the weights by a lot so that you see a bigger impact.)

    Increasing or decreasing weights can affect the model's predictions, convergence speed, and generalization ability.

---
10. Now, finally, focus on the dataset with the spiral - the one at the lower right. What is the minimal deep net you can construct that allows you to find a spiral separation boundary? Watch the graph at the upper right - does it appear to be hopping among various local minima? 

    I condtructed a 2 hidden layers with five neurons and with L2 regularization with ratio 0.03 with all the features it can trained on. And the test loss os 0.182. The graph shows is the local minima

____