## Deep Neural Networks
Add hidden layers to your network to uncover complex relationships.


### Layers
Neural Networks typically organize their neurons into `layers`.

When we collect together linear units having a common set of input, it's called a `dense layer`.

![image.png](attachment:image.png)

You could think of each layer in a neural network as performing some kind of relatively simple transformation. Through a deep stack of layers, a neural network can transform its inputs in more and more complex ways. In a well-trained neural network, each layer is a transformation getting us a little bit closer to a solution.

### The Activation Functions

It turns out, however, that two dense layers with nothing in between are no better than a single dense layer by itself. Dense layers by themselves can never move us out of the world of lines and planes. What we need is something nonlinear. What we need are activation functions.


![image.png](attachment:image.png)

An `activition functions` is simply some functions we apply to each layer's output (like activation). The most commmon one is the `rectifier` functions `max(0, x)`

![image.png](attachment:image.png)

The rectifier function has a graph that's a line with the negative part "rectified" to zero. Applying the function to the outputs of a neuron will put a bend in the data, moving us away from simple lines.

When we attach the `rectifier` to a `linear unit`, we get a `rectifier linear unit` to `ReLU` (For this reason we common to call this function the `ReLU function`).

Applying a ReLU activation to a linear unit means the output becomes `max(0, w * x + b)`, which we might draw in a diagram like:

![image-2.png](attachment:image-2.png)
A rectified linear unit.



### Stacking DenseLayer
Now that we have some nonlinearity, let's see how we can stack layers to get complex data transformations.

![image.png](attachment:image.png)

The layers  before output one is called `hidden` since we can never see the output directly.

Now, notice that the final (output) layer is a linear unit (meaning, no activation function). That makes this network appropriate to a regression task, where we are trying to predict some arbitrary numeric value. Other tasks (like classification) might require an activation function on the output.


### Building a Sequential Models
The `Sequential` model will connect together a list of layer in order from first to last.

the first layer gets the input, the last layer produces the output. This creates the model in the figure above:

```python
from tensorflow import keras
from tensorflow.keras import layers

model = Sequential([
    ## Hidden layer
    layers.Dense(units = 4, activation="relu", input_shape=[2])
    layers.Dense(units=3, activation='relu'),

    # the linear output layer 
    layers.Dense(units=1),
])
```
