<!-- TITLE: Making Networks Deep -->

- [ ] Introduction
- [ ] Illustration: A dense layer
- [ ] Illustration: ReLU graph
- [ ] Illustration: Network with ReLU
- [ ] Illustration: Fully connected network
- [ ] Conclusion

# Introduction #

In this lesson we're going to expand on the one-neuron architecture we saw in Lesson 1. We're going to see how we can build neural networks capable of learning the complex kinds of relationships needed for advanced applications.

# Dense Layers #

Neural networks typically organize their neurons into **layers**. When we collect linear units together, we get a **dense** layer.

<figure style="padding: 1em;">
<img src="https://i.imgur.com/EmHaEOK.png" width="400" alt="A stack of three circles in an input layer connected to two circles in a dense layer.">
<figcaption style="textalign: center; font-style: italic"><center>A dense layer of two linear units receiving two inputs and a bias.
</center></figcaption>
</figure>

Notice here that each unit has a full set of input connections. This is the defining feature of dense layers, and why you sometimes hear of networks with only dense layers as being **fully connected**.

To make a linear model that produces three output values, we would just change to `units` argument to 3.

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Dense(units=2, input_shape=[2])
])

More generally, you could think of a layer in Keras as some kind of *data transformation*. To create the kinds of complicated transformations they need, neural networks will often create very deep stacks of layers.

It turns out, however, that a deep stack of dense layers with nothing in between can't do anything more than just a single dense layer can. Dense layers can never move us out of the world of lines and planes. What we need is something *nonlinear*. What we need are activation functions.

# The Activation Function #

An **activation function** is simply some function we apply to each of a layer's outputs. The most common is the **Rectified Linear Unit** or **ReLU** function.

<figure style="padding: 1em;">
<img src="https://i.imgur.com/.png" width="400" alt="A graph of the ReLU. The line y=x when x>0 and y=0 when x<0, making a 'hinge' shape.">
<figcaption style="textalign: center; font-style: italic"><center>The ReLU activation.
</center></figcaption>
</figure>

The ReLU function has a graph that's a line with the negative part "rectified" to zero: `max(0, x)`. As you can see, applying this ReLU function to the outputs of a neuron will put a *bend* in the data, moving us away from simple lines. In the network diagram, it might look like this:

<figure style="padding: 1em;">
<img src="https://i.imgur.com/.png" width="400" alt="A network diagram with the ReLU activation added. We draw a small circle containing the hinge graph immediately before the output.">
<figcaption style="textalign: center; font-style: italic"><center>Adding the ReLU activation.
</center></figcaption>
</figure>

## Adding an Activation Function ##

You can include an activation function in its own layer.
```
layers.Dense(units=3),
layers.Activation('relu')
```

Or you can add it as part of the definition of another layer.
```
layers.Dense(units=3, activation='relu')
```

These two ways of adding activations are completely equivalent.

# Stacking Dense Layers #

Now that we have some nonlinearity, let's see how we can stack layers to get complex data transformations.

As we've mentioned, what defines a dense layer is a full set of connections to its inputs. So when we stack on a dense layer, it will "fully connect" all of its neurons to whatever came before -- it could be input data or it could be other neurons.

<figure style="padding: 1em;">
<img src="https://i.imgur.com/x4Cqrh6.png" width="600" alt="An input layer, two hidden layers, and an output.">
<figcaption style="textalign: center; font-style: italic"><center>A stack of dense layers makes a "fully-connected" network.
</center></figcaption>
</figure>

The layers before the output layer are sometimes called **hidden**, since we never see their outputs directly. Though we haven't shown them in this diagram, each of these neurons would also be receiving a bias.

The `Sequential` model we've been using will connect together a list of layers sequentially from first to last: the first layer gets the input, the last layer produces the output. This will create the model in the figure above.

In [None]:
model = keras.Sequential([
    # the hidden ReLU layers
    layers.Dense(units=4, activation='relu'),
    layers.Dense(units=3, activation='relu'),
    # the linear output layer 
    layers.Dense(units=1),
])

The body and head serve different functions in the network. You should think about the hidden layers as learning a data transformation that will make it easy for the final layer to produce the correct outputs. The hidden layers learn how to do *feature engineering*, while the final layer learns how to fit a curve to the outputs.

<figure style="padding: 1em;">
<img src="https://i.imgur.com/OLSUEYT.png" width="400" alt=" ">
<figcaption style="textalign: center; font-style: italic"><center>The hidden layers can learn to use the ReLU function to put "bends" in the data. This allows the network to learn non-linear relationships like this one, a quadratic.
</center></figcaption>
</figure>

# Conclusion #