<!-- TITLE: Making Networks Deep -->

# Introduction #

In this lesson we're going to expand on the one-neuron architecture we saw in Lesson 1. We're going to see how we can build neural networks capable of learning the complex kinds of relationships deep neural nets are famous for.

The key idea here is *modularity*, building up a complex network from simpler functional units. We've seen how a linear unit computes a linear function -- now we'll see how to combine and modify these single units to compute more complex nonlinearities. Beyond single neurons, the next stage is the *layer*.

# Dense Layers #

Neural networks typically organize their neurons into **layers**, which you could think about as a collection of neurons having a common set of inputs. When we collect linear units into a layer, we get a **dense** layer.

<figure style="padding: 1em;">
<img src="https://i.imgur.com/EmHaEOK.png" width="300" alt="A stack of three circles in an input layer connected to two circles in a dense layer.">
<figcaption style="textalign: center; font-style: italic"><center>A dense layer of two linear units receiving two inputs and a bias.
</center></figcaption>
</figure>

To create this layer we would use `layers.Dense` with an argument `units=2`.

Notice here that each of these neurons has a full set of input connections. This is the defining feature of dense layers, and why you sometimes hear of networks with only dense layers as being **fully connected**.

It turns out that two dense layers with nothing inbetween are no better than a single dense layer by itself. Dense layers by themselves can never move us out of the world of lines and planes. What we need is something *nonlinear*. What we need are activation functions.

<blockquote style="margin-right:auto; margin-left:auto; background-color: #ebf9ff; padding: 1em; margin:24px;">
    <strong>Many Kinds of Layers</strong><br>
A "layer" in Keras is a very general kind of thing. A layer can be, essentially, any kind of *data transformation*. Many layers, like the [convolutional](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D) and [recurrent](https://www.tensorflow.org/api_docs/python/tf/keras/layers/RNN) layers, transform data through use of neurons and differ primarily in the pattern of connections they form. Others though are used for [feature engineering](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding) or just [simple arithmetic](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Add). There's a whole world of layers to discover -- [check them out](https://www.tensorflow.org/api_docs/python/tf/keras/layers)!
</blockquote>

# The Activation Function #

An **activation function** is simply some function we apply to each of a layer's outputs. The most common is the *rectifier* function $max(0, x)$. When we attach the rectifer to a linear unit, we get a **rectified linear unit** or **ReLU**. For this reason, it's common to call the rectifier function the "ReLU function".

<figure style="padding: 1em;">
<img src="https://i.imgur.com/KcGV019.png" width="400" alt="A graph of the rectifier function. The line y=x when x>0 and y=0 when x<0, making a 'hinge' shape like '_/'.">
<figcaption style="textalign: center; font-style: italic"><center>The ReLU activation.
</center></figcaption>
</figure>

The ReLU function has a graph that's a line with the negative part "rectified" to zero. As you can see, applying this ReLU function to the outputs of a neuron will put a *bend* in the data, moving us away from simple lines.

Applying a ReLU activation to a linear unit means the output becomes `max(0, w * x + b)`, which we might draw in a diagram like:

<figure style="padding: 1em;">
<img src="https://i.imgur.com/wEgBYMj.png" width="250" alt="Diagram of a single ReLU. Like a linear unit, but instead of a '+' symbol we now have a hinge '_/'. ">
<figcaption style="textalign: center; font-style: italic"><center>A rectified linear unit.
</center></figcaption>
</figure>

## Adding an Activation Function ##

You can include an activation function in its own layer.
```
layers.Dense(units=3),
layers.Activation('relu')
```

Or you can add it as part of the definition of another layer.
```
layers.Dense(units=3, activation='relu')
```

These two ways of adding activations are completely equivalent.

# Stacking Dense Layers #

Now that we have some nonlinearity, let's see how we can stack layers to get those complex data transformations.

As we've mentioned, what defines a dense layer is a full set of connections to its inputs. So when we stack on a dense layer, it will "fully connect" all of its neurons to whatever came before -- it could be input data or it could be other neurons.

<figure style="padding: 1em;">
<img src="https://i.imgur.com/ngr93Oa.png" width="450" alt="An input layer, two hidden layers, and an output.">
<figcaption style="textalign: center; font-style: italic"><center>A stack of dense layers makes a "fully-connected" network.
</center></figcaption>
</figure>

The layers before the output layer are sometimes called **hidden** since we never see their outputs directly. And though we haven't shown them in this diagram each of these neurons would also be receiving a bias (one bias for each neuron).

<blockquote style="margin-right:auto; margin-left:auto; background-color: #ebf9ff; padding: 1em; margin:24px;">
    <strong>Are the Inputs a Layer?</strong><br>
It's sometimes convenient to think of the input features themselves as being a layer (as in the diagrams). Keras actually has an input layer <code>layers.InputLayer</code>, which would replace the initial <code>input_shape</code> argument in the first layer. Most of the time however the "first layer" will refer to whatever comes after the inputs (like a <code>Dense</code> layer). Ultimately, whether or not to regard the input features as a layer is just a matter of convenience.
</blockquote>

## Building Sequential Models ##

The `Sequential` model we've been using will connect together a list of layers in order from first to last: the first layer gets the input, the last layer produces the output. This creates the model in the figure above:

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    # the hidden ReLU layers
    layers.Dense(units=4, activation='relu', input_shape=[2]),
    layers.Dense(units=3, activation='relu'),
    # the linear output layer 
    layers.Dense(units=1),
])

Be sure to pass all the layers together in a list, like `[layer, layer, layer, ...]`, instead of as separate arguments.

# Learning Nonlinearities #

We often think of a layer (or even groups of layers) as having a certain purpose in the network or as performing a certain task on the data that flows through it. As we saw in the first lesson, the job of the final (linear) layer is to perform a regression task -- its job is fit a line (or plane or hyperplane) to whatever comes into it.

What is the job of the hidden layers? You could think of the hidden layers as performing *feature engineering*. Their role in the regression problem is to transform the original input data into a shape that makes it easy for the final layer to fit the curve.

<figure style="padding: 1em;">
<img src="https://i.imgur.com/OLSUEYT.png" width="400" alt=" ">
<figcaption style="textalign: center; font-style: italic"><center>The hidden layers can learn to use the ReLU function to put "bends" in the data.<br>This allows the network to fit curves, like this parabola.
</center></figcaption>
</figure>

In the figure above, you could think about the input data as being points on a line, the x-axis. The hidden layers take this line and bend and transform it in high-dimensional space and then hand it off to the last layer which fits it to the data points.

So while a single neuron with no activation can learn linear relationships, it takes a hidden layer with an activation function to learn a nonlinear relationship, like the quadratic above.

# Conclusion #

**TODO**