<!-- TITLE: Making Networks Deep -->

# Introduction #

In this lesson we're going to expand on the one-neuron architecture we saw in Lesson 1. We're going to see how we can build neural networks capable of learning the complex kinds of relationships deep neural nets are famous for.

The key idea here is *modularity*, building up a complex network from simpler functional units. We've seen how a linear unit computes a linear function -- now we'll see how to combine and modify these single units to model more complex dynamics.

# Dense Layers #

<mark><b>TODO: currently there's a contradiction with the previous tutorial (where the user used a dense layer to define a single linear unit.  need to re-word the motivation from single linear unit to whole layers to not contradict with the idea that technically, a linear unit is a dense layer</b></mark>

Neural networks typically organize their neurons into **layers**, which you could think about as a collection of neurons having a common set of inputs.  When we collect linear units into a layer, we get a **dense** layer.

<figure style="padding: 1em;">
<img src="https://i.imgur.com/EmHaEOK.png" width="300" alt="A stack of three circles in an input layer connected to two circles in a dense layer.">
<figcaption style="textalign: center; font-style: italic"><center>A dense layer of two linear units receiving two inputs and a bias.
</center></figcaption>
</figure>

<mark><b>TODO: add color-coded figure</b></mark>

<blockquote style="margin-right:auto; margin-left:auto; background-color: #ebf9ff; padding: 1em; margin:24px;">
    <strong>In this course, we'll focus on dense layers.  But there are many kinds of layers!</strong><br>
A "layer" in Keras is a very general kind of thing. A layer can be, essentially, any kind of *data transformation*. Many layers, like the [convolutional](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D) and [recurrent](https://www.tensorflow.org/api_docs/python/tf/keras/layers/RNN) layers, transform data through use of neurons and differ primarily in the pattern of connections they form. Others though are used for [feature engineering](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding) or just [simple arithmetic](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Add). There's a whole world of layers to discover -- [check them out](https://www.tensorflow.org/api_docs/python/tf/keras/layers)!
</blockquote>


# The Activation Function #

<mark><b>TODO: missing: why would you want to stack layers?</b></mark>

It turns out that two dense layers with nothing inbetween are no better than a single dense layer by itself. Dense layers by themselves can never move us out of the world of lines and planes.  What we need is something *nonlinear*.  What we need are activation functions.

<mark><b>TODO: connection between "activation" and "activation function"</b></mark>


<figure style="padding: 1em;">
<img src="https://i.imgur.com/OLSUEYT.png" width="400" alt=" ">
<figcaption style="textalign: center; font-style: italic"><center>Without activation functions, neural networks can only learn linear relationships.  In order to fit curves, we'll need to use activation functions. 
</center></figcaption>
</figure>


<mark><b>TODO: what is meant by the "outputs" of a dense layer in the sentence below?</b></mark>


An **activation function** is simply some function we apply to each of a layer's outputs. The most common is the *rectifier* function $max(0, x)$.  

<figure style="padding: 1em;">
<img src="https://i.imgur.com/KcGV019.png" width="400" alt="A graph of the rectifier function. The line y=x when x>0 and y=0 when x<0, making a 'hinge' shape like '_/'.">
<figcaption style="textalign: center; font-style: italic"><center>The rectifier function.
</center></figcaption>
</figure>

The rectifier function has a graph that's a line with the negative part "rectified" to zero. Applying the function to the outputs of a neuron will put a *bend* in the data, moving us away from simple lines.

When we attach the rectifer to a linear unit, we get a **rectified linear unit** or **ReLU**. (For this reason, it's common to call the rectifier function the "ReLU function".)  Applying a ReLU activation to a linear unit means the output becomes `max(0, w * x + b)`, which we might draw in a diagram like:

<figure style="padding: 1em;">
<img src="https://i.imgur.com/wEgBYMj.png" width="250" alt="Diagram of a single ReLU. Like a linear unit, but instead of a '+' symbol we now have a hinge '_/'. ">
<figcaption style="textalign: center; font-style: italic"><center>A rectified linear unit.
</center></figcaption>
</figure>


# Stacking Dense Layers #

<mark><b>TODO: decided to remove idea of "fully connectedness"</b></mark>

<mark><b>TODO: more clarification re: how to combine multiple layers</b></mark>

Now that we have some nonlinearity, let's see how we can stack layers to get complex data transformations.

As we've mentioned, what defines a dense layer is a full set of connections to its inputs. So when we stack on a dense layer, it will "fully connect" all of its neurons to whatever came before -- it could be input data or it could be other neurons.

<figure style="padding: 1em;">
<img src="https://i.imgur.com/ngr93Oa.png" width="450" alt="An input layer, two hidden layers, and an output.">
<figcaption style="textalign: center; font-style: italic"><center>A stack of dense layers makes a "fully-connected" network.
</center></figcaption>
</figure>

<mark><b>TODO: define what is meant by "output layer" (final layer in the network?)</b></mark>

<mark><b>TODO: good to add the bias</b></mark>

<mark><b>TODO: mention explicitly why neural network does not use relu in the final layer</b></mark>

The layers before the output layer are sometimes called **hidden** since we never see their outputs directly. And though we haven't shown them in this diagram each of these neurons would also be receiving a bias (one bias for each neuron).

## Building Sequential Models ##

The `Sequential` model we've been using will connect together a list of layers in order from first to last: the first layer gets the input, the last layer produces the output. This creates the model in the figure above:

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    # the hidden ReLU layers
    layers.Dense(units=4, activation='relu', input_shape=[2]),
    layers.Dense(units=3, activation='relu'),
    # the linear output layer 
    layers.Dense(units=1),
])

Be sure to pass all the layers together in a list, like `[layer, layer, layer, ...]`, instead of as separate arguments.

<mark><b>TODO: describe how relu is added to each layer in code</b></mark>


# Conclusion #

**TODO**