# Introduction #

In this lesson we're going to expand on the one-neuron architecture we saw in Lesson 1. We're going to see how we can build neural networks capable of learning the complex kinds of relationships needed for advanced applications.

# Dense Layers #

We saw in Lesson 1 that we could create a one-input linear unit with a call like `layers.Dense(units=1)`. What if we wanted more inputs? This is easy enough. We just need to add more input connections.

<!-- FIGURE -->

In fact, there isn't even anything we'd need to change in our code at this point. Keras can figure out what input connections to create the first time you call the model on some data. (It's possible to specify it beforehand, too, with an `input_shape` argument, but we won't bother.)

But what if we wanted more *outputs*? Now we need to add more units. It's common to collect together neurons having a common input into layers. A **dense layer** is just a collection of these linear units.

<!-- FIGURE: 3 in, 2 out -->

Notice here that each unit has a full set of input connections. This is the defining feature of dense layers, and why you sometimes hear of networks with only dense layers as being **fully connected**.

To make a linear model that produces three outputs, we would just change to `units` argument to 2.

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Dense(units=2)
])

More generally, you could think of a layer in Keras as some kind of *data transformation*. To create the kinds of complicated transformations they need, neural networks will often create very deep stacks of layers.

It turns out, however, that a deep stack of dense layers with nothing in between can't do anything more than just a single dense layer. Dense layers can never move us out of the world of lines and planes. What we need is something *nonlinear*. What we need are activation functions.

# The Activation Function #

An **activation function** is simply some function we apply to each of a layer's outputs. The most common is the **Rectified Linear Unit** or **ReLU** function.

<!-- graph of ReLU -->

You could think about the ReLU function as being a line with the negative part "rectified" to zero: `max(0, x)`. As you can see, applying this ReLU function to the outputs of a neuron will allow us to put a *bend* in the data, moving us away from simple lines. In the network diagram, it might look like this:

<!-- activation in network -->

You can include an activation function in its own layer.
```
layers.Dense(8),
layers.Activation('relu')
```

Or you can add it as part of the definition of another layer.
```
layers.Dense(8, activation='relu')
```

These two ways of adding activations are completely equivalent.

# Stacking Dense Layers #

Now that we have some nonlinearity, let's see how we can stack layers to get complex data transformations.

As we've mentioned, what defines a dense layer is a full set of connections to its inputs. So when we stack on a dense layer, it will "fully connect" all of its neurons to whatever came before -- it could be input data or it could be other neurons.

<!-- FIGURE -->

The layers before the output layer are sometimes called **hidden**, since we never see their outputs directly. All of the hidden layers together might be called the **body**, while the output is the **head**.

The `Sequential` model we've been using will connect together a list of layers sequentially from first to last: the first layer gets the input, the last layer produces the output. This will create the model in the figure above.

In [None]:
model = keras.Sequential([
    # The Body of hidden layers
    layers.Dense(units=2, activation='relu'),
    layers.Dense(units=3, activation='relu'),
    # The Head
    layers.Dense(units=1),
])

The body and head serve different functions in the network. You should think about the hidden layers as learning a data transformation that will make it easy for the final layer to produce the correct outputs. The hidden layers learn how to do feature engineering, while the final layer learns how to fit a curve to the outputs.

<figure style="padding: 1em;">
<img src="https://i.imgur.com/wQRI0FV.gif" width="800" alt="Fitting a quadratic curve with hidden layers and relu activation using SGD.">
<figcaption style="textalign: center; font-style: italic"><center>Learning nonlinearities with hidden layers. ReLU allows us to "bend" the curve.
</center></figcaption>
</figure>

# Example - Red Wine Quality #

Let's return to the *Red Wine Quality* dataset. In Lesson 1 we just used a single feature as a predictor; this time we'll use all eleven.

In [None]:
#$HIDE$
import pandas as pd
from IPython.display import display

red_wine = pd.read_csv('../input/dl-course-data/dl-course-data/red-wine.csv')

# Create training and validation splits
df_train = red_wine.sample(frac=0.7, random_state=0)
df_valid = red_wine.drop(df_train.index)

# Scale to [0, 1]
max_ = df_train.max(axis=0)
min_ = df_train.min(axis=0)
df_train = (df_train - min_) / (max_ - min_)
df_valid = (df_valid - min_) / (max_ - min_)

# Split features and target
X_train = df_train.drop('quality', axis=1)
X_valid = df_valid.drop('quality', axis=1)
y_train = df_train['quality']
y_valid = df_valid['quality']

For this example, we've chosen three hidden layers each having 1024 neural units. The number of hidden layers and how many units they have determine the *capacity* of the network, or "how much" it's able to learn. Choosing these is part of the model development process, which we'll continue to talk about over the course.

We'll also include some validation data this time to check if the model if overfitting during training.

In [None]:
model = keras.Sequential([
    layers.Dense(1024, activation='relu'),
    layers.Dense(1024, activation='relu'),
    layers.Dense(1024, activation='relu'),
    layers.Dense(1),
])
model.compile(
    optimizer='SGD',
    loss='mse',
)

history = model.fit(
    x_train, y_train,
    validation_data=(x_valid, y_valid),
    batch_size=256,
    epochs=100,
)

import pandas as pd
history_df = pd.DataFrame(history.history)
history_df.loc[:, ['loss', 'val_loss']].plot()
history_df.loc[:, ['mse', 'val_mse']].plot();

# Conclusion #


# Your Turn #

Now [move on] to the Exercise, where you'll **TODO**.