# Introduction #

# Understand Dropout #
### Load Data

Load the *Spotify* dataset.

In [None]:
import pandas as pd
from sklearn.preprocessing import StandardScalar, OneHotEncoder
from sklearn.compose import make_column_transformer
from sklearn.model_selection import GroupShuffleSplit

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import callbacks

spotify = pd.read_csv('../input/dl-course-data/spotify.csv')

X = spotify.copy().dropna()
y = X.pop('track_popularity')
artists = X['track_artist']

features_num = ['danceability', 'energy', 'key', 'loudness', 'mode',
                'speechiness', 'acousticness', 'instrumentalness',
                'liveness', 'valence', 'tempo', 'duration_ms']
features_cat = ['playlist_genre']

preprocessor = make_column_transformer(
    (StandardScaler(), features_num),
    (OneHotEncoder(), features_cat),
)

def group_split(X, y, group, train_size=0.75):
    splitter = GroupShuffleSplit(train_size=train_size)
    train, test = next(splitter.split(X, y, groups=group))
    return (X.iloc[train], X.iloc[test], y.iloc[train], y.iloc[test])

X_train, X_valid, y_train, y_valid = group_split(X, y, artists)

X_train = preprocessor.fit_transform(X_train)
X_valid = preprocessor.transform(X_valid)
y_train = y_train / 100
y_valid = y_valid / 100

input_shape = X_train.shape[1]
print("Input shape: [{}]".format(input_shape)

### 1a) Add dropout

Let's recall the model we ended with in the exercises from Lesson 4.

```
model = keras.Sequential([
    layers.Dense(128, activation='relu', input_shape=input_shape),
    layers.Dense(128, activation='relu'),    
    layers.Dense(128, activation='relu'),
    layers.Dense(1)
])
```

### 1b) Understand the dropout parameter

Try different values of the `dropout` parameter and see the effect.

# Understand Batch Normalization #
### Load Data
Let's see how batch normalization can fix problems in training.

Load the *Concrete* dataset. We won't do any standardization this time so we can see the effect of the batchnorm layers.

Also we'll use the ordinary SGD optimizer, which is more sensitive to differences in scale. (Adam can train this, but batchnorm still improves the outcome.)

### 2a) Train without Batch Normalization

Depending on how the weights were initialized, the training will usually fail completely (loss diverges to inf/nan); when it works at all, it tends to converge at a very large loss.

Conditions make the training very unstable.

### 2b) Train with Batch Normalization

Now try again with batch normalization. Add a batchnorm layer before each `Dense` layer.

# Develop a Regression Model #
### Load Data
In this set of exercises, we'll walk through a complete model development process. We'll start from a simple baseline and work towards a finished model.

Load the *California Housing* dataset.

In [None]:
housing = pd.read_csv('../dl-course-data/housing.csv')

### 3a) Linear Baseline

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Dense(1, input_shape=[8]),
])

### 3b) Add Hidden Layers

In [None]:
model = keras.Sequential([
    layers.Dense(512, activation='relu', input_shape=[11]),
    layers.Dense(512, activation='relu'),
    layers.Dense(512, activation='relu'),
    layers.Dense(1),
])

### 3c) Add Dropout

Add 50% dropout layers after each of the hidden layers.

In [None]:
model = keras.Sequential([
    layers.Dense(512, activation='relu', input_shape=[11]),
    layers.Dropout(0.5),
    layers.Dense(512, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(512, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1),
])

Now run this to train the model.

In [None]:
model.compile(
    optimizer='adam',
    loss='mae',
    metrics=['mae', 'mse'],
)

history = model.fit(
    ds_train,
    validation_data=ds_valid,
    epochs=20,
)

### 3d) Add Batch Normalization

Add batch normalization after each hidden layer.

In [None]:
model = keras.Sequential([
    layers.Dense(128, activation='relu', input_shape=[11]),
    layers.BatchNormalization(),
    layers.Dense(128, activation='relu'),
    layers.BatchNormalization(),
    layers.Dense(128, activation='relu'),
    layers.BatchNormalization(),
    layers.Dense(1),
])

Now run this to train the model.

In [None]:
model.compile(
    optimizer='adam',
    loss='mae',
)

history = model.fit(
    ds_train,
    validation_data=ds_valid,
    epochs=20,
)

Did it require fewer epochs?

### Discussion

You could also try early stopping, other optimizers, other activation functions.

# Conclusion #