<mark>
- [ ] Introduction
- [ ] Exercise 1
  - [ ] Code
  - [ ] Discussion
  - [ ] Checking
- [ ] Exercise 2
  - [ ] Code
  - [ ] Discussion
  - [ ] Checking
- [ ] Exercise 3
  - [ ] Code
  - [ ] Discussion
  - [ ] Checking
- [ ] Conclusion
</mark>

# Introduction #

In these exercises we'll explore some ways of improving training outcomes.

First load the *Spotify* dataset. Your task will be to predict the popularity of a song based on various audio features.

# Load Data #

In [None]:
import pandas as pd
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import make_column_transformer
from sklearn.model_selection import GroupShuffleSplit

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import callbacks

spotify = pd.read_csv('../input/dl-course-data/spotify.csv')

X = spotify.copy().dropna()
y = X.pop('track_popularity')
artists = X['track_artist']

features_num = ['danceability', 'energy', 'key', 'loudness', 'mode',
                'speechiness', 'acousticness', 'instrumentalness',
                'liveness', 'valence', 'tempo', 'duration_ms']
features_cat = ['playlist_genre']

preprocessor = make_column_transformer(
    (StandardScaler(), features_num),
    (OneHotEncoder(), features_cat),
)

def group_split(X, y, group, train_size=0.75):
    splitter = GroupShuffleSplit(train_size=train_size)
    train, test = next(splitter.split(X, y, groups=group))
    return (X.iloc[train], X.iloc[test], y.iloc[train], y.iloc[test])

X_train, X_valid, y_train, y_valid = group_split(X, y, artists)

X_train = preprocessor.fit_transform(X_train)
X_valid = preprocessor.transform(X_valid)
y_train = y_train / 100
y_valid = y_valid / 100

input_shape = [X_train.shape[1]]
print("Input shape: {}".format(input_shape))

# Capacity #

We'll start the model development with a linear model. It's not a bad idea to begin with a simple baseline like this. It makes it easier to see what effect changing the network architecture has, and also to make sure that the rest of your pipeline is working as you expect.

Run this next cell without any changes to train a linear model on the *Spotify* dataset.

In [None]:
model = keras.Sequential([
    layers.Dense(1, input_shape=input_shape),
])
model.compile(
    optimizer='adam',
    loss='mae',
)
history = model.fit(
    X_train, y_train,
    validation_data=(X_valid, y_valid),
    batch_size=512,
    epochs=50,
)
history_df = pd.DataFrame(history.history)
history_df.loc[:, ['loss', 'val_loss']].plot()
print("Minimum Validation Loss: {:0.4f}".format(history_df['val_loss'].min()));

# 1) Add capacity

You suspect this model is underfitting the training data, so you want to add capacity. Add one hidden layer 128 units and ReLU activation. (Remember that the `input_shape` argument should always be on the *first* layer.)

In [None]:
# YOUR CODE HERE
model = ____

# Check your answer
q_1.check()

In [None]:
#%%RM_IF(PROD)%%
model = keras.Sequential([
    layers.Dense(32, activation="relu", input_shape=input_shape),
    layers.Dense(1)
])
q_1.assert_check_passed()

In [None]:
# Lines below will give you a hint or solution code
#_COMMENT_IF(PROD)_
q_4.hint()
#_COMMENT_IF(PROD)_
q_4.solution()

Once you've got the correct answer, run the cell below to train the model and see the learning curves.

In [None]:
model.compile(
    optimizer='adam',
    loss='mae',
)
history = model.fit(
    X_train, y_train,
    validation_data=(X_valid, y_valid),
    batch_size=512,
    epochs=50,
)
history_df = pd.DataFrame(history.history)
history_df.loc[:, ['loss', 'val_loss']].plot()
print("Minimum Validation Loss: {:0.4f}".format(history_df['val_loss'].min()));

# 2) Add more capacity

Could the model still be underfitting? Try adding two more hidden layers to the network. Your model should have three hidden layers of 128 units and ReLU activation.

In [None]:
model = keras.Sequential([
    layers.Dense(128, activation='relu', input_shape=input_shape),
    layers.Dense(128, activation='relu'),
    layers.Dense(128, activation='relu'),    
    layers.Dense(1)
])

What do you think about these learning curves?

In [None]:
model.compile(
    optimizer='adam',
    loss='mae',
)
history = model.fit(
    X_train, y_train,
    validation_data=(X_valid, y_valid),
    batch_size=512,
    epochs=50,
)
history_df = pd.DataFrame(history.history)
history_df.loc[:, ['loss', 'val_loss']].plot()
print("Minimum Validation Loss: {:0.4f}".format(history_df['val_loss'].min()));

# Early Stopping #

Let's take the last model and add an early stopping callback.

# 3) Define a callback

Define an early stopping callback with <mark>TODO</mark>

In [None]:
from tensorflow.keras import callbacks

early_stopping = callbacks.EarlyStopping(patience=5, min_delta=0.001)

# 4) Interpret results

Now train the model.

Did it solve the overfitting problem? Do you think the parameters of the callback could be improved?

In [None]:
# TOUGHT

# Learning Rate Schedules #

Let's make one final improvement to this model. We're going to define what's called a **learning rate schedule**. In Keras we can do this with a callback, just like with early stopping.

# 5) Define a learning rate schedule

You can often get lower loss by decreasing the learning rate during training. Let's define a learning rate scheduler and rerun the model from earlier.

In [None]:
model = keras.Sequential([
    layers.Dense(8, activation='relu', input_shape=input_shape),
    layers.Dense(8, activation='relu'),
    layers.Dense(1)
])
model.compile(
    optimizer='adam',
    loss='mae'
)

Now we can add the schedule using a "callback".

In [None]:
lr_schedule = keras.callbacks.ReduceLROnPlateau()

model.fit(
    X_train, y_train,
    batch_size=64,
    epochs=30,
    callbacks=[lr_schedule],
)

Did we see an improvement?

# Conclusion #
