<mark>
- [ ] Introduction
- [ ] Exercise 1
  - [x] Code
  - [ ] Discussion
  - [ ] Checking
- [ ] Exercise 2
  - [x] Code
  - [ ] Discussion
  - [ ] Checking
- [ ] Exercise 3
  - [x] Code
  - [ ] Discussion
  - [ ] Checking
- [ ] Conclusion
</mark>

# Introduction #


In [None]:
# Setup plotting
import matplotlib.pyplot as plt

plt.style.use('seaborn-whitegrid')
# Set Matplotlib defaults
plt.rc('figure', autolayout=True)
plt.rc('axes', labelweight='bold', labelsize='large',
       titleweight='bold', titlesize=18, titlepad=10)

# Setup feedback system
from learntools.core import binder
binder.bind(globals())
from learntools.deep_learning_new.ex3 import *

# Preparing Data for a Neural Network #

The data we'll use in this course will be *structured* data, or more specifically, *tabular* data, the kind you'd find in CSV files and Pandas DataFrames. Neural nets usually won't be able to work with the raw data, so you'll need to prepare the data before using it for training.

Neural nets need numeric inputs and produce numeric outputs and generally perform best when all the features are all on a common scale near 0. This means you'll need to encode any non-numeric features and scale any numeric features. For numerics, [standardization](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) and [min-max scaling](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler) to $[0, 1]$ can both be good choices. For categorical features with a moderate number of categories, [one-hot encoding](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html) is a good choice. The [preprocessing module](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing) in scikit-learn has almost everything you might need for preparing tabular data for neural networks.

<mark><strong>TODO - add resources on Kaggle</strong>
[Data Cleaning](https://www.kaggle.com/learn/data-cleaning)
[Intermediate Machine Learning](https://www.kaggle.com/learn/intermediate-machine-learning)
</mark>

# 1) Preparing the Fuel Economy Dataset

In the *Fuel Economy* dataset your task is to predict the fuel economy of an automobile given features like its type of engine or the year it was made. 

First let's load the *Fuel Economy* dataset. Our target is the `FE` column.

In [None]:
import pandas as pd

fuel = pd.read_csv('../input/dl-course-data/fuel.csv')
display(fuel.head())
display(fuel.info())

The features with `object` type are categorical; we will one-hot encode these. The numeric features we'll standardize. It's not as essential that the target be transformed, though doing so can significantly speed up training.

In [None]:
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import make_column_transformer, make_column_selector
from sklearn.model_selection import train_test_split

X = fuel.copy()
# Remove target
y = X.pop('FE')

preprocessor = make_column_transformer(
    (StandardScaler(),
     make_column_selector(dtype_include=np.number)),
    (OneHotEncoder(sparse=False),
     make_column_selector(dtype_include=object)),
)

# Split before applying any data-dependent transformations
X_train, X_valid, y_train, y_valid = \
    train_test_split(X, y, train_size=0.75)

X_train = preprocessor.fit_transform(X_train)
X_valid = preprocessor.transform(X_valid)
y_train = np.log(y_train) # log transform target instead of standardizing
y_valid = np.log(y_valid)

And now our data is ready for the network! Run the next cell to get credit for this part.

In [None]:
# Run this cell for credit!
q_1.check()

# 2) Input Shape

What should be the value of `input_shape` in the first layer of the network? (Consider looking at the `shape` attribute of the appropriate dataset.)

In [None]:
# YOUR CODE HERE
input_shape = ____

# Check your answer
q_2.check()

In [None]:
#%%RM_IF(PROD)%%
input_shape = [13] # and check 14
q_2.assert_check_failed()

In [None]:
#%%RM_IF(PROD)%%
input_shape = 50
q_2.assert_check_failed()

In [None]:
#%%RM_IF(PROD)%%
input_shape = [50]
q_2.assert_check_passed()

In [None]:
# Lines below will give you a hint or solution code
#_COMMENT_IF(PROD)_
q_2.hint()
#_COMMENT_IF(PROD)_
q_2.solution()

And now our data is ready for a neural network!

# 3) Define Neural Network Model

Define a regression model with three hidden dense layers, each having 64 units and a ReLU activation. (Be sure to include the output layer, too!)

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

# YOUR CODE HERE
model = ____

# Check your answer
q_3.check()

In [None]:
#%%RM_IF(PROD)%%
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=input_shape),
    layers.Dense(64, activation='relu'),    
    layers.Dense(64, activation='relu'),
    layers.Dense(1),
])
q_3.assert_check_failed()

In [None]:
#%%RM_IF(PROD)%%
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=input_shape),
    layers.Dense(64, activation='relu'),    
    layers.Dense(64, activation='relu'),
    layers.Dense(1),
])
q_3.assert_check_failed()

In [None]:
#%%RM_IF(PROD)%%
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=input_shape),
    layers.Dense(64, activation='relu'),    
    layers.Dense(64, activation='relu'),
    layers.Dense(1),
])
q_3.assert_check_failed()

In [None]:
#%%RM_IF(PROD)%%
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=input_shape),
    layers.Dense(64, activation='relu'),    
    layers.Dense(64, activation='relu'),
    layers.Dense(1),
])
q_3.assert_check_passed()

In [None]:
# Lines below will give you a hint or solution code
#_COMMENT_IF(PROD)_
q_3.hint()
#_COMMENT_IF(PROD)_
q_3.solution()

# 4) Add Loss and Optimizer

Now, using the `compile` method, add the Adam optimizer and MAE loss.

In [None]:
# YOUR CODE HERE
____

# Check your answer
q_4.check()

In [None]:
#%%RM_IF(PROD)%%
# missing loss or optimizer
model.compile(
    loss='mae'
)
q_4.assert_check_failed()

In [None]:
#%%RM_IF(PROD)%%
# wrong loss or optimizer
model.compile(
    optimizer='sgd',
    loss='mse'
)
q_4.assert_check_failed()

In [None]:
#%%RM_IF(PROD)%%
model.compile(
    optimizer='adam',
    loss='mae'
)
q_4.assert_check_passed()

In [None]:
# Lines below will give you a hint or solution code
#_COMMENT_IF(PROD)_
q_4.hint()
#_COMMENT_IF(PROD)_
q_4.solution()

# 5) Train Model

Now train the network for 100 epochs with a batch size of 128. The input data is `X_train` with target `y_train`; the validation data is `X_valid` and `y_valid`.

In [None]:
# YOUR CODE HERE
history = ____

# Check your answer
q_5.check()

In [None]:
#%%RM_IF(PROD)%%
# Wrong arguments
history = model.fit(
    X_train, y_train,
    validation_data=(X_valid, y_valid),
    batch_size=8,
    epochs=4,
)
q_5.assert_check_failed()

In [None]:
#%%RM_IF(PROD)%%
# Missing validation data
history = model.fit(
    X_train, y_train,
    batch_size=128,
    epochs=100,
)
q_5.assert_check_failed()

In [None]:
#%%RM_IF(PROD)%%
history = model.fit(
    X_train, y_train,
    validation_data=(X_valid, y_valid),
    batch_size=128,
    epochs=100,

q_5.assert_check_passed()

In [None]:
# Lines below will give you a hint or solution code
#_COMMENT_IF(PROD)_
q_5.hint()
#_COMMENT_IF(PROD)_
q_5.solution()

# 6) Evaluate Training

Finally, run the cell below to get a plot of the learning curves.

In [None]:
import pandas as pd

history_df = pd.DataFrame(history.history)
# Start the plot at epoch 10. You can change this to get a different view.
history_df.loc[10:, ['loss', 'val_loss']].plot()

If you trained the model longer, would you expect the loss to decrease further?

In [None]:
# View the solution (Run this cell to receive credit!)
q_6.solution()

# Learning Rate and Batch Size #

Let's see how the learning rate and batch size affect how the training proceeds.

# 7) Observe changes in the loss curve

Change the values for `learning_rate` and `batch_size` and then run the cell. Pay attention to how the loss curve changes. Try the following combinations, or try some of your own:

| `learning_rate` | `batch_size` |
|-----------------|--------------|
| 0.01            | 128          |
| 0.0001          | 128          |
| 1.0             | 128          |
| 0.01            | 8            |
| 0.01            | 1024         |


In [None]:
# YOUR CODE HERE: Experiment with different values for the learning rate and batch size
learning_rate = 0.01
batch_size = 2048


#-------------------------------------------------------------------------------#
bias_init = keras.initializers.constant(y_train.median()) # you can ignore!
model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=input_shape),
    layers.Dense(64, activation='relu'),
    layers.Dense(64, activation='relu'),    
    layers.Dense(1, bias_initializer=bias_init)
                 
])

optimizer = keras.optimizers.SGD(learning_rate=learning_rate)
model.compile(
    optimizer=optimizer,
    loss='mae'
)
history = model.fit(
    X_train, y_train,
    batch_size=batch_size,
    epochs=100,
    verbose=0, # turn off output
)

history_df = pd.DataFrame(history.history)
history_df.loc[0:, 'loss'].plot()
plt.show();

What effect did changing the learning rate have? What effect does changing the batch size have?

In [None]:
# View the solution (Run this cell to receive credit!)
q_7.solution()