-sandbox
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px; height: 163px">
</div>

## Intro to Neural Networks with Keras II

Congrats on building your first neural network! In this notebook, we will cover even more topics to improve your model building. After you learn the concepts here, you will apply them to the neural network you just created.

We will use the California Housing Dataset.

Objectives:
   * Data Normalization
   * Custom Metrics
   * Validation data
   * Checkpointing/callbacks
   * Saving Models

In [3]:
from sklearn.datasets.california_housing import fetch_california_housing
from sklearn.model_selection import train_test_split
import numpy as np
np.random.seed(0)

cal_housing = fetch_california_housing()

# split 80/20 train-test
X_train, X_test, y_train, y_test = train_test_split(cal_housing.data,
                                                        cal_housing.target,
                                                        test_size=0.2,
                                                        random_state=1)

print(cal_housing.DESCR)

Let's take a look at the distribution of our features.

In [5]:
import pandas as pd

xTrainDF = pd.DataFrame(X_train, columns=cal_housing.feature_names)

#print(xTrainDF.describe())


## 1. Data Normalization

Because our features are all on different scales, it's going to be more difficult for our neural network during training. Let's do feature-wise normalization.

We are going to use the [StandardScaler](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) from Sklearn, which will remove the mean (zero-mean) and scale to unit variance.

$$x' = \frac{x - \bar{x}}{\sigma}$$

In [7]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Keras Model
![Life Cycle](https://brookewenig.github.io/img/DL/Life-Cycle-for-Neural-Network-Models-in-Keras.png)

In [9]:
X_train.shape

In [10]:
import tensorflow as tf
tf.set_random_seed(42) # For reproducibility

from keras.models import Sequential
from keras.layers import Dense

model = Sequential([
  Dense(20, input_dim=8, activation='relu'),
  Dense(20, activation='relu'),
  Dense(1, activation='linear')
])

## 2. Custom Metrics

Up until this point, we used MSE as our loss function and metric of choice. But what if we want to use RMSE?

In [12]:
model.compile(optimizer="adam", loss="rmse")

Looks like we can't use it in our loss function. What about the metrics we print out during the evaluation?

In [14]:
model.compile(optimizer="adam", loss="mse", metrics=["rmse"])

Luckily, Keras allows you to define custom metrics. So, you might implement RMSE as below.

In [16]:
from keras import backend
 
def rmse(y_true, y_pred):
	return backend.sqrt(backend.mean(backend.square(y_pred - y_true), axis=-1))

In [17]:
model.compile(optimizer="adam", loss="mse", metrics=["mse", rmse])

## 3. Validation Data

Let's take a look at the [.fit()](https://keras.io/models/sequential/) method in the docs to see all of the options we have available! 

We can either explicitly specify a validation dataset, or we can specify a fraction of our training data to be used as our validation dataset.

The reason why we need a validation dataeset is to evaluate how well we are performing on unseen data (neural networks will overfit if you train them for too long!).

We can specify `validation_split` to be any value between 0.0 and 1.0 (defaults to 0.0).

In [19]:
history = model.fit(X_train, y_train, validation_split=.2, epochs=10, verbose=2)

Wow! Look at how much lower our loss is to start, and that it is able to converge more quickly thanks to the data normalization!!

But, let's test: Is that RMSE correct?

In [21]:
import numpy as np

np.sqrt(history.history['mean_squared_error'][-1]) # Get MSE of last training epoch

#### Gotcha!! 

Because Keras computes the loss batch by batch, if we take the square root of the total MSE, it does not yield the same result as this RMSE function.

You can see Francois Challot's [comment](https://github.com/keras-team/keras/issues/1170) on this issue, recommending to stick with MSE. But for teaching purposes, now you see how to wrtie custom metric functions!

## 4. Checkpointing

After each epoch, we want to save the model. However, we will pass in the flag `save_best_only=True`, which will only save the model if the validation loss decreased. This way, if our machine crashes or we start to overfit, we can always go back to the "good" state of the model.

To accomplish this, we will use the ModelCheckpoint [callback](https://keras.io/callbacks/). History is an example of a callback that is automatically applied to every Keras model.

In [24]:
from keras.callbacks import ModelCheckpoint

filepath = '/tmp/02Keras_checkpoint_weights.hdf5'
checkpointer = ModelCheckpoint(filepath=filepath, verbose=1, save_best_only=True)

history = model.fit(X_train, y_train, validation_split=.2, epochs=10, verbose=2, callbacks=[checkpointer])

## 5. Save Model/Load Model

Whenever you train neural networks, you want to save them. This way, you can reuse them later! With the checkpointing agove, we were saving the model weights. Let's try to load them into a new model.

In [26]:
newModel = Sequential()

newModel.load_weights(filepath)

We just saved our model weights with the checkpointing above. However, we also need the model configuration if we want to load the weights into a new model object.

In [28]:
from keras.models import model_from_yaml

yaml_string = model.to_yaml() # Returns a representation of the model as a YAML string (only model architecture, not weights)
newModel = model_from_yaml(yaml_string)

Check that the model architecture is the same.

In [30]:
newModel.summary()

Now we can load in the weights for this model architecture.

In [32]:
newModel.load_weights(filepath)

Let's train it for one more epoch (we need to recompile), and then save those weights.

In [34]:
newModel.compile(optimizer="adam", loss="mse")
newModel.fit(X_train, y_train, validation_split=.2, epochs=1, verbose=2)
newModel.save_weights(filepath)

Now it's your turn to try out these techniques on the Boston Housing Dataset!

-sandbox
&copy; 2018 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>