In this notebook, we will look at incorporating batch normalization into our models and look at an example of how we do this in practice.

# 1) Import

In [None]:
import tensorflow as tf
print(tf.__version__)

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, BatchNormalization, Dropout

# 2) Load Data 

We will be working with the diabetes dataset. 

Let's load and pre-process the dataset.

In [None]:
# Load the dataset

from sklearn.datasets import load_diabetes

In [None]:
# call the load diabetes function which returns a dictionary containing all of the information about the data set.

diabetes_dataset = load_diabetes()

In [None]:
# print the keys

print(diabetes_dataset.keys())

In [None]:
# take a look at the dataset description

print(diabetes_dataset['DESCR'])

In [None]:
# Save the input and target variables

data = diabetes_dataset['data']
targets = diabetes_dataset['target']

In [None]:
print("train X shape: ", data.shape) 
print("train y shape: ", targets.shape) 

In [None]:
print("minimum train y value: ", min(targets))
print("maximum train y value: ", max(targets))

In [None]:
# Normalize the target data (this will make clearer training curves, for example loss curve to be around 0 and 1)

targets = (targets - targets.mean()) / (targets.std())

In [None]:
print("minimum train y value: ", min(targets))
print("maximum train y value: ", max(targets))

In [None]:
# Split the dataset into training and test datasets 

from sklearn.model_selection import train_test_split

train_data, test_data, train_targets, test_targets = train_test_split(data, targets, test_size=0.1)

# 3) Model without Batch Normalization

In [None]:
# Build the model

model = Sequential([
    Dense(64, input_shape=[train_data.shape[1],], activation="relu"),
    Dropout(0.5),
    Dense(256, activation='relu'),
    Dense(1)
])

In [None]:
# Print the model summary

model.summary()

In [None]:
# Compile the model

model.compile(optimizer='adam',
              loss='mse',
              metrics=['mae'])

In [None]:
# Train the model

history = model.fit(train_data, train_targets, epochs=100, validation_split=0.15, batch_size=64,verbose=2)

In [None]:
# Plot the learning curves

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

frame = pd.DataFrame(history.history)
epochs = np.arange(len(frame))

fig = plt.figure(figsize=(12,4))

# Loss plot
ax = fig.add_subplot(121)
ax.plot(epochs, frame['loss'], label="Train")
ax.plot(epochs, frame['val_loss'], label="Validation")
ax.set_xlabel("Epochs")
ax.set_ylabel("Loss")
ax.set_title("Loss vs Epochs")
ax.legend()

# Accuracy plot
ax = fig.add_subplot(122)
ax.plot(epochs, frame['mae'], label="Train")
ax.plot(epochs, frame['val_mae'], label="Validation")
ax.set_xlabel("Epochs")
ax.set_ylabel("Mean Absolute Error")
ax.set_title("Mean Absolute Error vs Epochs")
ax.legend()

# 4)Model with Batch Normalization

We can implement batch normalization into our model by adding it in the same way as any other layer.

In [None]:
# Build the model
#model = Sequential([
#    Dense(64, input_shape=[train_data.shape[1],], activation="relu"),
#    Dropout(0.5),
#    Dense(256, activation='relu'),
#    Dense(1)
#])

model = Sequential([
    Dense(64, input_shape=[train_data.shape[1],], activation="relu"),
    BatchNormalization(),  # <- Batch normalization layer 1
    Dropout(0.5),
    Dense(256, activation='relu'),
    BatchNormalization(),  # <- Batch normalization layer 2
    Dense(1)
])

In [None]:
# Print the model summary

model.summary()

In [None]:
# Compile the model

model.compile(optimizer='adam',
              loss='mse',
              metrics=['mae'])

In [None]:
# Train the model

history = model.fit(train_data, train_targets, epochs=100, validation_split=0.15, batch_size=64,verbose=2)

In [None]:
# Plot the learning curves

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

frame = pd.DataFrame(history.history)
epochs = np.arange(len(frame))

fig = plt.figure(figsize=(12,4))

# Loss plot
ax = fig.add_subplot(121)
ax.plot(epochs, frame['loss'], label="Train")
ax.plot(epochs, frame['val_loss'], label="Validation")
ax.set_xlabel("Epochs")
ax.set_ylabel("Loss")
ax.set_title("Loss vs Epochs")
ax.legend()

# Accuracy plot
ax = fig.add_subplot(122)
ax.plot(epochs, frame['mae'], label="Train")
ax.plot(epochs, frame['val_mae'], label="Validation")
ax.set_xlabel("Epochs")
ax.set_ylabel("Mean Absolute Error")
ax.set_title("Mean Absolute Error vs Epochs")
ax.legend()

### Customising parameters

Recall that there are some parameters and hyperparameters associated with batch normalization.

* The hyperparameter **momentum** is the weighting given to the previous running mean when re-computing it with an extra minibatch. By **default**, it is set to 0.99.

* The hyperparameter **$\epsilon$** is used for numeric stability when performing the normalization over the minibatch. By **default** it is set to 0.001.

* The parameters **$\beta$** and **$\gamma$** are used to implement an affine transformation after normalization. By **default**, $\beta$ is an all-zeros vector, and $\gamma$ is an all-ones vector.

These can all be changed (along with various other properties) by adding optional arguments to `tf.keras.layers.BatchNormalization()`.

We can also specify the axis for batch normalization. By default, it is set as -1.

Let's see an example.

In [None]:
# the previous model
# model = Sequential([
#     Dense(64, input_shape=[train_data.shape[1],], activation="relu"),
#     BatchNormalization(),  # <- Batch normalization layer 1
#     Dropout(0.5),
#     Dense(256, activation='relu'),
#     BatchNormalization(), # <- Batch normalization layer 2
#     Dense(1)
# ])

model = Sequential([
    Dense(64, input_shape=[train_data.shape[1],], activation="relu"),
    BatchNormalization(),  # <- Batch normalization layer 1
    Dropout(0.5),
    Dense(256, activation='relu'),
    #BatchNormalization(),
    #Dense(1)
])

# Notice that in this new model,we still have not added the output layer because we still have more layers to add!

In [None]:
# Add a customised batch normalization layer 2

model.add(tf.keras.layers.BatchNormalization(
    momentum=0.95, # default is 0.99
    epsilon=0.005, #default is 0.001
    axis = -1, #default is -1
    beta_initializer=tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.05), # default is beta_initializer='zeros'
    gamma_initializer=tf.keras.initializers.Constant(value=0.9) # default is gamma_initializer='ones'
))

In [None]:
# Add the output layer

model.add(Dense(1))

In [None]:
# Print the model summary

model.summary()

Let's now compile and fit our model with batch normalization, and track the progress on training and validation sets.

First we compile our model.

In [None]:
# Compile the model

model.compile(optimizer='adam',
              loss='mse',
              metrics=['mae'])

Now we fit the model to the data.

In [None]:
# Train the model

history = model.fit(train_data, train_targets, epochs=100, validation_split=0.15, batch_size=64,verbose=2)

Finally, we plot training and validation loss and accuracy to observe how the accuracy of our model improves over time.

In [None]:
# Plot the learning curves

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

frame = pd.DataFrame(history.history)
epochs = np.arange(len(frame))

fig = plt.figure(figsize=(12,4))

# Loss plot
ax = fig.add_subplot(121)
ax.plot(epochs, frame['loss'], label="Train")
ax.plot(epochs, frame['val_loss'], label="Validation")
ax.set_xlabel("Epochs")
ax.set_ylabel("Loss")
ax.set_title("Loss vs Epochs")
ax.legend()

# Accuracy plot
ax = fig.add_subplot(122)
ax.plot(epochs, frame['mae'], label="Train")
ax.plot(epochs, frame['val_mae'], label="Validation")
ax.set_xlabel("Epochs")
ax.set_ylabel("Mean Absolute Error")
ax.set_title("Mean Absolute Error vs Epochs")
ax.legend()

## Further reading and resources 
* https://keras.io/layers/normalization/
* https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/layers/BatchNormalization