# Callbacks

Launching a big model training could be like step into unknown territory. 

You don't know what learning rate is the best or for how many epochs you should train. Doing dry run and than running the whole training again just to stop before best weights is wasteful.

Keras provides multiple callbacks that help developers with monitoring and give them some control over the training process.

Callbacks are objects passed to *fit* functions executed in different points of training.

There are many built-in callbacks, but you can implement your own as well.

https://www.tensorflow.org/api_docs/python/tf/keras/callbacks

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, Flatten, Input
from tensorflow.keras.models import Model
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import fashion_mnist

from IPython.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

## Set up basis model for demonstration
Load and process data.

In [None]:
URL = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
COLUMN_NAMES = ['MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight', 'Acceleration', 'Model Year', 'Origin']

dataset = pd.read_csv(
    URL, 
    names=COLUMN_NAMES,
    na_values='?', 
    sep=' ',
    comment='\t',  
    skipinitialspace=True)

In [None]:
dataset = dataset.dropna()

In [None]:
def standardize(data):
    return (data-data.mean())/data.std()

In [None]:
train_dataset = dataset.sample(frac = 0.8, random_state = 42)
test_dataset = dataset.drop(train_dataset.index)

In [None]:
train_labels = train_dataset.pop('MPG')
test_labels = test_dataset.pop('MPG')

In [None]:
train_features = standardize(train_dataset)
test_features = standardize(test_dataset)

Define model.

In [None]:
input_layer = Input(shape=(7,))
x = Dense(128, activation='relu')(input_layer)
x = Dense(128, activation='relu')(x)
output_layer = Dense(1)(x)
model = Model(inputs = input_layer, outputs = output_layer)

In [None]:
model.compile(loss='mse', optimizer=tf.keras.optimizers.Adam(), metrics=['mean_absolute_error'])

## ModelCheckpoint

One of the most important callbacks - it can save progress after each epoch or track selected measure value and save only the best weights.

https://keras.io/api/callbacks/model_checkpoint/

Params:
* *filepath* - where to save checkpoint(s), if it contains '{epoch}' it will save model after each epoch
* *monitor* - metric to follow, you can add anything you want to from train log - loss, val_loss, custom metrics like val_mean_absolute_error...
* *save_best_only* - if only the best model (measured by monitor) should be saved

Target metric has to be set in model (in case you are not using just a simple loss).

In case of using *val_* metrics you need to provide validation data to the model.

In [None]:
model_checkpoint = keras.callbacks.ModelCheckpoint(
    filepath="checkpoints",
    monitor='val_mean_absolute_error',
    save_best_only=True,
    verbose=1)

## EarlyStopping

Can stop training early when target metric has stopped improving.

https://keras.io/api/callbacks/early_stopping/

Params:
* *monitor* - metric to follow, you can add anything you want to from train log - loss, val_loss, custom metrics like val_mean_absolute_error...
* *patience* - number of epochs without improvement
* *min_delta* - minimal change the will be considered as an improvement
* *restore_best_weights* - if the best weights should be restored or just use the latest weights from training

In [None]:
early_stopping = keras.callbacks.EarlyStopping(
    monitor="val_loss",
    patience=30)

## CSVLogger

Stream the training progress to the file.

https://keras.io/api/callbacks/csv_logger/

Params:
* *filename* - Filename of the CSV file, e.g. 'run/log.csv'.
* *separator* - String used to separate elements in the CSV file.
* *append* - Boolean. True: append if file exists (useful for continuing training). False: overwrite existing file.

In [None]:
csv_logger = keras.callbacks.CSVLogger('train.log', separator=',', append=False)

## Generic Callback

If you need some custom calls, you can use Callback with multiple prepared functions to implement:
* on_batch_begin
* on_batch_end
* on_epoch_begin
* ...

https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/Callback

Data about training are provided in *logs* parameter.

## Setting callbacks to model

Callbacks are set fit function.

In case of mutliple callbacks we can put them into List and set the List into the parameter.

In [None]:
callbacks_list = [
    model_checkpoint,
    early_stopping,
    csv_logger
]

Use set callbacks and run training for 700 epochs.

In [None]:
%%time
history = model.fit(
    train_features,
    train_labels,
    batch_size = 16,
    validation_split=0.2,
    verbose=1, 
    callbacks=callbacks_list,
    epochs=700)

Thaks to early stopping callback interrupts training right before overfitting.

In [None]:
plt.plot(history.history['loss'], label='loss')
plt.plot(history.history['val_loss'], label='val_loss')
plt.xlabel('Epoch')
plt.ylim(0,10)
plt.legend()
plt.grid(True)

## Changing learning rate dynamically

Changing learning rate over the training.

Optimizers like Adam do not used this technique very often since it can scale gradients in some degree.

## LearningRateScheduler

Allows users to set custom learning rate decay.

https://keras.io/api/callbacks/learning_rate_scheduler/

Demonstrations of different learning rates works better on more complicated data.

In [None]:
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

In [None]:
X_train, y_train = train_images[:10000] / 255.0, train_labels[:10000]

In [None]:
input_layer = Input(shape=(28, 28))
x = Flatten()(input_layer)
x = Dense(300, activation='relu')(x)
x = Dense(100, activation='relu')(x)
output_layer = Dense(10, activation='softmax')(x)
model = Model(inputs = input_layer, outputs = output_layer)
model.compile(
            optimizer=tf.keras.optimizers.SGD(learning_rate=0.01), 
            loss='sparse_categorical_crossentropy', 
            metrics=['accuracy'])

In [None]:
history = model.fit(
    X_train, 
    y_train, 
    epochs=20, 
    validation_split=0.2, 
    batch_size=64, 
    verbose=1
)

In [None]:
plt.plot(history.history['loss'], label='loss')
plt.plot(history.history['val_loss'], label='val_loss')
plt.xlabel('Epoch')
plt.ylim(0,1.5)
plt.legend()
plt.grid(True)

Set up base model for learning rate scheduling.

In [None]:
def create_model():
    input_layer = Input(shape=(28, 28))
    x = Flatten()(input_layer)
    x = Dense(300, activation='relu')(x)
    x = Dense(100, activation='relu')(x)
    output_layer = Dense(10, activation='softmax')(x)
    model = Model(inputs = input_layer, outputs = output_layer)
    model.compile(
            optimizer='sgd', 
            loss='sparse_categorical_crossentropy', 
            metrics=['accuracy'])
    return model

In [None]:
model = create_model()

### Exponencial scheduling

Gradually drops the learning rate - in this modification by factor 10 every s epochs.

In [None]:
def exponential_decay(lr0, s):
    def exponential_decay_fn(epoch):
        return lr0 * 0.1**(epoch / s)
    return exponential_decay_fn

In [None]:
lr_scheduler = keras.callbacks.LearningRateScheduler(exponential_decay(lr0 = 0.1, s = 10))

In [None]:
history = model.fit(
    X_train, 
    y_train, 
    epochs=20, 
    validation_split=0.2, 
    batch_size=64, 
    verbose=1,
    callbacks=[lr_scheduler]
)

In [None]:
plt.plot(history.history['lr'])
plt.title('Learning rate')
plt.xlabel('Epochs')
plt.ylabel('Learning rate')
plt.grid()
plt.show()

Thanks to lowering learning rate model converges faster.

In [None]:
plt.plot(history.history['loss'], label='loss')
plt.plot(history.history['val_loss'], label='val_loss')
plt.xlabel('Epoch')
plt.ylim(0,1.5)
plt.legend()
plt.grid(True)

## Piecewise Constant Scheduling

Use different constant learning rate for different numbers of epochs.

In [None]:
model = create_model()

In [None]:
def piecewise_constant():
    def piecewise_constant_fn(epoch):
            if epoch < 5:
                return 0.01
            elif epoch < 15:
                return 0.005
            else:
                return 0.001
    return piecewise_constant_fn

In [None]:
lr_scheduler = keras.callbacks.LearningRateScheduler(piecewise_constant())

In [None]:
history = model.fit(
    X_train, 
    y_train, 
    epochs=20, 
    validation_split=0.2, 
    batch_size=64, 
    verbose=1,
    callbacks=[lr_scheduler]
)

In [None]:
plt.plot(history.history['lr'])
plt.title('Learning rate')
plt.xlabel('Epochs')
plt.ylabel('Learning rate')
plt.grid()
plt.show()

In [None]:
plt.plot(history.history['loss'], label='loss')
plt.plot(history.history['val_loss'], label='val_loss')
plt.xlabel('Epoch')
plt.ylim(0,1.5)
plt.legend()
plt.grid(True)

## ReduceLROnPlateau

Reduce learning rate when metric stops improving.

Params:
* *monitor* - metric to follow
* *patience* - number of epochs without improvement
* *factor* - factor for reducing learning rate

https://keras.io/api/callbacks/reduce_lr_on_plateau/

In [None]:
# default metric is "val_loss"
lr_scheduler = keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=5)

In [None]:
input_layer = Input(shape=(28, 28))
x = Flatten()(input_layer)
x = Dense(300, activation='relu')(x)
x = Dense(100, activation='relu')(x)
output_layer = Dense(10, activation='softmax')(x)
model = Model(inputs = input_layer, outputs = output_layer)
model.compile(
    # start with high learning rate
    optimizer=keras.optimizers.SGD(learning_rate=0.1), 
    loss='sparse_categorical_crossentropy', 
    metrics=['accuracy'])

In [None]:
history = model.fit(
    X_train, 
    y_train, 
    epochs=25,
    validation_split=0.2,
    batch_size=32,
    callbacks=[lr_scheduler])

In [None]:
plt.plot(history.history['lr'])
plt.title('Learning rate')
plt.xlabel('Epochs')
plt.ylabel('Learning rate')
plt.grid()
plt.show()

In [None]:
plt.plot(history.history['loss'], label='loss')
plt.plot(history.history['val_loss'], label='val_loss')
plt.xlabel('Epoch')
plt.ylim(0,1.5)
plt.legend()
plt.grid(True)