# Using Callbacks in Keras

In this notebook, we well see how to use pre-defined and custom callbacks in Keras for tasks such as chekpointing, learning rate scheduling, etc.

We'll use the same simple dataset and linear model of the previous notebook.


In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing

2022-11-01 14:44:16.341915: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-01 14:44:16.476169: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-01 14:44:16.481176: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-11-01 14:44:16.481198: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if yo

##### Download the Auto-MPG dataset
 
Download the Auto-MPG dataset (seen in a previous notebook).

In [2]:
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight',
                'Acceleration', 'Model Year', 'Origin']

dataset = pd.read_csv(url, names=column_names, na_values='?', comment='\t', sep=' ', skipinitialspace=True)
dataset = dataset.dropna()
dataset['Origin'] = dataset['Origin'].map({1: 'USA', 2: 'Europe', 3: 'Japan'})
dataset = pd.get_dummies(dataset, prefix='', prefix_sep='')
dataset.tail()

Unnamed: 0,MPG,Cylinders,Displacement,Horsepower,Weight,Acceleration,Model Year,Europe,Japan,USA
393,27.0,4,140.0,86.0,2790.0,15.6,82,0,0,1
394,44.0,4,97.0,52.0,2130.0,24.6,82,1,0,0
395,32.0,4,135.0,84.0,2295.0,11.6,82,0,0,1
396,28.0,4,120.0,79.0,2625.0,18.6,82,0,0,1
397,31.0,4,119.0,82.0,2720.0,19.4,82,0,0,1


Split training and test set, separate features and labels:

In [3]:
train_dataset = dataset.sample(frac=0.8, random_state=0)
test_dataset = dataset.drop(train_dataset.index)

train_features = train_dataset.copy()
test_features = test_dataset.copy()

train_labels = train_features.pop('MPG')
test_labels = test_features.pop('MPG')

##### Build the model

Let's build a simple linear regression model (seen in a previous notebook) to test different callbacks during its training.

We use a `get_model()` function so that we can re-create and re-compile the model from scratch multiple times easily:

In [4]:
def get_model(train_features):
    normalizer = preprocessing.Normalization(input_shape=(train_features.shape[1],))
    normalizer.adapt(np.array(train_features))
    
    model = keras.Sequential([
        normalizer,
        layers.Dense(units=1)
    ])
    
    model.compile(
        optimizer=tf.optimizers.Adam(learning_rate=0.1),
        loss='mse', metrics=['mae', 'mse']
    )
    
    return model

## Early Stopping callback

Use an *early stopping* callback to stop training when it reaches stability.

The `monitor` parameter specifies the loss/metric to be monitored, and the `patience` parameters specifies the number of non-improving epochs to wait before stopping:

In [5]:
es_callback = keras.callbacks.EarlyStopping(monitor='val_loss', patience=2, verbose=1)

# re-create the model to restart training every time
model = get_model(train_features)
history = model.fit(train_features, train_labels, epochs=200, validation_split = 0.2, callbacks=[es_callback])

2022-11-01 14:44:49.678214: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-11-01 14:44:49.678271: W tensorflow/stream_executor/cuda/cuda_driver.cc:263] failed call to cuInit: UNKNOWN ERROR (303)
2022-11-01 14:44:49.678294: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (p-7817b2ad-42a3-441f-8072-b020be286d3c): /proc/driver/nvidia/version does not exist
2022-11-01 14:44:49.678780: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoc

As you can see, the training stopped after about 60/70 epochs, rather than running for the entire 200 epochs specified in `fit()`.

## Checkpoint Callback

Let's add a second callback to save a model checkpoint after every epoch. Notice that we can pass multiple callbacks at the same time to `fit()`.

In [6]:
cp_callback = keras.callbacks.ModelCheckpoint(
     './callback_test_chkp/chkp_{epoch:02d}',
    # './callback_test_chkp/chkp_best',
    monitor='val_loss',
    verbose=0, 
    save_best_only=False,
    # save_best_only=True,
    save_weights_only=False,
    mode='auto',
    save_freq='epoch'
)

In [7]:
model = get_model(train_features)
history = model.fit(train_features, train_labels, epochs=200, validation_split = 0.2,
                                callbacks=[es_callback, cp_callback])

Epoch 1/200
1/8 [==>...........................] - ETA: 1s - loss: 514.4294 - mae: 20.4803 - mse: 514.4294INFO:tensorflow:Assets written to: ./callback_test_chkp/chkp_01/assets
Epoch 2/200
1/8 [==>...........................] - ETA: 0s - loss: 479.4446 - mae: 21.1589 - mse: 479.4446INFO:tensorflow:Assets written to: ./callback_test_chkp/chkp_02/assets
Epoch 3/200
1/8 [==>...........................] - ETA: 0s - loss: 484.2189 - mae: 21.7836 - mse: 484.2189INFO:tensorflow:Assets written to: ./callback_test_chkp/chkp_03/assets
Epoch 4/200
1/8 [==>...........................] - ETA: 0s - loss: 467.8256 - mae: 21.1964 - mse: 467.8256INFO:tensorflow:Assets written to: ./callback_test_chkp/chkp_04/assets
Epoch 5/200
1/8 [==>...........................] - ETA: 0s - loss: 383.8685 - mae: 19.0767 - mse: 383.8685INFO:tensorflow:Assets written to: ./callback_test_chkp/chkp_05/assets
Epoch 6/200
1/8 [==>...........................] - ETA: 0s - loss: 381.0430 - mae: 19.3896 - mse: 381.0430INFO:tens

##### Restore a saved checkpoint

Let's try loading back two different models, and let's evaluate them on training data.

In [9]:
# model_epoch1 = keras.models.load_model('./callback_test_chkp/chkp_best')
model_epoch1 = keras.models.load_model('./callback_test_chkp/chkp_73')
model_epoch1.evaluate(train_features, train_labels,)



[11.233522415161133, 2.541062355041504, 11.233522415161133]

In [10]:
model_epoch10 = keras.models.load_model('./callback_test_chkp/chkp_10')
model_epoch10.evaluate(train_features, train_labels,)



[263.16888427734375, 15.786527633666992, 263.16888427734375]

## Learning Rate Scheduling

Let's try to change the learning rate by reducing it by 0.01 after every epoch. This is just to demonstrate LR scheduling, it is not a particularly useful scheduling mechanism.

In [11]:
def my_schedule(epoch, lr):
    return max(lr - 0.01, 0.01)

Test if the schedule works for different input LR values:

In [12]:
print(my_schedule(1, 0.05))
print(my_schedule(1, 0.01))

0.04
0.01


In [13]:
lr_callback = keras.callbacks.LearningRateScheduler(my_schedule, verbose=1)

In [14]:
model = get_model(train_features)
history = model.fit(train_features, train_labels, epochs=200, validation_split = 0.2,
                                callbacks=[lr_callback, es_callback])


Epoch 1: LearningRateScheduler setting learning rate to 0.09000000149011612.
Epoch 1/200

Epoch 2: LearningRateScheduler setting learning rate to 0.08000000357627869.
Epoch 2/200

Epoch 3: LearningRateScheduler setting learning rate to 0.07000000566244126.
Epoch 3/200

Epoch 4: LearningRateScheduler setting learning rate to 0.06000000774860382.
Epoch 4/200

Epoch 5: LearningRateScheduler setting learning rate to 0.05000000610947609.
Epoch 5/200

Epoch 6: LearningRateScheduler setting learning rate to 0.040000004470348356.
Epoch 6/200

Epoch 7: LearningRateScheduler setting learning rate to 0.030000002831220625.
Epoch 7/200

Epoch 8: LearningRateScheduler setting learning rate to 0.020000003054738043.
Epoch 8/200

Epoch 9: LearningRateScheduler setting learning rate to 0.010000003278255462.
Epoch 9/200

Epoch 10: LearningRateScheduler setting learning rate to 0.01.
Epoch 10/200

Epoch 11: LearningRateScheduler setting learning rate to 0.01.
Epoch 11/200

Epoch 12: LearningRateScheduler

## Custom Callback N.1

Let's write a simple custom callback that logs the loss and metrics values after every batch, epoch, etc.

In [15]:
class CustomLogger(keras.callbacks.Callback):
    def on_train_begin(self, logs=None):
        print("Starting training; log content: {}".format(logs))

    def on_train_end(self, logs=None):
        print("Stop training; log content: {}".format(logs))

    def on_epoch_end(self, epoch, logs=None):
        print("End epoch {} of training; log content: {}".format(epoch, logs))

    def on_train_batch_end(self, batch, logs=None):
        print("...Training: end of batch {}; log content: {}".format(batch, logs))


In [16]:
log_callback = CustomLogger()

model = get_model(train_features)
history = model.fit(train_features, train_labels, epochs=200,
                                verbose=0, #verbose=0 to avoid mixing our prints and the default ones of keras
                                validation_split = 0.2,
                                callbacks=[log_callback]
)

...Training: end of batch 1; log content: {'loss': 11.813257217407227, 'mae': 2.599332809448242, 'mse': 11.813257217407227}
...Training: end of batch 2; log content: {'loss': 10.3507719039917, 'mae': 2.438214063644409, 'mse': 10.3507719039917}
...Training: end of batch 3; log content: {'loss': 11.39717960357666, 'mae': 2.5712409019470215, 'mse': 11.39717960357666}
...Training: end of batch 4; log content: {'loss': 11.581306457519531, 'mae': 2.6131412982940674, 'mse': 11.581306457519531}
...Training: end of batch 5; log content: {'loss': 11.448307037353516, 'mae': 2.5711774826049805, 'mse': 11.448307037353516}
...Training: end of batch 6; log content: {'loss': 11.949694633483887, 'mae': 2.6010830402374268, 'mse': 11.949694633483887}
...Training: end of batch 7; log content: {'loss': 11.411012649536133, 'mae': 2.556143045425415, 'mse': 11.411012649536133}
End epoch 89 of training; log content: {'loss': 11.411012649536133, 'mae': 2.556143045425415, 'mse': 11.411012649536133, 'val_loss': 1

# Custom Callback N. 2

Let us write another example of custom callback. This time, let's implement a custom early stopping mechanism on a pre-defined Validation MAE value.

In [17]:
class MyEarlyStopping(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs):
        if(logs['val_mae']< 10.0):
            print("\nReached MAE < 10.0, so cancelling training!")
            self.model.stop_training = True


In [18]:
my_es_callback = MyEarlyStopping()

model = get_model(train_features)
history = model.fit(train_features, train_labels, epochs=200, validation_split = 0.2, callbacks=[my_es_callback])

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
1/8 [==>...........................] - ETA: 0s - loss: 107.9358 - mae: 9.6960 - mse: 107.9358
Reached MAE < 10.0, so cancelling training!


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=7817b2ad-42a3-441f-8072-b020be286d3c' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>