<div style="text-align:center;font-size:22pt; font-weight:bold;color:white;border:solid black 1.5pt;background-color:#1e7263;">
    Callbacks Overview
</div>

In [1]:
# ======================================================================= #
# Course: Deep Learning Complete Course (CS-501)
# Author: Dr. Saad Laouadi
# Institution: Quant Coding Versity Academy
#
# ==========================================================
# Lesson: Understanding Callbacks in Keras API
#         
# ==========================================================
# ## Learning Objectives
# This guide will enable you to:
# 1. 
# =======================================================================
#          Copyright © Dr. Saad Laouadi 2024
# =======================================================================

In [2]:
# ==================================================== #
#        Load Required Libraries
# ==================================================== #

import os  

# Disable Metal API Validation
os.environ["METAL_DEVICE_WRAPPER_TYPE"] = "0"  

import tensorflow as tf
from tensorflow.keras import callbacks

print("="*72)

%reload_ext watermark
%watermark -a "Dr. Saad Laouadi" -u -d -m

print("="*72)
print("Imported Packages and Their Versions:")
print("="*72)

%watermark -iv
print("="*72)

# Global Config
RANDOM_STATE = 101

Author: Dr. Saad Laouadi

Last updated: 2025-01-04

Compiler    : Clang 14.0.6 
OS          : Darwin
Release     : 24.1.0
Machine     : arm64
Processor   : arm
CPU cores   : 16
Architecture: 64bit

Imported Packages and Their Versions:
tensorflow: 2.16.2
keras     : 3.6.0



In [3]:
# List the available callbacks in Keras API
for ind, callback in enumerate(dir(callbacks), 1):
    if not callback.startswith('_'):
        print(f"{ind:0>2d} ==> {callback}")

01 ==> BackupAndRestore
02 ==> CSVLogger
03 ==> Callback
04 ==> CallbackList
05 ==> EarlyStopping
06 ==> History
07 ==> LambdaCallback
08 ==> LearningRateScheduler
09 ==> ModelCheckpoint
10 ==> ProgbarLogger
11 ==> ReduceLROnPlateau
12 ==> RemoteMonitor
13 ==> SwapEMAWeights
14 ==> TensorBoard
15 ==> TerminateOnNaN


In [5]:
# Check the help for Early Stopping Callback
?tf.keras.callbacks.EarlyStopping

[0;31mInit signature:[0m
[0mtf[0m[0;34m.[0m[0mkeras[0m[0;34m.[0m[0mcallbacks[0m[0;34m.[0m[0mEarlyStopping[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mmonitor[0m[0;34m=[0m[0;34m'val_loss'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmin_delta[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mpatience[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mverbose[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmode[0m[0;34m=[0m[0;34m'auto'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mbaseline[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mrestore_best_weights[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mstart_from_epoch[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
Stop training when a monitored metric has stopped improving.

Assuming the goal of a training is to minimiz

In [17]:
# Check the help for ModelCheckpoint Callback
?tf.keras.callbacks.ModelCheckpoint

[0;31mInit signature:[0m
[0mtf[0m[0;34m.[0m[0mkeras[0m[0;34m.[0m[0mcallbacks[0m[0;34m.[0m[0mModelCheckpoint[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mfilepath[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmonitor[0m[0;34m=[0m[0;34m'val_loss'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mverbose[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msave_best_only[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msave_weights_only[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmode[0m[0;34m=[0m[0;34m'auto'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msave_freq[0m[0;34m=[0m[0;34m'epoch'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0minitial_value_threshold[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
Callback to save the Keras model or model weights at some frequency.

`ModelCheckpoint` callback is used in c

In [18]:
# Check the help for ReduceOnPlateau Callback
?callbacks.ReduceLROnPlateau

[0;31mInit signature:[0m
[0mcallbacks[0m[0;34m.[0m[0mReduceLROnPlateau[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mmonitor[0m[0;34m=[0m[0;34m'val_loss'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfactor[0m[0;34m=[0m[0;36m0.1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mpatience[0m[0;34m=[0m[0;36m10[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mverbose[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmode[0m[0;34m=[0m[0;34m'auto'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmin_delta[0m[0;34m=[0m[0;36m0.0001[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcooldown[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmin_lr[0m[0;34m=[0m[0;36m0.0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m**[0m[0mkwargs[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
Reduce learning rate when a metric has stopped improving.

Models often benefit from reducing the learnin

# Complete List of Keras Callbacks

## Model Checkpointing and Saving

### 1. ModelCheckpoint
- **Purpose**: Saves model weights during training
- **Key Parameters**:
  - `filepath`: Path to save the model file
  - `monitor`: Metric to monitor
  - `save_best_only`: Only save when model improves
  - `save_weights_only`: Save only weights vs entire model
  - `mode`: 'auto', 'min', or 'max'
  - `save_freq`: 'epoch' or integer batch interval

### 2. BackupAndRestore
- **Purpose**: Enables fault tolerance in training
- **Key Parameters**:
  - `backup_dir`: Directory for backup files
  - `save_freq`: Frequency of backups
  - `delete_checkpoint`: Whether to delete backup after restoration

## Training Optimization

### 3. EarlyStopping
- **Purpose**: Stops training when model stops improving
- **Key Parameters**:
  - `monitor`: Metric to monitor
  - `patience`: Number of epochs to wait
  - `restore_best_weights`: Revert to best weights
  - `mode`: 'auto', 'min', or 'max'
  - `min_delta`: Minimum change to qualify as improvement

### 4. ReduceLROnPlateau
- **Purpose**: Reduces learning rate when metrics plateau
- **Key Parameters**:
  - `monitor`: Metric to monitor
  - `factor`: Factor to reduce learning rate by
  - `patience`: Number of epochs to wait
  - `min_lr`: Minimum learning rate
  - `cooldown`: Epochs to wait before resuming

### 5. LearningRateScheduler
- **Purpose**: Dynamically adjusts learning rate
- **Key Parameters**:
  - `schedule`: Function that takes epoch and returns LR
  - `verbose`: Logging verbosity

## Monitoring and Logging

### 6. TensorBoard
- **Purpose**: Enables TensorBoard visualization
- **Key Parameters**:
  - `log_dir`: Directory to save logs
  - `histogram_freq`: Frequency of histogram updates
  - `write_graph`: Whether to visualize graph
  - `write_images`: Log image summaries
  - `update_freq`: 'batch', 'epoch', or integer

### 7. CSVLogger
- **Purpose**: Logs metrics to CSV file
- **Key Parameters**:
  - `filename`: Path to CSV file
  - `separator`: Column separator
  - `append`: Whether to append to existing file

### 8. ProgbarLogger
- **Purpose**: Prints metrics to stdout
- **Key Parameters**:
  - `count_mode`: 'samples' or 'steps'
  - `stateful_metrics`: Metrics that shouldn't be averaged

## Specialized Callbacks

### 9. TerminateOnNaN
- **Purpose**: Terminates training if loss becomes NaN
- **Key Parameters**: None

### 10. RemoteMonitor
- **Purpose**: Sends metrics to remote server
- **Key Parameters**:
  - `root`: Server root URL
  - `path`: Path for metrics
  - `field`: JSON field name
  - `headers`: Custom HTTP headers

### 11. LambdaCallback
- **Purpose**: Creates simple custom callbacks
- **Key Parameters**:
  - `on_epoch_begin`
  - `on_epoch_end`
  - `on_batch_begin`
  - `on_batch_end`
  - `on_train_begin`
  - `on_train_end`

## Base Callback Methods

All callbacks inherit from the base Callback class and can implement these methods:

```python
class CustomCallback(tf.keras.callbacks.Callback):
    def on_train_begin(self, logs=None)
    def on_train_end(self, logs=None)
    def on_epoch_begin(self, epoch, logs=None)
    def on_epoch_end(self, epoch, logs=None)
    def on_test_begin(self, logs=None)
    def on_test_end(self, logs=None)
    def on_predict_begin(self, logs=None)
    def on_predict_end(self, logs=None)
    def on_batch_begin(self, batch, logs=None)
    def on_batch_end(self, batch, logs=None)
```

## Common Usage Pattern

```python
# Combining multiple callbacks
callbacks = [
    ModelCheckpoint(
        'best_model.h5',
        monitor='val_loss',
        save_best_only=True
    ),
    EarlyStopping(
        monitor='val_loss',
        patience=10,
        restore_best_weights=True
    ),
    ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.2,
        patience=5
    ),
    CSVLogger('training.log'),
    TensorBoard(log_dir='./logs')
]

# Using callbacks in model training
model.fit(
    X_train, y_train,
    epochs=100,
    validation_split=0.2,
    callbacks=callbacks
)
```

## Best Practices

1. **Order Matters**: Place critical callbacks (like EarlyStopping) before monitoring callbacks

2. **Resource Management**: Be mindful of disk space when using ModelCheckpoint and TensorBoard

3. **Monitoring**: Always include at least one monitoring callback (CSVLogger or TensorBoard)

4. **Fault Tolerance**: Use BackupAndRestore for long training sessions

5. **Custom Metrics**: When using custom metrics, ensure they're properly logged in callbacks

## Advanced Usage Tips

1. **Chaining Callbacks**: Multiple callbacks can work together:
```python
callbacks = [
    EarlyStopping(patience=10),
    ReduceLROnPlateau(patience=5),  # Tries reducing LR before stopping
]
```

2. **Custom Checkpoint Naming**:
```python
ModelCheckpoint(
    filepath='model_{epoch:02d}-{val_loss:.2f}.h5',
    save_best_only=True
)
```

3. **Dynamic Learning Rate Scheduling**:
```python
def schedule(epoch, lr):
    if epoch < 10:
        return lr
    else:
        return lr * tf.math.exp(-0.1)

LearningRateScheduler(schedule)
```

4. **Custom Training Monitoring**:
```python
class MetricsHistory(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs={}):
        if logs.get('accuracy') > 0.95:
            print('Reached 95% accuracy, stopping training.')
            self.model.stop_training = True
```