# Checkpointing and early stopping

This section covers two techniques for making training more economical: checkpointing and early stopping. Checkpointing is useful when a model overtrains and diverges, and we want to recover the model weights at the point of convergence without the additional cost of retraining. You can think of early stopping as an extension to checkpointing. We have a monitoring system detect divergence at the earliest moment that it occurs, and then we stop training, saving additional cost when recovering a checkpoint at the point of divergence.


## Check Pointing
Checkpointing is periodically saving the learned model parameters and current hyperparameter values during training. There are two reasons for doing this:
1. To be able to resume training of a model where you left off, instead of restarting the training from the beginning
2. To identify a past point in training where the model gave the best results

In the first case, we might want to split training across sessions as a way of managing resources. For example, we might reserve (or be authorized) one hour a day for training. At the end of the one-hour training each day, the training is checkpointed. The following day, training is resumed by restoring from the checkpoint. For example, you might be working in a research organization that has a fixed budget for computing expenses, and your team is experimenting with training a model with substantial computing costs. To manage the budget, your team might be allocated a limit of daily computing expenses.

Why wouldn’t saving the model’s weights and biases be enough? In neural networks, some hyperparameter values will dynamically change, such as the learning rate and decay. We would want to resume at the same hyperparameter values at the time the training was paused.

In another scenario, we might implement continuous learning as a part of a continuous integration and delivery (CI/CD) process. In this scenario, new labeled
images are **continuously added** to the training data, and we want to only **incrementally retrain the model instead of retraining from scratch** on each integration cycle.

 In the second case, we might want to find the best result after a model has trained past the best optimum, and started to diverge and/or overfit. We would not want to start retraining from scratch with fewer epochs (or other hyperparameter changes), but instead identify the epoch that achieved the best results, and restore (set) the learned model parameters to those that were checkpointed at the end of that epoch.




In [1]:
from keras.callbacks import ModelCheckpoint
file_path = "./saved_model/mymodel_{epoch:02d}.ckpt"
checkpoint = ModelCheckpoint(file_path)

In [15]:
from keras import Sequential
from keras.layers import Dense
import tensorflow as tf
model = Sequential([
    Dense(3, activation="relu", input_shape=(28,3))
])
model.compile(loss="mse", optimizer="adam", metrics=tf.keras.metrics.RootMeanSquaredError())
model.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_4 (Dense)             (None, 28, 3)             12        
                                                                 
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________


In [16]:
import numpy as np
X = np.random.randn(28,28,3)
X = tf.convert_to_tensor(X)
y = 2*X**2 + 0.5*X +10
model.fit(X, y, epochs=10, callbacks=[checkpoint])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1fe4cbd7e80>

Alternatively, we can save the current best checkpoint with the parameter **save_best_only=True** and the **parameter monitor to the measurement** to base the decision on.

For example, if the parameter monitor is set to val_acc, it will write a checkpoint only if the validation accuracy is higher than the last saved checkpoint. If the parameter is set to val_loss, it will write a checkpoint only if the validation loss is lower than the last saved checkpoint:


In [22]:
checkpoint_val_acc = ModelCheckpoint(filepath=file_path, save_best_only=True, monitor="val_acc")
model2 = Sequential([
    Dense(3, activation='relu', input_shape=(28,3))])
model2.compile(loss='mse', optimizer="adam", metrics=[tf.keras.metrics.RootMeanSquaredError()])
model2.fit(X, y, epochs=10, callbacks=[checkpoint_val_acc])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1fe502059d0>

In [21]:
X.shape

TensorShape([28, 28, 3])

## Early stopping
An early stop is setting a condition upon which training is *terminated earlier than the set limits* (for example, number of epochs).

This is generally set to conserve resources and/or prevent overtraining when a goal objective is reached, such as a level of accuracy or convergence on evaluation loss.

For example, we might set a training for 20 epochs, which average 30 minutes each, for a total of 10 hours. But if the objective is met after 8 epochs, it would be ideal to terminate the training, saving 6 hours of resources.

An early stop is specified in a manner similar to a checkpoint. An EarlyStopping object is instantiated and configured with a target goal, and passed to the callbacks parameter of the fit() method. In this example, training will be stopped early only if the validation loss stops, reducing from the previous epoch:








In [27]:
from keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='loss')
model3 = Sequential([Dense(3, activation='relu', input_shape=(28,3))])
model3.compile(loss='mse', optimizer='adam', metrics=[tf.keras.metrics.RootMeanSquaredError()])
model3.fit(X, y, epochs=100, callbacks=[early_stop, checkpoint_val_acc])

Epoch 1/100
Epoch 2/100


<keras.callbacks.History at 0x1fe4cbd7580>

In [28]:
early_stop = EarlyStopping(monitor='loss',  patience=3)
model4 = Sequential([Dense(3, activation='relu', input_shape=(28,3))])
model4.compile(loss='mse', optimizer='adam', metrics=[tf.keras.metrics.RootMeanSquaredError()])
model4.fit(X, y, epochs=100, callbacks=[early_stop, checkpoint_val_acc])

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x1fe4de12e80>

<img src="img_10.png" />

<img src="img_11.png" />

The parameter:
 1. *patience* specifies a minimum number of epochs without improvement before early stopping, and
 2. *min_delta* specifies a minimum threshold to determine whether the model improved.