<a href="https://colab.research.google.com/github/Deep-Learning-Challenge/challenge-notebooks/blob/master/1.Multilayer%20Perceptrons/3.Advanced%20Lessons/2.Keep%20The%20Best%20Models%20During%20Training%20With%20Checkpointing.ipynbb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" /></a>

# Keep The Best Models During Training With Checkpointing

Deep learning models can take hours, days, or even weeks to train, and if a training run is stopped unexpectedly, you can lose a lot of work. In this lesson, you will discover how you can checkpoint your deep learning models during training in Python using the Keras library. After completing this lesson, you will know:

* The importance of checkpointing neural network models when training.
* How to checkpoint each improvement to a model during training.
* How to checkpoint the very best model observed during training.

Let's get started.

## Runtime Setup

In [None]:
import sys

dataset_name = "pima-indians-diabetes.data.csv"
if 'google.colab' in sys.modules:
    DATASET = f"https://github.com/Deep-Learning-Challenge/challenge-notebooks/raw/master/datasets/{dataset_name}"
else:
    DATASET = f"../../datasets/{dataset_name}"
    
DATASET

## Checkpointing Neural Network Models

Application checkpointing is a fault tolerance technique for long-running processes. It is an approach where a snapshot of the system's state is taken in case of system failure. If there is a problem, not all is lost. The checkpoint may be used directly or used as the starting point for a new run, picking up where it left off. When training deep learning models, the checkpoint captures the weights of the model. These weights can be used to make predictions as-is or used as the basis for ongoing training.

The Keras library provides a checkpointing capability by a callback API. The `ModelCheckpoint` callback class allows you to define where to checkpoint the model weights, how the file should be named and under what circumstances to make a checkpoint of the model. The API allows you to specify which metric to monitor, such as loss or accuracy on the training or validation dataset. You can specify whether to look for an improvement in maximizing or minimizing the score. Finally, the filename that you use to store the weights can include variables like the epoch number or metric. The `ModelCheckpoint` instance can then be passed to the training process when calling the model's `fit()` function. Note, you may need to install the `h5py` library.

## Checkpoint Neural Network Model Improvements

A good use of checkpointing is to output the model weights each time an improvement is observed during training. The example below creates a small neural network for the Pima Indians onset of diabetes binary classification problem. The example uses 33% of the data for validation.

Checkpointing is set up to save the network weights only when there is improved classification accuracy on the validation dataset (`monitor='val acc'` and `mode='max'`). The weights are stored in a file that includes the score in the `filename weights-improvement-epoch-val acc=.2f.hdf5`.

In [None]:
import tensorflow as tf

# MLP for Pima Indians Dataset Serialize to JSON and HDF5
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import ModelCheckpoint
import numpy

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# load pima indians dataset
dataset = numpy.loadtxt("../../datasets/pima-indians-diabetes.data.csv", delimiter=",")

# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]

# create model
model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))

# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# checkpoint
filepath="weights-improvement-{epoch:02d}-{val_accuracy:.2f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1, save_best_only=True,
mode='max')
callbacks_list = [checkpoint]

Running the example produces the output below. In the output, you can see cases where an improvement in the model accuracy on the validation dataset resulted in a new weight file being written to disk.

In [None]:
# Fit the model
model.fit(X, Y, validation_split=0.33, epochs=150, batch_size=10, callbacks=callbacks_list,
verbose=0)

You will also see a number of files in your working directory containing the network weights in HDF5 format. For example:

```
...
weights-improvement-51-0.69.hdf5
weights-improvement-56-0.73.hdf5
weights-improvement-63-0.74.hdf5
weights-improvement-67-0.71.hdf5
weights-improvement-94-0.74.hdf5
```

This is a straightforward checkpointing strategy. It may create many unnecessary checkpoint files if the validation accuracy moves up and down over training epochs. Nevertheless, it will ensure that you have a snapshot of the best model discovered during your run.

## Checkpoint Best Neural Network Model Only

A simpler checkpoint strategy is to save the model weights to the same file, if and only if the validation accuracy improves. This can be done quickly using the same code from above and changing the output filename to be fixed (not include score or epoch information). In this case, model weights are written to the file `weights.best.hdf5` only if the classification accuracy of the model on the validation dataset improves over the best seen so far.

In [None]:
import tensorflow as tf

# MLP for Pima Indians Dataset Serialize to JSON and HDF5
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import ModelCheckpoint

import numpy

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# load pima indians dataset
dataset = numpy.loadtxt("../../datasets/pima-indians-diabetes.data.csv", delimiter=",")

# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]

# create model
model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))

# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# checkpoint
filepath="weights.best.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1, save_best_only=True,
mode='max')
callbacks_list = [checkpoint]

Running this example provides the following output (truncated for brevity):

In [None]:
# Fit the model
model.fit(X, Y, validation_split=0.33, epochs=150, batch_size=10, callbacks=callbacks_list,
verbose=0)

You should see the weight file in your local directory.

`weights.best.hdf5`

## Loading a Saved Neural Network Model

Now that you have seen how to checkpoint your deep learning models during training, you need to review how to load and use a checkpointed model. The checkpoint only includes the model weights. It assumes you know the network structure. This too can be serialized to file in JSON or YAML format. In the example below, the model structure is known, and the best weights are loaded from the previous experiment, stored in the working directory in the `weights.best.hdf5` file. The model is then used to make predictions on the entire dataset.

In [None]:
import tensorflow as tf

# MLP for Pima Indians Dataset Serialize to JSON and HDF5
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

import numpy

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# create model
model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))

# load weights
model.load_weights("weights.best.hdf5")

# Compile model (required to make predictions)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print("Created model and loaded weights from file")

# load pima indians dataset
dataset = numpy.loadtxt("../../datasets/pima-indians-diabetes.data.csv", delimiter=",")

# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]

Running the example produces the following output:

In [None]:
# estimate accuracy on whole dataset using loaded weights
scores = model.evaluate(X, Y, verbose=0)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

## Summary

In this lesson, you have discovered the importance of checkpointing deep learning models for long training runs. You learned:

* How to use Keras to checkpoint each time an improvement to the model is observed.
* How to only checkpoint the very best model observed during training.
* How to load a checkpointed model from a file and use it later to make predictions.