- Deep learning models can take hours, days or even weeks to train and if a training run is stopped unexpectedly, you can lose a lot of work.

# Checkpointing Neural Network Models
- Application checkpointing is a fault tolerance technique for long running processes. It is an approach where a snapshot of the state of the system is taken in case of system failure. If there is a problem, not all is lost. The checkpoint may be used directly, or used as the starting point for a new run, picking up where it left off. When training deep learning models, the checkpoint captures the weights of the model. These weights can be used to make predictions as-is, or used as the basis for ongoing training.
- The Keras library provides a checkpointing capability by a callback API. The ModelCheckpoint callback class allows you to define where to checkpoint the model weights, how the file should be named and under what circumstances to make a checkpoint of the model. The API allows you to specify which metric to monitor, such as loss or accuracy on the training or validation dataset. You can specify whether to look for an improvement in maximizing or minimizing the score. Finally, the filename that you use to store the weights can include variables like the epoch number or metric. The ModelCheckpoint instance can then be passed to the training process when calling the fit() function on the model. Note, you may need to install the h5py
library

In [1]:
# checkpoint the weight when validation accuracy improves
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import ModelCheckpoint
import matplotlib.pyplot as plt
import numpy
import pandas

Using TensorFlow backend.


In [11]:
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# load pima indians dataset.
dataframe = pandas.read_csv('pima-indians-diabetes.csv')
dataset = dataframe.values

# split into input(X) and outut(Y) variable
X = dataset[:,:8]
Y = dataset[:,8]

# create model
model = Sequential()
model.add(Dense(12, input_dim = 8, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dense(8, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dense(1, kernel_initializer = 'uniform', activation = 'sigmoid'))

# compile model
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

# checkpoint
#filepath = "D:/Neural_Network_With_Keras/Model_Checkpoints/best_model_checkpoint.hdf5"
#filepath = 'weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5'
checkpoint = ModelCheckpoint(filepath='D:/Neural_Network_With_Keras/Model_Checkpoints/weights.hdf5', monitor = 'val_acc', verbose = 1, save_best_only = True, mode = 'max')
callbacks_list = [checkpoint]

# fit the model
model.fit(X,Y, validation_split = 0.33, epochs = 150, batch_size = 10, callbacks = callbacks_list, verbose = 0)

<keras.callbacks.callbacks.History at 0x25b59e39e48>