# Tensorflow Tutorials
# ML Basics 06: Save and Load Models

Can save models during and after training. This means we don't have to start training models from scratch every time. We can pick up where we left aoff. 

Also, we can publish the weights of the trained model along with code to create the model.

## Workspace Setup

In [7]:
from __future__ import absolute_import, division, print_function, unicode_literals

# For manipulating filepaths
import os

# TF and Keras API
import tensorflow as tf
from tensorflow import keras

In [8]:
try:
    # %tensorflow_version only exists in Colab
    %tensorflow_version 2.x
except Exception:
    pass

In [10]:
# Python module for importing/exporting weights of trained models as h5 files
import h5py

## Loading Dataset

In [11]:
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

In [12]:
# Limiting ourselves to the first 1000 samples
train_labels = train_labels[:1000]
test_labels = test_labels[:1000]

# Reshape the images as 28 * 28 dimensional vectors, and normalize to [0, 1] range
train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0

## Creating Our First Model

In [14]:
# Define a simple sequential model
def create_model():
    model = tf.keras.models.Sequential([
        # Input layer and dropout - 784 because 28 * 28
        keras.layers.Dense(512, activation='relu', input_shape=(784, )), 
        keras.layers.Dropout(0.2), 
        
        # No hidden layer
        keras.layers.Dense(10, activation='softmax')
    ])
    
    # Compile
    model.compile(optimizer='adam',
         loss='sparse_categorical_crossentropy', 
         metrics=['accuracy'])
    
    # Return the model 
    return model

In [15]:
# Create a basic model instance
model = create_model()

In [16]:
# Display the model's architecture
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 512)               401920    
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                5130      
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________


## Save Checkpoints During Training
Create `ModelCheckpoint` to continually save the model's parameters both during and end of the training. This will help us pick up where we left off if training is interrupted.

In [17]:
# Relative path for the checkpoint file for the training process
checkpoint_path = 'training_1/cp.ckpt'

# Finds the working directory's path and appends checkpoint to it
checkpoint_dir = os.path.dirname(checkpoint_path)

In [18]:
# Create a callback that saves the model's weights
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, 
                                                save_weights_only=True, 
                                                verbose=1)

In [19]:
# Train the model with the new callback
# May generate warnings
model.fit(train_images, train_labels, epochs=10, 
         validation_data=(test_images, test_labels), 
         callbacks=[cp_callback])  # Pass the callback to training

Train on 1000 samples, validate on 1000 samples
Epoch 1/10

Epoch 00001: saving model to training_1/cp.ckpt
Epoch 2/10

Epoch 00002: saving model to training_1/cp.ckpt
Epoch 3/10

Epoch 00003: saving model to training_1/cp.ckpt
Epoch 4/10

Epoch 00004: saving model to training_1/cp.ckpt
Epoch 5/10

Epoch 00005: saving model to training_1/cp.ckpt
Epoch 6/10

Epoch 00006: saving model to training_1/cp.ckpt
Epoch 7/10

Epoch 00007: saving model to training_1/cp.ckpt
Epoch 8/10

Epoch 00008: saving model to training_1/cp.ckpt
Epoch 9/10

Epoch 00009: saving model to training_1/cp.ckpt
Epoch 10/10

Epoch 00010: saving model to training_1/cp.ckpt


<tensorflow.python.keras.callbacks.History at 0xb1f2a7438>

In [20]:
# Create a single collection of TF checkpoint files that are updated at the end of each epoch
!ls {checkpoint_dir}

checkpoint                  cp.ckpt.index
cp.ckpt.data-00000-of-00001


## Untrained Model with Weights
We will create a new, untrained model. When restoring a model from weights-only, the model must have the same architecture as the one that was used to save the weights. Weights can be shared even if the new model is a different instance of the same architecture.

In [21]:
# Create a new instance of a basic model
model = create_model()

# Evaluate the model prior to using pre-trained weights
loss, acc = model.evaluate(test_images, test_labels)
print("Untrained model, accuracy; {:5.2f}%".format(100 * acc))

Untrained model, accuracy; 15.20%


Prior to using the pre-trained model's weights, we have used randomly initialized weights. This gives a classification accuracy 15.20%.

In [23]:
# Load the weights
model.load_weights(checkpoint_path)

# Re-evaluate the model
loss, acc = model.evaluate(test_images, test_labels)
print("Restored model, accuracy: {:5.2f}%".format(100 * acc))

Restored model, accuracy: 87.50%


## Checkpoint Callback Options

In [24]:
# Inlcude the epoch in the file name 
checkpoint_path = 'training_2/cp-{epoch:04d}.ckpt'
checkpoint_dir = os.path.dirname(checkpoint_path)

In [26]:
# Create a callback that saves the model's weights every 5 seconds
cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_path,
    verbose=1,
    save_weights_only=True,
    period=5)

In [27]:
# Create a new model instance
model = create_model()

In [28]:
# Save the weights using the `checkpoint_path` format
model.save_weights(checkpoint_path.format(epoch=0))

In [29]:
# Train the model with the new callback
model.fit(train_images,
         train_labels,
         epochs=50, 
         callbacks=[cp_callback], 
         validation_data=(test_images, test_labels), 
         verbose=0)


Epoch 00005: saving model to training_2/cp-0005.ckpt

Epoch 00010: saving model to training_2/cp-0010.ckpt

Epoch 00015: saving model to training_2/cp-0015.ckpt

Epoch 00020: saving model to training_2/cp-0020.ckpt

Epoch 00025: saving model to training_2/cp-0025.ckpt

Epoch 00030: saving model to training_2/cp-0030.ckpt

Epoch 00035: saving model to training_2/cp-0035.ckpt

Epoch 00040: saving model to training_2/cp-0040.ckpt

Epoch 00045: saving model to training_2/cp-0045.ckpt

Epoch 00050: saving model to training_2/cp-0050.ckpt


<tensorflow.python.keras.callbacks.History at 0xb1cf3fbe0>

In [30]:
# Examining the latest checkpoint
!ls {checkpoint_dir}

checkpoint                       cp-0040.ckpt.index
cp-0030.ckpt.data-00000-of-00001 cp-0045.ckpt.data-00000-of-00001
cp-0030.ckpt.index               cp-0045.ckpt.index
cp-0035.ckpt.data-00000-of-00001 cp-0050.ckpt.data-00000-of-00001
cp-0035.ckpt.index               cp-0050.ckpt.index
cp-0040.ckpt.data-00000-of-00001


In [31]:
# Reference to the latest checkpoint file
latest = tf.train.latest_checkpoint(checkpoint_dir)
latest

'training_2/cp-0050.ckpt'

In [32]:
# Create a new model to test the weights from the latest checkpoint
model = create_model()

# Load the previously saved weights
model.load_weights(latest)

# Re-evaluate the model
loss, acc = model.evaluate(test_images, test_labels)
print("Restored model, accuracy: {:5.2f}%".format(100 * acc))

Restored model, accuracy: 86.90%


## What are `ckpt` files?
- Weights of the model during (and after) training are stored as a series of checkpoint formatted files.
- They contain **only the trained weights of the model** in binary format.
- Specifically, consist of 
    - one or more shards that contain the model's weights.
    - An index files that indicates which weights are stored in which shard. 
- When training a model on a single machine, there will be only one shard.

## Manually Save Weights

So far, we have been using a `keras` callback object for automatically saving our weights at designated, predetermined instances of time.

We can also manually save weights at any time using the `save_weights` function. This will also save the weights as a `ckpt` file.

In [33]:
# Save the weights using the keras `save_weights` function
model.save_weights('./checkpoints/my_checkpoint')

# Create a new model instance
model = create_model()

# Restore the weights
model.load_weights('./checkpoints/my_checkpoint')

# Evaluate the model 
loss, acc = model.evaluate(test_images, test_labels)
print("Restored model, accuracy: {:5.2f}%".format(100 * acc))

Restored model, accuracy: 86.90%


## Saving Entire Model
We save weights when we just want to remember the values of the params in the model, but not necessarily the architecture of the model itself.

We use `model.save` to save the entire model (architecture, weights, and training configuration) in a single file or folder. This allows the model to be exported and used elsewhere (on a different device, or a different environment e.g. tensorflow js or tensorflow lite).

In [34]:
# Create and train a new model instance 
model = create_model()
model.fit(train_images, train_labels, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0xb1de16d30>

In [35]:
# Save the entire model to a HDF5 file (extension .h5)
model.save('my_model.h5')

In [36]:
# Now recreate the exact same model - weight, config, and optimizer - from h5 
new_model = tf.keras.models.load_model('my_model.h5')

# Show the model's summary
new_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_10 (Dense)             (None, 512)               401920    
_________________________________________________________________
dropout_5 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_11 (Dense)             (None, 10)                5130      
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________


In [38]:
# Check the model's accuracy
loss, acc = new_model.evaluate(test_images, test_labels)
print("Restored Model, accuracy: {:5.2f}%".format(100 * acc))

Restored Model, accuracy: 87.20%
