# SSD300 Training for Deepfake Detection

The parameters are based on the training of the original SSD300 Pascal VOC "07+12" model.

In [1]:
# Loading in the necessary Python modules
from keras.optimizers import Adam, SGD
from keras.callbacks import ModelCheckpoint, LearningRateScheduler, TerminateOnNaN, CSVLogger, TensorBoard
from keras import backend as K
from keras.models import load_model
from math import ceil
import numpy as np
from matplotlib import pyplot as plt

import os
from tensorflow.python.lib.io import file_io
from google.cloud import storage

from models.keras_ssd300 import ssd_300
from keras_loss_function.keras_ssd_loss import SSDLoss
from keras_layers.keras_layer_AnchorBoxes import AnchorBoxes
from keras_layers.keras_layer_DecodeDetections import DecodeDetections
from keras_layers.keras_layer_DecodeDetectionsFast import DecodeDetectionsFast
from keras_layers.keras_layer_L2Normalization import L2Normalization

from ssd_encoder_decoder.ssd_input_encoder import SSDInputEncoder
from ssd_encoder_decoder.ssd_output_decoder import decode_detections, decode_detections_fast

from data_generator.object_detection_2d_data_generator import DataGenerator
from data_generator.object_detection_2d_geometric_ops import Resize
from data_generator.object_detection_2d_photometric_ops import ConvertTo3Channels
from data_generator.data_augmentation_chain_original_ssd import SSDDataAugmentation
from data_generator.object_detection_2d_misc_utils import apply_inverse_transforms

%matplotlib inline

Using TensorFlow backend.


## 0. Preliminary note

All places in the code where I need to make any changes are marked `TODO` and explained accordingly. All code cells that don't contain `TODO` markers just need to be executed.

## 1. Set the model configuration parameters

This section sets the configuration parameters for the model definition. The parameters set here are being used both by the `ssd_300()` function that builds the SSD300 model as well as further down by the constructor for the `SSDInputEncoder` object that is needed to run the training. Most of these parameters are needed to define the anchor boxes.

The parameters below are based on the original SSD300 architecture that was trained on the Pascal VOC datasets. Note that the anchor box scaling factors of the original SSD implementation vary depending on the datasets on which the models were trained. The scaling factors used for the MS COCO datasets are smaller than the scaling factors used for the Pascal VOC datasets. The reason why the list of scaling factors has 7 elements while there are only 6 predictor layers is that the last scaling factor is used for the second aspect-ratio-1 box of the last predictor layer.

As mentioned above, the parameters set below are not only needed to build the model, but are also passed to the `SSDInputEncoder` constructor further down, which is responsible for matching and encoding ground truth boxes and anchor boxes during the training. In order to do that, it needs to know the anchor box parameters.

In [2]:
# SSD300 model configuration parameters
img_height = 300 # Height of the model input images
img_width = 300 # Width of the model input images
img_channels = 3 # Number of color channels of the model input images
mean_color = [139, 121, 113] # The per-channel mean of the images in the dataset
swap_channels = [2, 1, 0] # The color channel order in the original SSD is BGR, so we'll have the model reverse the color channel order of the input images
n_classes = 2 # Number of positive classes, Real and Fake
scales_pascal = [0.1, 0.2, 0.37, 0.54, 0.71, 0.88, 1.05] # The anchor box scaling factors used in the original SSD300 for the Pascal VOC datasets
scales_coco = [0.07, 0.15, 0.33, 0.51, 0.69, 0.87, 1.05] # The anchor box scaling factors used in the original SSD300 for the MS COCO datasets
scales = scales_pascal
aspect_ratios = [[1.0, 2.0, 0.5],
                 [1.0, 2.0, 0.5, 3.0, 1.0/3.0],
                 [1.0, 2.0, 0.5, 3.0, 1.0/3.0],
                 [1.0, 2.0, 0.5, 3.0, 1.0/3.0],
                 [1.0, 2.0, 0.5],
                 [1.0, 2.0, 0.5]] # The anchor box aspect ratios used in the original SSD300; the order matters
two_boxes_for_ar1 = True
steps = [8, 16, 32, 64, 100, 300] # The space between two adjacent anchor box center points for each predictor layer
offsets = [0.5, 0.5, 0.5, 0.5, 0.5, 0.5] # The offsets of the first anchor box center points from the top and left borders of the image as a fraction of the step size for each predictor layer
clip_boxes = False # Whether or not to clip the anchor boxes to lie entirely within the image boundaries
variances = [0.1, 0.1, 0.2, 0.2] # The variances by which the encoded target coordinates are divided as in the original implementation
normalize_coords = True

## 2. Build or load the SSD300 model

Execute either 2.1 (create a new SSD300 model) *or* 2.2 (load a previously saved SSD300 model), **not** both.

### 2.1 Create a new model and load trained VGG-16 weights into it

The code cell below does the following things:
1. It calls the function `ssd_300()` to build the model.
2. It then loads the weights file that is found at `weights_path` into the model. If I want to reproduce the original SSD training, load the pre-trained VGG-16 weights.
3. Finally, it compiles the model for the training. In order to do so, we're defining an optimizer (Adam) and a loss function (SSDLoss) to be passed to the `compile()` method.

The original SSD implementation uses plain SGD with momentum. But, since Adam is generally the superior optimizer, I will use this as the optimizer instead of SGD. I might need to adjust the learning rate scheduler below slightly in this case.

Note that the learning rate that is being set here doesn't matter, because further below we'll pass a learning rate scheduler to the training function, which will overwrite any learning rate set here (i.e. what matters are the learning rates that are defined by the learning rate scheduler).

`SSDLoss` is a custom Keras loss function that implements the multi-task that consists of a log loss for classification and a smooth L1 loss for localization. `neg_pos_ratio` and `alpha` are set as in the original SSD paper.

In [3]:
# 1: Build the Keras model.

K.clear_session() # Clear previous models from memory.

model = ssd_300(image_size=(img_height, img_width, img_channels),
                n_classes=n_classes,
                mode='training',
                l2_regularization=0.0005,
                scales=scales,
                aspect_ratios_per_layer=aspect_ratios,
                two_boxes_for_ar1=two_boxes_for_ar1,
                steps=steps,
                offsets=offsets,
                clip_boxes=clip_boxes,
                variances=variances,
                normalize_coords=normalize_coords,
                subtract_mean=mean_color,
                swap_channels=swap_channels)

# 2: Load the VCG-16 weights into the model.

# TODO: Set the path to the VCG-16 weights for loading.
weights_path = file_io.FileIO('gs://gcp_storage_bucket/VGG_ILSVRC_16_layers_fc_reduced.h5', mode='rb')
weights_location = './vcg16_weights.h5'
weights_file = open(weights_location, 'wb')
weights_file.write(weights_path.read())
weights_file.close()
weights_path.close()

model.load_weights(weights_location, by_name=True)

# 3: Instantiate an optimizer and the SSD loss function and compile the model.
#    If I want to follow the original Caffe implementation, use the preset SGD
#    optimizer, otherwise it's recommended to use the Adam optimizer.

adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
#sgd = SGD(lr=0.001, momentum=0.9, decay=0.0, nesterov=False)

ssd_loss = SSDLoss(neg_pos_ratio=3, alpha=1.0)

model.compile(optimizer=adam, loss=ssd_loss.compute_loss, metrics=['acc'])















Instructions for updating:
Use `tf.cast` instead.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Use `tf.cast` instead.


### 2.2 Load a previously created SSD300 model

If I have previously created and saved a model and would now like to load it, execute the next code cell. The only thing I need to do here is to set the path to the saved model HDF5 file that I would like to load.

The SSD300 model contains custom objects: Neither the loss function nor the anchor box or L2-normalization layer types are contained in the Keras core library, so we need to provide them to the model loader.

This next code cell assumes that I want to load a model that was created in 'training' mode. If I want to load a model that was created in 'inference' or 'inference_fast' mode, I'll have to add the DecodeDetections or DecodeDetectionsFast layer type to the custom_objects dictionary below.

In [3]:
# TODO: Set the path to the `.h5` file of the model to be loaded.
#model_path = './model_output/ssd300_epoch-01_loss-3.5443_val_loss-4.1877.h5'

# We need to create an SSDLoss object in order to pass that to the model loader.
#ssd_loss = SSDLoss(neg_pos_ratio=3, alpha=1.0)

#K.clear_session() # Clear previous models from memory.

#model = load_model(model_path, custom_objects={'AnchorBoxes': AnchorBoxes,
#                                               'L2Normalization': L2Normalization,
#                                               'compute_loss': ssd_loss.compute_loss})















Instructions for updating:
Use `tf.cast` instead.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Use `tf.cast` instead.




## 3. Set up the data generators for the training

The code cells below set up the data generators for the training and validation datasets to train the model. The settings below reproduce the original SSD training and validation.

The only things I need to change here are the filepaths to the datasets.

Note that the generator provides two options to speed up the training. By default, it loads the individual images for a batch from disk. This has two disadvantages. First, for compressed image formats like JPG, this is a huge computational waste, because every image needs to be decompressed again and again every time it is being loaded. Second, the images on disk are likely not stored in a contiguous block of memory, which may also slow down the loading process. The first option that `DataGenerator` provides to deal with this is to load the entire dataset into memory, which reduces the access time for any image to a negligible amount, but of course this is only an option if I have enough free memory to hold the whole dataset. As a second option, `DataGenerator` provides the possibility to convert the dataset into a single HDF5 file. This HDF5 file stores the images as uncompressed arrays in a contiguous block of memory, which dramatically speeds up the loading time. It's not as good as having the images in memory, but it's a lot better than the default option of loading them from their compressed JPG state every time they are needed. Of course such an HDF5 dataset may require significantly more disk space than the compressed images. I can later load these HDF5 datasets directly in the constructor.

The original SSD implementation uses a batch size of 32 for the training. In case I run into GPU memory issues, reduce the batch size accordingly. It requires at least 7 GB of free GPU memory to train an SSD300 with 20 object classes with a batch size of 32; so this should be less for only 2 object classes.

The `DataGenerator` itself is fairly generic. It doesn't contain any data augmentation or bounding box encoding logic. Instead, I pass a list of image transformations and an encoder for the bounding boxes in the `transformations` and `label_encoder` arguments of the data generator's `generate()` method, and the data generator will then apply those given transformations and the encoding to the data. Everything here is preset already.

The data augmentation settings defined further down reproduce the data augmentation pipeline of the original SSD training. The training generator receives an object `ssd_data_augmentation`, which is a transformation object that is itself composed of a whole chain of transformations that replicate the data augmentation procedure used to train the original Caffe implementation. The validation generator receives an object `resize`, which simply resizes the input images.

An `SSDInputEncoder` object, `ssd_input_encoder`, is passed to both the training and validation generators. As explained above, it matches the ground truth labels to the model's anchor boxes and encodes the box coordinates into the format that the model needs.

In order to train the model on a dataset other than Pascal VOC, choose `DataGenerator`'s appropriate parser method that corresponds to my data format.

In [4]:
# 1: Instantiate two `DataGenerator` objects: One for training, one for validation.

# Optional: If I have enough memory, consider loading the images into memory for the reasons explained above.

# Use the generators below if there are no HDF5 datasets created yet
#train_dataset = DataGenerator(load_images_into_memory=False, hdf5_dataset_path=None)
#val_dataset = DataGenerator(load_images_into_memory=False, hdf5_dataset_path=None)
# Use the generators below if HDF5 datasets have been previously created
train_dataset = DataGenerator(load_images_into_memory=False, hdf5_dataset_path='dataset_train.h5')
val_dataset = DataGenerator(load_images_into_memory=False, hdf5_dataset_path='dataset_validation.h5')

# 2: Parse the image and label lists for the training and validation datasets.

# TODO: Set the paths to the datasets here.

# The directory that contains the images and CSV files
#if not os.path.exists('./images_dir/py-keras-job-dir/all_videos'):
#    os.makedirs('./images_dir/py-keras-job-dir/all_videos')
#bucket_name = 'gcp_storage_bucket'
#folder_id = 'py-keras-job-dir/all_videos'
#client = storage.Client()
#folder = './images_dir'
#bucket = client.get_bucket(bucket_name)
#blobs = bucket.list_blobs(prefix=folder_id) # List all objects in specified folder.
#for blob in blobs:
#    destination_uri = '{}/{}'.format(folder, blob.name) 
#    blob.download_to_filename(destination_uri)

# Path to images
#images_dir = './images_dir/py-keras-job-dir/all_videos/'

# Path to CSV files of the ground truth annotation labels
#train_labels_filename = './images_dir/py-keras-job-dir/all_videos/labels_train.csv'
#val_labels_filename = './images_dir/py-keras-job-dir/all_videos/labels_validation.csv'

#train_dataset.parse_csv(images_dir=images_dir,
#                        labels_filename=train_labels_filename,
#                        input_format=['image_name', 'xmin', 'xmax', 'ymin', 'ymax', 'class_id'], # This is the order of the first six columns in the CSV file that contains the labels for my dataset.
#                        include_classes='all')

#val_dataset.parse_csv(images_dir=images_dir,
#                      labels_filename=val_labels_filename,
#                      input_format=['image_name', 'xmin', 'xmax', 'ymin', 'ymax', 'class_id'],
#                      include_classes='all')

# Optional: Convert the dataset into an HDF5 dataset. This will require more disk space, but will
# speed up the training. Doing this is not relevant in case I activated the `load_images_into_memory`
# option in the constructor, because in that case the images are in memory already anyway. If I don't
# want to create HDF5 datasets, comment out the subsequent two function calls.

#train_dataset.create_hdf5_dataset(file_path='dataset_train.h5',
#                                  resize=False,
#                                  variable_image_size=True,
#                                  verbose=True)

#val_dataset.create_hdf5_dataset(file_path='dataset_validation.h5',
#                                resize=False,
#                                variable_image_size=True,
#                                verbose=True)

Loading labels: 100%|██████████| 323202/323202 [04:22<00:00, 1229.75it/s]
Loading image IDs: 100%|██████████| 323202/323202 [00:53<00:00, 6051.44it/s]
Loading labels: 100%|██████████| 80855/80855 [01:03<00:00, 1263.55it/s]
Loading image IDs: 100%|██████████| 80855/80855 [00:13<00:00, 6044.01it/s]


In [5]:
# 3: Set the batch size.

batch_size = 32 # Change the batch size if I like, or if I run into GPU memory issues.

# 4: Set the image transformations for pre-processing and data augmentation options.

# For the training generator:
ssd_data_augmentation = SSDDataAugmentation(img_height=img_height,
                                            img_width=img_width,
                                            background=mean_color)

# For the validation generator:
convert_to_3_channels = ConvertTo3Channels()
resize = Resize(height=img_height, width=img_width)

# 5: Instantiate an encoder that can encode ground truth labels into the format needed by the SSD loss function.

# The encoder constructor needs the spatial dimensions of the model's predictor layers to create the anchor boxes.
predictor_sizes = [model.get_layer('conv4_3_norm_mbox_conf').output_shape[1:3],
                   model.get_layer('fc7_mbox_conf').output_shape[1:3],
                   model.get_layer('conv6_2_mbox_conf').output_shape[1:3],
                   model.get_layer('conv7_2_mbox_conf').output_shape[1:3],
                   model.get_layer('conv8_2_mbox_conf').output_shape[1:3],
                   model.get_layer('conv9_2_mbox_conf').output_shape[1:3]]

ssd_input_encoder = SSDInputEncoder(img_height=img_height,
                                    img_width=img_width,
                                    n_classes=n_classes,
                                    predictor_sizes=predictor_sizes,
                                    scales=scales,
                                    aspect_ratios_per_layer=aspect_ratios,
                                    two_boxes_for_ar1=two_boxes_for_ar1,
                                    steps=steps,
                                    offsets=offsets,
                                    clip_boxes=clip_boxes,
                                    variances=variances,
                                    matching_type='multi',
                                    pos_iou_threshold=0.5,
                                    neg_iou_limit=0.5,
                                    normalize_coords=normalize_coords)

# 6: Create the generator handles that will be passed to Keras' `fit_generator()` function.

train_generator = train_dataset.generate(batch_size=batch_size,
                                         shuffle=True,
                                         transformations=[ssd_data_augmentation],
                                         label_encoder=ssd_input_encoder,
                                         returns={'processed_images',
                                                  'encoded_labels'},
                                         keep_images_without_gt=False)

val_generator = val_dataset.generate(batch_size=batch_size,
                                     shuffle=False,
                                     transformations=[convert_to_3_channels,
                                                      resize],
                                     label_encoder=ssd_input_encoder,
                                     returns={'processed_images',
                                              'encoded_labels'},
                                     keep_images_without_gt=False)

# Get the number of samples in the training and validation datasets.
train_dataset_size = train_dataset.get_dataset_size()
val_dataset_size   = val_dataset.get_dataset_size()

print("Number of images in the training dataset:\t{:>6}".format(train_dataset_size))
print("Number of images in the validation dataset:\t{:>6}".format(val_dataset_size))

Number of images in the training dataset:	323202
Number of images in the validation dataset:	 80855


## 4. Set the remaining training parameters

We've already chosen an optimizer and set the batch size above, now let's set the remaining training parameters. We'll set one epoch to consist of 11,000 training steps. The next code cell defines a learning rate schedule that approximates the learning rate schedule of the original Caffe implementation for the training of the SSD300 Pascal VOC "07+12" model. That model was trained for 120,000 steps with a learning rate of 0.001 for the first 80,000 steps, 0.0001 for the next 20,000 steps, and 0.00001 for the last 20,000 steps. If I'm training on a different dataset, I can define the learning rate schedule however I see fit: In this case, I will train for 15 epochs, which corresponds to 165,000 steps. I will use a learning rate of 0.001 for the first 88,000 steps, 0.0001 for the next 44,000 steps, and 0.00001 for the last 33,000 steps.

We'll set only a few essential Keras callbacks below, feel free to add more callbacks. We obviously need the learning rate scheduler and we want to save the models during the training. It also makes sense to continuously stream our training history to a CSV log file after every epoch, because if we didn't do that, in case the training terminates with an exception at some point or if the kernel of this Jupyter notebook dies for some reason or anything like that happens, we would lose the entire history for the trained epochs. Finally, we'll also add a callback that makes sure that the training terminates if the loss becomes `NaN`. Depending on the optimizer I use, it can happen that the loss becomes `NaN` during the first iterations of the training. In later iterations it's less of a risk. For example, a `NaN` loss when training a SSD using an Adam optimizer has never been seen, but an `NaN` loss has been seen a couple of times during the very first couple of hundred training steps of training a new model using an SGD optimizer.

In [6]:
# Define a learning rate schedule.

def lr_schedule(epoch):
    if epoch < 8:
        return 0.001
    elif epoch < 12:
        return 0.0001
    else:
        return 0.00001

In [7]:
# Define model callbacks.

# TODO: Set the filepath under which I want to save the model.
if not os.path.exists('./model_output'):
    os.makedirs('./model_output')

model_checkpoint = ModelCheckpoint(filepath='./model_output/ssd300_epoch-{epoch:02d}_loss-{loss:.4f}_val_loss-{val_loss:.4f}.h5',
                                   monitor='val_loss',
                                   verbose=1,
                                   save_best_only=False,
                                   save_weights_only=False,
                                   mode='auto',
                                   period=1)

#model_checkpoint.best = 

csv_logger = CSVLogger(filename='./model_output/ssd300_training_log.csv',
                       separator=',',
                       append=True)

learning_rate_scheduler = LearningRateScheduler(schedule=lr_schedule,
                                                verbose=1)

terminate_on_nan = TerminateOnNaN()

# TODO: Set the filepath under which I want to save the Tensorboard output.
tensorboard_cb = TensorBoard(log_dir='./model_output/keras_tensorboard',
                             histogram_freq=0)

callbacks = [model_checkpoint,
             csv_logger,
             learning_rate_scheduler,
             terminate_on_nan,
             tensorboard_cb]

## 5. Train

In order to utilize the full training dataset and approximate the training of the "07+12" model mentioned above, I will train 11,000 steps per epoch for 15 epochs. At 11,000 steps, each epoch is expected to take 14 hours 10 minutes running on a K80 GPU. Training for 15 epochs, this corresponds to 165,000 steps trained over 212.5 hours, or approximately 9 days. Because this is a good chunk of time, I might not want to do all 15 epochs in one go and instead train only for a few epochs at a time.

In order to only run a partial training and resume smoothly later on, there are a few things to note:
1. Always load the full model if I can, rather than building a new model and loading previously saved weights into it. Optimizers like SGD or Adam keep running averages of past gradient moments internally. If I always save and load full models when resuming a training, then the state of the optimizer is maintained and the training picks up exactly where it left off. If I build a new model and load weights into it, the optimizer is being initialized from scratch, which, especially in the case of Adam, leads to small but unnecessary setbacks every time I resume the training with previously saved weights.
2. In order for the learning rate scheduler callback above to work properly, `fit_generator()` needs to know which epoch we're in, otherwise it will start with epoch 0 every time I resume the training. Set `initial_epoch` to be the next epoch of my training. Note that this parameter is zero-based, i.e. the first epoch is epoch 0. If I had trained for 10 epochs previously and now I want to resume the training from there, I'd set `initial_epoch = 10` (since epoch 10 is the eleventh epoch). Furthermore, set `final_epoch` to the last epoch I want to run. To stick with the previous example, if I had trained for 10 epochs previously and now I want to train for another 10 epochs, I'd set `initial_epoch = 10` and `final_epoch = 20`.
3. If I'm only saving the best models, in order for the model checkpoint callback above to work correctly after a kernel restart, set `model_checkpoint.best` to the best validation loss from the previous training. If I don't do this and a new `ModelCheckpoint` object is created after a kernel restart, that object obviously won't know what the last best validation loss was, so it will always save the weights of the first epoch of my new training and record that loss as its new best loss.

In [8]:
# If I'm resuming a previous training, set `initial_epoch` and `final_epoch` accordingly.
initial_epoch   = 0
final_epoch     = 15
steps_per_epoch = 11000

history = model.fit_generator(generator=train_generator,
                              steps_per_epoch=steps_per_epoch,
                              epochs=final_epoch,
                              callbacks=callbacks,
                              validation_data=val_generator,
                              validation_steps=ceil(val_dataset_size/batch_size),
                              initial_epoch=initial_epoch)



Epoch 2/2

Epoch 00002: LearningRateScheduler setting learning rate to 0.001.
   43/11000 [..............................] - ETA: 13:45:56 - loss: 2.9706 - acc: 0.5474

In [17]:
# Plotting the training and validation loss across epochs
plt.figure(figsize=(20,12))
plt.title('Keras model loss')
plt.plot(history.history['loss'], label='training')
plt.plot(history.history['val_loss'], label='validation')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(loc='upper right', prop={'size': 24})
plt.savefig('loss.png')
plt.show()

# Plotting the training and validation accuracy across epochs
plt.figure(figsize=(20,12))
plt.title('Keras model accuracy')
plt.plot(history.history['acc'], label='training')
plt.plot(history.history['val_acc'], label='validation')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(loc='lower right', prop={'size': 24})
plt.savefig('accuracy.png')
plt.show()