## Training Routines in TF-Slim

*by Marvin Bertin*
<img src="../images/tensorflow.png" width="400">

**Training Deep Learning Models**

Training Tensorflow model requirements
- a model represented has a computational graph.
- a loss function to minimize and optimize over.
- the gradient computation of the model weights relative to the loss to perform backpropagation of the error signal.
- a training routine that iteratively does all of the above and updates the weights accordingly.

In the previous lession we looked at the loss functions provided by TF-Slim.
In this lesson, we'll see that TF-Slim also provides training routines that simplifies the training process for neural networks.


<img src="../images/backprop.png" width="800">



In [1]:
import sys  
sys.path.append("../") 

import tensorflow as tf
slim = tf.contrib.slim

%load_ext autoreload
%autoreload 2

**Training Loop**

TF-Slim provides a simple but powerful set of tools for training models.

TF-Slim training loop allows the user to pass in the `train_op` and runs the optimization according to user-specified arguments, such as the loss function and the optimization method.

The training operation includes:

1. Iteratively measures the loss
2. Computes gradients
3. Update the model weights
4. Saves the model to disk


Note that the training loop uses the tf.Supervisor
and its managed_session in its implementation to ensure the ability of worker
processes to recover from failures.

## Training Loop Example

In [None]:
# load data
images, labels = LoadData(...)

# Create a model and make predictions
predictions = MyModel(images)

# Define a losses function
slim.losses.log_loss(predictions, labels)

# Get total model loss and regularization loss
total_loss = slim.losses.get_total_loss()

# Define Optimization method (SGD, Momentum, RMSProp, AdaGrad, Adam optimizer)
optimizer = tf.train.GradientDescentOptimizer(learning_rate)

# create_train_op at each steps:
# compute loss, comput gradients and compute update_ops
train_op = slim.learning.create_train_op(total_loss, optimizer)

# Where checkpoints and event files are stored.
logdir = "/logdir/path" 

slim.learning.train(
    train_op,
    logdir,
    number_of_steps=1000, # number of gradient steps
    save_summaries_secs=60, # compute summaries every 60 secs
    save_interval_secs=300) # save model checkpoint every 5 min

## Manipulating Gradients with the Training Operation

TF-Slim training function also provides the ability to manipulate the gradients.

**Gradient norm clipping**

In [None]:
# Create the train_op and clip the gradient norms
# L2-norm greater than 4 will be clipped to avoid 'exploding gradients` especially in RNNs
train_op = slim.learning.create_train_op(
  total_loss,
  optimizer,
  clip_gradient_norm=4)

**Gradient scaling**

In [None]:
# Create the train_op and scale the gradients
# scaling the gradients redistributes the weight importance

# mapping from variable name to scaling coefficient
gradient_multipliers = {
    'conv1/weights': 2.4,
    'fc8/weights': 5.1,
}

train_op = slim.learning.create_train_op(
  total_loss,
  optimizer,
  gradient_multipliers=gradient_multipliers)

## Modifying the Update Operation

TF-Slim also provide the option of modifying the update operation. This is the operation that performs the learning step at every iteration.

You can:

- Override the default update ops with a custom specialized update.
- Remove the update operation completely. For example in the batch normalizing layer, it is required to perform a series of non-gradient updates during training, such as computing the moving mean and moving variance.

Since BachNorm is already an implemented layer in TF-Slim, the non-gradient update is done automatically by TensorFlow.  

In [None]:
# Use an alternative set of update ops:
train_op = slim.learning.create_train_op(
    total_loss,
    optimizer,
    update_ops=my_other_update_ops)

# Force TF-Slim NOT to use ANY update_ops:
train_op = slim.learning.create_train_op(
    total_loss,
    optimizer,
    update_ops=[])

## Load CNN Flower Model

In [2]:
from utils.slim_models import CNNClassifier

image_shape = (64,64,3)
num_class = 5

CNN_model = CNNClassifier("flowers", image_shape , num_class)
CNN_model.examine_model_structure()

Layers
name = CNN_flowers_classifier/conv1/conv1_1/Relu:0             shape = (?, 64, 64, 64)
name = CNN_flowers_classifier/pool1/MaxPool:0                  shape = (?, 32, 32, 64)
name = CNN_flowers_classifier/conv2/conv2_1/Relu:0             shape = (?, 32, 32, 128)
name = CNN_flowers_classifier/conv2/conv2_2/Relu:0             shape = (?, 32, 32, 128)
name = CNN_flowers_classifier/pool2/MaxPool:0                  shape = (?, 16, 16, 128)
name = CNN_flowers_classifier/conv3/conv3_1/Relu:0             shape = (?, 16, 16, 256)
name = CNN_flowers_classifier/conv3/conv3_2/Relu:0             shape = (?, 16, 16, 256)
name = CNN_flowers_classifier/conv3/conv3_3/Relu:0             shape = (?, 16, 16, 256)
name = CNN_flowers_classifier/pool3/MaxPool:0                  shape = (?, 8, 8, 256)
name = CNN_flowers_classifier/fc4/Relu:0                       shape = (?, 1, 1, 1024)
name = CNN_flowers_classifier/fc5/Relu:0                       shape = (?, 1, 1, 1024)
name = CNN_flowers_classifier/p

## Construct a Training Routine

In [None]:
# Make the model.
logits, _ = CNN_model.graph(inputs, weight_decay, dropout)

# Add the loss function to the graph.
one_hot_labels = slim.one_hot_encoding(targets, output_dim)
loss = slim.losses.softmax_cross_entropy(logits, one_hot_labels)

# The total loss is the model's loss plus any regularization losses.
total_loss = slim.losses.get_total_loss()

# Specify the optimizer and create the train op:
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = slim.learning.create_train_op(total_loss, optimizer)

# Run the training inside a session.
final_loss = slim.learning.train(
    train_op,
    logdir=checkpoint_dir,
    number_of_steps=iterations,
    save_summaries_secs=5,
    log_every_n_steps=log_frq)

print("Finished training. Last batch loss:", final_loss)
print("Checkpoint saved in %s" % checkpoint_dir)

## Combine Everything Together

In [3]:
from utils.slim_training_evaluation import ModelTrainerEvaluater
from utils.slim_data_provider import DatasetProvider

checkpoint_dir="../models/flowers/"
data_dir = "../data/flowers/"

CNN_trainer = ModelTrainerEvaluater(model=CNN_model,
                           dataset_provider = DatasetProvider(data_dir),
                           data_name="flowers",
                           checkpoint_dir=checkpoint_dir)

## Train CNN

This helper function combines all the parameters into one method. You can start training by running the command below

**Training tips**
- Train your model by monitoring the loss values. The model starts learning when the loss starts going down.
- Experiment with different parameter configurations, the one given are just to get you started. 
- This is a large neural network, therefore the training can take several hours. It's recommended to train the CNN on a GPU either locally (if you have one), or in the cloud (ie AWS, Google Cloud Platform).

In [5]:
CNN_trainer.train(weight_decay=0.005,
                  dropout=0.5,
                  learning_rate=0.0005,
                  iterations=100,
                  log_frq=100,
                  batch_size=32,
                  data_type='train')

Instructions for updating:
Please switch to tf.summary.FileWriter. The interface and behavior is the same; this is just a rename.
INFO:tensorflow:Starting Session.
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Stopping Training.
INFO:tensorflow:Finished training! Saving model to disk.
('Finished training. Last batch loss:', 5.0561266)
Checkpoint saved in ../models/flowers/


## Next Lesson
### Evaluation Metrics in TF-Slim
-  Explore different evaluation metrics provided by TF-Slim

<img src="../images/divider.png" width="100">