### Introduction

In the previous post, we implemented the upsampling and made sure it is correct
by comparing it to the implementation of the [scikit-image library](http://scikit-image.org/).
To be more specific we had _FCN-32_ _Segmentation_ network implemented which is
described in the paper _Fully convolutional networks for semantic segmentation_.

In this post we will perform a simple training: we will get a sample image from
[PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) dataset along with annotation,
train our network on them and test our network on the same image. It was done this way
so that it can also be run on CPU -- it takes only 10 iterations for the training to complete.
Another point of this post is to show that segmentation that our network (FCN-32s) produces is
very coarse -- even if we run it on the same image that we were training it on. In this post
we tackle this problem by performing Conditional Random Field post-processing stage, which
refines our segmentation by taking into account pure RGB features of image and probabilities
produced by our network. Overall, we get a refined segmentation. The set-up of this post
is very simple on purpose. Similar approach to Segmentation was described in the paper
_Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs_ by Chen et al.

The blog post is created using jupyter notebook. After each chunk of a code
you can see the result of its evaluation. You can also get the notebook
file from [here](http://google.com). The content of the blog post
is partially borrowed from [slim walkthough notebook](https://github.com/tensorflow/models/blob/master/slim/slim_walkthough.ipynb).

### Setup



To be able to run the code, you will need to have Tensorflow installed. I have used _r0.12_.
You will need to use [this fork of _tensorflow/models_](https://github.com/tensorflow/models/pull/684). 

I am also using scikit-image library and numpy for this tutorial plus other
dependencies. One of the ways to install them is to download _Anaconda_ software
package for python.

Follow all the other steps described in the previous posts -- it shows how to download
the _VGG-16_ model and I also forked the 

In [1]:
%matplotlib inline

from __future__ import division

import tensorflow as tf
import skimage.io as io
import numpy as np
from nets import vgg
from preprocessing import vgg_preprocessing
from libs.scale_input_image import scale_randomly_image_with_annotation_with_fixed_size_output
from libs.training import get_valid_logits_and_labels
from libs.training import get_labels_from_annotation
# Load the mean pixel values and the function
# that performs the subtraction from each pixel
from preprocessing.vgg_preprocessing import (_mean_image_subtraction,
                                            _R_MEAN, _G_MEAN, _B_MEAN)
import sys
import os
from matplotlib import pyplot as plt
slim = tf.contrib.slim

# Data processing

In [2]:
#0:   background
#1:   aeroplane
#2:   bicycle
#3:   bird
#4:   boat
#5:   bottle
#6:   bus
#7:   car
#8:   cat
#9:   chair
#10:  cow
#11:  diningtable
#12:  dog
#13:  horse
#14:  motorbike
#15:  person
#16:  pottedplant
#17:  sheep
#18:  sofa
#19:  train
#20:  tvmonitor
#255: undefined/don't care
number_of_classes = 21
class_labels = [v for v in range((number_of_classes+1))]
class_labels[-1] = 255

### Upsampling helper functions and Image Loading

In this part, we define helper functions that were used in previous post.
If you recall, we used upsampling to upsample the downsampled predictions
that we get from our network. We get downsampled predictions because of max-pooling
layers that are used in _VGG-16_ network.

We also write code for image and respective ground-truth segmentation loading.
The code is well-commented, so don't be afraid to read it.

In [3]:
import numpy as np

def get_kernel_size(factor):
    """
    Find the kernel size given the desired factor of upsampling.
    """
    return 2 * factor - factor % 2


def upsample_filt(size):
    """
    Make a 2D bilinear kernel suitable for upsampling of the given (h, w) size.
    """
    factor = (size + 1) // 2
    if size % 2 == 1:
        center = factor - 1
    else:
        center = factor - 0.5
    og = np.ogrid[:size, :size]
    return (1 - abs(og[0] - center) / factor) * \
           (1 - abs(og[1] - center) / factor)


def bilinear_upsample_weights(factor, number_of_classes):
    """
    Create weights matrix for transposed convolution with bilinear filter
    initialization.
    """
    
    filter_size = get_kernel_size(factor)
    
    weights = np.zeros((filter_size,
                        filter_size,
                        number_of_classes,
                        number_of_classes), dtype=np.float32)
    
    upsample_kernel = upsample_filt(filter_size)
    
    for i in range(number_of_classes):
        
        weights[:, :, i, i] = upsample_kernel
    
    return weights

In [4]:
training_filenames = "/home/thalles_silva/DataPublic/PascalVoc2012/train/VOC2012/ImageSets/Segmentation/train.txt"
training_dir = "/home/thalles_silva/DataPublic/PascalVoc2012/train/VOC2012/JPEGImages"
annotations_dir = "/home/thalles_silva/DataPublic/PascalVoc2012/train/VOC2012/SegmentationClass_1D"
log_folder = '/home/thalles_silva/Thalles/image-segmentation/log_folder'
checkpoints_dir = '/home/thalles_silva/Thalles/image-segmentation/vgg'
vgg_checkpoint_path = os.path.join(checkpoints_dir, 'vgg_16.ckpt')

In [5]:
def model_input():
    is_training_placeholder = tf.placeholder(tf.bool)
    return is_training_placeholder

In [6]:
upsample_filter_factor_2_np = bilinear_upsample_weights(factor=2,
                                                        number_of_classes=number_of_classes)
print(upsample_filter_factor_2_np.shape)
upsample_filter_factor_8_np = bilinear_upsample_weights(factor=4,
                                                         number_of_classes=number_of_classes)

upsample_filter_factor_2_tensor = tf.constant(upsample_filter_factor_2_np)
upsample_filter_factor_8_tensor = tf.constant(upsample_filter_factor_8_np)

(4, 4, 21, 21)


In [7]:
def model(processed_images, number_of_classes=21, is_training=True):

    with slim.arg_scope(vgg.vgg_arg_scope()):

        _, end_points = vgg.vgg_16(processed_images,
                                                   num_classes=number_of_classes,
                                                   is_training=is_training,
                                                   spatial_squeeze=False,
                                                   fc_conv_padding='SAME')


    
    # get the vggs pool5 feature map, this way we do not use the last layer of the vgg net therefore making the net faster
    pool5_feature_map = end_points['vgg_16/pool5']

    pool5_logits = slim.conv2d(pool5_feature_map,
                               number_of_classes,
                               [1, 1],
                               activation_fn=None,
                               normalizer_fn=None,
                               scope="seg_vars/pool5",
                               weights_initializer=tf.zeros_initializer) # Out: # (1, 22, 30, 2)
    
    pool5_layer_logits_shape = tf.shape(pool5_logits)
    
    # Calculate the ouput size of the upsampled tensor
    last_layer_upsampled_by_factor_2_logits_shape = tf.stack([
                                                          pool5_layer_logits_shape[0],
                                                          pool5_layer_logits_shape[1] * 2,
                                                          pool5_layer_logits_shape[2] * 2,
                                                          pool5_layer_logits_shape[3]
                                                         ])

    # Perform the upsampling
    last_layer_upsampled_by_factor_2_logits = tf.nn.conv2d_transpose(pool5_logits,
                                                                     upsample_filter_factor_2_tensor,
                                                                     output_shape=last_layer_upsampled_by_factor_2_logits_shape,
                                                                     strides=[1, 2, 2, 1])

    ## Adding the skip here for FCN-16s model

    # We created vgg in the fcn_8s name scope -- so
    # all the vgg endpoints now are prepended with fcn_8s name
    pool4_features = end_points['vgg_16/pool4']

    # We zero initialize the weights to start training with the same
    # accuracy that we ended training FCN-32s

    pool4_logits = slim.conv2d(pool4_features,
                               number_of_classes,
                               [1, 1],
                               activation_fn=None,
                               normalizer_fn=None,
                               weights_initializer=tf.zeros_initializer,
                               scope='seg_vars/pool4')

    fused_last_layer_and_pool4_logits = pool4_logits + last_layer_upsampled_by_factor_2_logits

    fused_last_layer_and_pool4_logits_shape = tf.shape(fused_last_layer_and_pool4_logits)

    # Calculate the ouput size of the upsampled tensor
    fused_last_layer_and_pool4_upsampled_by_factor_2_logits_shape = tf.stack([
                                                                  fused_last_layer_and_pool4_logits_shape[0],
                                                                  fused_last_layer_and_pool4_logits_shape[1] * 2,
                                                                  fused_last_layer_and_pool4_logits_shape[2] * 2,
                                                                  fused_last_layer_and_pool4_logits_shape[3]
                                                                 ])

    # Perform the upsampling
    fused_last_layer_and_pool4_upsampled_by_factor_2_logits = tf.nn.conv2d_transpose(fused_last_layer_and_pool4_logits,
                                                                upsample_filter_factor_2_tensor,
                                                                output_shape=fused_last_layer_and_pool4_upsampled_by_factor_2_logits_shape,
                                                                strides=[1, 2, 2, 1])


    ## Adding the skip here for FCN-8s model
    pool3_features = end_points['vgg_16/pool3']

    # We zero initialize the weights to start training with the same
    # accuracy that we ended training FCN-32s

    pool3_logits = slim.conv2d(pool3_features,
                               number_of_classes,
                               [1, 1],
                               activation_fn=None,
                               normalizer_fn=None,
                               weights_initializer=tf.zeros_initializer,
                               scope='seg_vars/pool3')


    fused_last_layer_and_pool4_logits_and_pool_3_logits = pool3_logits + \
                                    fused_last_layer_and_pool4_upsampled_by_factor_2_logits


    fused_last_layer_and_pool4_logits_and_pool_3_logits_shape = tf.shape(fused_last_layer_and_pool4_logits_and_pool_3_logits)


    # Calculate the ouput size of the upsampled tensor
    fused_last_layer_and_pool4_logits_and_pool_3_upsampled_by_factor_8_logits_shape = tf.stack([
                                                                  fused_last_layer_and_pool4_logits_and_pool_3_logits_shape[0],
                                                                  fused_last_layer_and_pool4_logits_and_pool_3_logits_shape[1] * 8,
                                                                  fused_last_layer_and_pool4_logits_and_pool_3_logits_shape[2] * 8,
                                                                  fused_last_layer_and_pool4_logits_and_pool_3_logits_shape[3]
                                                                 ])

    # Perform the upsampling
    fused_last_layer_and_pool4_logits_and_pool_3_upsampled_by_factor_8_logits = tf.nn.conv2d_transpose(fused_last_layer_and_pool4_logits_and_pool_3_logits,
                                                                upsample_filter_factor_8_tensor,
                                                                output_shape=fused_last_layer_and_pool4_logits_and_pool_3_upsampled_by_factor_8_logits_shape,
                                                                strides=[1, 8, 8, 1])
    
    

    return fused_last_layer_and_pool4_logits_and_pool_3_upsampled_by_factor_8_logits

In [8]:
def model_loss(upsampled_by_factor_16_logits, labels):
    #labels = tf.squeeze(labels)
    #valid_labels_batch_tensor, valid_logits_batch_tensor = get_valid_logits_and_labels(annotation_batch_tensor=labels,
    #                                                                                   logits_batch_tensor=upsampled_by_factor_16_logits,
    #                                                                                   class_labels=class_labels)
    


    #cross_entropies = tf.nn.softmax_cross_entropy_with_logits(logits=valid_logits_batch_tensor,
    #                                                          labels=valid_labels_batch_tensor)
    
    cross_entropies = tf.nn.softmax_cross_entropy_with_logits(logits=upsampled_by_factor_16_logits,
                                                              labels=labels)
    cross_entropy_mean = tf.reduce_mean(cross_entropies)
    
    # Add summary op for the loss -- to be able to see it in tensorboard.
    tf.summary.scalar('cross_entropy_loss', cross_entropy_mean)

    # Tensor to get the final prediction for each pixel -- pay 
    # attention that we don't need softmax in this case because
    # we only need the final decision. If we also need the respective
    # probabilities we will have to apply softmax.
    pred = tf.argmax(upsampled_by_factor_16_logits, dimension=3)
    probabilities = tf.nn.softmax(upsampled_by_factor_16_logits)
    
    return cross_entropy_mean, pred, probabilities

In [9]:
def model_optimizer(cross_entropy_sum, learning_rate):
    with tf.variable_scope("adam_vars"):
        train_step = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cross_entropy_sum)
    return train_step

In [10]:
file = open(training_filenames, 'r')
images_filenale_list = [line for line in file]

In [11]:
# get an image for testing the network
image_path = training_dir + "/2008_004453.jpg"
annotation_path = annotations_dir + "/2008_004453.png"

image_tensor = tf.read_file(image_path)
image_tensor = tf.image.decode_jpeg(image_tensor, channels=3)
image_tensor = tf.cast(image_tensor, tf.float32)
image_tensor = _mean_image_subtraction(image_tensor,[_R_MEAN, _G_MEAN, _B_MEAN])
image_tensor = tf.expand_dims(image_tensor, axis=0) # (1, ?, ?, 3) # BATCH,WIDTH,HEIGHTxDEPTH

annotation_tensor = tf.read_file(annotation_path)
annotation_tensor = tf.image.decode_png(annotation_tensor, channels=1)
annotation_masks_tensor = get_labels_from_annotation(tf.squeeze(annotation_tensor), class_labels)
annotation_masks_tensor = tf.expand_dims(annotation_masks_tensor, axis=0) # BATCH,WIDTH,HEIGHT

In [12]:
is_training_placeholder = model_input()

upsampled_by_factor_16_logits = model(image_tensor, number_of_classes=number_of_classes, is_training=True)

cross_entropy_sum, pred, probabilities = model_loss(upsampled_by_factor_16_logits, annotation_masks_tensor)

train_step = model_optimizer(cross_entropy_sum, learning_rate=0.00001)

In [13]:
# Define the accuracy metric: Mean Intersection Over Union
miou, update_op = slim.metrics.streaming_mean_iou(predictions=pred,
                                                   labels=annotation_tensor,
                                                   num_classes=number_of_classes)

In [14]:
# get all segmentation model vars, these are the variables we create to perform 
# the segmentation upsampling layers
model_variables = [ var.op.name for var in slim.get_variables(scope="seg_vars") ]

# Now we define a function that will load the weights from VGG checkpoint
# into our variables when we call it. We exclude the weights from the last layer
# which is responsible for class predictions. We do this because 
# we will have different number of classes to predict and we can't
# use the old ones as an initialization.
exclude_vars = model_variables + ['vgg_16/fc8', 'adam_vars']
vgg_except_fc8_weights = slim.get_variables_to_restore(exclude=exclude_vars)

# Here we get variables that belong to the last layer of network.
# As we saw, the number of classes that VGG was originally trained on
# is different from ours -- in our case it is only 2 classes.
vgg_fc8_weights = slim.get_variables_to_restore(include=['vgg_16/fc8'])

adam_optimizer_variables = slim.get_variables_to_restore(include=['adam_vars'])

# get the segmentation upsampling variables to be initialized 
model_variables = slim.get_variables(scope="seg_vars")

# Put all summary ops into one op. Produces string when you run it.
merged_summary_op = tf.summary.merge_all()

# Create the summary writer -- to write all the logs
# into a specified file. This file can be later read
# by tensorboard.
summary_string_writer = tf.summary.FileWriter(log_folder)

# Create the log folder if doesn't exist yet
if not os.path.exists(log_folder):
    os.makedirs(log_folder)

# Create an OP that performs the initialization of
# the VGG net variables.
read_vgg_weights_except_fc8_func = slim.assign_from_checkpoint_fn(
                                   vgg_checkpoint_path,
                                   vgg_except_fc8_weights)

In [15]:
# Initializer for new fc8 weights -- for two classes.
vgg_fc8_weights_initializer = tf.variables_initializer(vgg_fc8_weights)

# Initializer for adam variables
optimization_variables_initializer = tf.variables_initializer(adam_optimizer_variables)

model_vars = tf.variables_initializer(model_variables)

# Create a saver.
saver = tf.train.Saver()

with tf.Session() as sess:
    
    # Run the initializers.
    read_vgg_weights_except_fc8_func(sess)
    sess.run(vgg_fc8_weights_initializer)
    sess.run(optimization_variables_initializer)
    sess.run(tf.local_variables_initializer())
    sess.run(model_vars)
    
    out, train_image, train_annotation = sess.run([upsampled_by_factor_16_logits, image_tensor, annotation_tensor],
                                              feed_dict={is_training_placeholder: False})
    print(out.shape)
    
    f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
    f.set_figheight(4)
    f.set_figwidth(10)
    ax1.imshow(train_image[0])
    ax1.set_title('Input image')
    probability_graph = ax2.imshow(np.dstack((train_annotation,)*3)*100)
    ax2.set_title('Input Ground-Truth Annotation')
    plt.show()
    
    for step in range(100):

        _, train_loss, pred_np, probabilities_np, tmp = sess.run([train_step, cross_entropy_sum, pred, probabilities, update_op],
                                        feed_dict={is_training_placeholder: True})
        miou_np = sess.run(miou)
        
        pred_annotation = np.expand_dims(pred_np[0], axis=2).astype(float)
        print("Train step:", step, "\tTraing Loss:", train_loss, "\tmIOU:", miou_np)

        cmap = plt.get_cmap('bwr')
        f, (ax1, ax2, ax3) = plt.subplots(1, 3, sharey=True)
        f.set_figheight(4)
        f.set_figwidth(16)

        ax1.imshow(np.dstack((pred_annotation,)*3)*100)
        ax1.set_title('Predication')
        probability_graph = ax2.imshow(np.dstack((train_annotation,)*3)*100)
        ax2.set_title('Input Ground-Truth Annotation')
        probability_graph = ax3.imshow(probabilities_np.squeeze()[:, :, 0])
        ax3.set_title('Prediction probabilities')
        plt.show()

    sess.close()

    summary_string_writer.close()

INFO:tensorflow:Restoring parameters from /home/thalles_silva/Thalles/image-segmentation/vgg/vgg_16.ckpt


InvalidArgumentError: Incompatible shapes: [1,31,23,21] vs. [1,30,22,21]
	 [[Node: add = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](seg_vars/pool4/BiasAdd, conv2d_transpose)]]
	 [[Node: ResizeBilinear/_65 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_331_ResizeBilinear", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op 'add', defined at:
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2698, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2802, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-12-b8cf0e0e9281>", line 3, in <module>
    upsampled_by_factor_16_logits = model(image_tensor, number_of_classes=number_of_classes, is_training=True)
  File "<ipython-input-7-30438eaf8d01>", line 58, in model
    fused_last_layer_and_pool4_logits = pool4_logits + last_layer_upsampled_by_factor_2_logits
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 821, in binary_op_wrapper
    return func(x, y, name=name)
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 73, in add
    result = _op_def_lib.apply_op("Add", x=x, y=y, name=name)
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
    op_def=op_def)
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/thalles_silva/anaconda3/envs/py3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Incompatible shapes: [1,31,23,21] vs. [1,30,22,21]
	 [[Node: add = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](seg_vars/pool4/BiasAdd, conv2d_transpose)]]
	 [[Node: ResizeBilinear/_65 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_331_ResizeBilinear", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
