<h1 style="color:white;background-color:rgb(255, 108, 0);padding-top:1em;padding-bottom:0.7em;padding-left:1em;">Optical flow-based segmentation of moving objects for mobile robot navigation using pre-trained deep learning models</h1>
<hr>

<h2>Auxiliary materials</h2>

This notebook contains some helper functions to implement the compound training strategy for binary image segmentation that is introduced in our paper.
<br>
The training is based on the combination of two loss functions:

- The Dice loss and
- The Cross Entropy loss

<h3>Dice loss</h3>

The Dice loss is based on the Sørensen–Dice coefficient (Dice coefficient), with which the overlap between two sets, A and B can be computed like:

$$
Dice = \frac{2|A\cap B|}{|A|+|B|}.
$$

In case of binary segmentation, A and B are the predicted and groud-truth segmentation maps respectively. If the pixels of the segmentaion maps are not interpreted as binary values, but rather as probabilities, the Soft Dice coefficient can be used:

$$
Soft Dice = \frac{2\sum\limits_{i=1}^Ny_ip_i}{\sum\limits_{i=1}^Np_i+\sum\limits_{i=1}^Ny_i},
$$

where N is the number of pixels in the output segmentation map,
<br>
$y_i$ is the correct label for the $i^{th}$ output pixel (0 or 1) and
<br>
$p_i$ is the predicted probability of the $i^{th}$ pixel having a correct label of 1.

From the Soft Dice coefficient the Dice loss can be formulated like:

$$
 L_{SD}=1-Soft Dice.
$$

<br>
<br>
The following code contains implementation of the Soft Dice coefficient and the Dice loss:
<br>
(Please note that we assume sigmoid activation function in the output layer. If an other activation function is used in the output layer the following code should be modified accordingly)

In [None]:
def dice_coeff(labels, logits):
    '''This function computes the Soft Sørensen–Dice coefficient.
    
    Args:
        labels - (batch, height x width) shaped tensor of the label maps for the binary segmentation problem
        logits - (batch, height x width) shaped tensor of the logits
    
    Returns:
        Tensor of shape () containing the Soft Sørensen–Dice coefficient for labels and logits
    '''
    
    with tf.variable_scope('Dice_Coefficient'):
        labels = tf.cast(labels, tf.float32)
        predictions = tf.nn.sigmoid(tf.cast(logits, tf.float32))
        num = tf.cast(tf.reduce_sum(2*labels*predictions, axis=1), tf.float32) + tf.constant(0.01)
        den = tf.cast(tf.reduce_sum(labels, axis=1)+tf.reduce_sum(predictions, axis=1), tf.float32) + tf.constant(0.01)
        dice_coeff = tf.reduce_mean(num/den, name='dice_coeff')
    return dice_coeff

def dice_loss(labels, logits):
    '''This function computes the Dice loss for binary classification
    
    Args:
        labels - (batch, height x width) shaped tensor of the label maps for the binary segmentation problem
        logits - (batch, height x width) shaped tensor of the logits
        
    Returns:
        Tensor of shape () containing the Dice loss for labels and logits. This value is bound between 0 and 1.
    '''
    
    with tf.variable_scope('Dice_Loss'):
        dice_loss = tf.identity(1-dice_coeff(labels, logits), name='dice_loss')
    return dice_loss

<h3>Compound loss</h3>

The compund loss is calculated from the linear combination of the Dice loss and the Cross Entropy loss:

$$
L = (1-\alpha)L_{CE}+\alpha L_{SD},
$$

where $\alpha$ is the coefficient for controlling how much the Dice loss and the Cross Entropy loss contribute to the compound loss,
<br>
$L_{CE}$ is the Cross Entropy loss and
<br>
$L_{SD}$ is the Dice loss.

<br>
<br>
The following function implements the compund loss (for the Cross Entropy loss we used the built-in function from TensorFlow):

In [None]:
def compound_loss(labels, logits):
    '''This function computes the compound loss for the binary classification problem
    
    Args:
        labels - (batch, height x width) shaped tensor of the label maps for the binary segmentation problem
        logits - (batch, height x width) shaped tensor of the logits
                
    Returns:
            Tuple of a tensor of shape () containing the compound loss for labels and logits for a certain alpha and
            a tensor to set the value of alpha during training with the feed_dict parameter of the tf.Session().run()
            call. Alpha is a scalar to control how much the Dice loss and the Cross Entropy loss contribute to the
            compound loss (default value is 0, the value for alpha should be in the [0,1] interval). A value of 0 means
            that the compound loss is entirely computed from the Cross Entropy loss and a value of 1 means that the
            compound loss is entirely computed from the Dice loss. Please note that setting a value for alpha that is
            smaller than 0 or greater than 1 does not raise an error, but the value gets clipped to either 0 or 1
            respectively.
    '''
    
    with tf.variable_scope('Compound_Loss'):
        alpha = tf.constant(0, dtype=tf.float32, name='alpha')
        loss_weight = tf.clip_by_value(alpha, 0, 1, name='loss_weight')
        compound_loss = (1-loss_weight)*tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=tf.cast(labels,tf.float32),logits=tf.cast(logits,tf.float32)))+loss_weight*dice_loss(labels,logits)
    return (compound_loss, alpha)

<h3>Usage</h3>

The code bellow shows an example of how the compound loss function can be used in a training loop, if the number of iterations is known a priori. We found that setting the value of alpha according to

$$
\alpha = \frac{\left(\dfrac{j}{n}\right)^4}{1.6},
$$

where $n$ is the number of iterations and $j$ is the current iteration number, is a reasonable choice.

In [None]:
# Import the training data or create a data stream and define the tensor for the ground-truth labels:

#...
labels = #...


# Define a model and collect its outputs, without the sigmoid activation, in logits:

#...
logits = #...


# Define the compound loss and the training step operation (minimize the compound loss):

cl, alpha = compound_loss(labels, logits)
optimizer = #...
train_op = optimizer.minimize(cl)


# Number of training iterations:

n = #...


# Training loop:

with tf.Session() as sess:
    # Initilaization operations:
    
    #...
    
    
    # Training:
    
    for j in range(n):
        sess.run(train_op, feed_dict={alpha: ((j/n)**4)/1.6})