In [1]:
import torch
from torch import Tensor
import torch.nn.functional as F
import numpy as np

# DICE evaluation metric
In the lab semantic segmentation, you have implemented IOU to evaluate the performance of the model. Here, you need to implement a similar evaluation metric called DICE or Sørensen–Dice coefficient, and it is formulated as: $$ DICE(X, X_{truth}) = \frac{2|X \cap X_{truth}|}{|X| + |X_{truth}|}$$ \
Compared to IOU, DICE is more sensitive to small differences in overlap due to the squared terms in the numerator and denominator, so it can be more informative when there's a need to discriminate between segmentations with subtle differences in overlap.

In [2]:
def DICE(inp : Tensor, tgt : Tensor):
    """
    Arguments:
        inp: Predicted mask (batchsize, number of classes, width, height)
        tgt: Ground truth mask (batchsize, number of classes, width, height)
    Returns:
        Classwise Average of DICE coefficient
    """
    eps = 1e-5 # small number to add to denominator to avoid division by zero
    #YOUR CODE START HERE
    sum_dim = (-1, -2, -3)
    # calculation of intersection   
    inter = 2 *(inp * tgt).sum(dim=sum_dim)

    # calculate the sum of |inp| + |tgt|
    sets_sum = inp.sum(dim=sum_dim) + tgt.sum(dim=sum_dim)
    sets_sum = torch.where(sets_sum == 0, inter, sets_sum)

    # calcaute the dice    
    dice = (inter + eps) / (sets_sum + eps)
    
    # average the dice batchwise
    return dice.mean()

### Tests

In [3]:
prediction1 = Tensor([[0, 7, 5, 7, 2],
        [2, 4, 5, 9, 9],
        [2, 8, 5, 1, 8],
        [3, 6, 5, 2, 6],
        [3, 2, 9, 1, 1]]).unsqueeze(0).long()
mask1 = Tensor([[4, 2, 5, 0, 2],
        [8, 2, 9, 8, 5],
        [0, 8, 7, 9, 6],
        [8, 6, 5, 9, 1],
        [3, 2, 9, 0, 6]]).unsqueeze(0).long()
prediction2 = Tensor([[5, 7, 3, 3, 0],
        [0, 2, 8, 2, 7],
        [1, 7, 0, 9, 9],
        [7, 5, 2, 3, 4],
        [6, 0, 9, 0, 1]]).unsqueeze(0).long()
mask2 = Tensor([[4, 6, 8, 3, 0],
        [4, 4, 7, 2, 7],
        [0, 0, 4, 9, 9],
        [5, 2, 3, 3, 4],
        [3, 0, 0, 8, 2]]).unsqueeze(0).long()

#Tests
dice1 = DICE(F.one_hot(prediction1).permute(0, 3, 1, 2).float(), F.one_hot(mask1).permute(0, 3, 1, 2).float()).item()
dice2 = DICE(F.one_hot(prediction2).permute(0, 3, 1, 2).float(), F.one_hot(mask2).permute(0, 3, 1, 2).float()).item()

assert np.isclose(0.3200001120567322, dice1), 'incorrect dice 1!'
assert np.isclose(0.3600001037120819, dice2), 'incorrect dice 2!'
print("\033[92m All tests passed!")

[92m All tests passed!


## Open questions 

1. Based on their formulation, what are the **difference** among Cross-entropy, DICE, and IOU?  

Let us first consider cross-entropy and DICE. Here is a formlation for cross-entropy:

$L=-\sum^n_{i=1}{t_i \log{p_i}}$

This is for all n classes where $p_i$ is taken as a softmax probability for the $i^{th}$ class. Under the guise of semantic segmentation this cross-entropy loss would be performed as the expectation of each pixel and so no generalisation takes place. In other words, each loss is computed individually per pixel, and so there is no consideration of adjacent pixels and whether boundaries occur at these adjacent points. On the other hand both DICE and IOU both evalaute on a more granular level as they both take the sums of some form of overlap. IOU takes the area of intersection over the area of union whearas DICE takes the area of overlap over the total area. DICE also uses the square of these values and so DICE is more sensitive to smaller changes and so provides a more discrimatory loss than IOU.

2. How might the choice of architecture, such as U-Net or FFN, affect the performance and application suitability of your semantic segmentation model?

The choice of architecture has an immense effect on both performance and application suitability. However, models are in development that attempt to achieve greater robustness to unseen data. U-Net is an example of a convolution network with an encoder, decoder style model. This allows the model to achieves rapid learning over a quick period of time but is unstable in the long run with a particular emphasis on not being able to boundaries are intricate or when the class target is small (https://arxiv.org/abs/2309.13013#:~:text=The%20findings%20reveal%20that%20the,when%20handling%20fine%20image%20details). This suggests that the model places more effort into acheiving a greater semantic understanding of the scene rather than on the precision of a task. As such it may be more suitable for larger scale tasks than the biomedical task being investigated.

As such the choice in architecture is indivitive of the task that is chosen as the problem. If precise boundaries are required a greater emphasis may be placed in having smaller convolutions and with little reduction in hidden layer size or with additional layers such that precise boundaries are retained. For over tasks, such as for RGB cameras w