# MONAI Dice Overview

This notebook summarises some of the details relating to the `DiceLoss` and `DiceMetric` classes and their behaviour.

In [1]:
import torch
from monai.losses import DiceLoss
from monai.metrics import DiceMetric
from monai.transforms import AddChannel, AsDiscrete, Compose


def print_tensor(name, t):
    print(f"{name}: {t.numpy().tolist()} shape: {tuple(t.shape)}")

## Binary Formulation

Start with the binary case: 

In [2]:
# CHW
grnd = torch.zeros(1, 4, 4)
pred = torch.zeros(1, 4, 4)

# grnd= [0,0,0,0] pred= [0,0,0,0]
#       [0,1,1,0]       [0,1,1,0]
#       [0,1,1,0]       [1,1,0,0]
#       [0,0,0,0]       [0,0,0,0]
grnd[..., 1, 1] = grnd[..., 1, 2] = grnd[..., 2, 1] = grnd[..., 2, 2] = 1
pred[..., 1, 1] = pred[..., 1, 2] = pred[..., 2, 0] = pred[..., 2, 1] = 1

Given this ground truth and prediction we would expect a dice loss of 0.25 and so a dice score of 0.75. The prediction correctly labels 3 pixels as foreground out of 4 with 1 pixel of oversegmentation and 1 of undersegmentation. 

By default the loss and metric will work on the binary case and produce the expected results, that is calculate the dice loss/metric for the foreground:

In [3]:
loss = DiceLoss()
metric = DiceMetric()

# using [None] to add batch dimension
print_tensor("loss", loss(pred[None], grnd[None]))
print_tensor("metric", metric(pred[None], grnd[None]))

loss: 0.24999964237213135 shape: ()
metric: [[0.75]] shape: (1, 1)


The loss assumes by default that the input has been activated already. If the argument `sigmoid` is set to True sigmoid activation is applied to the prediction, similarly if `softmax` is True softmax is applied. Our fake predictions used here are treated as if they have been through activation already. The metric also assumes that the prediction given to it has been activated.

This binary data can be treated instead as a two class segmentation with background and foreground as the classes. This is less efficient than using the loss and metric in the binary form as shown above but does produce the correct results. If the `include_background` arguments for both are True (the default) and `reduction` is `None` we can see the loss/scores for both classes in the outputs:

In [4]:
# make one hot and add batch dimension
make_2_class = Compose([AsDiscrete(to_onehot=2), AddChannel()])

grnd2 = make_2_class(grnd)
pred2 = make_2_class(pred)

In [5]:
loss2 = DiceLoss(include_background=True, reduction="none")
metric2 = DiceMetric(include_background=True, reduction="none")

print_tensor("loss2", loss2(pred2, grnd2))
print_tensor("metric2", metric2(pred2, grnd2))

loss2: [[[[0.08333331346511841]], [[0.24999964237213135]]]] shape: (1, 2, 1, 1)
metric2: [[0.9166666865348816, 0.75]] shape: (1, 2)


The second values correspond to the foreground loss/score and match up with what was seen above.

With the loss the ground truth can be provided as a flat segmentation map if `to_onehot_y` is True:

In [6]:
loss_onehot = DiceLoss(include_background=True, to_onehot_y=True, reduction="none")

# note grnd[None] instead of grnd2
print_tensor("loss_onehot", loss_onehot(pred2, grnd[None]))

loss_onehot: [[[[0.08333331346511841]], [[0.24999964237213135]]]] shape: (1, 2, 1, 1)


If `include_background` is False then class 0 is discarded from the loss/score calculation. This is useful for loss functions when the background class may dominate the calculation and lead the network to optimise by just ignoring small segmentation classes, but probably this only makes sense in true multi-class situations:

In [7]:
loss3 = DiceLoss(include_background=False, reduction="none")
metric3 = DiceMetric(include_background=False, reduction="none")

print_tensor("loss3", loss3(pred2, grnd2))
print_tensor("metric3", metric3(pred2, grnd2))

loss3: [[[[0.24999964237213135]]]] shape: (1, 1, 1, 1)
metric3: [[0.75]] shape: (1, 1)


The results here are the same values as with the binary for of the loss and metric just different shape. If `include_background` is True and `reduction` is `mean` the loss/score looks different for the same data because the values for the background are included and averaged with that of the foreground. A pixel/voxel can only be fore- or background (ie. the relationship between foreground and background is exclusive) so the loss/metric values for each category improve or worsen in lockstep and provide no added signal. It's therefore unnecessary to calculate this and leads to poor training:

In [8]:
loss4 = DiceLoss(include_background=True, reduction="mean")
metric4 = DiceMetric(include_background=True, reduction="mean")

print_tensor("loss4", loss4(pred2, grnd2))
print_tensor("metric4", metric4(pred2, grnd2))

loss4: 0.16666647791862488 shape: ()
metric4: [[0.9166666865348816, 0.75]] shape: (1, 2)


## Multi-class Formulation

Onto the multi-class case, in this example 3 classes where class 1 is perfectly segmented:

In [9]:
mgrnd = torch.zeros(1, 4, 4)
mpred = torch.zeros(1, 4, 4)

# grnd= [0,0,0,0] pred= [0,0,0,0]
#       [0,1,1,0]       [0,1,1,0]
#       [0,2,2,0]       [2,2,0,0]
#       [0,0,0,0]       [0,0,0,0]
mgrnd[..., 1, 1] = mgrnd[..., 1, 2] = 1
mgrnd[..., 2, 1] = mgrnd[..., 2, 2] = 2

mpred[..., 1, 1] = mpred[..., 1, 2] = 1
mpred[..., 2, 0] = mpred[..., 2, 1] = 2

Using the binary formulation with these values will lead to weird results:

In [10]:
print_tensor("loss", loss(mpred[None], mgrnd[None]))
print_tensor("metric", metric(mpred[None], mgrnd[None]))

loss: 0.0 shape: ()
metric: [[1.0]] shape: (1, 1)


Instead the prediction and ground truth should be in one-hot format:

In [11]:
# make one hot and add batch dimension
make_3_class = Compose([AsDiscrete(to_onehot=3), AddChannel()])

mgrnd2 = make_3_class(mgrnd)
mpred2 = make_3_class(mpred)

In [12]:
print_tensor("loss", loss(mpred2, mgrnd2))
print_tensor("metric", metric(mpred2, mgrnd2))

loss: 0.19444401562213898 shape: ()
metric: [[0.9166666865348816, 1.0, 0.5]] shape: (1, 3)


Once again note that the assumption is the inputs for both the loss and metric have been through activation already and are in one-hot format. If the prediction hasn't been activated `softmax` can be set to True to apply this activation, and `to_onehot_y` can be set to True to convert the ground truth to one-hot. 

The design decision in MONAI is not to integrate activation into the final layer of networks and to use post-processing transforms to apply activation to predictions or use loss function arguments. Be aware of what your network is producing as its final output and whether this should be passed through sigmoid or softmax before calculating loss or metric.

Disabling reduction allows us to see the loss/metric scores for every class:

In [13]:
loss5 = DiceLoss(reduction="none")
metric5 = DiceMetric(reduction="none")

print_tensor("loss5", loss5(mpred2, mgrnd2))
print_tensor("metric5", metric5(mpred2, mgrnd2))

loss5: [[[[0.08333331346511841]], [[0.0]], [[0.4999987483024597]]]] shape: (1, 3, 1, 1)
metric5: [[0.9166666865348816, 1.0, 0.5]] shape: (1, 3)


From both outputs we can see that the background is almost perfectly predicted, class 1 is in fact perfect, and class 2 is off by half (ie. 1 pixel out of 2 is correct). These examples neglect the case of correctly segmenting a pixel but with the wrong category but this isn't really necessary here.

If the ground truth isn't one-hot the loss function can be changed to do that conversion:

In [14]:
loss6 = DiceLoss(reduction="none", to_onehot_y=True)
print_tensor("loss6", loss6(mpred2, mgrnd[None]))

loss6: [[[[0.08333331346511841]], [[0.0]], [[0.4999987483024597]]]] shape: (1, 3, 1, 1)
