## DICE COEFFICIENT

* This competition is evaluated on the mean Dice coefficient. 
* The Dice coefficient can be used to compare the pixel-wise agreement between a predicted segmentation and its corresponding ground truth. 
* Dice coefficient is 2 times The area of Overlap divided by the total number of pixels in both the images

![Dice coeff image](https://miro.medium.com/max/429/1*yUd5ckecHjWZf6hGrdlwzA.png)

The formula is given by:

<center> $ \huge \frac{2*|X∩Y|}{|X|+|Y|}$ </center>

<br>

where X is the predicted set of pixels and Y is the ground truth.
* The Dice coefficient is defined to be 1 when both X and Y are empty. The leaderboard score is the mean of the Dice coefficients for each image in the test set.

* [Here's](https://towardsdatascience.com/metrics-to-evaluate-your-semantic-segmentation-model-6bcb99639aa2) an interesting article for further read!

### Table of Contents

1. [Import libraries](#libimport)
2. [Loading Dataset](#loaddatset)
3. [Writing a Simple DICE Implementation](#simpledice)
4. [Image and the Mask](#imagemask)
5. [How Big are the Glomeruli Masks?](#glomersize)
6. [Dice Coefficient Between same masks](#dicesame)
7. [Dice Coefficient Between shifted masks](#diceshift)
8. [Plot Between Dice Coefficient and Shift in masks](#diceplot)

<div id="libimport"> </div>

### Import the libraries


In [None]:
import collections
import json
import os
import uuid
import gc

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from PIL import Image, ImageDraw
import tifffile as tiff 
import seaborn as sns
import tensorflow as tf
from tqdm.notebook import tqdm

from skimage.measure import label, regionprops
import cv2

<div id="loaddataset"> </div>

### Loading Dataset

In [None]:
TRAIN_PATH = "../input/hubmap-kidney-segmentation/train/"

## Training dataset information
train_df = pd.read_csv("../input/hubmap-kidney-segmentation/train.csv")
print(f"Shape of the Train data - {train_df.shape}")

print(f"First ID - {train_df.iloc[0, 0]}")
print(f"First Encoding string beginning- {train_df.iloc[0, 1][:100]}")

<div id="simpledice"> </div>

### Writing a Simple DICE Coefficient Implementation

* As seen above, we need to calculate the intersection pixels in the masks. 
* Since our masks consist of just 0s and 1s, simple multiplication will leave us with intersection pixels

In [None]:
def DICE_COE(mask1, mask2):
    intersect = np.sum(mask1*mask2)
    fsum = np.sum(mask1)
    ssum = np.sum(mask2)
    dice = (2 * intersect ) / (fsum + ssum)
    dice = np.mean(dice)
    dice = round(dice, 3) # for easy reading
    return dice    

<div id="imagemask"> </div>

### Image and The Mask

In [None]:
## Looking into a single training image
image1 = tiff.imread(TRAIN_PATH + train_df.iloc[4, 0] + ".tiff")

In [None]:
print("Image ID --> ", train_df.iloc[1, 0], "\tTraining image shape -->", image1.shape)

In [None]:
## The kidney tissue image
image1 = image1[0][0].transpose(1, 2, 0)
plt.figure(figsize=(10, 10))
plt.imshow(image1)
plt.title("Random Image of Kidney Tissue", size=15)
plt.show()

In [None]:
## We need to decode the mask from encoding column of train.csv
## https://www.kaggle.com/paulorzp/rle-functions-run-lenght-encode-decode
def mask2rle(img):
    '''
    img: numpy array, 1 - mask, 0 - background
    Returns run length as string formated
    '''
    pixels= img.T.flatten()
    pixels = np.concatenate([[0], pixels, [0]])
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
    runs[1::2] -= runs[::2]
    return ' '.join(str(x) for x in runs)
 
def rle2mask(mask_rle, shape):
    '''
    mask_rle: run-length as string formated (start length)
    shape: (width,height) of array to return 
    Returns numpy array, 1 - mask, 0 - background

    '''
    s = mask_rle.split()
    starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
    starts -= 1
    ends = starts + lengths
    #print(starts, ends)
    img = np.zeros(shape[0]*shape[1], dtype=np.uint8)
    for lo, hi in zip(starts, ends):
        img[lo:hi] = 1
    return img.reshape(shape).T

In [None]:
## Plot all the Glomeruli in this particular kidney
mask = rle2mask(train_df.iloc[4, 1], (image1.shape[1], image1.shape[0])) # Call the RLE2Mask function

In [None]:
## The same kidney image with all the masks
plt.figure(figsize=(10, 10))
plt.imshow(image1)
plt.imshow(mask, alpha=0.5, cmap='plasma')
plt.title("Image with Masks on Glomeruli", size=15)
plt.show()

In [None]:
# SAVING RAM
del train_df
a = gc.collect()

<div id="glomersize"> </div>

## How Big are the Glomeruli Masks?

In [None]:
print(f"Shape of the Full Glomeruli Mask - {mask.shape}")

####  SKIMAGE HANDY FUNCTIONS

* LABEL [(skimage.measure.label)](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.label)
This finds the connected regions from the image array.

* REGION PROPS [(skimage.measure.region_props)](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops)
This Finds the properties of the labeled array, here we are concerned about the bounding box of the mask.

In [None]:
## Identify all the coordinates of the glomeruli in this image
labelled = label(mask) 
props = regionprops(labelled)

print(f"Number of Glomeruli identified - {len(props)}")

Converting the Properties to Bounding boxes

In [None]:
bboxes = [] 
for prop in props:
    bboxes.append([prop.bbox[0] - 20, prop.bbox[1] - 20, 
                   prop.bbox[2] + 20, prop.bbox[3] + 20]) ## Adding a little bit of extra image run

Visualizing one of the Bounding box

In [None]:
plt.figure(figsize=(10, 10))
plt.imshow(image1[bboxes[0][0]:bboxes[0][2], bboxes[0][1]:bboxes[0][3], :])
plt.imshow(mask[bboxes[0][0]:bboxes[0][2], bboxes[0][1]:bboxes[0][3]], alpha=0.5, cmap='viridis')
plt.show()

We can notice a few things - 
* The Mask is not perfectly positioned on the Glomeruli
* The Mask is Approximately 400 pixels wide, we can understand the scale of the size of the other masks

In [None]:
# SAVING RAM
del image1, props, labelled, bboxes
a = gc.collect()

<div id="dicesame"> </div>

### DICE COEFFICIENT BETWEEN SAME IMAGES

What do you think will be the output in this case?

In [None]:
print(f"Dice Coefficient of two same masks are {DICE_COE(mask, mask)}")

Yes! You can obtain the answer by easy calculation

Let the number of masked pixels be x . Then the intersection between the two images will also have x pixels.

$\huge \frac{2\cdot x }{x + x}  = 1.0$

<div id="diceshift"> </div>

### DICE COEFFICIENT OF SHIFTED IMAGES

Lets shift the mask a little, say 5 pixels and see the effect on the dice score

In [None]:
# We are just shifting the images towards the bottom to keep it simple
def return_shifted(mask, shift=5):
    nmask = np.zeros((mask.shape[0]+shift, mask.shape[1]))
    nmask[shift:, :] = mask
    nmask = nmask[:-shift, :]
    return nmask


sh_mask = return_shifted(mask)

In [None]:
print(f"The DICE COEFFICIENT of Same Masks shifted by 5 pixels is {DICE_COE(mask, sh_mask)}")

We can see that the score decreased!


Now lets shift it more by 10 20 30 pixels

In [None]:
# SAVING RAM
del sh_mask
a = gc.collect()

In [None]:
for shift in [10,20,30]:
    print(f"The DICE COEFFICIENT of Same Masks shifted by {shift} pixels is {DICE_COE(mask, return_shifted(mask, shift=shift))}")

Now lets try to calculate for big shifts like 400, 500 and 600

In [None]:
for shift in [400,500,600]:
    print(f"The DICE COEFFICIENT of Same Masks shifted by {shift} pixels is {DICE_COE(mask, return_shifted(mask, shift=shift))}")

<div id="diceplot"> </div>

Lets take a step further and plot a curve between shift and the Dice coefficient

In [None]:
nums = list(range(1, 601, 15))
dices  = []
for num in tqdm(nums):
    dices.append(DICE_COE(mask, return_shifted(mask, shift=num)))

In [None]:
plt.figure(figsize=(10, 10))
plt.plot(nums, dices)
plt.title('SHIFT VS DICE COEFFICIENT', size=20)
plt.xlabel('Shift in Pixels', size=15)
plt.ylabel('Dice coefficient', size=15)

plt.show()

#### **We can clearly see how the Dice Coefficient Decreases as the shift increases and the masks are less overlapping with the true masks.**

P.S - The Small increase in the end can be attributed to the fact that some masks might be overlapping with other true masks as the shift increases. 

### References

1. [HuBMAP - Visualize Mask & BBOX 📈](https://www.kaggle.com/ckanth090/hubmap-visualize-mask-bbox)