<h1> HubMap - Hacking the Kidney </h1>
<h3> Goal - Mapping the human body at function tissue unit level - detect glomeruli FTUs in kidney </h3>

Description - Calculate the performance metrics for test data predictions of kidney data. <br>
Input - submission.csv (csv file containing rle format predicted mask), test.csv (csv file containing rle format original mask).<br>
Output - Performance metrics values - dice coeff, Jaccard index, pixel accuracy, hausdorff distance. <br>

<b>How to use?</b><br> 
Change the basepath to where your data lives and you're good to go. <br>

<b>How to reproduce on a different dataset?</b><br>
Create a train and test folders of the dataset containing train images and masks and test images and masks respectively. Have a train.csv with the rle for train images and a sample-submission file with test image names. Create a test.csv with rle for test images and predicted csv from the trained network. 

<hr>


<h6> Step 1 - Import useful libraries<h6>

In [5]:
import numpy as np
import pandas as pd
from sklearn.metrics import jaccard_score
from scipy.spatial.distance import directed_hausdorff
import json
import cv2
import matplotlib.pyplot as plt
from PIL import Image
from pathlib import Path

In [8]:
DATA_PATH = Path(r'C:/Users/soodn/Downloads/Naveksha/Kaggle HuBMAP/')
df = pd.read_csv('output/submission_df2_reproduced.csv')
rles = pd.read_csv(DATA_PATH/'Data/kidney-data/test.csv')
df_info = pd.read_csv(DATA_PATH/'Data/kidney-data/HuBMAP-20-dataset_information.csv')

In [3]:
path_test = r'C:/Users/soodn/Downloads/Naveksha/Kaggle HuBMAP/Data/hubmap-kidney-segmentation-data/test/'

<h6> Step 2 - Write utility functions </h6> 

In [4]:
def dice_scores_img(pred, truth, eps=1e-8):
    pred = pred.reshape(-1) > 0
    truth = truth.reshape(-1) > 0
    intersect = (pred & truth).sum(-1)
    union = pred.sum(-1) + truth.sum(-1)
    dice = (2.0 * intersect + eps) / (union + eps)
    return dice

In [5]:
def perf_metrics(gt, pred):
    n = 0
    d = 0
    for i in range(gt.shape[0]):
        for j in range (gt.shape[1]):
            if (gt[i][j]==pred[i][j]):
                n = n+1
            d = d+1
    
    return n/d, jaccard_score(gt.flatten(order='C'), pred.flatten(order='C')), directed_hausdorff(gt, pred)

In [6]:
def read_mask(mask_file, mask_shape):
    read_file = open(mask_file, "r", encoding='utf-8') 
    mask_data = json.load(read_file)
    polys = []
    for index in range(mask_data.__len__()):
        geom = np.array(mask_data[index]['geometry']['coordinates'], dtype=np.int32)
        polys.append(geom)

    mask = np.zeros(mask_shape)
    cv2.fillPoly(mask, polys, 1)
    mask = mask.astype(bool)
    return mask

In [7]:
def rle_encode_less_memory(img):
    pixels = img.T.flatten()
    pixels[0] = 0
    pixels[-1] = 0
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 2
    runs[1::2] -= runs[::2]
    return ' '.join(str(x) for x in runs)

def enc2mask(encs, shape):
    img = np.zeros(shape[0] * shape[1], dtype=np.uint8)
    for m, enc in enumerate(encs):
        if isinstance(enc, np.float) and np.isnan(enc):
            continue
        enc_split = enc.split()
        for i in range(len(enc_split) // 2):
            start = int(enc_split[2 * i]) - 1
            length = int(enc_split[2 * i + 1])
            img[start: start + length] = 1 + m

    return img.reshape(shape).T

<h6> Step 3 - Calculate mean metrics values for test images </h6> 

In [None]:
sum_score = 0
sum_pa = 0
sum_ji = 0
sum_haus = 0

for img in rles['id'].unique():
    shape = df_info[df_info.image_file == img + ".tiff"][['width_pixels', 'height_pixels']].values.astype(int)[0]
    truth = rles[rles['id'] == img]['encoding']
    mask_truth = enc2mask(truth, shape)
    pred = df[df['id'] == img]['predicted']
    mask_pred = enc2mask(pred, shape)  
    score = dice_scores_img(mask_pred, mask_truth)
    pa, ji, haus = perf_metrics(mask_pred, mask_truth)    
    sum_score += score
    sum_pa += pa
    sum_ji += ji
    sum_haus += haus[0]
    print (img, "is done.")

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if isinstance(enc, np.float) and np.isnan(enc):


In [None]:
l = len(df)
print ('Dice Score:',sum_score/l, '\n Pixel Accuracy:',sum_pa/l, '\n Jaccard Index:',sum_ji/l, '\n Hausdorff Distance:',sum_haus/l)