<h1> HubMap - Hacking the Kidney </h1>
<h3> Goal - Mapping the human body at function tissue unit level - detect glomeruli FTUs in kidney </h3>

Description - Calculate the performance metrics for test data predictions of kidney data. <br>
Input - submission.csv (csv file containing rle format predicted mask), test.csv (csv file containing rle format original mask).<br>
Output - Performance metrics values - dice coeff, Jaccard index, pixel accuracy, hausdorff distance. <br>

<b>How to use?</b><br> 
Change the basepath to where your data lives and you're good to go. <br>

<b>How to reproduce on a different dataset?</b><br>
Create a train and test folders of the dataset containing train images and masks and test images and masks respectively. Have a train.csv with the rle for train images and a sample-submission file with test image names. Create a test.csv with rle for test images and predicted csv from the trained network. 

<hr>


<h6> Step 1 - Import useful libraries</h6>

In [8]:
import numpy as np
import pandas as pd
from sklearn.metrics import jaccard_score
from scipy.spatial.distance import directed_hausdorff
import json
import cv2
import matplotlib.pyplot as plt
from PIL import Image

In [11]:
DATA_PATH = r'C:/Users/soodn/Downloads/Naveksha/Kaggle HuBMAP/'
df = pd.read_csv('submission_kidney_pvt_deeplive.csv')
# Read ground truth rles as rles
rles = pd.read_csv(DATA_PATH + 'Data/kidney-data/private_test.csv')

In [12]:
path_test = r'C:/Users/soodn/Downloads/Naveksha/Kaggle HuBMAP/Data/kidney-data/private test/'

<h6> Step 2 - Write utility functions </h6> 

In [13]:
def dice_scores_img(pred, truth, eps=1e-8):
    pred = pred.reshape(-1) > 0
    truth = truth.reshape(-1) > 0
    intersect = (pred & truth).sum(-1)
    union = pred.sum(-1) + truth.sum(-1)

    dice = (2.0 * intersect + eps) / (union + eps)
    return dice

In [14]:
def rle_encode_less_memory(img):
    pixels = img.T.flatten()
    pixels[0] = 0
    pixels[-1] = 0
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 2
    runs[1::2] -= runs[::2]
    return ' '.join(str(x) for x in runs)

def enc2mask(encs, shape):
    img = np.zeros(shape[0] * shape[1], dtype=np.uint8)
    for m, enc in enumerate(encs):
        if isinstance(enc, np.float) and np.isnan(enc):
            continue
        enc_split = enc.split()
        for i in range(len(enc_split) // 2):
            start = int(enc_split[2 * i]) - 1
            length = int(enc_split[2 * i + 1])
            img[start: start + length] = 1 + m

    return img.reshape(shape).T

In [15]:
def read_mask(mask_file, mask_shape):
    read_file = open(mask_file, "r", encoding='utf-8') 
    mask_data = json.load(read_file)
    polys = []
    for index in range(mask_data.__len__()):
        geom = np.array(mask_data[index]['geometry']['coordinates'], dtype=np.int32)
        polys.append(geom)

    mask = np.zeros(mask_shape)
    cv2.fillPoly(mask, polys, 1)
    mask = mask.astype(bool)
    return mask

##### Step 3 - Calculate mean metrics values for test images 

In [17]:
import json
import cv2
import matplotlib.pyplot as plt
from PIL import Image
import gdal
sum_score = 0
scores = []

pvt_test = ['00a67c839', '0749c6ccc', '1eb18739d', '5274ef79a', '5d8b53a68', '9e81e2693', 'a14e495cf', 'bacb03928', 'e464d2f6c',
'ff339c0b2']
for img in pvt_test:
    img_array = gdal.Open(path_test+img+'.tiff').ReadAsArray()
    shape =  img_array.shape[1], img_array.shape[2]
    truth = rles[rles['id'] == img]['expected']
    mask_truth = enc2mask(truth, shape)
    print (mask_truth.shape)
    pred = df[df['id'] == img]['predicted']
    mask_pred = enc2mask(pred, shape)  
    print (mask_pred.shape)
    score = dice_scores_img(mask_pred, mask_truth)
    print (score)
    scores.append(score)
    sum_score += score

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if isinstance(enc, np.float) and np.isnan(enc):


(28672, 30400)
(28672, 30400)
0.947306437060565
(26624, 30368)
(26624, 30368)
0.960884703233028
(33103, 20329)
(33103, 20329)
0.9544122761166374
(18491, 22134)
(18491, 22134)
0.923693008092995
(36732, 22153)
(36732, 22153)
0.9331075763117931
(33100, 27642)
(33100, 27642)
0.9476178319718085
(32768, 62688)
(32768, 62688)
0.9671245404897417
(22163, 23968)
(22163, 23968)
0.937107881688568
(40816, 50560)
(40816, 50560)
0.9658314760007455
(38912, 48544)
(38912, 48544)
0.9661910278158478


In [18]:
# To find mean, divide by number of test images
l = len(df)
for img, s in zip(rles[5:]['id'],scores):
    print (round(s, 3))
    
print (round(sum_score/l,3))

0.947
0.961
0.954
0.924
0.933
0.948
0.967
0.937
0.966
0.966
0.95


<hr> 
2 - 0.7823584118305178 0.9789166355475171 0.65511773076382 17.892116009274872 <br>
3 - 0.8108048015490154 0.9801827959468621 0.6902205216112944 17.514854634509877 <br>
4 - 0.8177225324575326 0.9790151482771239 0.6976625988788336 18.188822003916233 <br>
