<h1 style="text-align: center; font-family: Verdana; font-size: 32px; font-style: normal; font-weight: bold; text-decoration: none; text-transform: none; font-variant: small-caps; letter-spacing: 3px; color: #468282; background-color: #ffffff;">HuBMAP + HPA - Hacking the Human Body</h1>
<h2 style="text-align: center; font-family: Verdana; font-size: 24px; font-style: normal; font-weight: bold; text-decoration: underline; text-transform: none; letter-spacing: 2px; color: navy; background-color: #ffffff;">Segment multi-organ functional tissue units</h2>

<img src="https://storage.googleapis.com/kaggle-competitions/kaggle/34547/logos/header.png"> 

> # 📌Introduction: 
>> A lot of segmentation competition is getting hosted in kaggle recently. This competition is very similar to the previous segmentation competitions, eg: [Sartorius CIS](https://www.kaggle.com/competitions/sartorius-cell-instance-segmentation/overview), and ongoing [UW-Madison GI Tract Image Segmentation](https://www.kaggle.com/competitions/uw-madison-gi-tract-image-segmentation) also felt similar [but no so much], from the POV of medical data, segmentation problem statement. You might start with different models used in those above competitions, and learn a few frameworks used like MMdet, detectron2 etc. The evaluation metric is mean Dice coefficient, don't really know why mean?[maybe because mean over all the segments found in a single image]. But other than that Dice coefficient is a very popular metric for image segmentation. if you don't know, you might want to check out this [NB](https://www.kaggle.com/code/yerramvarun/understanding-dice-coefficient). Use models like Unet, SegNet, Enet etc. type encoder decoder model for baseline, then move on to more complex models and pre-processing and post-processing techniques. Follow the augmentations used in the previous competitions, and do trial and error for fitting those augmentations to the model, or come up with some new one. 

>> Data for this competition comes from two different consortiums, the Human Protein Atlas (HPA) and Human BioMolecular Atlas Program (HUBMAP). As mentioned in the Data tab, one of the main challenges of this competition will be adapting models to function properly when presented with data collected using a different protocol. Because among the three datasets, the training set contains data from public HPAs, the public test set is a combination of private HPAs and HuBMAP data, and the private test set contains only HuBMAP data. Image resolution is high, though the number of training images are very small(351).

>> I plan to publish three different parts which will include data prep, model training, model inference. Its a very basic version of the NB, will try to improve over time.

# Imports

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import cv2
import matplotlib.pyplot as plt
import json
from tqdm import tqdm

In [None]:
DIR = "../input/hubmap-organ-segmentation"
train_df = pd.read_csv(os.path.join(DIR,"train.csv"))
train_df.head()

In [None]:
train_df.info()

In [None]:
# source: https://www.kaggle.com/code/julian3833/sartorius-starter-torch-mask-r-cnn-lb-0-273 w/ a bit change
def rle_decode(mask_rle, shape, color=1):
    '''
    mask_rle: run-length as string formated (start length)
    shape: (height,width) of array to return 
    Returns numpy array, 1 - mask, 0 - background
    '''
    s = mask_rle.split()
    starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
    starts -= 1
    ends = starts + lengths
    img = np.zeros(shape[0] * shape[1], dtype=np.float32)
    for lo, hi in zip(starts, ends):
        img[lo : hi] = color
    return img.reshape(shape).T

# Checking the training data:

In [None]:
# train_df["rle"].iloc[0]

for i in np.random.choice(200,5):
    rle_img = rle_decode(train_df["rle"].iloc[i],(train_df["img_height"].iloc[i],train_df["img_width"].iloc[i]))
    img_dir = os.path.join("../input/hubmap-organ-segmentation/train_images" , str(train_df["id"].iloc[i]) + '.tiff')
#     print(img_dir)
    img = plt.imread(img_dir)

    plt.figure(figsize=(16,18))
    plt.subplot(1,2,1)
    plt.imshow(img)
    plt.title(f"id: {i} image")

    plt.subplot(1,2,2)
    plt.imshow(rle_img)
    plt.title(f"id: {i} mask");

# Comparing `train_annotations` folder data with RLE: 

- Reading the RLE from `.csv` and creating the mask.

In [None]:
i = 0
rle_img = rle_decode(train_df["rle"].iloc[i],(train_df["img_height"].iloc[i],train_df["img_width"].iloc[i]))
img_dir = os.path.join("../input/hubmap-organ-segmentation/train_images" , str(train_df["id"].iloc[i]) + '.tiff')
#     print(img_dir)
img = plt.imread(img_dir)

plt.figure(figsize=(16,18))
plt.subplot(1,2,1)
plt.imshow(img)

plt.subplot(1,2,2)
plt.imshow(rle_img);

- Reading the polygon ploints from json.

In [None]:
with open("../input/hubmap-organ-segmentation/train_annotations/10044.json") as rle_json:
    data = json.load(rle_json)
    
print(data.__len__())

In [None]:
image = np.zeros((3000,3000))
for i in range(len(data)):
    image = cv2.fillPoly(image, pts = [np.array(data[i])], color =(255,255,255))

plt.figure(figsize=(16,18))
plt.subplot(1,2,1)
plt.imshow(img)

plt.subplot(1,2,2)
plt.imshow(image);

- Both are same

# Saving the masks in folder:

In [None]:
folder1 = "/kaggle/working/train_masks"
folder2 = "/kaggle/working/train_masks_np"

if not os.path.isdir(folder1):
    os.mkdir(folder1)
    
if not os.path.isdir(folder2):
    os.mkdir(folder2)
    
    
for i in tqdm(range(len(train_df))):
    rle_img = rle_decode(train_df["rle"].iloc[i],(train_df["img_height"].iloc[i],train_df["img_width"].iloc[i]))
    f_name1 = os.path.join(folder1, str(train_df["id"].iloc[i])+'.png')
    f_name2 = os.path.join(folder2, str(train_df["id"].iloc[i])+'.npy')

    cv2.imwrite(f_name1,rle_img)
    np.save(f_name2, rle_img)
    

In [None]:
plt.imshow(np.load(os.path.join(folder2,"10610.npy")))

In [None]:
plt.imshow(plt.imread(os.path.join(folder1,"10610.png")))

In [None]:
os.listdir("./train_masks_np").__len__(), os.listdir("./train_masks").__len__()

> # ⭕ WORK IN PROGRESS ! ! !
<p align="center">
<img src="https://media.giphy.com/media/xThuWu82QD3pj4wvEQ/giphy.gif" width="300">
</p>