# Description of the evaluation metric
This competition is evaluated on the mean **Dice coefficient**. The Dice coefficient can be used to compare the pixel-wise agreement between a predicted segmentation and its corresponding ground truth. The formula is given by:

2∗|𝑋∩𝑌|/|𝑋|+|𝑌|

where X is the predicted set of pixels and Y is the ground truth. The Dice coefficient is defined to be 1 when both X and Y are empty. The leaderboard score is the mean of the Dice coefficients for each image in the test set.

Submission File
In order to reduce the submission file size, our metric uses run-length encoding on the pixel values. Instead of submitting an exhaustive list of indices for your segmentation, you will submit pairs of values that contain a start position and a run length. E.g. '1 3' implies starting at pixel 1 and running a total of 3 pixels (1,2,3).

Note that, at the time of encoding, the mask should be binary, meaning the masks for all objects in an image are joined into a single large mask. A value of 0 should indicate pixels that are not masked, and a value of 1 will indicate pixels that are masked.

The competition format requires a space delimited list of pairs. For example, '1 3 10 5' implies pixels 1,2,3,10,11,12,13,14 are to be included in the mask. The metric checks that the pairs are sorted, positive, and the decoded pixel values are not duplicated. The pixels are numbered from top to bottom, then left to right: 1 is pixel (1,1), 2 is pixel (2,1), etc.

# Files
* [train/test].csv Metadata for the train/test set. Only the first few rows of the test set are available for download.

* id - The image ID.
* organ - The organ that the biopsy sample was taken from.
* data_source - Whether the image was provided by HuBMAP or HPA.
* img_height - The height of the image in pixels.
* img_width - The width of the image in pixels.
* pixel_size - The height/width of a single pixel from this image in micrometers. All HPA images have a pixel size of 0.4 µm. For HuBMAP imagery the pixel size is 0.5 µm for kidney, 0.2290 µm for large intestine, 0.7562 µm for lung, 0.4945 µm for spleen, and 6.263 µm for prostate.
* tissue_thickness - The thickness of the biopsy sample in micrometers. All HPA images have a thickness of 4 µm. The HuBMAP samples have tissue slice thicknesses 10 µm for kidney, 8 µm for large intestine, 4 µm for spleen, 5 µm for lung, and 5 µm for prostate.
* rle - The target column. A run length encoded copy of the annotations. Provided for the training set only.
* age - The patient's age in years. Provided for the training set only.
* sex - The sex of the patient. Provided for the training set only.
* sample_submission.csv

* id - The image ID.
* rle - A run length encoded mask of the FTUs in the image.
* [train/test]_images/ The images. Expect roughly 550 images in the hidden test set. All HPA images are 3000 x 3000 pixels with a tissue area within the image around 2500 x 2500 pixels. The HuBMAP images range in size from 4500x4500 down to 160x160 pixels. HPA samples were stained with antibodies visualized with 3,3'-diaminobenzidine (DAB) and counterstained with hematoxylin. HuBMAP images were prepared using Periodic acid-Schiff (PAS)/hematoxylin and eosin (H&E) stains. All images used have at least one FTU. All tissue data used in this competition is from healthy donors that pathologists identified as pathologically unremarkable tissue.

* train_annotations/ The annotations provided in the format of points that define the boundaries of the polygon masks of the FTUs.

In [None]:
import os

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from PIL import Image
import cv2
import tifffile
from fastai.vision.all import *
from fastai.callback.hook import *
from fastai.data.all import *

In [None]:
# Make dir
!mkdir -p /root/.cache/torch/hub/checkpoints


#Resnet34
!cp ../input/resnet34/resnet34-b627a593.pth /root/.cache/torch/hub/checkpoints/resnet34-b627a593.pth
#Resnet50
!cp ../input/resnet50/resnet50-0676ba61.pth /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
#Resnet101
!cp ../input/resnet101/resnet101-63fe2227.pth /root/.cache/torch/hub/checkpoints/resnet101-63fe2227.pth

In [None]:
# Paths
_BASE_DIR = '../input/hubmap-2022-512x512'
_IMG_DIR = os.path.join(_BASE_DIR,'train')
_MSK_DIR = os.path.join(_BASE_DIR,'masks')

# Data
traindf = pd.read_csv('/kaggle/input/hubmap-organ-segmentation/train.csv')
N_SAMPLES = len(traindf)
N_ORGANS = traindf['organ'].nunique()
ORGANS = sorted(traindf['organ'].unique())
ORGAN_CATS = {ORGANS[0]: 1, ORGANS[1]: 2, ORGANS[2]: 3, ORGANS[3]: 4, ORGANS[4]: 5}

# Params
BATCH_SIZE = 2
IMAGE_SIZE = 512
IMAGE_RESIZE = IMAGE_SIZE // 2
THRESHOLD = .39
M_DIR = './model'
OPTIMIZER = ranger
ACTIVATION_F = Mish
TEST_IMG_SIZE = 512

In [None]:
# Print data infos
print('-'*50)
print(f'''
[Info] Organ information:     

        N_SAMPLES: {N_SAMPLES}
        N_ORGANS: {N_ORGANS}
        ORGAN NAMES: {ORGANS}
        ORGAN CATEGORIES: {ORGAN_CATS}

''')
print('-'*50)

display(traindf.info())
display(traindf.describe())

In [None]:
# Func for train images and masks
def get_img_fn(path): return get_image_files(path)

In [None]:
# Test func on images
img_fnames = get_img_fn(_IMG_DIR)
img_fnames[:3]

In [None]:
# Test func on masks
label_fnames = get_image_files(_MSK_DIR)
label_fnames[:3]

In [None]:
# Plot test image
img_f = img_fnames[12]
img = load_image(img_f)
plt.imshow(img)

In [None]:
# Func for defining label path for data block
get_msk_fn = lambda x: f'{_MSK_DIR}/{x.stem}{x.suffix}'

In [None]:
# Test func
get_msk_fn(img_f)

In [None]:
# Plot test mask
msk = Image.open(get_msk_fn(img_f))
plt.imshow(msk)

In [None]:
# Define blocks
blocks = (ImageBlock, MaskBlock(ORGANS))

In [None]:
# Build data block
dblock = DataBlock(blocks    = blocks,
                   get_items = get_img_fn,
                   get_y     = get_msk_fn,
                   splitter  = RandomSplitter(),
                   item_tfms = Resize(IMAGE_RESIZE))

In [None]:
# Print summary of data block
dblock.summary(_IMG_DIR)

In [None]:
# Define data loader
dls = dblock.dataloaders(_IMG_DIR, 
                         bs=BATCH_SIZE)

In [None]:
# Plot batch of 4
dls.train.show_batch(max_n=4, nrows=1)

In [None]:
# Check sizes of tensors
b = dls.one_batch()
len(b), b[0].shape, b[1].shape

In [None]:
# Define training model --- Used model: UNET with Resnet backbone
learn = unet_learner(dls, 
                     resnet101,  
                     model_dir=M_DIR, 
                     self_attention=True, 
                     act_cls=ACTIVATION_F, 
                     opt_func=OPTIMIZER
                    )
learn.path = Path(M_DIR)

In [None]:
# Use fastai method for finding learning rate
learn.lr_find(suggest_funcs=(minimum, steep, valley, slide))

In [None]:
# Fit method: Fit one cycle
learn.fit_one_cycle(1, slice(1e-06,1e-03), pct_start=0.9)

In [None]:
# Plot sample results
learn.show_results()

In [None]:
# Unfreeze weigths for second training round
learn.unfreeze()

In [None]:
# Use fastai method for finding learning rate 2
learn.lr_find()

In [None]:
# Fit method: Fit one cycle 2
learn.fit_one_cycle(2, slice(1e-5,1e-4), pct_start=0.8)

In [None]:
# Plot sample results 2
learn.show_results()

In [None]:
# Load test image
def get_img(img_path): return tifffile.imread(img_path)

test_img = get_img('../input/hubmap-organ-segmentation/test_images/10078.tiff')

In [None]:
# Pred mask
test_pred = learn.predict(test_img)
mask_pred = list(zip(*test_pred))[0][2]

In [None]:
# Load test dataframe for information purposes
testdf = pd.read_csv('../input/hubmap-organ-segmentation/test.csv')
testdf

In [None]:
# Resize tensor to original shape and make predicted mask area more visible by thresold
def resize_tensor(tensor, size=None, dtype=np.uint8): return cv2.resize(tensor, [size, size], interpolation=cv2.INTER_CUBIC).astype(dtype)

mask_pred_resized = resize_tensor(mask_pred.numpy(), size=TEST_IMG_SIZE, dtype=np.float32)
mask_binary = (mask_pred_resized > THRESHOLD).astype(np.int8)

In [None]:
# Plot original test image and predicted binary mask
f, ax = plt.subplots(nrows=1, ncols=2, figsize=(15,15))
# --------------------------------------------------------------
ax[0].imshow(test_img)
ax[0].set_title(f'Original', size=16)

ax[1].imshow(mask_binary)
ax[1].set_title(f'Pred_Mask', size=16)
# --------------------------------------------------------------

plt.show()

In [None]:
# Func for converting binary mask image to run length encoding (rle), which is the target variable in this competition
def mask2rle(mask, orig_dim=TEST_IMG_SIZE):
    #Rescale image to original size
    size = int(len(mask.flatten())**.5)
    n = Image.fromarray(mask.reshape((size, size))*255.0)
    n = n.resize((orig_dim, orig_dim))
    n = np.array(n).astype(np.float32)
    #Get pixels to flatten
    pixels = n.T.flatten()
    #Round the pixels using the half of the range of pixel value
    pixels = (pixels-min(pixels) > ((max(pixels)-min(pixels))/2)).astype(int)
    pixels = np.nan_to_num(pixels) #incase of zero-div-error
    
    pixels = np.concatenate([[0], pixels, [0]])
    runs = np.where(pixels[1:] != pixels[:-1])[0]
    runs[1::2] -= runs[::2]
    
    return ' '.join(str(x) for x in runs)

In [None]:
# Converting binary mask
rle = mask2rle(mask_binary)
rle

In [None]:
# Iterate over test images (in this case only one) and build submission dataframe
import gc
df_sample = pd.read_csv('../input/hubmap-organ-segmentation/sample_submission.csv')
TEST_IMG_DIR = '../input/hubmap-organ-segmentation/test_images/'

names,preds = [],[]
for idx,row in df_sample.iterrows():
    idx = str(row['id'])
    ds = get_img(os.path.join(TEST_IMG_DIR,idx+'.tiff'))
    mp = learn.predict(ds)
    mp = list(zip(*mp))[0][2]
    mp_resized = resize_tensor(mp.numpy(), size=2023, dtype=np.float32)
    mp_binary = (mp_resized > THRESHOLD).astype(np.int8)
    rle = mask2rle(mp_binary)
    names.append(idx)
    preds.append(rle)
    del ds
    gc.collect()

In [None]:
# Make submission dataframe
df = pd.DataFrame({'id':names,'rle':preds})
df.to_csv('submission.csv',index=False)

In [None]:
# Final check
df