## Contents

1. [Introduction](#1)
1. [About Dataset and data](#2)
1. [Let see how kidney images looks like](#3)
1. [Modeling](#4)

 <a id="1"></a> <br>
# <div class="alert alert-block alert-info"> Introduction </div>


###  Just as the Human Genome Project mapped the entirety of human DNA, the Human BioMolecular Atlas Program (HuBMAP) is a major endeavor. Sponsored by the National Institutes of Health (NIH), HuBMAP is working to catalyze the development of a framework for mapping the human body at a level of glomeruli functional tissue units for the first time in history. 

## Hoping to become one of the world’s largest collaborative biological projects, HuBMAP aims to be an open map of the human body at the cellular level.


# 🙌 😊 👍 Upvote if you find this Kernal useful 

 <a id="2"></a> <br>
# <div class="alert alert-block alert-info"> About Dataset and Data </div>


# About Dataset


### Size of Data

The data is huge (24.5 GB) The HuBMAP data used in this hackathon includes 11 fresh frozen and 9 Formalin Fixed Paraffin Embedded (FFPE) PAS kidney images. Glomeruli FTU annotations exist for all 20 tissue samples.

The dataset is comprised of very large (>500MB - 5GB) TIFF files. The training set has 8, and the public test set has 5. The private test set is larger than the public test set.

### Train test Split 

The training set includes annotations in both RLE-encoded and unencoded (JSON) forms. The annotations denote segmentations of glomeruli.

Both the training and public test sets also include anatomical structure segmentations. They are intended to help you identify the various parts of the tissue.


The training set includes annotations in both RLE-encoded and unencoded (JSON) forms. The annotations denote segmentations of glomeruli.


# About Data


File structure
The JSON files are structured as follows, with each feature having:

A type (Feature) and object type id (PathAnnotationObject). Note that these fields are the same between all files and do not offer signal.
A geometry containing a Polygon with coordinates for the feature's enclosing volume
Additional properties, including the name and color of the feature in the image.

The IsLocked field is the same across file types (locked for glomerulus, unlocked for anatomical structure) and is not signal-bearing.
Note that the objects themselves do NOT have unique IDs. The expected prediction for a given image is an RLE-encoded mask containing ALL objects in the image. The mask, as mentioned in the Evaluation page, should be binary when encoded - with 0 indicating the lack of a masked pixel, and 1 indicating a masked pixel.


Both the training and public test sets also include anatomical structure segmentations. They are intended to help you identify the various parts of the tissue.

We are provided with following files:

For each of the 11 training images we have been provided with a JSON file. Each JSON file has:

A type (Feature) and object type id (PathAnnotationObject). Note that these fields are the same between all files and do not offer signal.
A geometry containing a Polygon with coordinates for the feature's enclosing volume

Additional properties, including the name and color of the feature in the image.
The IsLocked field is the same across file types (locked for glomerulus, unlocked for anatomical structure) and is not signal-bearing.
train.csv contains the unique IDs for each image, as well as an RLE-encoded representation of the mask for the objects in the image.
See the evaluation tab for details of the RLE encoding scheme

HuBMAP-20-dataset_information.csv contains additional information (including anonymized patient data) about each image.


train.csv  :- 
   It contains the unique IDs for each image, as well as an RLE-encoded representation of the mask for the objects in the image. See the evaluation tab for details of the RLE encoding scheme.

HuBMAP-20-dataset_information.csv :- 
   It contains additional information (including anonymized patient data) about each image.

# What we are prediciting?

Develop segmentation algorithms that identify glomeruli in the PAS stained microscopy data. Detect functional tissue units (FTUs) across different tissue preparation pipelines

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tifffile as tiff 
import seaborn as sns

# segmentation
from keras_segmentation.models.unet import vgg_unet
from IPython.display import clear_output

for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
## Thanks to divamgupta for Image segmentation in Keras module 

In [None]:
! pip install git+https://github.com/divamgupta/image-segmentation-keras

In [None]:
!ls ../input/hubmap-kidney-segmentation/

In [None]:
!ls ../input/hubmap-kidney-segmentation/test

In [None]:
train = pd.read_csv("../input/hubmap-kidney-segmentation/train.csv")
train.info()

In [None]:
train.head()

 <a id="3"></a> <br>
# <div class="alert alert-block alert-info"> Let see how kidney images looks like </div>


In [None]:
image = tiff.imread('../input/hubmap-kidney-segmentation/train/' + train.iloc[1,0] + ".tiff")
print("This image's id:", train.iloc[1,0])
image.shape

In [None]:
plt.figure(figsize=(15, 15))
plt.imshow(image)

# Decoding the mask in the image

In [None]:
# Thanks to - https://www.kaggle.com/paulorzp/rle-functions-run-lenght-encode-decode

def mask2rle(img):
    '''
    img: numpy array, 1 - mask, 0 - background
    Returns run length as string formated
    '''
    pixels= img.T.flatten()
    pixels = np.concatenate([[0], pixels, [0]])
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
    runs[1::2] -= runs[::2]
    return ' '.join(str(x) for x in runs)
 
def rle2mask(mask_rle, shape=(1600,256)):
    '''
    mask_rle: run-length as string formated (start length)
    shape: (width,height) of array to return 
    Returns numpy array, 1 - mask, 0 - background

    '''
    s = mask_rle.split()
    starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
    starts -= 1
    ends = starts + lengths
    img = np.zeros(shape[0]*shape[1], dtype=np.uint8)
    for lo, hi in zip(starts, ends):
        img[lo:hi] = 1
    return img.reshape(shape).T

In [None]:
mask = rle2mask(train.iloc[1, 1], (image.shape[1], image.shape[0]))
mask.shape

In [None]:
plt.figure(figsize=(10,10))
plt.imshow(image)
plt.imshow(mask, cmap='coolwarm', alpha=0.5)

 <a id="4"></a> <br>
# <div class="alert alert-block alert-info">Modeling </div>

In [None]:
data_path = '/kaggle/input/'
os.listdir(data_path)

In [None]:
path_train = os.path.join(data_path, 'hubmap-256x256/train')
path_masks = os.path.join(data_path, 'hubmap-256x256/masks')

path_test = os.path.join(data_path, 'hubmap-256x256-test-data')

print(f'No. of training images: {len(os.listdir(path_train))}')
print(f'No. of masks: {len(os.listdir(path_masks))}')
print()
print(f'No. of test images: {len(os.listdir(path_test))}')

In [None]:
for filename in os.listdir(path_train):
    if filename in os.listdir(path_masks):
        pass
    else:
        print('Filenames not same.')
else:
    print('All corresponding filenames are same.')

In [None]:
sample_filename = os.listdir(path_train)[120]

sample_image = plt.imread( os.path.join(path_train, sample_filename))

sample_mask = plt.imread(os.path.join(path_masks, sample_filename))

_, ax = plt.subplots(1, 2)
ax[0].imshow(sample_image)
ax[1].imshow(sample_mask)

In [None]:
model = vgg_unet(n_classes = 2, input_height = 256, input_width = 256)

In [None]:
model.train(train_images = path_train,train_annotations=path_masks,checkpoints_path='/kaggle/working/',epochs=6)

In [None]:
output = model.predict_segmentation(inp = os.path.join(path_train, sample_filename))

In [None]:
plt.imshow(output)

# Model Architecture 

In [None]:
from keras.models import Sequential
from keras.layers import Dense
from keras.utils.vis_utils import plot_model

plot_model(model, show_shapes=True, show_layer_names=True)

# ..... Preparing Submission dataset

# 🙌 😊 👍 Upvote if you find this Kernal useful 