![HuBMAP](https://pbs.twimg.com/media/Eg15rqyWAAMU2cJ.png "HuBMAP")


# HuBMAP: Hacking the Kidney
##### **Identify glomeruli in human kidney tissue images**

The goal of the Human BioMolecular Atlas Program (HuBMAP) is to develop an open and global platform to map healthy cells in the human body.  In humans, the proper functioning of organs and tissues is dependent on the interaction, spatial organization, and specialization of all our cells.  Scientists estimate there are 37 trillion cells in an adult human body, so determining the function and relationship among these cells is a monumental undertaking.  Using the latest molecular and cellular biology technologies, HuBMAP researchers are studying the connections that cells have with each other throughout the body. The resulting HuBMAP Atlas will be openly available to accelerate understanding of the relationships between cell and tissue organization and function and human health.


This competition, “Hacking the Kidney," starts by mapping the human kidney at single cell resolution. The goal is to detect glomeruli functional tissue units (FTUs) across different tissue preparation pipelines. 

# Table of Contents
1. [Introduction - The Kidney 📚](#Introduction) 
2. [The Data 💾](#Data)




## Introduction - The Kidney 📚




Kidneys are a pair of red-brown organs attached to the back of the abdominal cavity, surrounded by a thick protective layer of fat and connective tissue. On a human, if you place your hands on your hips, your thumbs are in the approximate position of your kidneys.

The kidneys are bean-shaped organs. An adult kidney is about 10 cm (4 in) long, 6 cm (2 to 3 in) wide and 3 cm (1 to 2 in) thick.

### **What the kidney do ?**


The main function of the kidneys is to filter water, impurities and wastes from the blood.

The blood from the body enters the kidneys through the renal arteries. Once in the kidney, the blood passes through the nephrons, where waste products and extra water are removed. The clean blood is returned to the body through the renal veins.

The waste products filtered from the blood are then concentrated into urine. The urine is collected in the renal pelvis. The ureters move the urine to the bladder, where it is stored. Urine is passed out of the bladder and the body through the urethra.

![kidney](https://www.cancer.ca/~/media/CCE/1730/49c7ddb4361bde04bfaca0e8a1d09ca9.png)

**Gerota’s fascia** is a thin, fibrous tissue on the outside of the kidney. Below Gerota’s fascia is a layer of fat.

The **renal capsule** is a layer of fibrous tissue that surrounds the body of the kidney inside the layer of fat.

The **cortex** is the tissue just under the renal capsule.

The **medulla** is the inner part of the kidney.

The **renal pelvis** is a hollow area in the centre of each kidney where urine (pee) collects.

The **renal artery** brings blood to the kidney.

The **renal vein** takes blood back to the body after it has passed through the kidney.

The **renal hilum** is the area where the renal artery, renal vein and ureter enter the kidney.

### **The nephrons**

In the cortex, we can find nephrons — the functional unit of the kidney. 

The **nephrons** are the millions of small tubes inside each kidney. Each nephron has 2 parts. **Tubules** are tiny tubes that collect the waste materials and chemicals from the blood moving through the kidney. The ** corpuscles** contain a clump of tiny blood vessels called **glomeruli** that filter the blood as it moves through the kidney. The waste products are passed through the tubules to the collecting ducts, which drain into the renal pelvis.

![nephrons](https://s3-us-west-2.amazonaws.com/courses-images/wp-content/uploads/sites/1223/2017/02/09002635/Figure_41_06_03-1024x508.png)


### **Glomeruli**


Finally, in the **corpuscles**, as we said, contain a clump of tiny blood vessels called glomeruli.

The glomerulus (pular glomeruli) is a loop of capillaries twisted into a ball shape, surrounded by the Bowman’s capsule. This is where ultrafiltration of blood occurs, the first step in urine production. The filtration barrier consists of 3 components:

- Endothelial cells of glomerular capillaries
- Glomerular basement membrane
- Epithelial cells of Bowman’s Capsule (podocytes)
    
![teachmephysiology.com - glomerulus](https://teachmephysiology.com/wp-content/uploads/2017/03/glomerulus.jpg)

Here we are, the goal of this competition is to detect these glomeruli functional tissue units (FTUs).

Note : Cells are the smallest functional units of life and tissues are groups of similar cells that have a common function. So a Functional Tissue Unit is a group a cells with common goal as we can see in the illustartion below.

![source : MOOC CCF Ontology, 3D Reference Objects, and User Interfaces – Creating an Atlas of the Human Body ](https://i.ibb.co/mR1cQ3Z/ccf-ontology-3d-reference-object.png)

# Data 💾

The HuBMAP data used in this hackathon includes 11 fresh frozen and 9 Formalin Fixed Paraffin Embedded (FFPE) PAS kidney images. 

FFPE is a form of preservation and preparation for biopsy specimens that aids in examination, experimental research, and diagnostic/drug development. A tissue sample is first preserved by fixing it in formaldehyde, also known as formalin, to preserve the proteins and vital structures within the tissue. Next, it is embedded in a paraffin wax block; this makes it easier to cut slices of required sizes to mount on a microscopic slide for examination.

The Periodic acid–Schiff (PAS), is just a staining method. So that how we get the color (purple or magenta) in our images : chemical reaction.


Glomeruli FTU annotations exist for all 20 tissue samples; There are over 600,000 glomeruli in each human kidney (Nyengaard, 1992). Normal glomeruli typically range from 100-350μm in diameter with a roughly spherical shape (Kannan, 2019).


The dataset is comprised of very large (>500MB - 5GB) TIFF files. The training set has 8, and the public test set has 5. The private test set is larger than the public test set.

The training set includes annotations in both RLE-encoded and unencoded (JSON) forms. The annotations denote segmentations of glomeruli.

Both the training and public test sets also include anatomical structure segmentations. They are intended to help us identify the various parts of the tissue.

In [None]:
import re
import os
import json
import rasterio
import pandas as pd
import matplotlib.pyplot as plt 

In [None]:
train_path = "../input/hubmap-kidney-segmentation/train"
test_path = "../input/hubmap-kidney-segmentation/test"
information_path = "../input/hubmap-kidney-segmentation/HuBMAP-20-dataset_information.csv"
train_csv = "../input/hubmap-kidney-segmentation/train.csv"


anatomical_train = [f"{train_path}/{af}" for af in os.listdir(train_path) if "anatomical-structure" in af]
json_train = [f"{train_path}/{jf}" for jf in os.listdir(train_path) if re.search(r'^[a-z0-9]{9}.json$', jf)]
tiff_train = [f"{train_path}/{t}" for t in os.listdir(train_path) if re.search(r'.tiff$', t)]

anatomical_test = [f"{test_path}/{af}" for af in os.listdir(test_path) if "anatomical-structure" in af]
json_test = [f"{test_path}/{jf}" for jf in os.listdir(test_path) if re.search(r'^[a-z0-9]{9}.json$', jf)]
tiff_test = [f"{test_path}/{t}" for t in os.listdir(test_path) if re.search(r'.tiff$', t)]

In [None]:
print(f'There are {len(tiff_test)} samples in the test dataset and {len(tiff_train)} samples in the train dataset')

### **The JSON and RLE data**

The JSON files are structured as follows, with each feature having:

- A type (Feature) and object type id (PathAnnotationObject). Note that these fields are the same between all files and do not offer signal.
- A geometry containing a Polygon with coordinates for the feature's enclosing volume
- Additional properties, including the name and color of the feature in the image.
- The IsLocked field is the same across file types (locked for glomerulus, unlocked for anatomical structure) and is not signal-bearing.


In [None]:
with open(json_train[0], 'r') as f:
    json_0 = json.load(f)
        
print(json_0[0].keys())

The test set includes annotations in ONLY RLE-encoded and not unencoded (JSON) forms.

For understand the Run-length enconding (RLE) let us take an example :  consider a screen containing plain black text on a solid white background. There will be many long runs of white pixels in the blank space, and many short runs of black pixels within the text. A hypothetical scan line, with B representing a black pixel and W representing white, might read as follows:

    WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW 

With a run-length encoding (RLE) data compression algorithm applied to the above hypothetical scan line, it can be rendered as follows:

    12W1B12W3B24W1B14W 

This can be interpreted as a sequence of twelve Ws, one B, twelve Ws, three Bs, etc., 

We also have a:

- train.csv contains the unique IDs for each image, as well as an RLE-encoded representation of the mask for the objects in the image. See the evaluation tab for details of the RLE encoding scheme.

- HuBMAP-20-dataset_information.csv contains additional information (including anonymized patient data) about each image.

In [None]:
info_file = pd.read_csv(information_path)
train_csv_file = pd.read_csv(train_csv)

In [None]:
info_file.head()

In [None]:
train_csv_file.head()

Sources :

- [The kidneys - Canadian Cancer Society](https://www.cancer.ca/en/cancer-information/cancer-type/kidney/kidney-cancer/the-kidneys/?region=qc)
- [HuBMAP- VHMOOC](https://iu.instructure.com/courses/1888216/pages/4-ccf-ontology-3d-reference-objects-and-user-interfaces-creating-an-atlas-of-the-human-body)
- [Structure of a kindey](https://courses.lumenlearning.com/wm-biology2/chapter/kidneys/)
- [Glomerulus](https://teachmephysiology.com/urinary-system/nephron/glomerulus/)
- [FFPE](https://www.biochain.com/general/what-is-ffpe-tissue/)
- [RLE](https://fr.wikipedia.org/wiki/Run-length_encoding)