![Sartorious](https://storage.googleapis.com/kaggle-competitions/kaggle/30201/logos/thumb76_76.png?t=2021-09-03-15-27-57)

If you find this notebook useful, do give me an upvote, it helps to keep up my motivation. This notebook will be updated frequently so keep checking for furthur developments.

Checkout my blog for detailed explanation of this notebook - [Detect single neuronal cells in microscopy images](https://medium.com/@samyukthamantri/detect-single-neuronal-cells-in-microscopy-images-%EF%B8%8F-48b20f072a48)

# Sartorius - Cell Instance Segmentation 👩🏻‍🔬

### Detect single neuronal cells in microscopy images

Neurological disorders, including neurodegenerative diseases such as Alzheimer's and brain tumors, are a leading cause of death and disability across the globe. However, it is hard to quantify how well these deadly disorders respond to treatment. One accepted method is to review neuronal cells via light microscopy, which is both accessible and non-invasive. Unfortunately, segmenting individual neuronal cells in microscopic images can be challenging and time-intensive. Accurate instance segmentation of these cells—with the help of computer vision—could lead to new and effective drug discoveries to treat the millions of people with these disorders.


Current solutions have limited accuracy for neuronal cells in particular. In internal studies to develop cell instance segmentation models, the neuroblastoma cell line SH-SY5Y consistently exhibits the lowest precision scores out of eight different cancer cell types tested. This could be because neuronal cells have a very unique, irregular and concave morphology associated with them, making them challenging to segment with commonly used mask heads.

![Cell segmentation](https://www.marktechpost.com/wp-content/uploads/2021/10/3d-medical-background-with-virus-cells-dna-strand-scaled.jpg)

Sartorius is a partner of the life science research and the biopharmaceutical industry. They empower scientists and engineers to simplify and accelerate progress in life science and bioprocessing, enabling the development of new and better therapies and more affordable medicine. They're a magnet and dynamic platform for pioneers and leading experts in the field. They bring creative minds together for a common goal: technological breakthroughs that lead to better health for more people.

In this competition, you’ll detect and delineate distinct objects of interest in biological images depicting neuronal cell types commonly used in the study of neurological disorders. More specifically, you'll use phase contrast microscopy images to train and test your model for instance segmentation of neuronal cells. Successful models will do this with a high level of accuracy.

If successful, you'll help further research in neurobiology thanks to the collection of robust quantitative data. Researchers may be able to use this to more easily measure the effects of disease and treatment conditions on neuronal cells. As a result, new drugs could be discovered to treat the millions of people with these leading causes of death and disability.



# What is Neural Cell Segmentation ?

Neural cell instance segmentation, which aims at joint detection and segmentation of every neural cell in a microscopic image, is essential to many neuroscience applications. The challenge of this task involves cell adhesion, cell distortion, unclear cell contours, low-contrast cell protrusion structures, and background impurities. Consequently, current instance segmentation methods generally fall short of precision. 

Accurate cell counting provides key quantitative feedback and plays key roles in biological research as well as in industrial and biomedical applications. Unfortunately, the commonly used manual counting method is time-intensive, poorly standardized, and non-reproducible.

![Cell Detection and Segmentation](https://msquareprojects.in/wp-content/uploads/2021/06/screenshot.gif) Fig : Cell Detection and Segmentation

The straightforward approach for determining cell counts is to develop an object detection and segmentation model, which incorporates key determining characteristic combinations of morphological features such as cell size, color value, and cell spacing. Object detection and segmentation have been an import research focus in the computer vision field and many popular algorithms have been proposed in recent years such as Fast R-CNN, YOLO, and U-Net. While biological image analysis has entered the era of artificial intelligence with the utilization of approaches such as computer vision methods and deep convolutional networks because segmenting small objects is a notoriously difficult problem.

#MedicalImageAnalysis

# About the dataset

**Data Description**

In this competition we are segmenting neuronal cells in images. The training annotations are provided as run length encoded masks, and the images are in PNG format. The number of images is small, but the number of annotated objects is quite high. The hidden test set is roughly 240 images.

Note: while predictions are not allowed to overlap, the training labels are provided in full (with overlapping portions included). This is to ensure that models are provided the full data for each object. Removing overlap in predictions is a task for the competitor.

**Files**

    train.csv - IDs and masks for all training objects. None of this metadata is provided for the test set.

    id - unique identifier for object

    annotation - run length encoded pixels for the identified neuronal cell

    width - source image width

    height - source image height

    cell_type - the cell line

    plate_time - time plate was created

    sample_date - date sample was created

    sample_id - sample identifier

    elapsed_timedelta - time since first image taken of sample

    sample_submission.csv - a sample submission file in the correct format

    train - train images in PNG format

    test - test images in PNG format. Only a few test set images are available for download; the remainder can only be accessed by your notebooks when you submit.

    train_semi_supervised - unlabeled images offered in case you want to use additional data for a semi-supervised approach.

    LIVECell_dataset_2021 - A mirror of the data from the LIVECell dataset. LIVECell is the predecessor dataset to this competition. You will find extra data for the SH-SHY5Y cell line, plus several other cell lines not covered in the competition dataset that may be of interest for transfer learning.

<img src = "https://c.tenor.com/HAtJAqCgx5AAAAAC/unlimited-data-fun.gif">

# EDA 💟 💟 💟:

<span style='color:purple'> Let's load the data and look at some images in the dataset </span>

In [None]:
import pandas as pd 

train_data = pd.read_csv('../input/sartorius-cell-instance-segmentation/train.csv')

samplesub = pd.read_csv('../input/sartorius-cell-instance-segmentation/sample_submission.csv')
print('Training set shape',train_data.shape)
train_data.head(5)

### Train dataset information

In [None]:
# Train dataset information

train_data.info()

### Uniqueness of  train data

In [None]:
train_data.nunique()

###  Different cell types

In [None]:
# Different cell types

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1,1)
train_data.cell_type.value_counts().plot.bar()
ax.set_ylabel('Number of istances')
ax.set_xlabel('Cell types',rotation=0)
fig.tight_layout()
plt.show()

The shsy5y is the highest cell type followed by cort cells. Astro is the least occuring cell type.

### Images in each directory

In [None]:
# A class to access parameters and paths for the images

import tqdm
import os
from termcolor import colored

class config:
    dir = "../input/sartorius-cell-instance-segmentation"
    train_path = dir + '/train'
    test_path = dir +'/test'

# A py method to join image dir path and fnames as a list
def getImagePath(path):
    image_names = []
    for dirn, _, fnames in os.walk(path):
        for fname in fnames:
            fullpath = os.path.join(dirn, fname)
            #print(fullpath)
            image_names.append(fullpath)
    return image_names

train_im_path = getImagePath(config.train_path)
test_im_path = getImagePath(config.test_path)

print('Files in test path \n',test_im_path[0:5],'\n')   

print('No. of images in train and test dir\n',
      'Train images :',colored(len(train_im_path),'blue'),'\n'
     'Test images:',colored(len(test_im_path),'green'))

### Distribution plots

In [None]:
import plotly.express as px
def dist_plot(x):
    fig = px.histogram(train_data, x = x)
    fig.show()

dist_plot('cell_type')

In [None]:
dist_plot('plate_time')

Highest Plate time is 11 h

### Image gallery 🏞🎶

<img src = "https://c.tenor.com/q4-9f1AqtrYAAAAC/photos-no.gif">

We will be displaying  multiple images in a file directory using computer vision libraries

In [None]:
import cv2
def im_show(im_paths, r, c):
    fig, ax = plt.subplots(nrows = r, ncols = c, 
                          figsize = (16,8))
    for p, im_path in enumerate(im_paths):
        im = cv2.imread(im_path)
        im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
        try:
            
            ax.ravel()[p].imshow(im)
            ax.ravel()[p].set_axis_off()
        except:
            continue;
    plt.tight_layout()
    plt.show()
    


In [None]:
im_show(train_im_path[0:60],5,5)

In [None]:
im_show(test_im_path[0:60],5,5)

### Mask plots

* Represent some specific values in a plot
* Mask ( Hide / Display) specific features present in colormap, dataset, data type { array, list }

Ex: 
1. Medical image processing: 

![NVIDIA Medical image procesing](https://developer-blogs.nvidia.com/wp-content/uploads/2022/01/brain_scan.png)

Ref: [Accelrating medical image processing with NVIDIA DALI](https://developer.nvidia.com/blog/accelerating-medical-image-processing-with-dali/)

2. Earth Data

    -- Before Mask:
    ![Remove clouds and shadow](https://earthpy.readthedocs.io/en/latest/_images/sphx_glr_plot_stack_masks_004.png)    
    -- After Mask:
    ![Image array with clouds and shadows masked](https://earthpy.readthedocs.io/en/latest/_images/sphx_glr_plot_stack_masks_005.png)
Ref: [Mask and Plot Remote Sensing Data with EarthPy](https://earthpy.readthedocs.io/en/latest/gallery_vignettes/plot_stack_masks.html#mask-and-plot-remote-sensing-data-with-earthpy)

3. Face mask

![Funny ](https://i.ytimg.com/vi/gRAfSF3tBvg/mqdefault.jpg)

<img src = "https://c.tenor.com/YM96puKgUv8AAAAd/face-mask-mask-on.gif">

> It's time to look at the images and the masks now. The figure below shows randomly selected images corresponding to each of the three distinct cell types. Each cell type has its own unique morphological properties.

* astro instances are the biggest in shape. They cover a lot of space in the masks.
* cort instances are smaller than the other cell types in general and they are in circle-like shapes. They don't cover much space in the masks.
* shsy5y instances are slightly bigger, elongated and more abundant than the cort instances. They cover more space than the cort cells.

In [None]:
import numpy as np
def make_mask(mask_files, image_shape=(520, 704), color=False):
    mask = np.zeros(image_shape).ravel()
    for i, mask_file in enumerate(mask_files):
        couples = np.array(mask_file.split()).reshape(-1, 2).astype(int)
        couples[:, 1] = couples[:, 0] + couples[:, 1]
        for couple in couples:
            if color:
                mask[couple[0]: couple[1]] = i
            else:
                mask[couple[0]: couple[1]] = 1
    mask = mask.reshape(520, 704)
    return mask

def plot_image(image_id='0030fd0e6378'):
    fig, ax = plt.subplots(1, 2, figsize=(14,5))
    cell_type = df_train.loc[df_train['id'] == image_id, 'cell_type'][0:1].values
    
    file_name = os.path.join(
        '../input/sartorius-cell-instance-segmentation',
        'train', image_id + '.png')
    image = plt.imread(file_name)
    mask_files = df_train.loc[df_train['id'] == image_id, 'annotation']
    mask = make_mask(mask_files)

    ax[0].imshow(
        image,
        cmap = plt.get_cmap('winter'), 
        origin = 'upper',
        vmax = np.quantile(image, 0.99),
        vmin = np.quantile(image, 0.05)
    )
    ax[0].set_title(f'Source [{image_id}]')
    ax[0].axis('off')
    
    ax[1].imshow(
        image,
        cmap = plt.get_cmap('winter'), 
        origin = 'upper',
        vmax = 255,
        vmin = 0)
    ax[1].imshow(mask, alpha=1, cmap=plt.get_cmap('seismic'))
    ax[1].set_title(f'Source [{image_id}] + Mask {cell_type}')
    ax[1].axis('off')
    plt.show()

df_train = train_data

select_image_ids = []
select_image_ids.append(df_train.loc[df_train['cell_type'] == 'astro', 'id'].sample(1).to_list()[0])
select_image_ids.append(df_train.loc[df_train['cell_type'] == 'cort', 'id'].sample(1).to_list()[0])
select_image_ids.append(df_train.loc[df_train['cell_type'] == 'shsy5y', 'id'].sample(1).to_list()[0])

for image_id in select_image_ids:
    plot_image(image_id)

# Image segmentation with CNN

In [None]:
im_ht, im_width, im_channels = 520, 704, 1

train_ids = train_data['id'].unique().tolist()
test_ids = samplesub['id'].unique().tolist()


x_train = np.zeros((train_data['id'].nunique(),im_ht, im_width, im_channels), dtype = np.uint8)
y_train = np.zeros((train_data['id'].nunique(),im_ht, im_width, im_channels), dtype = np.uint8)
x_test = np.zeros((samplesub['id'].nunique(),im_ht, im_width, im_channels), dtype = np.uint8)

In [None]:
from tqdm import tqdm

TRAIN_PATH = '../input/sartorius-cell-instance-segmentation/train/'
# https://www.kaggle.com/c/sartorius-cell-instance-segmentation/discussion/291627
def rle_decode(mask_rle, shape=(520, 704, 1)):
    s = mask_rle.split()
    starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
    starts -= 1
    ends = starts + lengths
    img = np.zeros(shape[0]*shape[1], dtype=np.uint8)
    for lo, hi in zip(starts, ends):
        img[lo:hi] = 1
    return img.reshape(shape)  # Needed to align to RLE direction

def rle_encode(img):
    pixels = img.flatten()
    pixels = np.concatenate([[0], pixels, [0]])
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
    runs[1::2] -= runs[::2]
    return ' '.join(str(x) for x in runs)

for n, id_ in tqdm(enumerate(train_ids), total=len(train_ids)):
    path = TRAIN_PATH + id_
    img = cv2.imread(path + '.png')[:,:]
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY).astype(np.float32) -125
    img = np.expand_dims(img, axis = 2)
    x_train[n] = img
    
    labels = train_data[train_data["id"]
                        == id_]["annotation"].tolist()
    mask = np.zeros((520, 704, 1))
    for label in labels:
        mask += rle_decode(label, shape=(520, 704, 1))
    mask = mask.clip(0, 1)

    y_train[n] = mask
print("Done")

In [None]:
# Get and resize test images
test_images_id = []
X_test = np.zeros((samplesub['id'].nunique(), im_ht, im_width, im_channels), dtype=np.uint8)
for n, id_ in tqdm(enumerate(test_ids), total=len(test_ids)):
    path = TRAIN_PATH.replace('train', 'test') + id_
    img = cv2.imread(path + '.png')[:,:]
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY).astype(np.float32) -125
    img = np.expand_dims(img, axis = 2)
    X_test[n] = img
    test_images_id.append(id_)
print("Done")

In [None]:
import keras
from keras.models import Model,load_model
from keras import layers
from tensorflow.keras.losses import BinaryCrossentropy

model = keras.Sequential([
    keras.layers.Conv2D(filters = 20,kernel_size=5, strides = 1,
                       padding = 'same',activation ='relu',
                       input_shape = [im_ht, im_width, im_channels]),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(filters=10, kernel_size=5, strides = 1,
                       padding = 'same',activation ='relu'),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(filters=1, kernel_size=1),
])

model.compile(optimizer = 'adam', loss = BinaryCrossentropy(),metrics = ['accuracy'])
model.summary()

In [None]:
from tensorflow.keras.utils import plot_model

plot_model(model,show_shapes = True)

In [None]:
from keras.callbacks import EarlyStopping
# Save the best models and its weights at every step 

model_output = os.path.join('./','model.h5')
model_checkpoint = keras.callbacks.ModelCheckpoint(
    model_output,save_best_only = True, save_weights_only = True)

# Reduces the learning rate for no model improvement
lr_reduce = keras.callbacks.ReduceLROnPlateau( monitor='val_loss', factor=0.1, patience=10, verbose=0,
    mode='auto', min_delta=0.0001, cooldown=0, min_lr=0)

es =  EarlyStopping(patience = 10, verbose = 1)

hist = model.fit(x_train, y_train, batch_size = 65, 
                 validation_steps=0.5,
                 
                 epochs = 5,callbacks = [EarlyStopping(), model_checkpoint, lr_reduce])


<img src = "https://c.tenor.com/l2VFYv-iqUYAAAAC/kittycass-peachcat.gif">

In [None]:
print(x_train.shape, y_train.shape)
pred = model.predict(x_train)
print(pred.shape)
train_preds = (pred > 0.5).astype(np.uint8)
plt.imshow(train_preds[0],cmap = 'gray')


In [None]:
# Plot model loss
loss = hist.history['loss']

plt.figure()
plt.plot(hist.epoch, loss, 'r', label='Training loss')

plt.title('model loss')
plt.xlabel('epochs')
plt.ylabel('Loss')
plt.legend(['loss'],loc = 'right')
plt.show()

In [None]:
# Ref: https://www.kaggle.com/carlosgut/sartorius-simple-cnn-keras
preds_test = model.predict(X_test, verbose=1)
preds_test_t = (preds_test >= 0.5).astype(np.uint8)
# Test samples
from random import randint
ix = randint(0, len(preds_test_t)-1)
print(ix)
plt.imshow(X_test[ix])
plt.show()
plt.imshow(np.squeeze(preds_test_t[ix]))
plt.show()

<img src = "https://c.tenor.com/ad74hv3dCVkAAAAC/its-just-looks-so-good-so-great.gif">

If you find this notebook useful, do give me an upvote, it helps to keep up my motivation. This notebook will be updated frequently so keep checking for furthur developments.
​
Checkout my blog for detailed explanation of this notebook - [Detect single neuronal cells in microscopy images](https://medium.com/@samyukthamantri/detect-single-neuronal-cells-in-microscopy-images-%EF%B8%8F-48b20f072a48)

<img src = "https://c.tenor.com/cA8SFvHobcQAAAAC/thank-you-kind-sir-ty-kind-sir.gif">

<iframe width="1424" height="594" src="https://www.youtube.com/embed/pFsPZe_vpO0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

<iframe width="1424" height="594" src="https://www.youtube.com/embed/pFsPZe_vpO0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>