**Report of Project 3: "Implementation and evaluation of Otsu Thresholding"**

*presented by Elizaveta Chernova, Veronika Schuler, Laura Wächter, Hannah Winter*

**Abstract**

Otsu Thresholding is a valuable method for detecting the ideal threshold of an image. It is therefore used frequently in
image segmentation for biological and medical purposes. Here we are going to use the implementation of this algorithm
for segmentation of cell nuclei from the datasets N2DH-GOWT1 and N2DL-HeLa cells of the cell tracking challenge and the
NIH3T3 images of Coelho and colleagues in their challenge of nuclear segmentation in microscope cell images.
For pre-processing the images, we used histogram stretching for images with low resolution, a gaussian filter,
and a median filter as well as two-level Otsu thresholding for excluding reflections in some images. The implemented
Otsu algorithm used on the pre-processed images was then evaluated with the Dice score, the median surface distance
function and the Hausdorff metric. We find that...

**Table of contents**

...

**Introduction**

In computer vision, image segmentation is applied to a variety of challenges. These challenges include detecting
cell sizes and the number of cells in high-throughput cytometry or for cell tracking as well as segmentation of
brain images.


Challenges that need to be tackled in order to find the correct threshold for image segmentation are low contrast
images, reflections and random noise that occur in the images. A starting point would be the preprocessing of the
images before using the Otsu algorithm.


The performance of the algorithm can be evaluated using the dice score as well as the median surface distance
function and the Hausdorff metric, that compare the segmented image to the hand-segmented ground truth image.
Furthermore, to give a purpose to the segmented images, we implemented a cell counting algorithm to determine the
number of cells in an image. In addition to the performance measurement of the algorithm, we visualise the results
with an overlay of the segmented image and the ground truth image.

**Description of the datasets**

N2DH-GOWT1 cells

The dataset N2DH-GOWT1 of the cell tracking challenge (Bártová et al., 2011) contains images of GFP-GOWT1 mouse
embryonic stem cells that have been derived with time-lapse confocal microscopy with a Leica TCS SP5 microscope.
The varying brightness of the cells makes it hard to distinguish all the cells from the background.

N2HL-HeLa cells

The dataset N2DL-HeLa of the cell tracking challenge (Neumann et al., 2010) contains images of human epithelial cells
of cervical cancer. Those images have been derived with an Olympus IX81 microscope used for live imaging of
fluorescently labelled chromosomes. The challenge in these images is the variety of brightness of the cells.

NIH3T3 cells

The dataset NIH3T3 (Coelho et al., 2009) contains images of several mouse embryonic fibroblast cells. These images
have also been derived with fluorescence microscopy images and the difficulty in segmenting these images mainly
lies in the bright light spots, probably from the used microscope, that makes it difficult for the algorithm to choose
a threshold between the brightness of the cells and the background and not between the brightness of light spots
and the cells.


**Import of modules**

In [2]:
from skimage.io import imread
from skimage.io import imshow
from matplotlib import pyplot as plt
from skimage.filters import threshold_otsu
from nuclei_segmentation import pathlist
from nuclei_segmentation import otsu
from nuclei_segmentation import evaluation
import pathlib as pl

**Loading images, Preprocessing, Otsu, Evaluation of usage on datasets, Discussion**

In order to receive nicely segmented images, we try different preprocessing methods for each of our datasets before
using the Otsu algorithm and then evaluating the result with the Dice Score, the median surface distance and the hausdorff
metric.

The possible combinations were using only one of the following: gauss filter, median filter or histogram stretching.
Other possibilities were using the gaussian filter or the median filter combined with histogram stretching.

Descriptions:
The
For the N2DH-GOWT1 images, the combination... worked best, as can be seen in the following image.
For the N2DL-HeLa images, the combination... worked best, as can be seen in the following image.
For the NIH3T3 images, the combination... worked best, as can be seen in the following image.


Preprocessing - Gauss filter:
The Gaussian filter is a filter mask that multiplies the values of the neighboring pixels with values according to the
gaussian distribution displayed below and takes the mean of those and the central pixel. This function is used for
smoothing out noise in the images.

\begin{align*}
G_σ(x,y) = \frac{1}{σ^2 2π} \\ e^\frac {-(x^2 + y^2)}{2σ^2} \\
\end{align*}

\begin{align*}
σ = standard \enspace deviation
\end{align*}

Preprocessing - Median filter:
The median filter is a useful tool to tackle salt-and-pepper noise in the images. It takes the median intensity of the
central pixel and the surrounding pixels in the neigborhood by weighting each pixel the same.

Preprocessing - Histogram stretching:
To solve the problem of low contrast images, we decided to use histogram stretching.
To do that, the minimum and the maximum intensity values of the original image are taken and remapped to 0 or 1
respectively. All the values in-between are recalculated based on a linear function.

\begin{align*}
P_{out} = (P_{in} - c) \frac{(b-a)}{(d-c)} \\ + \enspace a
\end{align*}

\begin{align*}
a \enspace = \enspace 0
\end{align*}

\begin{align*}
b \enspace = \enspace 255
\end{align*}

\begin{align*}
c \enspace = \enspace lowest \enspace pixel \enspace intensity \enspace in\enspace  the \enspace image
\end{align*}

\begin{align*}
d \enspace = \enspace highest \enspace pixel \enspace  intensity \enspace in \enspace the \enspace  image
\end{align*}



Otsu's Thresholding:
The next step after preprocessing our pictures is to implement our image segmentation algortihm with help of Otsu's
thresholding.
The Otsu algorithm is utilized to select the perfect threshold if the distribution is not bimodal by testing all possible
threshold values k.
The algorithm returns a single intensity threshold that separates all pixels into two classes – foreground and
background. For the algorithm, only the gray value histogram of the image is needed.

By using the mean intensity value (µ) as well as the probability of class occurrence of each class, the algorithm
computes the in-between-class-variance for all of the possible threshold values k by searching for the value k
that maximizes the between-class-variance. This value will be our optimal threshold.


Between-class-variance:

\begin{align*}
    σ_B = ω_0ω_1(µ_1 - µ_0)^2
\end{align*}

\begin{align*}
σ = standard deviation
\end{align*}

\begin{align*}
    ω_{0,1} = probability \enspace of \enspace class \enspace occurrence
\end{align*}


\begin{align*}
    µ_{0,1} = mean \enspace intensity \enspace values
\end{align*}

Finally, we assign the intensity value 0 to all pixels having a lower or equal value than the computed optimal threshold
value. Likewise, we assign the intensity value 255 to all pixels having a higher intensity value than the computed
threshold.

\begin{align*}
     g_{clip} (x,y)=\left\{\begin{array}{ll}0 & if & g(x,y) <= k  \\
     255 & if & g(x,y) > k\end{array}\right. .
\end{align*}

Now we have received our binary image.

Two-level Otsu:
Two-level Otsu basically works like the normal Otsu's algorithm, but instead of computing only one threshold,
it gives out two thresholds that separate the pixels into two classes. This method is very useful to separate light
reflections from the cells and don't count them as foreground.


\begin{align*}
σ_B^2 = ω_1(µ_1 - µ_T)^2 + ω_2(µ_1 - µ_T)^2
\end{align*}

\begin{align*}
σ = standard deviation
\end{align*}

\begin{align*}
    ω_{1,2} = probability \enspace of \enspace class \enspace occurrence
\end{align*}


\begin{align*}
    µ_{1,2} = mean \enspace intensity \enspace values
\end{align*}



Evaluation method: Dice Score (DSC)

In this method we have a look at the overlapping area of the prediction and of the ground truth, compared to the total
area of the prediction and the total are of the ground truth.
Put in a formula:


\begin{align*}
    DSC = \frac{2*|A ∩ B|}{|A| + |B|} \\
\end{align*}

\begin{align*}
A = GT
\end{align*}

\begin{align*}
B = segmented \enspace image
\end{align*}

The value of the dice score measures how successful our cell nuclei segmentation was.The best score you can obtain is
1 and the worst is 0.

Mean Surface Distance (MSD):

This method also evaluates how well our image segmentation, with the help of Otsu's thresholding, was performed on the
given images.
Thereby it measures the distance of the border of the segmented cell nuclei to the border of the ground truth nuclei.
It computes the mean distance of all the calculated values.
The formula used is depicted here:

\begin{align*}
    d(p, S') = min ||p - p'||_2; \enspace p' ∈ S'
\end{align*}

\begin{align*}
    MSD = \frac{1}{n_s + n_s'}  \\ (\sum_{p=1}^{n_s} d(p,S') + \sum_{p'=1}^{n_s'}  d(p',S))
\end{align*}

\begin{align*}
    d = \enspace distance
\end{align*}

\begin{align*}
     S \enspace and \enspace S′= outer \enspace surfaces
\end{align*}

\begin{align*}
p = point \enspace on \enspace surface \enspace S \enspace and \enspace the \enspace surface \enspace S′
\end{align*}


Hausdorff method:

The Hausdorff method is very similar to MSD, the only difference is that it computes the maximum distance of all values.
Thus, the formula must be:

\begin{align*}
    HD = max[d(S,S'), \enspace d(S',S)]
\end{align*}

\begin{align*}
    d = \enspace distance
\end{align*}

\begin{align*}
     S \enspace and \enspace S′= outer \enspace surfaces
\end{align*}

Cell counting:

...

**Dataset 1**

%##
dice_scores = np.array([dc_gowt, g_dc_gowt, m_dc_gowt, hs_dc_gowt, gs_dc_gowt, dice_mh_GOWT1])
visualisation.comparison_preprocessing(dice_scores)

msd_scores = np.array([msd_gowt, g_msd_gowt, m_msd_gowt, hs_msd_gowt, gs_msd_gowt, msd_mh_GOWT1])
visualisation.comparison_preprocessing(msd_scores, y_label= 'MSD Value')


%## md
**Dataset 2**

**Dataset 3**

**Conclusion**


Hausdorff is not suitable since our segmentation methods are too imprecise and we therefore always get very random
values which are sometimes very high. Hausdorff takes the maximum surface-distance and it is therefore logical that
the values turn out to be very high. However, we are of the opinion that Hausdorff is not really of use to us which is
the reason why we are not applying this evaluation method.

