In [1]:
%pylab
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy.ndimage
import pandas as pd
import seaborn as sns
from nuclei_segmentation import otsu as ot

Using matplotlib backend: <object object at 0x10749ee20>
%pylab is deprecated, use %matplotlib inline and import the required libraries.
Populating the interactive namespace from numpy and matplotlib


<h1><b>Implementation and Evaluation of Otsu Thresholding</b></h1>


***


<h3><b>Abstract</b></h3>
<p style = 'text-align : justify;'>
This project, deals with the segmentation method Otsu's thresholding. 
Using different preprocessing methods and Otu's thresholding variations (meine damit two level otsu,... ist das ein missverständlicher begriff?) the segmentation for 3 different datasets (N2DH-GOWT1, N2DL-HeLa, NIH3T3) were optimized. Because each dataset has different challenges, they were individually considered. To find the best segmentation outcome the processed images were compared with ground truth images using Dice Score for the evaluation. 

</p>
***


<h3><b>Table of contents</b></h3>

1. Introduction

2. Description of datasets  
2.1 N2DH-GOWT1 dataset  
2.2 N2DL-HeLa dataset  
2.3 NIH3T3 dataset 

3. Methods 
3.1 Packages 
3.2 Preprocessing  
3.3 Thresholding  
3.4 Evaluation  
3.5 Cell nuclei counting  
4. Results  

5. Discussion 
 
6. References 


*** 


<h3><b>1. Introduction</b></h3>
<p style = 'text-align : justify;'>

(das hier eher abstract? evtl intro und abstract tausch?)

Image segmentation is a common tool for the applications in different areas....( in arbeit  https://arxiv.org/pdf/2001.05566.pdf )

Cell segmentation aims to simplify an image so that it can be better and easier analyzed. Cell segmentation makes it easier to determine cell boundaries precisely, which then help to further examine cells in terms of the phenotype of the cell (e.g. cellsize, cellshape), to count the number of cell nuclei and to better localize the cells. Therefore insights into the characteristics of the cells for example cell migration and proliferation by using time-lapse microscopy (Ulman, V., et al. 2017). There are several different image segmentation algorithms for example Thresholding (Otsu, N. 1979), k-means clustering (Dhanachandra, N. et all. 2015) and Region growing (Nock, R. and Nielsen, F. 2004). Thresholding is a simple segmentation method. The gray-scale image is binarized, meaning that the pixels are assigned to foreground and background dependent on the threshold. Thus pixels below the threshold are assigned to 1 and pixels above the threshold are assigned to 0. Otu's Thresholding allows for the selection of a threshold automatically (Otsu, N., 1979). 

</p>
***

<h3><b>2. Description of datasets</b></h3>

For each dataset different preprocessing methods were used in order to evaluate the best dice score possible. Furthermore the cell nuclei counts were verified using a cell nuclei counting algorithm. 

***
<br>
<h3><b>2.1 N2DH-GOWT1 dataset</b></h3>
<p style = 'text-align : justify;'>
This datset contains 6 images showing embryonic stem cells from the organism mouse (Mus musculus). The images have a size of 1024x1024 pixels and an approximate cell nuclei count of 10 to 20 per image. For visualization of the nuclei they tagged Oct4 with GFP. In the attributable studies they found out about UV-damaged Chromatin in embryonic stem cells by detecting different amounts of Oct4 accumulations. The images were receive using the confocal microscope Leica TGS SP5. (Bártová, E. et al., 2011). The main challenge of this dataset is the low contrast as well as stained parts within the cells. 
</p>
<h3><b>2.2 N2DL-HeLa dataset </b></h3>
<p style = 'text-align : justify;'>
In this dateset 4 images from human cells (Homo sapiens) derived from cervical cancer can be seen. The images have a size of 1100x700 pixels. The images contain an average of 30-50 nuclei per image. The cells are modified to stably express H2b-GFP in order to make the nuclei visible. This dataset was used for the development of a phenotypic screening platform for the identification of human genes important for specific biological functions like cell division. The images were made using a Olympus IX81 microscope (Neumann, B. et al., 2010). A problem that occurs is the varying brightness.
</p>
<h3><b>2.3 NIH3T3 dataset  </b></h3>
<p style = 'text-align : justify;'>
As for this dataset, 18 images that show mouse (Mus musculus) embryonic fibroblast cells, are shown. The images have a pixel size of 1344x1024 and contain about 60 nuclei per image. For this images Hoechst is used to stain the nuclei. 
(Coelho, L.P. et al., 2009). The challenges faced here are the varying brightnesses of the images, the reflections and overlapping nuclei.

</p>


***


<h3><b>3. Methods</b></h3>


***
<h3><b>3.1 Packages</b></h3>

<h3><b>3.2 Preprocessing</b></h3>


The larger the filtersize the more the image gets smoothed, however, edges will be blurred to a larger extend. Therefore it is necessary to find the optimal filtersize in order to receive the best possible filtering result. Thus for our project the optimal filtersize has been worked out by comparing the Dice scores for various different filtersizes.

<h4><b>3.2.1 Gaussian filter</b></h4>
<p style = 'text-align : justify;'>
The gaussian filter is a linear filter. Meaning that convolution is necessary. 
With the gaussian filter, the noise is reduced by weighting of the different pixels. Thus the central pixels is rated the highest. The noise is reduced, however the edges become blurred as the signal gets distorted. It is therefore not suitable against impulsive noise.  It is most suitable when the noise is gaussian distributed. A larger filtersize will result in removing of more noise, however edges will become more blurred. The formular for the 2-dimensional filter can be seen subsequently (Deng, G. and Cahill, W. 2002 and Mark S. Nixon, S. M. and Aguado, A.S. 2002) (zweites is tbuchquelle anders zitieren?)
</p>

$$ G_σ(x,y) = \frac{1}{σ^2 2π}e^\frac{-(x^2+y^2)}{2σ^2}$$

σ^2: variance

<br>
<br>
<br>
<h4><b>3.2.2 Median filter</b></h4>
<p style = 'text-align : justify;'>
The median filter is a non linear filter. Thus convolution is not necessary. 
Here the pixels in the NxN range are ranked and the median value replaces the pixel.
This filter is suitable for impulsive noise (Salt-and-Pepper-noise) and allows to retain edges.
It is not suitable for gaussian noise. (Mark S. Nixon, S. M. and Aguado, A.S. 2002)
</p>
<br>
<br>
<br>
<h4><b>3.2.3 Histogram stretching</b></h4>
<p style = 'text-align : justify;'>
Histogram stretching, also known as histogram normalisation, can be used to enhance the contrast. The image is stretched so the whole range of the pixel intensity values (e.g. 0-256) are used. This method is especially useful for images with a restricted range of pixel intensities. It is not useful, if the pixels that correspond to the minimum and maximum of the range are already present (Mark S. Nixon, S. M. and Aguado, A.S. 2002).
</p>

$$ g(x,y) = \frac{f(x,y)-0}{255-0}*255 $$
<br>
<br>
<br>
<br>
<h3><b>3.3 Thresholding</b></h3>
<h4><b>Otsu Thresholding</b></h4>
<br>
<br>
<br>
<br>
<br>
  
<h4><b>Two-level Otsu Thresholding</b></h4>
<p style = 'text-align : justify;'>
Otsu thresholding is not selective for bright spots, reflections or artifacts, which arise from the image acquisition process and data transmission during microscopy. As a reason for that, pixels are assigned incorrectly to the foreground and thus identified falsely as cell nuclei. To solve this problem, Otsu's method can be extended to multithreshold problems, respectively two-level Otsu thresholding can be performed. The fundamental principle is to set two threshold values, which divide the image into three pixel classes. The optimal set of threshold values is determined by maximizing the between-class variance between the three pixel classes as followed (Otsu, 1979):
</p>
  
$$σ^2_B = ω_1(μ_1 - μ_T)^2 + ω_2(μ_2 - μ_T)^2 + ω_3(μ_3 - μ_T)^2$$
  
<p style = 'text-align : justify;'>
The higher threshold intensity value is used to remove the bright spots and reflections from the original image, whereas the lower threshold value separates the cell nuclei from the background. In order to receive a binary image, all pixels with a lower or equal intensity level than the lower threshold value as well as all pixels with a higher intensity value than the upper threshold value are set to 0 (background). The remaining pixels, in rage between the lower and upper threshold value, are converted to 1 (foreground). Furthermore, two-level Otsu thresholding can be used to segment images in which the objects, such as cell nuclei, can be divided into two groups in terms of their intensity values. These features are represented in the images of the N2DH-GOWT1 dataset, in which the cell nuclei seem to be distributed at different levels. Ideally, the two thresholds would separate the cell nuclei from eachother, but also from the background. In this case, all pixels with a higher intensity level than the lower threshold and all pixels with a higher intensity level than the upper threshold are set to 1. The pixels with a lower or equal intensity value than the lower threshold are set to 0.
</p>
<br>
<br>
<br>
<br>
<h4><b>Local Adaptive Thresholding</b></h4>
<p style = 'text-align : justify;'>
Although, global thresholding is very fast and provides reliable results in most cases, it is susceptible to non-uniformal illumination, considerable background noise and varation in contrast (Singh <I>et al.</I>, 2011). As a reason for that, pixels can not be clearly assigned to the foreground or background, which results in unaccurate segmented images. In order to overcome this problem, the implemented global segmentation method, can be extended to local tresholding, being a more appropriate and accurate binarization method. Local thresholding uses non-overlapping windows to divide the original image into a distinct number of regions and to determine the optimal threshold of each. Due to contrast and illumination variation even in small regions especially in the NIH3T3 dataset, local adaptive thresholding is performed by using overlapping windows. The key idea is to use a sliding window, a  rectangular frame of a fixed size, which iterates over the image and calculates a unique threshold value for each iteration by Otsu thresholding, --- derived from the grayscale information of its local neighbourhood pixels --- /not really what we do in our sliding window/  (Singh <I>et al.</I>, 2011). To assign each pixel to the foreground or background, two possible options were elaborated. On one hand, the optimal threshold intensity value can be determined by averaging the threshold values, which were assigned to the pixel. By performing image clipping afterwards, a binary and segmented image can be received. On the other hand, each pixel can be assigned to the foreground or background, by checking to what the pixel has been assigned to more often. To achieve efficiency and accuracy it is important to select the right stepsize and window size. Since the calculation of the threshold value from the local neighbourhood is to be applied to each pixel in the image, local adaptive thresholding algorithms are often slower than global thresholding algorithms. 
</p>


***
