In [None]:
<p style = 'text-align : justify;'>
Although the Dice scores with preprocessing were much lower and the effect of the clip method was not as high as expected. The negative effect of the preprocessing with histogram stretching can be explained by the fact, that with the higher contrast more cell nuclei were above the second threshold, therefore probably some pixels were clipped with the reflections resulting in a lower Dice score. Furthermore, the effect was lower overall because not all images were affected by reflections and in those images clip resulted in a lower Dice score. 
</p>
<br>
<br>
<h4><b>Challenges</b></h4>
<p style = 'text-align : justify;'>
Also, we will discuss some of the challenges that were encountered when programming the Otsu algorithm. 
First, the biggest issue was runtime optimization. Herefore, the global Otsu thresholding code as well as the two-level Otsu thresholding code were mostly vectorized which reduced the runtime from around 1s per image two about 0.1s per image. Furthermore, we decided to only calculate the between class variance rather than the within class variance which further reduced the runtime to around 0.04s per image because the variance did not have to be computed. Therefore, the runtime for applying the global Otsu thresholding algorithm on all given images is about 1.2s. The two-level Otsu thresholding runtime is significantly larger because of the second for-loop and the image clipping which all takes about 20s per image. This long runtime is also the reason for not applying two-level Otsu thresholding together with local adaptive thresholding on all images which will be discussed further in the local thresholding part. 
Secondly, the runtime could be reduced when changing from the maplotlib histogram function to the numpy histogram function which did not plot the histogram anymore, thereby reducing runtime. 
Furthermore, we decided on the bin size 256 for thresholding after trying out different bin sizes as well as examining the intensity value histograms of the images. Two datasets showed the range of 0-255 in their intensity values and the third one showed the best segmentation outcomes with this bin size. Therefore, also taking the runtime into account, 256 was chosen as the optimal bin size, thus examining each threshold at each integer from 0-255. 
Finally, our last challenge was finding the optimal filter size for each dataset and each preprocessing method. This was solved by comparing the dice scores with different filter sizes from 1 to around 30 (taking the image size into account) and finding the optimal one for each dataset, maximizing the Dice scores. 
</p>
<br>
<br>
<br>
<h3><b>5.3 Local adaptive Otsu Thresholding</b></h3>
<p style = 'text-align : justify;'>
The main aim of local adaptive thresholding was segmentation of pictures with non-uniform illumination, therefore the efficiency of the algorithm was first and foremost tested on the NIH3T3 dataset, where such problem would often be present. Two separate algorithms were developed, as elaborated in Methods. 
While developing the algorithms, there were issues which arose in both of them, due to the nature of the sliding window iterations, as well as unique upsides and downsides emerged for each of the algorithms.
</p> 
<br>
<br>
<h4><b>Challenges - unsegmented edges</b></h4>
<p style = 'text-align : justify;'>
The most prominant challenge in both “local adaptive thresholding average” and “local adaptive thresholding counts” were non-segmented picture edges, as can be seen in figure 15. The sliding window iterations would always begin at the upper left corner of an image, therefore in outputs one would usually see completely white right and lower edges, which the sliding window algorithm simply could not access, if the size of an image were not a multiple of the chosen stepsize. To deal with this issue, referred to as “the edge problem”, the algorithm was extended, by translating the pixel values onto a larger array than the original picture. In such manner a lower and a right edge was attached with height or width equal to the framesize, set by the user, carrying a NaN value in each position. The sliding window now could iterate over the bounds of the original image array, while still calculating  proper threshold values in each frame, as the NaN values could simply be ignored. This method increases the runtime of the algorithm, by adding extra iterations, the number of which depends on the proportion of framesize over stepsize. By rounding the proportion down to the closest integer value, one could directly calculate exactly how many additional frames are added per every iteration row/column, thus the runtime increase can be easily approximated by the user before executing the algorithm. Although this method considers all the pixels in the input image, the intensity assignments for pixels located in the bottom and right is less consistent than for the rest of the image, as the more NaNs contained in an iteration frame, the less values are used for the calculation of the threshold and the confidence decreases for pixels further out.
</p>
<br>
<p style = 'text-align : justify;'>
Other solutions were considered, such as running the algorithm twice, first starting the iterations at the top left pixel and second time at the bottom right pixel, defining a “backward local thresholding average”. By uniting the two algorithms the segmented picture would have cleaner edges, especially the bottom and right edge, yet here the upper right and bottom left corners would remain fully black, as those were the overlapping areas for where sliding window would not reach in both algorithms. Such an algorithm also takes almost twice as much time as the previously defined solution, while still containing non-segmented areas. One could define an algorithm, where the sliding window is run 4 times, each time beginning from a different corner, yet such an algorithm would be an even larger increase in runtime and might not be worth implementing, unless the user has a small dataset and wants segmentation as perfect as possible.
</p>
<br>
<p style = 'text-align : justify;'>
A solution, that could also be further implemented, would be “patching up” the non-segmented areas after all sliding window iterations. Here one could compute additional threshold values/ assign pixel values, for additionally defined frames, which were previously not considered. Such method would only add a few up to few dozen seconds of runtime (based on framesize and stepsize) and would take into account the same amount of pixels for threshold calculation as for the frames in the sliding window iterations. 
</p>

In [None]:
#Figure 15 : img dna-28.png from the NIH3T3 dataset segmented with local adaptive Otsu thresholding without segmentation of picture edges.

dna28_edges = imread(r'Outputs_Boxplots\White edge for report NIH3T3 dna-28 sliding window, stepsize=40, framesize=200, sensitivity=0.5.png')
figure(figsize=(5,5))
plt.axis('off')
imshow(dna28_edges, 'gray')

<br>
<h4><b>Challenges - runtime</b></h4>
<p style = 'text-align : justify;'>
As the local thresholding algorithm performs Otsu thresholding at each iteration, the runtime of the algorithm directly correlates with the runtime of Otsu thresholding itself, as well as number of iterations, which in the simple case (no NaN edges, algorithm is run once, only forwards), would be approximately equal to image shape divided by stepsize, squared. Thus, for optimisation of the algorithm itself, the greatest reduction in runtime followed the optimisation and vectorization of Otsu thresholding, rather than optimisation of the local thresholding algorithm itself. In any case, by setting a stepsize, the user still defines the final runtime of the algorithm themselves, and one has to consider, that depending on how detailed the input image is, the segmentation can take from up to a minute (45 seconds for “mean” algorithm with NaN edges at stepsize = 50 and framesize = 200 on NIH3T3 images) to a few minutes (runtime will also differ based on outer factors, for example the processor of the computer, thus these values are only representative). The 
</p>
<br>
<br>
<h4><b>Challenges - random noise</b></h4>
<p style = 'text-align : justify;'>
An issue that arose in all datasets was random noise in areas with no distinguishable cell nuclei, that were bigger than the iteration frame (framesize x framesize), due to random assignment of pixels to foreground or background, as can be seen in figure 16. In "local adaptive Otsu thresholding count” this means, that for each iteration frame containing only background, a random array of 0’s and 1’s would be generated. For a smaller number of unique pixel foreground/background assignments these random assignments (for example at framesize = 150, stepsize = 50, only 3 frames contribute to each pixel) could easily influence the final segmentation and lead to large areas of random noise. Because local adaptive Otsu thresholding average only assigns a pixel intensity once the average threshold is calculated, this allows for a more dynamic segmentation with less or no random noise and leads to higher Dice score coefficients, therefore this algorithm was used as the final local adaptive thresholding algorithm.
</p>

In [None]:
#Figure 16 : img t-31.tif from the N2DH-GOWT1 dataset segmented with local adaptive Otsu thresholding with random noise.

t_31_noise = imread(r'Outputs_Boxplots\t31 noise.png')
figure(figsize=(5,5))
plt.axis('off')
imshow(t_31_noise, 'gray')

<br>
<br>
<h4><b>Evaluation of local adaptive thresholding</b></h4>
<p style = 'text-align : justify;'>
The random noise artifacts were the main influence on the segmentation quality and led to lower Dice scores, especially if the algorithms were used on the N2DH-GOWT1 dataset, where generally large areas with no distinguishable object were present. Due to no apparent non-uniform illumination and great reduction in segmentation quality in comparison to global thresholding, local thresholding was not further analysed as a segmentation method for this dataset. 
For the other two datasets (N2DL-HeLa and NIH3T3) both local adaptive Otsu thresholding count and local adaptive Otsu thresholding average were applied. An algorithm to automatically compare the Dice scores for both segmentations for each dataset was implemented and for both datasets segmentation with and without any kind of preprocessing returned a constant higher Dice score average for the local adaptive Otsu thresholding average algorithm, which can be explained by the more dynamic nature of the algorithm. Further only the results for this algorithm will be elaborated, as it was chosen as the better version from the two segmentation algorithms.
</p>
<br>
<br>
<h4><b>NIH3T3</b></h4>
<p style = 'text-align : justify;'>
For the NIH3T3 dataset local adaptive segmentation showed a clear increase in the Dice score in comparison to global Otsu thresholding, as one would predict for images with differentiating brightness, the median Dice scores being accordingly 0.817 and 0.672 without preprocessing. One can see a clear difference in segmentation quality in figure 17. 

In [None]:
#Figure 17 : img dna-37.png from the NIH3T3 dataset raw vs.segmented with local Otsu thresholding 

dna37_comparison = imread(r'Outputs_Boxplots\dna37_local_comparison.png')
figure(figsize=(10,10))
imshow(dna37_comparison, 'gray')
plt.axis('off')

As the preprocessing was performed globally on the whole picture before segmentation, it only influenced the thresholding of each frame in the same manner it influence the global thresholding, therefore the dice score differences compared to image segmentation without preprocessing for local thresholding were similar as the ones for global thresholding and had the same causes. Local adaptive segmentation as expected proved to be the optimal segmentation method for this dataset, even though some random noise would still be present in the edges of some segmented images. To increase the segmentation accuracy, one could use a better method that deals with “the edge problem”, as well as set a smaller stepsize to retreive a more accurate average threshold. An issue that this algorithm could not deal with in the NIH3T3 dataset were reflections, which were considered as background in the ground truth images, but can by no means be considered as background in a simple local adaptive thresholding algorithm, as pixels that clearly have a higher intensity than the rest of the image will be considered as foreground in each and every frame they appear in. To try solve this problem, two-level Otsu thresholding clip function was combined with local adaptive thresholding average to calculate two average thresholds for each frame and assign such pixels to the foreground, which had an intensity between the two calculated thresholds. Unfortunately, as there were only a few reflections on the image, that meant, that most frames would perceive actual nuclei as reflections and set their intensity values to 0, ending up in huge decrease in segmentation quality, returning a Dice score of around 0.580. An example for such segmentation is shown in figure 18. 

In [None]:
#Figure 18 : img dna-32.png from the NIH3T3 dataset segmented with two level local adaptive Otsu thresholding clip.

dna32_tlltc = imread(r'Outputs_Boxplots\Local_Two-level_Otsu_thresholding_clip_average_NIH3T3_dna32.png')
figure(figsize=(5,5))
plt.axis('off')
imshow(dna32_tlltc, 'gray')

Larger framesizes could be considered for images, where reflections are more evenly distributed to possibly deal with both the brightness issue as well as the reflections, yet the NIH3T3 dataset does not allow for this. 
</p>
<br>
<br>
<h4><b>N2DL-HeLa</b></h4>
<p style = 'text-align : justify;'>
The N2DL-HeLa dataset did not seem to have a significant difference in segmentation quality using the global Otsu thresholding function or local adaptive Otsu thresholding. As the preprocessing was performed globally on the whole picture before segmentation, it only influenced the thresholding of each frame in the same manner it influence the global thresholding, therefore the dice score differences compared to image segmentation without preprocessing for local thresholding were similar as the ones for global thresholding and had the same causes. The median Dice score for segmentation without preprocessing was 0.738 with global Otsu thresholding and 0.758 with local adaptive Otsu thresholding. As there were only 4 pictures in the dataset, the chosen stepsize and framesize (100 and 300) that was used in the local adaptive thresholding could only happen to be the optimal values and could not be extrapolated for other pictures with similar qualities as this dataset. Two further characteristics of the N2DL-HeLa dataset influence the segmentation. Firstly, two of the pictures contained large areas with no distinguishable nuclei, while in the other two the nuclei were evenly distributed. Secondly, the pictured nuclei seemed to be separated in two layers - there were such which would clearly appear brighter and ones, that would have a lower intensity value, each class sharing a similar brightness level. Due to the large background areas, a framesize which would be big enough to avoid the emergence of random noise had to be chosen, which in return lead to iteration frames, which were too big to distinguish the nuclei with lower brightness as foreground. To solve this issue two-level local adaptive Otsu thresholding average was implemented again, this time assigning all pixels with intensities above the lower threshold to foreground (two-level local adaptive thresholding average clip). Because the two-level Otsu thresholding for one picture already takes around 20 seconds and it had to be performed on each iteration, the runtime of such algorithm is high (it took on average 25 minutes to segment one picture), therefore only one picture was segmented with such an algorithm and it showed a Dice score of 0.865, while the Dice score for normal local adaptive Otsu thresholding for this picture was 0.741 and two-level Otsu thresholding -0.847. The segmented picture can be seen in figure 19. 

In [None]:
#Figure 18 : img t75.png from the HeLa dataset segmented with two level local adaptive Otsu thresholding.

t75_tllt = imread(r'Outputs_Boxplots\Local_Two-level_Otsu_thresholding_clip_average_N2DL-HeLa_t75.png')
figure(figsize=(5,5))
plt.axis('off')
imshow(t74_tllt, 'gray')

It seems that it did not matter, if the image was locally or globally thresholded before segmentation, which means there was probably not a huge difference in background illumination in different areas of the picture and instead of running an algorithm for 20 minutes one could retreive an even slightly more accurately segmented picture with simple two-level Otsu thresholding. As the preprocessing is performed globally on the whole picture before segmentation, these only influence the thresholding of each frame in the same manner they influence the global thresholding, therefore the dice score differences compared to image segmentation without preprocessing for local thresholding were similar as the ones for global thresholding and have the same causes.
</p>
<br>
<br>