# Sinopsis

In the previous notebooks, we learnt how to use images as a numpy arrays and do some
analysis. Also, we found some coins, isn't it cool?

Now, it is time to do some preprocessing on the images before running our algorithms.

# Read libraries

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

from colorama import Back, Fore, Style
from pathlib import Path

In [None]:
import os
import sys

import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np

from pylab import imread, imshow
from skimage import data, img_as_float, img_as_ubyte, measure
from skimage.filters import rank, threshold_otsu
from skimage.morphology import (
    disk, 
    binary_dilation, binary_erosion, binary_closing, binary_opening,
    remove_small_holes, remove_small_objects,
    flood_fill,
    skeletonize
)

# Thresholding

Sometimes, when we want to find blobs, we just need to differenciate the background from
them. In other words, we just want to preprocess the image from a RGB (or grayscale)
color spaces to a binary. For example, zero means background, one means it is something
we want to find.

Let's take the coins again and try to do something:

In [None]:
coins_original = img_as_float( data.coins() )

In [None]:
plt.imshow(coins_original, cmap = 'gray', vmin = 0, vmax = 1)
pass

We discussed previously that we need to remove the background. In fact, we already
used thresholding algorithms on it. The simplest one is using the mean of the values:

In [None]:
median = np.median(coins_original)
binarized_im = coins_original > median
plt.imshow(binarized_im, cmap="gray")
pass

Well, in this case, `binarized_im` is an array of boolean values, but we can transform
it easily to a integer values of 0 and 1:

In [None]:
binarized_im.astype(np.uint8)

**Note:** We are using `uint8` to use just a byte to store the number.

Check the next cell:

In [None]:
a1 = sys.getsizeof(binarized_im.astype(int))
a2 = sys.getsizeof(binarized_im.astype(np.uint8))
print(f"The size of using `int` is {a1}")
print(f"The size of using `np.uint8` is {a2}")
print(f"Yes, it is almost {np.round(a1/a2)} more expensive!")

Don't need to mention that using a big image (2048 x 2048) it makes a difference:

- Using bytes is 4 MB
- Using ints is 32 MB

If you have a stack of images from a confocal microscope (let's say 32 images):

- Using bytes is 128 MB
- Using ints is 1 GB

Now, imagine that you are treating a C. Elegans recorded video that contains thousands of frames.

>**IMPORTANT NOTE:** Take the memory usage serious!

# Thresholding, again

Ok, sorry for this interruption, let's continue with the thresholding. Using the coins image, this global threshold is not working properly, but there are cases where we can control the light that this could work.

>**Note:** If the mean or the median is working, let's go ahead and avoid adding complexity to 
your code.

We will introduce another algorithm to do the thresholding. It was proposed by Nobuyuki Otsu
and the algorithm is called after him: otsu.

More info: https://en.wikipedia.org/wiki/Otsu%27s_method

In [None]:
otsu_value = threshold_otsu(coins_original)
otsu_binarized_im = coins_original > otsu_value
plt.imshow(otsu_binarized_im, cmap="gray")
pass

Not sure if this algorithm improves the previous. Maybe we can try to find the way
to select the coins after this thresholding, but we can introduce more advanced thresholding
algorithms to improve the image before finding the contours.

In the previous notebook, we use a local threshold, looking for the value of each
segment of the image (it was cut in 4x4 rectangles).

We will do something similar, create a disk (circle) of some pixel radius and find the
local thresold of the pixel that are inside this disk. This will be applied to all the
values of the image, as a convolution.

Let's do it simple first. We will create an images 10x10 pixel size and do it using a disk
of 3 pixel radius:

In [None]:
im = img_as_ubyte(coins_original[25:80, 10:80])
radius = 10   # Play changing this value to understand how this works!
d = disk(radius)

fig = plt.figure( figsize = (8, 8))

plt.subplot(131)
plt.imshow(im, cmap="gray")

plt.subplot(132)
w, h = im.shape
pretty_d_im = np.zeros(im.shape)
pretty_d_im[(w//2-radius):(w//2+radius+1), (h//2-radius):(h//2+radius+1)] = d
plt.imshow(pretty_d_im, cmap="gray")

plt.subplot(133)
local_threshold = rank.otsu(im, d)
local_binarized_im = im > local_threshold
plt.imshow(local_binarized_im, cmap="gray")
pass

In [None]:
im = img_as_ubyte(coins_original)
d = disk(50)  # This 50 is cherry picked, could you imagine an algorithm to pick it?

local_threshold = rank.otsu(im, d)
local_binarized_im = im > local_threshold

fig = plt.figure( figsize = (8, 8))
plt.subplot(131)
plt.imshow(im, cmap="gray")
plt.subplot(132)
plt.imshow(local_threshold, cmap="gray")
plt.subplot(133)
plt.imshow(local_binarized_im, cmap="gray")
pass

Hey, I think this is working much better! I will keep this thresholding algorithm
for this example and do some stuff, like cleaning the little black dots inside the
coins.

# Morphology operations

There are some important morphology operations that you should know before start
working on imaging.

The first two are `dilation` and `erosion`:

- `dilation` applies a structuring element (like the disk of the previous example) for
  expanding the shape contained in the image. It is setting the structuring element on
  top of the all pixels that are `1` in the original image (convolution). This will
  generate a resulting image expanding all the edges of the elements inside the image.
  
- `erosion` applies a structuring element for reducing the shape contained in the
  image. It is doing the same as `dilation`, but using the background (`0` values).
  
We will start with the first coin of the image and do a simple threshold using the mean:

In [None]:
im = img_as_ubyte(coins_original[25:80, 10:80])
threshold = np.mean(im)

plt.subplot(121)
plt.imshow(im, cmap="gray")

plt.subplot(122)
thresholded_im = im > threshold
plt.imshow(thresholded_im, cmap="gray")
pass

We can observe that some pixels are not the same as its neighbors. We used to call this
`Salt & Pepper noise` and we can reduce it using `dilation` and `erosion`.

Let's apply first `dilation`:

In [None]:
d = disk(3)
dilated_im = binary_dilation(thresholded_im, d)

plt.subplot(121)
plt.imshow(thresholded_im, cmap="gray")
plt.subplot(122)
plt.imshow(dilated_im, cmap="gray")
pass

Cool! It seems that the black pixels inside the coin were removed, but.... oh ooohhhh,
the top right corner pixel just get expanded.

Also, it seems that the original coins just grows when the `dilation` is applied.

Let's try the `erosion` instead the `dilation`:

In [None]:
d = disk(3)
eroded_im = binary_erosion(thresholded_im, d)

plt.subplot(121)
plt.imshow(thresholded_im, cmap="gray")
plt.subplot(122)
plt.imshow(eroded_im, cmap="gray")

Ok, now we solved the top right corner pixel problem, but the others... got worst!

Also, the coin reduced its area.

Well, it seems that we solved one problem when one algorithm is applied... could
the second problem be solved if we applying the other algorithm? It seems both are
a kind of complementary, isn't it?

In [None]:
d = disk(3)
first_dilated_im = binary_dilation(thresholded_im, d)
first_eroded_im = binary_erosion(thresholded_im, d)

then_eroded_im = binary_erosion(first_dilated_im, d)
then_dilated_im = binary_dilation(first_eroded_im, d)

plt.subplot(221)
plt.imshow(first_dilated_im, cmap="gray")
plt.subplot(222)
plt.imshow(then_eroded_im, cmap="gray")
plt.subplot(223)
plt.imshow(first_eroded_im, cmap="gray")
plt.subplot(224)
plt.imshow(then_dilated_im, cmap="gray")
pass

We are moving closer to the solution, isn't it?

It seems that, if we first dilate and then erode, we are solving the black pixels inside
the coin. If we first erode and then dilate, we are deleting the top right corner pixel.

In both cases, the coin is almost the same size and, probably, we can deal with the
new approximated size.

Crazy idea, what if we first dilate, then erode, then erode, and then dilate?

In [None]:
d = disk(3)
first_dilated_im = binary_dilation(thresholded_im, d)
then_eroded_im = binary_erosion(first_dilated_im, d)
then_first_eroded_im = binary_erosion(then_eroded_im, d)
then_then_dilated_im = binary_dilation(then_first_eroded_im, d)

plt.subplot(121)
plt.imshow(thresholded_im, cmap="gray")
plt.subplot(122)
plt.imshow(then_then_dilated_im, cmap="gray")
pass

Now, it seems it works!

In fact, the first two operations are commonly named `closing` (the `erosion` of the `dilation`), and the second two operations are commonly named `opening` (the `dilation` of the `erosion`).

And this naming works: `closing` is removing the black points inside a blob (the coin)
and the `opening` is removing the white points at the background.

We can simplify this using the propper functions:

In [None]:
d = disk(3)
closing_im = binary_closing(thresholded_im, d)
opening_im = binary_opening(thresholded_im, d)

closing_then_opening_im = binary_opening(closing_im)
opening_then_closing_im = binary_closing(opening_im)

plt.subplot(221)
plt.imshow(closing_im, cmap="gray")
plt.subplot(222)
plt.imshow(opening_im, cmap="gray")
plt.subplot(223)
plt.imshow(closing_then_opening_im, cmap="gray")
plt.subplot(224)
plt.imshow(opening_then_closing_im, cmap="gray")
pass

Why are the results different? It seems these two operations are not commutative.
Wanna discuss this?

>**Note:** did you try to subtract the eroded image to the original one? What do you think
           it will happen? and the dilated imaged? and between them?

# More morphology operations

There are other morphology operations to remove local holes or removing local items. These 
operations are more flexible than `opening` and `closing` because removing the hole or item
depends of its area, not the structural element. Also, it will not change the shape of the 
objects.

In [None]:
without_holes_im = remove_small_holes(thresholded_im)
without_objects_im = remove_small_objects(thresholded_im)
without_them_im = remove_small_objects(without_holes_im)

plt.subplot(221)
plt.imshow(thresholded_im, cmap="gray")
plt.subplot(222)
plt.imshow(without_holes_im, cmap="gray")
plt.subplot(223)
plt.imshow(without_objects_im, cmap="gray")
plt.subplot(224)
plt.imshow(without_them_im, cmap="gray")
pass

# Extra operations

Another operation that could be useful (and sometimes dangerous) is the `floodfill`. Calling
this operation with a `seed point`, it will fill the image from that point until it arrives
to the borders. In order to use this operation, you must check that you don't have holes in your perimeter, so the paint leaks and fill all the image!

This is an example:

In [None]:
simple_im = np.zeros((100, 100), dtype=np.uint8)
simple_im[:, 50] = 1
simple_im[50, :] = 1
simple_im[50, 75] = 0

plt.subplot(131)
plt.imshow(simple_im, cmap="gray", vmin=0, vmax=2)

plt.subplot(132)
filled_im = flood_fill(simple_im, (5, 5), 2)
plt.imshow(filled_im, cmap="gray", vmin=0, vmax=2)


plt.subplot(133)
bad_filled_im = flood_fill(simple_im, (5, 55), 2)
plt.imshow(bad_filled_im, cmap="gray", vmin=0, vmax=2)
pass

Before segmenting the image into different blobs, or contours, let's talk a bit about the last operation: `skeletonization`. This operation will reduce the blobs to a single pixel wide.

To do this example, we will use another image, since the coins are round and the skeleton 
make no sense in that kind of blobs. We will try with something more related to science:
C. Elegans!

First, read the image:

In [None]:
celegans_frame_filename = Path.cwd() / 'Data' / 'celegans-frame.png'
celegans_frame = imread(celegans_frame_filename)
plt.imshow(celegans_frame, cmap="gray")
pass

We can remove the bottom and right white part of the image. In this case, we will just
remove it using the coordinates (we can use other algorithms to fins the frame, wanna
try it?)

In [None]:
celegans_frame_cut = celegans_frame[150:1350, 200:1825]
plt.imshow(celegans_frame_cut, cmap="gray")
pass

Also, we can inverse the image, since the worms are black and the background is white. We
will transform it also to an array of bytes:

In [None]:
celegans_frame_cut_inv = 1 - celegans_frame_cut
im = img_as_ubyte(celegans_frame_cut_inv)  # Yes, just play with `im`

Now, let's binarize using a local thresholding algorithm. Note that the otsu value is not detecting properly
all the worms, so we decided to subtract 10 to the threshold. Doing this, we will detect all the worms, as well as
some noise that we will need to remove:

In [None]:
otsu_value = threshold_otsu(im)
celegans_binarized = im > otsu_value - 10
plt.imshow(celegans_binarized, cmap="gray")
pass

We will remove the small objects using `remove_small_objects`. We will remove the border as well, using one of the 
simples solutions: using the `floodfill` operation on the top left corner.

In [None]:
celegans_binarized = im > otsu_value - 10
celegans_binarized_no_small = remove_small_objects(celegans_binarized)
filled = flood_fill(celegans_binarized_no_small, (0, 0), 0)
plt.imshow(filled, cmap="gray")
pass

And do the `skeletonization`:

In [None]:
skel = skeletonize(filled)

# Prepare a RGB image to show the skel over the object
w, h = filled.shape
final_image = np.zeros((w, h, 3), dtype=np.uint8)
final_image[filled, :] = [64, 64, 64]
final_image[skel, :] = [255, 64, 64]

plt.figure(figsize=(12, 9))
plt.imshow(final_image[500:800, 100:500])
pass

# Segmentation

Finally, let's do the segmentation and label all the worms we find, like we did in the previous notebooks:

In [None]:
fig = plt.figure(figsize=(12, 9))
ax = fig.add_subplot(111)
ax.imshow(filled, cmap="gray", vmin=0, vmax=1)

contours = measure.find_contours(filled)
for n, contour in enumerate(contours):
    ax.plot(contour[:, 1], contour[:, 0], linewidth=1)

Test the different functions on module `skimage.measure` to find some useful measurements: area, perimeter, 
center of mass...

In the example, we have some objects that are not worms. Implement some algorithm to filter the objects and keep
only the objects that look like worms.