# Watershed Algorithm (Part 2)

In [None]:
import cv2
import numpy as np
import matplotlib.pyplot as plt

In [None]:
def display(img, cmap='gray'):
    fig = plt.figure(figsize=(12, 10))
    ax = fig.add_subplot(111)
    ax.imshow(img, cmap='gray')

In [None]:
img = cv2.imread('../data/pennies.jpg')

In [None]:
# apply blur; as image is large 3000x4000px we need to apply a strong blur (use large kernel)
img_blur = cv2.medianBlur(img, 35)

In [None]:
display(img_blur)

In [None]:
# conver to grayscale
img_gray = cv2.cvtColor(img_blur, cv2.COLOR_BGR2GRAY)

In [None]:
# apply a threshold
ret, img_thresh = cv2.threshold(img_gray, 127, 255, cv2.THRESH_BINARY_INV)

In [None]:
display(img_thresh)

We can still see some features on coins (they are these black areas inside white coins).
To prevent appearance of these isolated features we'll apply [Otsu's method](https://en.wikipedia.org/wiki/Otsu%27s_method) of thresholding.

In [None]:
ret, img_thresh = cv2.threshold(img_gray, 0, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)

In [None]:
display(img_thresh)

In [None]:
# noise removal (optional in this simple use case)
kernel = np.ones((3, 3), np.uint8)

In [None]:
kernel

In [None]:
img_open = cv2.morphologyEx(img_thresh, cv2.MORPH_OPEN, kernel, iterations = 2)

In [None]:
display(img_open)

In [None]:
# grab sure background
sure_bg = cv2.dilate(img_open, kernel, iterations=3)

In [None]:
display(sure_bg)

We still have a fundamental problem here: all coins are still connected into a single blob (a single foreground object).

We want to set "seeds" that we are sure they are in the foreground. In our example, we want 6 seeds, one in the center of each coin. 

So how can we be sure that seeds are placed in the foregrond objects? We need to use a [distance transform](https://en.wikipedia.org/wiki/Distance_transform). If we're given a binary image (0s and 255s), distance transformation transforms the image in such way that the pixels more distant from the black (zeros) get more brighter.  

[Distance Transform](https://homepages.inf.ed.ac.uk/rbf/HIPR2/distance.htm)

Applied to our image, we can expect that the brightest pixels will be at coin centers and the darkest around coin edges (closest to the black pixels).

If we then apply thresholding again on that, we'll get 6 points that we are sure that are within coins.

In [None]:
# distance transform
img_dist_trans = cv2.distanceTransform(img_open, cv2.DIST_L2, 5)

In [None]:
display(img_dist_trans)

In [None]:
# apply another thresholding; the result is sure foreground
ret, img_thresh_2 = cv2.threshold(img_dist_trans, 0.7 * img_dist_trans.max(), 255, 0)

In [None]:
display(img_thresh_2)

We are absolutely sure these 6 points are in the foreground.

All white (foreground) pixels that are present in `img_open` but not in `img_thresh_2` are "unknown regions" (regions that we are not sure if they belong to foreground or background) that watershed algorithm has to find.

In [None]:
# sure foreground
sure_fg = np.uint8(img_thresh_2)
unknown = cv2.subtract(sure_bg, sure_fg)

In [None]:
display(unknown)

These are the regions that we are not sure if they belong to foreground or background. We're going to make label markers at those 6 points in `sure_fg` and use them as seeds that watershed algorithm uses to find foreground segments.

[Connected-component labeling](https://en.wikipedia.org/wiki/Connected-component_labeling)
* subsets of connected components are uniquely labeled based on a given heuristic. 
* used to detect connected regions in binary digital images
* Connected-component labeling is not to be confused with segmentation.

[cv::connectedComponents()](https://docs.opencv.org/3.4/d3/dc0/group__imgproc__shape.html#gaedef8c7340499ca391d459122e51bef5)

```
retval, labels = cv.connectedComponents(image[, labels[, connectivity[, ltype]]])
```

In [None]:
# 3 steps to create markers

ret, labels = cv2.connectedComponents(sure_fg)

In [None]:
labels.shape

In [None]:
img.shape

In [None]:
# let's see how many unique regions have been detected
np.unique(labels)

In [None]:
# If we see spatial distribution of these values, we can see that label with value 0 is assigned to background - first and last values belong to first and last rows of pixels:
labels

In [None]:
# we want to add 1 so sure background is not 0 but 1
labels = labels + 1

In [None]:
np.unique(labels)

In [None]:
# let's check that background indeed has label 1 now:
labels

In [None]:
# We did this as we want to mark the region of unknown with zeros (so only unknown region is black)
labels[unknown==255] = 0

In [None]:
display(labels)

We now have clearly labeled a gray sure background, a black sure unknown region and 6 sure markers (which will act as seeds for watershed algorithm).

In [None]:
# apply watershed algorithm to markers
markers = cv2.watershed(img, labels)
display(markers)

In [None]:
contours, hierarchy = cv2.findContours(markers.copy(), cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)

for i in range(len(contours)):
    if hierarchy[0][i][3] == -1:  # is it external contour?
        cv2.drawContours(img, contours, i, (255, 0, 0), 10) # red contours, thickness = 10
        
display(img)

So watershed algorithm works like this: 
* imagine having interconnected water pools but each pool is at different height
* you have buckets of various colours and number of buckets matches number of pools
* you then dip a brush in one bucket and then dip it in the centre of the firt pool - the colour will fill entire pool
* repeat this for all other buckets and pools - each pool will be colour with different colour 
* once you have these regions painted in different colours, you can draw countours around the each of them