# Semantic segmentation with scikit-learn

> Written by Dr Daniel Buscombe, Northern Arizona University

> Part of a series of notebooks for image recognition and classification using deep convolutional neural networks

This notebook demonstrates some strategies for semantic image segmentation using common machine learning techniques

## Naïve Bayes classification

Naive Bayes models are a group of extremely fast and simple classification algorithms that are often suitable for very high-dimensional datasets. Because they are so fast and have so few tunable parameters, they end up being very useful as a quick-and-dirty baseline for a classification problem. This section will focus on an intuitive explanation of how naive Bayes classifiers work, followed by a couple examples of them in action on some datasets.

## Detection of sand using Naïve Bayes 

In this example, Naïve Bayes classification is employed to detect pixels corresponding to sand in images, based just in the pixels color.

Training data is a M×N×3 array representing a color training image, and mask a M×N binary array representing the classification sand/non-sand. 

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from imageio import imread
import s3fs
fs = s3fs.S3FileSystem(anon=True)

Let's read an image in and look at its distributions of red, green and blue values

In [None]:
with fs.open('cdi-workshop/semseg_data/sandbars/RC0307Rf_20131111_1347.JPG', 'rb') as f:
    training_rgb = imread(f)
M, N, _ = training_rgb.shape

In [None]:
bins = np.linspace(0,255,30)

plt.figure(figsize=(15,5))
hist = plt.hist(training_rgb[:,:,0].flatten(), bins=bins, color='r', alpha=0.5)
hist = plt.hist(training_rgb[:,:,1].flatten(), bins=bins, color='g', alpha=0.5)
hist = plt.hist(training_rgb[:,:,2].flatten(), bins=bins, color='b', alpha=0.5)
plt.title('Pixel values', fontsize=8)

An (overly) simplistic approach to finding sand in a given image would be to find some threshold intensity in a certain channel

In [None]:
threshold = 200

mask = np.zeros((M,N))
mask[training_rgb[:,:,0] > threshold] = 1

In [None]:
plt.figure(figsize=(15,15))
plt.subplot(1,2,1)
plt.imshow(training_rgb)
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(mask, cmap=plt.cm.binary_r)
plt.axis('off')

The data is composed by MN 3d-vectors of red, green and blue values

In [None]:
print(np.shape(training_rgb))
data = training_rgb.reshape(M*N, -1)[:,:]
print(np.shape(data))

The classification used in the learning step is represented as a binary MN vector

In [None]:
target = mask.reshape(M*N)
target

### Training (fitting)

Sklearn provides a naive_bayes module containing a GaussianNB object that implements the supervised learning by the Gaussian Naïve Bayes method. 

One extremely fast way to create a simple model is to assume that the data is described by a Gaussian distribution with no covariance between dimensions. This model can be fit by simply finding the mean and standard deviation of the points within each label, which is all you need to define such a distribution.

In [None]:
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(data, target)

### Testing

Sand detection can be performed by reshaping and slicing in the same way as the training image. 

The predict method of GaussianNB performs the classification. The resulting classification vector can be reshaped to the original image dimensions for visualization.

Let's test at an image from the same place but at a different time

In [None]:
with fs.open('cdi-workshop/semseg_data/sandbars/RC0307Rf_20161106_1157.JPG', 'rb') as f:
    test_rgb = imread(f)
M_tst, N_tst, _ = test_rgb.shape

In [None]:
data = test_rgb.reshape(M_tst * N_tst, -1)[:,:]
sand_pred = gnb.predict(data)
S = sand_pred.reshape(M_tst, N_tst)

In [None]:
plt.figure(figsize=(15,15))
plt.subplot(1,3,1)
plt.imshow(test_rgb)
plt.axis('off')
plt.subplot(1,3,2)
plt.imshow(S, cmap=plt.cm.binary_r)
plt.axis('off')
plt.subplot(1,3,3)
plt.imshow(test_rgb, alpha=0.6)
plt.imshow(S, cmap=plt.cm.binary_r, alpha=0.4)
plt.axis('off')

We can also estimate the probabilities of each class because we have a simple recipe to compute the likelihood $P({\rm features}~|~L_1)$ for any data point, and thus we can quickly compute the posterior ratio and determine which label is the most probable for a given point.

In [None]:
sand_pred = gnb.predict_proba(data)
Sprob = sand_pred.reshape(M_tst, N_tst, -1)

And plot the probabilities:

In [None]:
plt.figure(figsize=(15,15))
plt.subplot(1,2,1)
plt.imshow(Sprob[:,:,1], cmap=plt.cm.bwr)
plt.axis('off')
plt.colorbar(shrink=0.25)
plt.title('Probability of Sand')

In [None]:
plt.figure(figsize=(15,15))
plt.subplot(1,3,1)
plt.imshow(test_rgb)
plt.axis('off')
plt.subplot(1,3,2)
plt.imshow(S, cmap=plt.cm.binary_r)
plt.axis('off')
plt.subplot(1,3,3)
plt.imshow(test_rgb, alpha=0.6)
plt.imshow(Sprob[:,:,1]>.99, cmap=plt.cm.binary_r, alpha=0.4)
plt.axis('off')

Now let's try the classifier from an image taken at a different place

In [None]:
with fs.open('cdi-workshop/semseg_data/sandbars/RC0220Ra_20150219_1126.JPG', 'rb') as f:
    test_rgb = imread(f)

M_tst, N_tst, _ = test_rgb.shape

In [None]:
data = test_rgb.reshape(M_tst * N_tst, -1)[:,:]
sand_pred = gnb.predict(data)
S = sand_pred.reshape(M_tst, N_tst)

In [None]:
plt.figure(figsize=(25,15))
plt.subplot(1,3,1)
plt.imshow(test_rgb)
plt.axis('off')
plt.subplot(1,3,2)
plt.imshow(S, cmap=plt.cm.binary_r)
plt.axis('off')
plt.subplot(1,3,3)
plt.imshow(test_rgb, alpha=0.6)
plt.imshow(S, cmap=plt.cm.binary_r, alpha=0.4)
plt.axis('off')

Not great. Because naive Bayesian classifiers make such stringent assumptions about data, they will generally not perform as well as a more complicated model. 

## Naive Bayes with principal components

We can make things more complicated by
* adding more classes
* using feature extraction

Unlike before, we can build the feature extraction straight into the model using pipelines, which sequentially apply a list of transforms and a final estimator. In our case we'll use PCA as a transform again

In [None]:
from sklearn.decomposition import PCA 
from sklearn.pipeline import make_pipeline

pca = PCA(svd_solver='randomized', n_components=3, whiten=True, random_state=42)
model = make_pipeline(pca, gnb)

Arbitrary thresholds can be made to make classes based on intensity alone

In [None]:
data = training_rgb.reshape(M*N, -1)[:,:]

threshold_sand = 200
threshold_shadow = 60
threshold_rock = 100

mask = np.zeros((M,N))
mask[training_rgb[:,:,0] > threshold_sand] = 3
mask[training_rgb[:,:,0] < threshold_shadow] = 0
mask[(training_rgb[:,:,0] > threshold_rock) & (training_rgb[:,:,0] < threshold_sand) ] = 1
mask[(training_rgb[:,:,0] > threshold_shadow) & (training_rgb[:,:,0] < threshold_rock) ] = 2

In [None]:
plt.figure(figsize=(15,10))
plt.subplot(1,2,1)
plt.imshow(training_rgb)
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(mask, cmap=plt.cm.binary_r)
plt.axis('off')

In [None]:
target = mask.reshape(M*N)
model.fit(data, target)

Apply model to the test image and plot the result:

In [None]:
data = test_rgb.reshape(M_tst * N_tst, -1)[:,:]
sand_pred = model.predict(data)
S = sand_pred.reshape(M_tst, N_tst)

In [None]:
plt.figure(figsize=(25,15))
plt.subplot(1,3,1)
plt.imshow(test_rgb)
plt.axis('off')
plt.subplot(1,3,2)
plt.imshow(S) #, cmap=plt.cm.binary_r)
plt.axis('off')
plt.colorbar(shrink=0.1)
plt.subplot(1,3,3)
plt.imshow(test_rgb, alpha=0.6)
plt.imshow(S==3, cmap=plt.cm.binary_r, alpha=0.4)
plt.axis('off')

In this case, there wasn't much advantage using feature extraction with the NB model. Let's look at a different model

## Gaussian Mixture Model

Specifying thresholds is a big weakness. There are other approaches that attempt to estimate the decision boundaries between different classes. One example is a Gaussian Mixture Model.

In [None]:
from sklearn.mixture import GaussianMixture

We're going to use downscaled versions of images to speed up the process

In [None]:
from scipy.misc import imresize
with fs.open('cdi-workshop/semseg_data/sandbars/RC0307Rf_20131111_1347.JPG', 'rb') as f:
    training_rgb = imresize(imread(f), .125)
    
with fs.open('cdi-workshop/semseg_data/sandbars/RC0220Ra_20150219_1126.JPG', 'rb') as f:
    test_rgb = imresize(imread(f), .125)

Fit the model with 4 components to the data

In [None]:
M, N, _ = training_rgb.shape
data = training_rgb.reshape(M*N, -1)[:,:]
gmm = GaussianMixture(n_components=4, covariance_type="tied").fit(data)
labels = gmm.predict(data)

GMMs use an expectation–maximization approach which qualitatively does the following:

Choose starting guesses for the location and shape

Repeat until converged:

* E-step: for each point, find weights encoding the probability of membership in each cluster
* M-step: for each cluster, update its location, normalization, and shape based on all data points, making use of the weights

We'll only show every 10th data point to save time

In [None]:
plt.scatter(data[::10, 0], data[::10, 1], c=labels[::10], s=5, cmap='viridis');

Apply to the test image and plot

In [None]:
newdata = test_rgb.reshape(M * N, -1)[:,:]
cluster = gmm.predict(newdata)
cluster = cluster.reshape(M, N)

In [None]:
plt.figure(figsize=(15,15))
plt.subplot(1,3,1)
plt.imshow(test_rgb)
plt.axis('off')
plt.subplot(1,3,2)
plt.imshow(cluster) #, cmap=plt.cm.binary_r)
plt.axis('off')
plt.colorbar(shrink=0.1)
plt.subplot(1,3,3)
plt.imshow(test_rgb, alpha=0.6)
plt.imshow(cluster==3, cmap=plt.cm.binary_r, alpha=0.4)
plt.axis('off')

Because GMM contains a probabilistic model under the hood, it is also possible to find probabilistic cluster assignments—in Scikit-Learn this is done using the predict_proba method. This returns a matrix of size [n_samples, n_clusters] which measures the probability that any point belongs to the given cluster:

In [None]:
post_probs = gmm.predict_proba(newdata)
np.shape(post_probs)

In [None]:
plt.figure(figsize=(20,10))
plt.subplot(2,2,1)
plt.imshow(post_probs[:,3].reshape(M, N), cmap=plt.cm.bwr)
plt.axis('off')
plt.colorbar(shrink=0.5)
plt.title('Probability of Sand')

### How many components?

The fact that GMM is a generative model gives us a natural means of determining the optimal number of components for a given dataset. A generative model is inherently a probability distribution for the dataset, and so we can simply evaluate the likelihood of the data under the model, using cross-validation to avoid over-fitting. 

Another means of correcting for over-fitting is to adjust the model likelihoods using some analytic criterion such as the Akaike information criterion (AIC) or the Bayesian information criterion (BIC). Scikit-Learn's GMM estimator actually includes built-in methods that compute both of these, and so it is very easy to operate on this approach.

In [None]:
n_components = np.arange(2, 15)
models = [GaussianMixture(n, covariance_type='tied', random_state=0).fit(data)
          for n in n_components]

In [None]:
plt.plot(n_components, [m.bic(data) for m in models], 'k--o', label='BIC')
plt.plot(n_components, [m.aic(data) for m in models], 'r-s', label='AIC')
plt.legend(loc='best')
plt.xlabel('n_components');

In this case, AIC and BIC are the same

The optimal number of clusters is the value that minimizes the AIC or BIC. 

It says about 9 components would have been a better choice than 4

In [None]:
gmm = GaussianMixture(n_components=9, covariance_type="tied").fit(data)

In [None]:
newdata = test_rgb.reshape(M * N, -1)[:,:]
cluster = gmm.predict(newdata)
cluster = cluster.reshape(M, N)

We can look at the means for each cluster

In [None]:
print(gmm.means_[1])

In [None]:
print(gmm.means_[3])

In [None]:
sorted_means = np.argsort(np.mean(gmm.means_, axis=1))
sorted_means

In [None]:
plt.figure(figsize=(20,10))
plt.subplot(1,3,1)
plt.imshow(test_rgb)
plt.axis('off')
plt.subplot(1,3,2)
plt.imshow(cluster) 
plt.axis('off')
plt.colorbar(shrink=0.15)
plt.subplot(1,3,3)
plt.imshow(test_rgb, alpha=0.6)
plt.imshow((cluster==sorted_means[-1])  + (cluster==sorted_means[-2]), cmap=plt.cm.binary_r, alpha=0.4)
plt.axis('off')

Looks like one cluster represents lower beach and another cluster represents upper beach

What choices did we make to arrive at this result?
* which cluster corresponds to what feature
* number of components
* type of covariance

How well does this approach generalize to all sandbar images?

## Exercises

1. Training a classifier with ground truth imagery

We'll use the seabright dataset for this task. We have images and associated ground-truth labelled imagery

First, we'll load in the images (downsizng them to aid with speed)

In [None]:
images = [f for f in fs.ls('cdi-workshop/semseg_data/seabright/train') if f.endswith('.jpg')]
len(images)

Xtrain = []
for file in images:
    with fs.open(file, 'rb') as f:
        Xtrain.append(imresize(imread(f), .125))

Next we'll load the label images, contained in .mat files

We need to rescale these data the same way as we rescaled the imagery, but this time we also need to make sure that we keep the label images in whole integeres

In [None]:
from scipy.io import loadmat

classfiles = [f for f in fs.ls('cdi-workshop/semseg_data/seabright/train/gt') if f.endswith('.mat')]

ytrain = []
for file in classfiles:
    with fs.open(file) as f:
        dat = loadmat(f)['class']
        datr = np.round(imresize(dat, .125, interp='nearest')/255 * np.max(dat))
        ytrain.append(datr)

Always a good idea to plot some data just to check it is as you expect

In [None]:
plt.figure(figsize=(15,10))
plt.subplot(121); plt.imshow(Xtrain[0]); plt.axis('off')
plt.subplot(122); plt.imshow(ytrain[0]); plt.axis('off')

In [None]:
num_images, M, N, num_channels = np.shape(Xtrain)

Ok, go ahead and fit a Naive Bayes model to the first pair of images

Using that model, predict the labels for the next image in the sequence

Make a 4-part plot showing 
1. the original image 
2. the original label image
3. the model-predicted label image
4. the pixels associated with water

How well does it do?

In order to fit a model to the entire dataset, we need to arrange the data a little different

The model fitting function expected the image data to be arranged N_features x N_channels (3)

and the label data to be 

N_features 

In [None]:
X = []
for item in Xtrain:
    X.append(item.reshape(M*N, -1)[:,:])

Xtrain2 = np.vstack(X)

Y = []
for item in ytrain:
    Y.append(item.reshape(M*N))

Xtrain2 = np.vstack(X)
ytrain2 = np.hstack(Y)

In [None]:
np.shape(ytrain2)

In [None]:
np.shape(Xtrain2)

Create a new model and fit to this data

Fit the model to an image (you choose) and make a plot as before, exploring how well the model predicts various landcover classes