# Semantic segmentation with scikit-learn

> Written by Dr Daniel Buscombe, Northern Arizona University

> Part of a series of notebooks for image recognition and classification using deep convolutional neural networks

This notebook demonstrates some strategies for semantic image segmentation using common machine learning techniques

## Naïve Bayes classification

Naive Bayes models are a group of extremely fast and simple classification algorithms that are often suitable for very high-dimensional datasets. Because they are so fast and have so few tunable parameters, they end up being very useful as a quick-and-dirty baseline for a classification problem. This section will focus on an intuitive explanation of how naive Bayes classifiers work, followed by a couple examples of them in action on some datasets.

## Detection of sand using Naïve Bayes 

In this example, Naïve Bayes classification is employed to detect pixels corresponding to sand in images, based just in the pixels color.

Training data is a M×N×3 array representing a color training image, and mask a M×N binary array representing the classification sand/non-sand. 

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from imageio import imread

### Dataset

[Hoonhout et al 2015](https://www.sciencedirect.com/science/article/pii/S0378383915001313) "An automated method for semantic classification of regions in coastal images" Coastal Engineering 105, 1-12

Objective: develop a ML classifier for semantic segmentation of images of coasts

The full data repository is available [here](https://data.4tu.nl/repository/uuid:08400507-4731-4cb2-a7ec-9ed2937db119). It consists of several images from stationary camera monitoring the Netherlands coast

For each image there is an associated 'segments' and 'classes' file. I've just 1 example image and associated classes in the /data folder

In [None]:
import pickle ## pickle is a process of serializing an object into a byte stream, basically storing it in a different way which is more efficient 
## pickled files usually have .p or .pkl extensions
## see also cPickle

#load classes in
infile = 'data/1369558802.Sun.May.26_09_00_02.GMT.2013.jvspeijk.c2.snap.classes.pkl'

We need to read in two files to create a label image
These classes are strings of names of features, so we'll recode the class strings ('sky', etc) into numeric codes

In [None]:
def read_classes(infile):
    with open(infile, 'rb') as f:
        classes = pickle.load(f)  # the process for un-pickling a file
    #recast string classes to numeric codes   
    codes = np.unique(classes, return_inverse=True)[1].tolist()        
    return classes, codes

In [None]:
classes, codes = read_classes(infile)
print(len(classes))
print(np.unique(classes))

In the 'segments' file there should be 588 image segments (superpixels) each corresponding to the class in the 'classes' vector. 

Next we'll load in the 'segments' file. This binary file is encoded a little differently

We'll need to make a label image by allocating numeric class codes to each segment

In [None]:
def make_label_image(infile, codes):
    with open(infile, 'rb') as f:
        u = pickle._Unpickler(f)
        u.encoding = 'latin1'
        segments = u.load()
    
    # make label image out of segments
    for k in range(len(np.unique(segments))):
        segments[segments==k] = codes[k]  
    return segments

In [None]:
#load in image segments   (superpixels)
infile = infile.replace('classes', 'segments')
segments = make_label_image(infile, codes)

Let's plot the segments matrix to see what we're working with

In [None]:
plt.figure(figsize=(10,10))
plt.imshow(segments)
plt.colorbar(shrink=0.5)

Let's display that matrix ontop of the image

In [None]:
#load in image 
infile = infile.replace('.segments.pkl', '.jpg')
training_rgb = imread(infile)

In [None]:
print(np.shape(segments))
print(np.shape(training_rgb))
nrows_mask = np.shape(segments)[0]

Looks like there is a problem in the input data set: the label image is smaller than the input image!!

In [None]:
training_rgb = training_rgb[:nrows_mask,:,:]
print(np.shape(training_rgb))

In [None]:
cmap = plt.cm.get_cmap('RdYlBu', 6)    # 6 discrete colors

In [None]:
plt.figure(figsize=(10,10))
plt.imshow(training_rgb)
plt.imshow(segments, alpha=0.5, cmap=cmap)  # alpha is the opacity of an image alpha = 1 is totally opaque, alpha = 0 is transparent 
cb = plt.colorbar(shrink=0.5)
cb.set_ticks(np.arange(6))
cb.set_ticklabels(np.unique(classes))

In [None]:
## 2 is associated with sand (I can see that from the colorbar above, remember, python indexes from zero)
mask = (segments==2).astype('int')
# in this case, the 'mask' is going to be a binary matrix with zeros for non-sand and ones for sand pixels

In [None]:
plt.figure(figsize=(15,15))
plt.subplot(1,2,1)
plt.imshow(training_rgb)
plt.title('Input')
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(mask, cmap=plt.cm.binary_r)
plt.title('Threshold')
plt.axis('off')

In [None]:
M, N, _ = np.shape(training_rgb)

It's generally good practice to scale imagery so classifiers are relatively insensitive to brightness and contrast in training imagery

We'll standardize imagery so it has zero mean and unit variance

In [None]:
from sklearn import preprocessing
training_rgb_scaled = preprocessing.scale(training_rgb[:,:,0])

plt.figure(figsize=(15,15))
plt.subplot(1,2,1)
plt.imshow(training_rgb[:,:,0])
plt.title('Input')
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(training_rgb_scaled)
plt.title('Standardized')
plt.axis('off')

In [None]:
data_train = training_rgb_scaled.reshape(M*N, -1)[:,:]
print(np.shape(data_train))

The classification used in the learning step is represented as a binary MN vector

In [None]:
target = mask.reshape(M*N)
target

### Training (fitting)

Sklearn provides a naive_bayes module containing a GaussianNB object that implements the supervised learning by the Gaussian Naïve Bayes method. 

One extremely fast way to create a simple model is to assume that the data is described by a Gaussian distribution with no covariance between dimensions. This model can be fit by simply finding the mean and standard deviation of the points within each label, which is all you need to define such a distribution.

In [None]:
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(data_train, target)

In [None]:
gnb.class_prior_

The priors are just the relative abundances of each class (no sand and sand)

### Testing

Sand detection can be performed by reshaping and slicing in the same way as the training image. 

The predict method of GaussianNB performs the classification. The resulting classification vector can be reshaped to the original image dimensions for visualization.

Let's test at an image from the same place but at a different time

In [None]:
infile = 'data/1367407802.Wed.May.01_11_30_02.GMT.2013.jvspeijk.c2.snap.jpg' 
test_rgb = imread(infile)
# remember, nrows_mask is the number of rows in the mask, so we are cropping the image to have the same number of rows as the mask
test_rgb = test_rgb[:nrows_mask,:,:]
test_rgb = preprocessing.scale(test_rgb[:,:,0]) # we are using the slicing [:,:,0] to indicate we are testing the first image. 
# if we wanted to test the 10th image, that would be test_rgb[:,:,9] (python indexes from zero)
M_tst, N_tst = test_rgb.shape

Next we'll read in the classes and segments and make a label image like we did before

In [None]:
infile = infile.replace('snap.jpg', 'snap.classes.pkl')
classes, codes = read_classes(infile)

In [None]:
infile = infile.replace('classes', 'segments')
print(infile)
mask = make_label_image(infile, codes)==2

Reshape the input data

In [None]:
data_test = test_rgb.reshape(M_tst * N_tst, -1)[:,:]

And run the model we trained on the first image

In [None]:
sand_pred = gnb.predict(data_test)
S = sand_pred.reshape(M_tst, N_tst)

In [None]:
plt.figure(figsize=(15,15))
plt.subplot(1,3,1)
plt.imshow(test_rgb)
plt.title('New Image')
plt.axis('off')
plt.subplot(1,3,2)
plt.imshow(S, cmap=plt.cm.binary_r)
plt.title('Model Prediction')
plt.axis('off')
plt.subplot(1,3,3)
plt.imshow(mask, cmap=plt.cm.binary_r)
plt.title('Target')
plt.axis('off')

We can also estimate the probabilities of each class because we have a simple recipe to compute the likelihood $P({\rm features}~|~L_1)$ for any data point, and thus we can quickly compute the posterior ratio and determine which label is the most probable for a given point.

In [None]:
sand_pred = gnb.predict_proba(data_test)
Sprob = sand_pred.reshape(M_tst, N_tst, -1)

And plot the probabilities:

In [None]:
plt.figure(figsize=(15,15))
plt.subplot(1,2,1)
plt.imshow(Sprob[:,:,1], cmap=plt.cm.bwr, vmax=1)
plt.axis('off')
plt.colorbar(shrink=0.25)
plt.title('Probability of Sand')

In [None]:
plt.figure(figsize=(15,15))
plt.subplot(1,3,1)
plt.imshow(test_rgb)
plt.title('Test Image')
plt.axis('off')
plt.subplot(1,3,2)
#plt.imshow(test_rgb, alpha=0.6)
plt.imshow(Sprob[:,:,1]>.3, cmap=plt.cm.binary_r)
plt.title('Prob. (Sand) > 0.3')
plt.axis('off')
plt.subplot(1,3,3)
plt.imshow(mask, cmap=plt.cm.binary_r)
plt.title('Target')
plt.axis('off')

Because naive Bayesian classifiers make such stringent assumptions about data, they will generally not perform as well as a more complicated model. 

What if we adjust the priors? Say, that there is a 50% prior likelihood of sand pixels

In [None]:
gnb = GaussianNB(priors=[0.5,0.5])
gnb.fit(data_train, target)

In [None]:
sand_pred = gnb.predict_proba(data_test)
Sprob = sand_pred.reshape(M_tst, N_tst, -1)

In [None]:
plt.figure(figsize=(15,15))
plt.subplot(1,2,1)
plt.imshow(Sprob[:,:,1], cmap=plt.cm.bwr, vmax=1)
plt.axis('off')
plt.colorbar(shrink=0.25)
plt.title('Probability of Sand')

Much higher probability of sand, as expected

In [None]:
plt.figure(figsize=(15,15))
plt.subplot(1,3,1)
plt.imshow(test_rgb)
plt.title('Test Image')
plt.axis('off')
plt.subplot(1,3,2)
#plt.imshow(test_rgb, alpha=0.6)
plt.imshow(Sprob[:,:,1]>.6, cmap=plt.cm.binary_r)
plt.title('Prob. (Sand) > 0.6')
plt.axis('off')
plt.subplot(1,3,3)
plt.imshow(mask, cmap=plt.cm.binary_r)
plt.title('Target')
plt.axis('off')

## Naive Bayes with principal components

We can make things more complicated by
* adding more classes
* using feature extraction

Unlike before, we can build the feature extraction straight into the model using pipelines, which sequentially apply a list of transforms and a final estimator. In our case we'll use PCA as a transform again

In [None]:
from sklearn.decomposition import PCA 
from sklearn.pipeline import make_pipeline

gnb = GaussianNB()

pca = PCA(svd_solver='randomized', n_components=1, whiten=True, random_state=42) 
model = make_pipeline(pca, gnb)

In [None]:
plt.figure(figsize=(15,10))
plt.subplot(1,2,1)
plt.imshow(test_rgb)
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(segments, cmap=plt.cm.binary_r)
plt.axis('off')

In [None]:
target = segments.reshape(M*N)
model.fit(data_test, target)

In [None]:
sand_pred = model.predict(data_train)
S = sand_pred.reshape(M, N)

In [None]:
plt.figure(figsize=(25,15))
plt.subplot(1,2,1)
plt.imshow(segments==2, plt.cm.binary_r)
plt.axis('off')
plt.title('Sand')
plt.subplot(1,2,2)
plt.imshow(S==2, plt.cm.binary_r)
plt.axis('off')
plt.title('Sand prediction')

In this case, there wasn't much advantage using feature extraction with the NB model. Let's look at a different model

## Gaussian Mixture Model

There are other approaches that attempt to estimate the decision boundaries between different classes. One example is a Gaussian Mixture Model.

![](figs/gmm.png)

In [None]:
from sklearn.mixture import GaussianMixture

We're going to use downscaled versions of images to speed up the process

In [None]:
from scipy.misc import imresize

infile = 'data/1367407802.Wed.May.01_11_30_02.GMT.2013.jvspeijk.c2.snap.jpg' 
test_rgb = imread(infile)

infile = 'data/1369558802.Sun.May.26_09_00_02.GMT.2013.jvspeijk.c2.snap.jpg'
training_rgb = imread(infile)

training_rgb = imresize(training_rgb, .125)
test_rgb = imresize(test_rgb, .125)

In [None]:
plt.figure(figsize=(15,15))
plt.subplot(1,2,1)
plt.imshow(training_rgb)
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(test_rgb)
plt.axis('off')

Fit the model with 4 components to the data

In [None]:
M, N, _ = training_rgb.shape
data = training_rgb.reshape(M*N, -1)[:,:]
gmm = GaussianMixture(n_components=4, covariance_type="tied").fit(data)
labels = gmm.predict(data)

GMMs use an expectation–maximization approach which qualitatively does the following:

Choose starting guesses for the location and shape

Repeat until converged:

* E-step: for each point, find weights encoding the probability of membership in each cluster
* M-step: for each cluster, update its location, normalization, and shape based on all data points, making use of the weights

We'll only show every 10th data point to save time

In [None]:
plt.scatter(data[::10, 0], data[::10, 1], c=labels[::10], s=5, cmap='viridis');

Apply to the test image and plot

In [None]:
newdata = test_rgb.reshape(M * N, -1)[:,:]
cluster = gmm.predict(newdata)
cluster = cluster.reshape(M, N)

In [None]:
plt.figure(figsize=(15,15))
plt.subplot(1,3,1)
plt.imshow(test_rgb)
plt.axis('off')
plt.subplot(1,3,2)
plt.imshow(cluster) #, cmap=plt.cm.binary_r)
plt.axis('off')
plt.colorbar(shrink=0.1)
plt.subplot(1,3,3)
plt.imshow(test_rgb, alpha=0.6)
plt.imshow(cluster==1, cmap=plt.cm.binary_r, alpha=0.4)
plt.axis('off')

Because GMM contains a probabilistic model under the hood, it is also possible to find probabilistic cluster assignments—in Scikit-Learn this is done using the predict_proba method. This returns a matrix of size [n_samples, n_clusters] which measures the probability that any point belongs to the given cluster:

In [None]:
post_probs = gmm.predict_proba(newdata)
np.shape(post_probs)

In [None]:
plt.figure(figsize=(20,10))
plt.subplot(2,2,1)
plt.imshow(post_probs[:,1].reshape(M, N), cmap=plt.cm.bwr)
plt.axis('off')
plt.colorbar(shrink=0.5)
plt.title('Probability of Sand')

### How many components?

The fact that GMM is a generative model gives us a natural means of determining the optimal number of components for a given dataset. A generative model is inherently a probability distribution for the dataset, and so we can simply evaluate the likelihood of the data under the model, using cross-validation to avoid over-fitting. 

Another means of correcting for over-fitting is to adjust the model likelihoods using some analytic criterion such as the Akaike information criterion (AIC) or the Bayesian information criterion (BIC). Scikit-Learn's GMM estimator actually includes built-in methods that compute both of these, and so it is very easy to operate on this approach.

In [None]:
n_components = np.arange(2, 10)
models = [GaussianMixture(n, covariance_type='tied', random_state=0).fit(data)
          for n in n_components]

In [None]:
plt.plot(n_components, [m.bic(data) for m in models], 'k--o', label='BIC')
plt.plot(n_components, [m.aic(data) for m in models], 'r-s', label='AIC')
plt.legend(loc='best')
#plt.yscale('log')
plt.xlabel('n_components');

In this case, AIC and BIC are virtually the same

The optimal number of clusters is the value that minimizes the AIC or BIC. 

It says about 8 components would have been a better choice than 4


In [None]:
gmm = GaussianMixture(n_components=8, covariance_type="tied").fit(data)

In [None]:
newdata = test_rgb.reshape(M * N, -1)[:,:]
cluster = gmm.predict(newdata)
cluster = cluster.reshape(M, N)

We can look at the means for each cluster

In [None]:
print(gmm.means_[1])

In [None]:
print(gmm.means_[3])

In [None]:
sorted_means = np.argsort(np.mean(gmm.means_, axis=1))
sorted_means

In [None]:
plt.figure(figsize=(20,10))
plt.subplot(1,3,1)
plt.imshow(test_rgb)
plt.axis('off')
plt.subplot(1,3,2)
plt.imshow(cluster) 
plt.axis('off')
plt.colorbar(shrink=0.25)
plt.subplot(1,3,3)
plt.imshow(test_rgb, alpha=0.6)
plt.imshow((cluster==sorted_means[-1])  
           + (cluster==sorted_means[-2])
           + (cluster==sorted_means[-3]), 
           cmap=plt.cm.binary_r, alpha=0.4)
plt.axis('off')

Looks like one cluster represents lower beach and another cluster represents upper beach

What choices did we make to arrive at this result?
* which cluster corresponds to what feature
* number of components
* type of covariance

How well does this approach generalize to other images?

In [None]:
infile = 'data/1368104406.Thu.May.09_13_00_06.GMT.2013.egmond.c5.snap.jpg'
test_rgb2 = imread(infile)

test_rgb2 = imresize(test_rgb2, .125)
M, N, _ = np.shape(test_rgb2)

In [None]:
newdata = test_rgb2.reshape(M * N, -1)[:,:]
cluster = gmm.predict(newdata)
cluster = cluster.reshape(M, N)

In [None]:
plt.figure(figsize=(20,10))
plt.subplot(1,2,1)
plt.imshow(test_rgb2)
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(cluster) 
plt.axis('off')
plt.colorbar(shrink=0.25)
plt.axis('off')

Not very well.

## Exercises

1. Training a classifier with ground truth imagery

We'll use the seabright dataset for this task. We have images and associated ground-truth labelled imagery

First, we'll load in the images (downsizng them to aid with speed)

In [None]:
import s3fs
fs = s3fs.S3FileSystem(anon=True)

root = 'esipfed/cdi-workshop'

images = [f for f in fs.ls(root+'/semseg_data/seabright/train') if f.endswith('.jpg')]
len(images)

Xtrain = []
for file in images:
    with fs.open(file, 'rb') as f:
        Xtrain.append(imresize(imread(f), .125))

Next we'll load the label images, contained in .mat files

We need to rescale these data the same way as we rescaled the imagery, but this time we also need to make sure that we keep the label images in whole integeres

In [None]:
from scipy.io import loadmat

classfiles = [f for f in fs.ls(root+'/semseg_data/seabright/train/gt') if f.endswith('.mat')]

ytrain = []
for file in classfiles:
    with fs.open(file) as f:
        dat = loadmat(f)['class']
        datr = np.round(imresize(dat, .125, interp='nearest')/255 * np.max(dat))
        ytrain.append(datr)

Always a good idea to plot some data just to check it is as you expect

In [None]:
plt.figure(figsize=(15,10))
plt.subplot(121); plt.imshow(Xtrain[0]); plt.axis('off')
plt.subplot(122); plt.imshow(ytrain[0]); plt.axis('off')

In [None]:
num_images, M, N, num_channels = np.shape(Xtrain)

Ok, go ahead and fit a Naive Bayes model to the first pair of images

Using that model, predict the labels for the next image in the sequence

Make a 4-part plot showing 
1. the original image 
2. the original label image
3. the model-predicted label image
4. the pixels associated with water

How well does it do?

In order to fit a model to the entire dataset, we need to arrange the data a little different

The model fitting function expected the image data to be arranged N_features x N_channels (3)

and the label data to be 

N_features 

In [None]:
X = []
for item in Xtrain:
    X.append(item.reshape(M*N, -1)[:,:])

Xtrain2 = np.vstack(X)

Y = []
for item in ytrain:
    Y.append(item.reshape(M*N))

Xtrain2 = np.vstack(X)
ytrain2 = np.hstack(Y)

In [None]:
np.shape(ytrain2)

In [None]:
np.shape(Xtrain2)

Create a new model and fit to this data

Fit the model to an image (you choose) and make a plot as before, exploring how well the model predicts various landcover classes