# Evaluating semantic classifications

> Written by Dr Daniel Buscombe, Northern Arizona University

> Part of a series of notebooks for image recognition and classification using deep convolutional neural networks

This notebook demonstrates how to evaluate how good your retrained DCNN model is at semantic segmentation (classifying image pixels)

![](figs/dl_tools_eval_semseg.png)

As usual, we'll load some libraries

In [None]:
import os
from scipy.io import loadmat
from glob import glob
import matplotlib.pyplot as plt
import numpy as np
import itertools

In [None]:
import s3fs
fs = s3fs.S3FileSystem(anon=True)

Here's our usual confusion matrix plotting function

In [None]:
## =========================================================
def plot_confusion_matrix2(cm, classes, normalize=False, cmap=plt.cm.Blues, dolabels=True):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        cm[np.isnan(cm)] = 0

    cm = cm[:len(classes),:len(classes)]    
        
    plt.imshow(cm, interpolation='nearest', cmap=cmap, vmax=1, vmin=0)
    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    if dolabels==True:
       tick_marks = np.arange(len(classes))
       plt.xticks(tick_marks, classes, fontsize=8) # rotation=45
       plt.yticks(tick_marks, classes, fontsize=8)

       plt.ylabel('True label',fontsize=6)
       plt.xlabel('Estimated label',fontsize=6)

    else:
       plt.axis('off')

    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        if cm[i, j]>0:
           plt.text(j, i, format(cm[i, j], fmt),
                 fontsize=8,
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")
    plt.tight_layout()

    return cm

Here's the google download function we saw in the last exercise

In [None]:
import requests

def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)    

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value

    return None

def save_response_content(response, destination):
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)

We'll take some files from the google drive

In [None]:
destination = 'gdrive_downloads'
import os
os.mkdir(destination)

### Load model file

In [None]:
##https://drive.google.com/open?id=1-NPqL7dPEQ3Q87lvDkeBtKYdA0ryhSaT
file_id = '1-NPqL7dPEQ3Q87lvDkeBtKYdA0ryhSaT'
classifier_file = destination+os.sep+'monterey96_gdrive.pb'

download_file_from_google_drive(file_id, classifier_file)

### Load ground truth class file

In [None]:
##https://drive.google.com/open?id=1MXaPbb-nPBg4y2LmKLJ0Q-jmQY91VVl0
file_id = '1MXaPbb-nPBg4y2LmKLJ0Q-jmQY91VVl0'

gt_matfile = destination+os.sep+'monterey_example_gt.mat'
download_file_from_google_drive(file_id, gt_matfile)

### Load image file

In [None]:
##https://drive.google.com/open?id=1G4deh3kOUNtPFsV0yWSngsM4TVth1-6C
file_id = '1G4deh3kOUNtPFsV0yWSngsM4TVth1-6C'

testimage = destination+os.sep+'monterey_example.jpg'
download_file_from_google_drive(file_id, testimage)

### Load label file

In [None]:
##https://drive.google.com/open?id=1qSDLER1IuFGG35PZJyPI05tqh2YQOjkV
file_id = '1qSDLER1IuFGG35PZJyPI05tqh2YQOjkV'

labels_path = destination+os.sep+'monterey_labels.txt'
download_file_from_google_drive(file_id, labels_path)

### Load colors file

In [None]:
##https://drive.google.com/open?id=128INeq_qJ6y9p7WtHnVk9YNrnSL5hSxS
file_id = '128INeq_qJ6y9p7WtHnVk9YNrnSL5hSxS'

colors_path = destination+os.sep+'monterey_colors.txt'
download_file_from_google_drive(file_id, colors_path)

Take a look in the ```gdrive_downloads``` folder to make sure they are all in there

### Run pixelwise prediction

Define input parameters

In [None]:
tile = 96 ## the size of the tile (corresponds to the size used to train the model)
winprop = 1.0 # the proportion of each tile to use as input to the CRF
prob_thres = 0.5 # threshold probability. Below this, DCNN classifications are ignored
n_iter = 20 # number of iterations in CRF model
compat_col = 100 # compatibility function (color)
theta = 60 #std deviation terms (color and spatial)
scale = 1 # weight term in CRF
decim = 16 # 1/proportion of image to use in DCNN
fct =  0.125 # scale of image to use. If <1, image will be downsclaed to that fraction
compat_spat = 5 #compatability function (spatial)
prob = 0.5 # the likelihood of the CRF unary potentials

Run the pixelwise classifier

In [None]:
%run ./semseg_cnn_crf.py $testimage $classifier_file $labels_path $colors_path $tile $prob_thres $prob $decim $fct

### Evaluate classification pixel-by-pixel

Okay, let's compare pixel by pixel

First, we'll load the 'ground truth' label image

In [None]:
c = loadmat(gt_matfile)['class']
print(np.shape(c))

Next, the estimate that we just generated

In [None]:
est_matfile = 'monterey_example_ares_96.mat'
a = loadmat(est_matfile)['class']
print(np.shape(a))

In [None]:
alabs = loadmat(est_matfile)['labels']
clabs = loadmat(gt_matfile)['labels']

alabs = [label.replace(' ','') for label in alabs]
clabs = [label.replace(' ','') for label in clabs]
cind = [clabs.index(x) for x in alabs]
aind = [alabs.index(x) for x in alabs]

In [None]:
# the following code deals with the eventuality that the ground truth and estimate have different order of numeric codes
Cmaster = np.zeros((len(alabs), len(alabs)))
c2 = c.copy()
for kk in range(len(aind)):
    if cind[kk] != aind[kk]:
        c2[c==cind[kk]] = aind[kk] 
del c

Let's compare visually

To do this we'll load in the image and colors

In [None]:
img = imread(testimage)

with open(colors_path) as f: #'labels.txt') as f:
   cols = f.readlines()
cmap1 = [x.strip() for x in cols] 
cmap1 = colors.ListedColormap(cmap1)

plt.figure(figsize=(20,10))
plt.subplot(121)
plt.imshow(img)
plt.imshow(a, cmap=cmap1, alpha=0.5)
plt.axis('off')
plt.title('Ground truth')

plt.subplot(122)
plt.imshow(img)
plt.imshow(c2, cmap=cmap1, alpha=0.5)
plt.axis('off')
plt.title('Estimate')

In [None]:
nx, ny = np.shape(a)
print('Accuracy is '+str(np.sum((a==c2).flatten()/(nx*ny)))[:5])

Another way to look at things is to see the difference image

In [None]:
plt.figure(figsize=(20,10))
plt.subplot(121)
plt.imshow(a - c2, cmap='bwr')
plt.colorbar(shrink=0.5)
plt.axis('off')

plt.subplot(122)
plt.imshow(a == c2, cmap=plt.cm.binary_r)
plt.axis('off')
plt.title('Proportion correct:' + str(np.sum((a==c2).flatten()/(nx*ny)))[:5])

Let's create a confusion matrix to look at class-by-class comparisons

In [None]:
n = len(alabs)
cm = np.zeros((n,n))
for amat, pmat in zip(a.flatten(), c2.flatten()):
    cm[amat][pmat] += 1

Plot the confusion matrix

In [None]:
fig = plt.figure(figsize=(15,15))
_ = plot_confusion_matrix2(cm, classes=alabs, normalize=True, cmap=plt.cm.Reds)

There are a couple of things to remember:
1. there is error in the ground truth label
2. while the overall pattern is qualitatively very similar, large numbers of individual pixels can still be misclassified

These effects are exacerbated by 
* large numbers of classes (especially very similar classes)
* large spatial heterogeneity

The ```precision_recall_fscore_support``` function (part of scikit-learn) doesn't do well when there aren't examples of pixels for each class

It basically includes zeroes in the average. Let's look at the effect:

In [None]:
from sklearn.metrics import precision_recall_fscore_support

e = precision_recall_fscore_support(a.flatten(), c2.flatten())

p = np.mean(e[0])
r = np.mean(e[1])
f = np.mean(e[2])
print('mean precision: %f' %(p))
print('mean recall: %f' %(r))
print('mean f-score: %f' %(f))

We can write our own function that computes all three, ignoring classes with no support

Let's remind ourselves of the formulae for precision, recall, and F1-score

$P=  \frac{TP}{(TP+FP)}$

$R=  \frac{TP}{(TP+FN)}$

$F=2\times \frac{(P \times R)}{(P+R)}$

In [None]:
def get_stats(cm):
    
    m, n = np.shape(cm)
    
    TP = []
    for x in range(n):
        TP.append(cm[x, x])
    
    FP = []
    for x in range(n):
        FP.append(sum(cm[:, x])-cm[x, x])

    FN = []
    for x in range(n):
        FN.append(sum(cm[x, :], 2)-cm[x, x])    
        
    tp = np.asarray(TP)
    fp = np.asarray(FP)
    fn = np.asarray(FN)    
        
    p = tp/(tp+fp)
    p[p==0] = np.nan

    r = tp/(tp+fn)
    r[r==0] = np.nan    

    p = np.nanmean(p)
    r = np.nanmean(r)
    
    f = 2*((p*r)/(p+r))
    
    return p, r, f

In [None]:
p, r, f = get_stats(cm)
print('mean precision: %f' %(p))
print('mean recall: %f' %(r))
print('mean f-score: %f' %(f))

Big difference. Be mindful

### Tidy up

In [None]:
!rm -rf gdrive_downloads
!rm monterey_example_ares_96.mat
!rm monterey_example_ares_96.png

## DL-tools

The equivalent function in DL-tools is called and is the same as used here, i.e.

```python eval_semseg\test_pixels.py```

You are asked to select a directory that contains the ground truth and estimates class files (*.mat)