<h1 align="center">Advanced Image Processing and Analysis</h1>
<h3 align="center">ECE 4438B/ECE 9022/ECE 9202B/BIOMED 9519B/BIOPHYS 9519B/CAMI 9519B</h3>
<h4 align="center"><a href="mailto:echen29@uwo.ca?subject=Day 15: Segmentation Evaluation Lecture"> Elvis Chen, PhD, LL</a></h4>
<h4 align="center">Day 15, March 04, 2019</h4>

In [None]:
import SimpleITK as sitk

import numpy as np

%run update_path_to_download_script
from downloaddata import fetch_data as fdata
import matplotlib.pyplot as plt
%matplotlib inline

from ipywidgets import interact, fixed
from myshow import myshow, myshow3d

### Similarity Metrics in Detail

Let us revisit the Similarity metrics in more detail, as they will be used again when we are learning about image registration (and the evaluation of).  Define the following sets:

* $I$ be the image,
* $S$, or the *Source*, be a segmented region as the output of an algorithm to be evaluated, and
* $T$, or the *Target*, be a reference segmentation of which we are using as a gold standard.

Generally speacking, $S$ and $T$ are not identical:
<img src="VennDiag.png" width="650"/>

Using $T$ as the reference, we can further define the following:
* The intersection: $ S \bigcap T$. This is the set of pixels labelled as the foreground by the algorithm, and the reference data agrees with it.  In other words, $S \bigcap T$ is the **True Positive (TP)**,
* The Union: $S \bigcup T$. This is the set of pixels labelled as the foreground **either** by the algorithm **or** by the reference segmentation,
  * Conversely, $I - S \bigcap T$ is the set of the pixel labelled as the background by our algorithm, and the reference data agrees with it.  That is: $I - S \bigcap T$ is the **True Negative**.
* The Set Difference of $S$ and $T$: $S \setminus T$. This is the set of pixels labelled as the foreground by the algorithm but as background by the reference segmentation. That is, $S \setminus T$ is the **False Positive**, and
* The Set Difference of $T$ and $S$: $T \setminus S$. This is the set of pixels labelled as the background by the algorithm but as foreground by the reference segmentation. That is, $T \setminus S$ is the **False Negative**.

<img src="ConfusionMatrix.PNG" width="650"/>

Another way to visualize these relation is the **Confusion Table**:
<img src="ConfusionMatrixTable.PNG" width="650"/>

Using these notation, we can further define the following metrics:

[Jaccard similarity coefficient](https://en.wikipedia.org/wiki/Jaccard_index), AKA **Intersection over Union**


$J(S,T) = \frac{| S \bigcap T |}{| S \bigcup T |} = \frac{| S \bigcap T |}{ | S | + | T | - | S | \bigcap | T |}$ 

and

[Dice Similarit Cofficient](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient) is another area based measure of similarity:

$DSC(S,T) = \frac{ 2 | S \bigcap T | }{ | S | + | T | }$

where the operator $| |$ denotes the pixel counts within the set. Both the Jaccard and Dice metrics have a range between $[0,1]$. In fact, they are monotonic to each other: Jaccard metric is always larger than Dice metric except at the extrema $\{0,1\}$ where they are equal.

### Sensitivity and Specificity

While Dice metric (and by extension, Jaccard index) measures the amount of the overlapping between the segmentated result and what we assumed to be the good answer (**note**: I am still not using the word *ground truth*), it does not tell us anything about the *shape* or the edges of the segmented region compared to the good answer:

<img src="OverUnder.PNG" width="550"/>

The image pair shown below is a case of *over-segmentation* (left) versus a different algorithm that did not suffer from the over-segmentation problem.

![An example of over segmentation](https://www.researchgate.net/profile/Ashraf_Aly_H/publication/267753061/figure/fig4/AS:295575825207311@1447482176812/Example-image-from-grayscale-images-a-Over-segmentation-problem-by-using-Active-Contour.png)

Two other metrics often used is the **True Positive Rate (TPR)** (also known as *sensitivity* and *recall*), and the **True Negative Rate (TNR)** (also known as Specificity), define as:

**True Positive Rate (TPR)** (Sensitivity/Recall), 

$\frac{TP}{TP+FN}$ or $\frac{ | S \bigcap T |}{ | T |}$

and
**True Negative Rate (TPR)** (Specificity)

$\frac{TN}{TN+FP}$ 

To understand what Sensitivity (TPR) and Specificity (TNR) mean, imaging you have a car alarm:
* True Positive (TP) means the alarm is sound and there is a break-in,
* False Positive (FP) means the alarm is sound but there is NO break-in,
* False Negative (FN) means the alarm is NOT sound but there IS a break-in, and
* True Negative (TN) means the alarm is not sound and there is no break-in.

Obviously having high TP and high TN is good, high FP and hight FN is not desired.

Supose the sensitivity of the proximity sensor can be adjusted. If it is VERY sensitive:
* True Positive goes up: you catch all the break-in, BUT
* False Positive goes up as well (cars driving by will trigger the alarm),
* False Negative goes down,
* True Negative does down.

Thus, **Sensitivity**, or True Positive Rate, is the extent to which true foreground pixels are correctly identified (so false negatives are few).  **If the segmentation is sensitive, then it is good at identifying the foreground pixels.**

Specificity, on the other hand, relates to how good a segmentation at identify the background pixels. It is the proportion of known background pixtures (True Negative) to those evaluated to be background. 

Questions: What are the sensitivity/specificity for the over/under segmentation cases?

Question: What does it mean that an segmentation is both sensitive and specific?

#### examples

In [None]:
# load an image: CT of a head
img_ct = sitk.ReadImage(fdata("cthead1.png"))

# to visualize the labels image in RGB needs an image with 0-255 range
#
# we don't need to do this for THIS particular image (it is already rescaled).
img_255 = sitk.Cast(sitk.RescaleIntensity(img_ct), sitk.sitkUInt8)

myshow(img_ct)

In [None]:
seed = (100, 170)
seg_connectedThreshold = sitk.Image(img_ct.GetSize(), sitk.sitkUInt8)
seg_connectedThreshold.CopyInformation(img_ct)
seg_connectedThreshold[seed] = 1
seg_connectedThreshold = sitk.BinaryDilate(seg_connectedThreshold, 3)
myshow(sitk.LabelOverlay(img_255, seg_connectedThreshold), "Initial Seed")

In [None]:
seg_connectedThreshold = sitk.ConnectedThreshold(img_ct, seedList=[seed], lower=230, upper=255)
myshow(sitk.LabelOverlay(img_255,seg_connectedThreshold), "Connected Threshold")

In [None]:
seg_confidenceConnected = sitk.ConfidenceConnected(img_ct, 
                                                   seedList=[seed], 
                                                   numberOfIterations = 10, 
                                                   multiplier = 2.5, 
                                                   initialNeighborhoodRadius = 3, 
                                                   replaceValue = 1)
myshow(sitk.LabelOverlay( img_255, seg_confidenceConnected), "Conficence Connected Threshold")

Visually, the result of the ConfidenceConnected segmentation is an *under* segmentation of the ConnectedThreshold technique.

In [None]:
## Jaccard and Dice metrics
overlap_measures_filter = sitk.LabelOverlapMeasuresImageFilter()
overlap_measures_filter.Execute( seg_confidenceConnected, seg_connectedThreshold )
print( overlap_measures_filter.GetJaccardCoefficient() )
print( overlap_measures_filter.GetDiceCoefficient() )

In [None]:
## TNR an

print( overlap_measures_filter.GetFalseNegativeError() )
print( overlap_measures_filter.GetFalsePositiveError() )
print( overlap_measures_filter.GetUnionOverlap())
print( overlap_measures_filter.GetMeanOverlap())

In [None]:
## Jaccard and Dice metrics
overlap_measures_filter = sitk.LabelOverlapMeasuresImageFilter()
overlap_measures_filter.Execute( seg_connectedThreshold, seg_confidenceConnected  )
print( overlap_measures_filter.GetJaccardCoefficient() )
print( overlap_measures_filter.GetDiceCoefficient() )

print( overlap_measures_filter.GetFalseNegativeError() )
print( overlap_measures_filter.GetFalsePositiveError() )

### Volume Similarity

The volume similarity coefficient, $VS$, is a measure of the similarity between the source and target volumes. Although this measure does not reflect segmentation (or registration, as we will see later in the course) accuracy (i.e. source and target regions can be disjoint and still have qual volumes), it is a conventional measure included for retrospective evaluation of prior studies. It is equal to the differences between two volumes (or areas in 2D) divided by their mean volume (area):

$VS = \frac{| S | - | T |}{| S | + | T |}$

(There are many definitions for VS: as an [example](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4533825/pdf/12880_2015_Article_68.pdf). The definition we use is what is implemented in [ITK/SimpleITK](http://www.insight-journal.org/browse/publication/707)

In [None]:
print(overlap_measures_filter.GetVolumeSimilarity())

**Question**: What does negative VS mean?

**Question**: does the input order to sitk.LabelOverlapMeasuresImageFilter() matter for VS?

In [None]:
## enter your codes to answer the above questions.

### Hausdorff Distance

Housdorff distance (HD) is a spatial distance based metric widely used in the evaluation of image segmentation as a *dissimilarity measure*. Used in conjunction to overlapping metrics, spatial distance based metrics are recommended when the segmentation overall accuracy, e.e. contour, of the segmentation is of importance.

Suppose we got two curves/contours

<img src="Hausdorff.png" width="550"/>

where $R_n$ is a point on the red curve and $G_n$ is a point on the green curve, respectively. Then the Hausdorff distance between the point sets $R$ and $G$ is defined by:

$HD(G,R) = max( h(G,R), h(R,G) )$

where $h(g,r)$ is called the *directed* Hausdorff distance and is given by:

$h(g,r) = \max_{g\in G} \min_{r \in R} \| g - r \|$

where $\| g - r \|$ is some distance norm (e.g. Euclidean distance).

In this particular example,

$HD(G,R) = \|G_4 - R_4 \|$.  **WHY**?

In [None]:
## work it out by a simple enumeration.

**Question**: does $h(G,R)=h(R,G)$?

Hausdorff distance gives us another perspective to the **shape** of the segmentation. If two segmentations are similar in shape and in proximity, their HD will be small. Using the above example where one label is entirely inside the other, The Dice Coefficient will be the same but the Hausdorff distance will be different depending on the location of the label.

In [None]:
hausdorff_distance_filter = sitk.HausdorffDistanceImageFilter()
hausdorff_distance_filter.Execute(seg_confidenceConnected, seg_connectedThreshold)
print(hausdorff_distance_filter.GetHausdorffDistance())

**Question**: does the input order to  sitk.LabelOverlapMeasuresImageFilter matter for Hausdorff Distance?

**Question**: What is the unit here?

In [None]:
## enter your code here.

In [None]:
segmented_distance_map = sitk.Abs(sitk.SignedMaurerDistanceMap(seg_connectedThreshold, squaredDistance=False))
segmented_surface = sitk.LabelContour(seg_connectedThreshold)
myshow(segmented_distance_map)

## Utility method for display

In [None]:
def display_with_overlay(segmentation_number, slice_number, image, segs, window_min, window_max):
    """
    Display a CT slice with segmented contours overlaid onto it. The contours are the edges of 
    the labeled regions.
    """
    img = image[:,:,slice_number]
    msk = segs[segmentation_number][:,:,slice_number]
    overlay_img = sitk.LabelMapContourOverlay(sitk.Cast(msk, sitk.sitkLabelUInt8), 
                                              sitk.Cast(sitk.IntensityWindowing(img,
                                                                                windowMinimum=window_min, 
                                                                                windowMaximum=window_max), 
                                                        sitk.sitkUInt8), 
                                             opacity = 1, 
                                             contourThickness=[2,2])
    #We assume the original slice is isotropic, otherwise the display would be distorted 
    plt.imshow(sitk.GetArrayViewFromImage(overlay_img))
    plt.axis('off')
    plt.show()

## Fetch the data

Retrieve a single CT scan and three manual delineations of a liver tumor. Visual inspection of the data highlights the variability between experts. 

In [None]:
image = sitk.ReadImage(fdata("liverTumorSegmentations/Patient01Homo.mha"))
segmentation_file_names = ["liverTumorSegmentations/Patient01Homo_Rad01.mha", 
                          "liverTumorSegmentations/Patient01Homo_Rad02.mha",
                          "liverTumorSegmentations/Patient01Homo_Rad03.mha"]
                          
segmentations = [sitk.ReadImage(fdata(file_name), sitk.sitkUInt8) for file_name in segmentation_file_names]
    
interact(display_with_overlay, segmentation_number=(0,len(segmentations)-1), 
         slice_number = (0, image.GetSize()[2]-1), image = fixed(image),
         segs = fixed(segmentations), window_min = fixed(-1024), window_max=fixed(976));

In the dataset, there are 3 segmentations from 3 radiologists. From around slice 66 to 80 you will see the outline of the tumor being outlined. Between the 3 radiologists there are some variations.

## Derive a reference

There are a variety of ways to derive a reference segmentation from multiple expert inputs. Several options, and there are more, are described in ["A comparison of ground truth estimation methods", A. M. Biancardi, A. C. Jirapatnakul, and A. P. Reeves](https://link.springer.com/article/10.1007%2Fs11548-009-0401-3).

Two methods available in SimpleITK are **majority vote** and the **STAPLE** algorithm.

In [None]:
# Use majority voting to obtain the reference segmentation. Note that this filter does not resolve ties. In case of 
# ties, it will assign max_label_value+1 or a user specified label value (labelForUndecidedPixels) to the result. 
# Before using the results of this filter you will have to check whether there were ties and modify the results to
# resolve the ties in a manner that makes sense for your task. The filter implicitly accommodates multiple labels.
labelForUndecidedPixels = 10
reference_segmentation_majority_vote = sitk.LabelVoting(segmentations, labelForUndecidedPixels)    

manual_plus_majority_vote = list(segmentations)  
# Append the reference segmentation to the list of manual segmentations
manual_plus_majority_vote.append(reference_segmentation_majority_vote)

interact(display_with_overlay, segmentation_number=(0,len(manual_plus_majority_vote)-1), 
         slice_number = (0, image.GetSize()[2]-1), image = fixed(image),
         segs = fixed(manual_plus_majority_vote), window_min = fixed(-1024), window_max=fixed(976));

In the absence of the ground truth, the [STAPLE algorithm](https://ieeexplore.ieee.org/document/1309714) estimates what the ground truth is based on multiple segmentations. The mathematical foundation is based on the [Expectation-Maximization](https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm) (EM) algorithm, which is quite involved and thus not covered in this class. 

The STAPLE algorithm is an iterative. At each iteration, the ground truth is estimated based on the input. The performance of each input is evaluated against the estimated ground truth in terms of sensitivity and specificity. If a particular input segmentation is evaluated as *good*, i.e. having high sensitivity and specificity, it is weighted more to generate the next estimate of the ground truth.

<img src="STAPLE.gif" width="650"/>

(image courtesy of the [STAPLE paper by Warfield et al.](https://ieeexplore.ieee.org/document/1309714))

The intermediate results are an image of probability, with value of each pixel/voxel representing the probability for that pixel/voxel to be part of the ground truth, and the estimated sensitivity and specificity for each input segmentation.

At the end of the algorithm, this probability map is thresholded to provide an estimate of the ground truth segmentation.

In [None]:
# Use the STAPLE algorithm to obtain the reference segmentation. This implementation of the original algorithm
# combines a single label from multiple segmentations, the label is user specified. The result of the
# filter is the voxel's probability of belonging to the foreground. We then have to threshold the result to obtain
# a reference binary segmentation.
foregroundValue = 1
threshold = 0.95
reference_segmentation_STAPLE_probabilities = sitk.STAPLE(segmentations, foregroundValue) 
# We use the overloaded operator to perform thresholding, another option is to use the BinaryThreshold function.
reference_segmentation_STAPLE = reference_segmentation_STAPLE_probabilities > threshold

manual_plus_staple = list(segmentations)  
# Append the reference segmentation to the list of manual segmentations
manual_plus_staple.append(reference_segmentation_STAPLE)

interact(display_with_overlay, segmentation_number=(0,len(manual_plus_staple)-1), 
         slice_number = (0, image.GetSize()[2]-1), image = fixed(image),
         segs = fixed(manual_plus_staple), window_min = fixed(-1024), window_max=fixed(976));

## Evaluate segmentations using the reference

Once we derive a reference from our experts input we can compare segmentation results to it.

Note that in this notebook we compare the expert segmentations to the reference derived from them. This is not relevant for algorithm evaluation, but it can potentially be used to rank your experts.

In this specific implementation we take advantage of the fact that we have a binary segmentation with 1 for foreground and 0 for background.

In [None]:
from enum import Enum

# Use enumerations to represent the various evaluation measures
class OverlapMeasures(Enum):
    jaccard, dice, volume_similarity, false_negative, false_positive = range(5)

class SurfaceDistanceMeasures(Enum):
    hausdorff_distance, mean_surface_distance, median_surface_distance, std_surface_distance, max_surface_distance = range(5)
    
# Select which reference we want to use (majority vote or STAPLE)    
reference_segmentation = reference_segmentation_STAPLE

# Empty numpy arrays to hold the results 
overlap_results = np.zeros((len(segmentations),len(OverlapMeasures.__members__.items())))  
surface_distance_results = np.zeros((len(segmentations),len(SurfaceDistanceMeasures.__members__.items())))  

# Compute the evaluation criteria

# Note that for the overlap measures filter, because we are dealing with a single label we 
# use the combined, all labels, evaluation measures without passing a specific label to the methods.
overlap_measures_filter = sitk.LabelOverlapMeasuresImageFilter()

hausdorff_distance_filter = sitk.HausdorffDistanceImageFilter()

# Use the absolute values of the distance map to compute the surface distances (distance map sign, outside or inside 
# relationship, is irrelevant)
label = 1
reference_distance_map = sitk.Abs(sitk.SignedMaurerDistanceMap(reference_segmentation, squaredDistance=False))
reference_surface = sitk.LabelContour(reference_segmentation)

statistics_image_filter = sitk.StatisticsImageFilter()
# Get the number of pixels in the reference surface by counting all pixels that are 1.
statistics_image_filter.Execute(reference_surface)
num_reference_surface_pixels = int(statistics_image_filter.GetSum()) 

for i, seg in enumerate(segmentations):
    # Overlap measures
    overlap_measures_filter.Execute(reference_segmentation, seg)
    overlap_results[i,OverlapMeasures.jaccard.value] = overlap_measures_filter.GetJaccardCoefficient()
    overlap_results[i,OverlapMeasures.dice.value] = overlap_measures_filter.GetDiceCoefficient()
    overlap_results[i,OverlapMeasures.volume_similarity.value] = overlap_measures_filter.GetVolumeSimilarity()
    overlap_results[i,OverlapMeasures.false_negative.value] = overlap_measures_filter.GetFalseNegativeError()
    overlap_results[i,OverlapMeasures.false_positive.value] = overlap_measures_filter.GetFalsePositiveError()
    # Hausdorff distance
    hausdorff_distance_filter.Execute(reference_segmentation, seg)
    surface_distance_results[i,SurfaceDistanceMeasures.hausdorff_distance.value] = hausdorff_distance_filter.GetHausdorffDistance()
    # Symmetric surface distance measures
    segmented_distance_map = sitk.Abs(sitk.SignedMaurerDistanceMap(seg, squaredDistance=False))
    segmented_surface = sitk.LabelContour(seg)
        
    # Multiply the binary surface segmentations with the distance maps. The resulting distance
    # maps contain non-zero values only on the surface (they can also contain zero on the surface)
    seg2ref_distance_map = reference_distance_map*sitk.Cast(segmented_surface, sitk.sitkFloat32)
    ref2seg_distance_map = segmented_distance_map*sitk.Cast(reference_surface, sitk.sitkFloat32)
        
    # Get the number of pixels in the reference surface by counting all pixels that are 1.
    statistics_image_filter.Execute(segmented_surface)
    num_segmented_surface_pixels = int(statistics_image_filter.GetSum())
    
    # Get all non-zero distances and then add zero distances if required.
    seg2ref_distance_map_arr = sitk.GetArrayViewFromImage(seg2ref_distance_map)
    seg2ref_distances = list(seg2ref_distance_map_arr[seg2ref_distance_map_arr!=0]) 
    seg2ref_distances = seg2ref_distances + \
                        list(np.zeros(num_segmented_surface_pixels - len(seg2ref_distances)))
    ref2seg_distance_map_arr = sitk.GetArrayViewFromImage(ref2seg_distance_map)
    ref2seg_distances = list(ref2seg_distance_map_arr[ref2seg_distance_map_arr!=0]) 
    ref2seg_distances = ref2seg_distances + \
                        list(np.zeros(num_reference_surface_pixels - len(ref2seg_distances)))
        
    all_surface_distances = seg2ref_distances + ref2seg_distances
    
    surface_distance_results[i,SurfaceDistanceMeasures.mean_surface_distance.value] = np.mean(all_surface_distances)
    surface_distance_results[i,SurfaceDistanceMeasures.median_surface_distance.value] = np.median(all_surface_distances)
    surface_distance_results[i,SurfaceDistanceMeasures.std_surface_distance.value] = np.std(all_surface_distances)
    surface_distance_results[i,SurfaceDistanceMeasures.max_surface_distance.value] = np.max(all_surface_distances)
    

# Print the matrices
np.set_printoptions(precision=3)
print(overlap_results)
print(surface_distance_results)

In [None]:
import pandas as pd

from IPython.display import display, HTML 

# Graft our results matrix into pandas data frames 
overlap_results_df = pd.DataFrame(data=overlap_results, index = list(range(len(segmentations))), 
                                  columns=[name for name, _ in OverlapMeasures.__members__.items()]) 
surface_distance_results_df = pd.DataFrame(data=surface_distance_results, index = list(range(len(segmentations))), 
                                  columns=[name for name, _ in SurfaceDistanceMeasures.__members__.items()]) 

# Display the data as HTML tables and graphs
display(HTML(overlap_results_df.to_html(float_format=lambda x: '%.3f' % x)))
display(HTML(surface_distance_results_df.to_html(float_format=lambda x: '%.3f' % x)))
overlap_results_df.plot(kind='bar').legend(bbox_to_anchor=(1.6,0.9))
surface_distance_results_df.plot(kind='bar').legend(bbox_to_anchor=(1.6,0.9))

In [None]:
# The formatting of the table using the default settings is less than ideal 
print(overlap_results_df.to_latex())

# We can improve on this by specifying the table's column format and the float format
print(overlap_results_df.to_latex(column_format='ccccccc', float_format=lambda x: '%.3f' % x))