<a href="https://colab.research.google.com/github/cagBRT/Clustering-Intro/blob/master/Evaluating_the_performance_of_a_clustering_algorithm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Evaluating the performance of a clustering algorithm

Evaluating the performance of a clustering algorithm is not as trivial as counting the number of errors or the precision and recall of a supervised classification algorithm.

Any evaluation metric should **not** take the absolute values of the cluster labels into account<br>

Instead, if the clustering <br>
- defines separations of the data similar to some ground truth set of classes<br>
or <br>
-satisfying some assumption such that members belong to the same class are more similar than members of different classes according to some similarity metric.



## **Rand Index**

If the ground truth class assignments are labels_true<br>
and<br>
The clustering algorithm assignments of the same samples are labels_pred <br>

The (adjusted or unadjusted) **Rand index is a function that measures the similarity of the two assignments, ignoring permutations:**



In [None]:
from sklearn import metrics

In [None]:
labels_true = [0, 0, 0, 1, 1, 1]
labels_pred = [0, 0, 1, 1, 2, 2]

#two 0,0 and two 1,2 and one 1,1 - agreeing

In [None]:
metrics.rand_score(labels_true, labels_pred)

The Rand index does not ensure to obtain a value close to 0.0 for a random labelling. The adjusted Rand index corrects for chance and will give such a baseline.

In [None]:
metrics.adjusted_rand_score(labels_true, labels_pred)

Change the predictions<br>
As with all clustering metrics, one can permute 0 and 1 in the predicted labels, rename 2 to 3, and **get the same score**

In [None]:
labels_true = [0, 0, 0, 1, 1, 1]
labels_pred = [1, 1, 0, 0, 3, 3]
metrics.rand_score(labels_true, labels_pred)

In [None]:
metrics.adjusted_rand_score(labels_true, labels_pred)

Swapping the argument does not change the scores, they can be used as consensus measures

In [None]:
metrics.rand_score(labels_pred, labels_true)

In [None]:
metrics.adjusted_rand_score(labels_pred, labels_true)

**Perfect labeling is scored 1.0**

In [None]:
labels_pred = labels_true[:]
metrics.rand_score(labels_true, labels_pred)


In [None]:
metrics.adjusted_rand_score(labels_true, labels_pred)

Poorly agreeing labels (e.g. independent labelings) have lower scores <br>

- **the adjusted Rand index** the score will be negative or close to zero.

- for the **unadjusted Rand index** the score, while lower, will not necessarily be close to zero

In [None]:
labels_true = [0, 0, 0, 0, 0, 0, 1, 1]
labels_pred = [0, 1, 2, 3, 4, 5, 5, 6]
metrics.rand_score(labels_true, labels_pred)

#non-agreeing 0,0...0,1...0,5 and 1,5. disagreeing

In [None]:
metrics.adjusted_rand_score(labels_true, labels_pred)

## **Mutual Information based scores**

Mutual Information is a function that measures the agreement of the two assignments, ignoring permutations.

In [None]:
labels_true = [0, 0, 0, 1, 1, 1]
labels_pred = [0, 0, 1, 1, 2, 2]

In [None]:
metrics.adjusted_mutual_info_score(labels_true, labels_pred)

You can permute 0 and 1 in the predicted labels, rename 2 to 3 and get the same score:

In [None]:
labels_true = [0, 0, 0, 1, 1, 1]
labels_pred = [1, 1, 0, 0, 3, 3]
metrics.adjusted_mutual_info_score(labels_true, labels_pred)

Swapping the argument does not change the score. Thus they can be used as a consensus measure:



In [None]:
metrics.adjusted_mutual_info_score(labels_pred, labels_true)

Perfect labeling is scored 1.0

In [None]:
labels_pred = labels_true[:]
metrics.adjusted_mutual_info_score(labels_true, labels_pred)

In [None]:
metrics.normalized_mutual_info_score(labels_true, labels_pred)

Bad (e.g. independent labelings) have non-positive scores:

In [None]:
labels_true = [0, 1, 2, 0, 3, 4, 5, 1]
labels_pred = [1, 1, 0, 0, 2, 2, 2, 2]
metrics.adjusted_mutual_info_score(labels_true, labels_pred)