# Clustering metrics

As opposed to classfication, it is difficult to assess the quality of results from clustering. Here, a metric cannot depend on the labels but only on the goodness of split. Secondly, we do not usually have true labels of the observations when we use clustering.

There are *internal* and *external* goodness metrics. External metrics use the information about the known true split while internal metrics do not use any external information and assess the goodness of clusters based only on the initial data. The optimal number of clusters is usually defined with respect to some internal metrics. 

### Types of Metrics :
**Internal**
* Silhouette Distance

**External**
* adjusted rand index
* homogeneity
* V-measure

### Rand index adjusted for chance.
The Rand Index computes a similarity measure between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings.

Here, we assume that the true labels of objects are known. This metric does not depend on the labels' values but on the data cluster split. Let $N$ be the number of observations in a sample. Let $a$ to be the number of observation pairs with the same labels and located in the same cluster, and let $b$ to be the number of observations with different labels and located in different clusters. The Rand Index can be calculated using the following formula: $$\text{RI} = \frac{2(a + b)}{n(n-1)}.$$ 
In other words, it evaluates a share of observations for which these splits (initial and clustering result) are consistent. The Rand Index (RI) evaluates the similarity of the two splits of the same sample. In order for this index to be close to zero for any clustering outcomes with any $n$ and number of clusters, it is essential to scale it, hence the Adjusted Rand Index: $$\text{ARI} = \frac{\text{RI} - E[\text{RI}]}{\max(\text{RI}) - E[\text{RI}]}.$$

This metric is symmetric and does not depend in the label permutation. Therefore, this index is a measure of distances between different sample splits. $\text{ARI}$ takes on values in the $[-1, 1]$ range. Negative values indicate the independence of splits, and positive values indicate that these splits are consistent (they match $\text{ARI} = 1$).

In [1]:
from sklearn.metrics import adjusted_rand_score
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
adjusted_rand_score(y_true, y_pred)

1.0

In [2]:
adjusted_rand_score([0, 0, 1, 2], [0, 0, 1, 1])  

0.57142857142857151

In [3]:
adjusted_rand_score([0, 0, 0, 0], [0, 1, 2, 3])

0.0

### Homogeneity, V-measure

Formally, these metrics are also defined based on the entropy function and the conditional entropy function, interpreting the sample splits as discrete distributions: $$h = 1 - \frac{H(C\mid K)}{H(C)}, c = 1 - \frac{H(K\mid C)}{H(K)},$$
where $K$ is a clustering result and $C$ is the initial split. Therefore, $h$ evaluates whether each cluster is composed of same class objects, and $c$ measures how well the same class objects fit the clusters. These metrics are not symmetric. Both lie in the $[0, 1]$ range, and values closer to 1 indicate more accurate clustering results. These metrics' values are not scaled as the $\text{ARI}$ or $\text{AMI}$ metrics are and thus depend on the number of clusters. A random clustering result will not have metrics' values closer to zero when the number of clusters is big enough and the number of objects is small. In such a case, it would be more reasonable to use $\text{ARI}$. However, with a large number of observations (more than 100) and the number of clusters less than 10, this issue is less critical and can be ignored.

$V$-measure is a combination of $h$, and $c$ and is their harmonic mean:
$$v = 2\frac{hc}{h+c}.$$
It is symmetric and measures how consistent two clustering results are.

### Homogeneity
Homogeneity metric of a cluster labeling given a ground truth.
A clustering result satisfies homogeneity if all of its clusters contain only data points which are members of a single class.

In [4]:
from sklearn.metrics.cluster import homogeneity_score
homogeneity_score([0, 0, 1, 1], [1, 1, 0, 0])

1.0

In [5]:
print("%.6f" % homogeneity_score([0, 0, 1, 1], [0, 0, 1, 2]))
print("%.6f" % homogeneity_score([0, 0, 1, 1], [0, 1, 2, 3]))

1.000000
1.000000


In [7]:
print("%.6f" % homogeneity_score([0, 0, 1, 1], [0, 1, 0, 1]))                                               
print("%.6f" % homogeneity_score([0, 0, 1, 1], [0, 0, 0, 0]))
# you can just play around testing different data points                                                

0.000000
0.000000


### V-measure
V-measure cluster labeling given a ground truth.
This score is identical to normalized_mutual_info_score.

The V-measure is the harmonic mean between homogeneity and completeness:
v = 2 * (homogeneity * completeness) / (homogeneity + completeness)

This metric is independent of the absolute values of the labels: a permutation of the class or cluster label values won’t change the score value in any way.

This metric is furthermore symmetric: switching label_true with label_pred will return the same score value. This can be useful to measure the agreement of two independent label assignments strategies on the same dataset when the real ground truth is not known.

In [8]:
from sklearn.metrics.cluster import v_measure_score
v_measure_score([0, 0, 1, 1], [0, 0, 1, 1])
v_measure_score([0, 0, 1, 1], [1, 1, 0, 0])

1.0

In [9]:
print("%.6f" % v_measure_score([0, 0, 1, 2], [0, 0, 1, 1]))
print("%.6f" % v_measure_score([0, 1, 2, 3], [0, 0, 1, 1]))

0.800000
0.666667


In [10]:
print("%.6f" % v_measure_score([0, 0, 0, 0], [0, 1, 2, 3]))

0.000000
