Evaluating cluster similarity is a bit non-trivial compared to supervised classification algorithms. Check [Scikit-learn documentation](https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation) for more. One way is to compute the *pair*-confusion matrix.

![pair-confusion.png](attachment:pair-confusion.png)

In [1]:
from sklearn.metrics.cluster import pair_confusion_matrix
from sklearn import metrics
import numpy as np


In [2]:
C = pair_confusion_matrix([0, 0, 1, 1], [0, 0, 1, 2])
C
TN = C[0, 0]
FP = C[0, 1]
FN = C[1, 0]
TP = C[1, 1]

array([[8, 0],
       [2, 2]])

It's easy to compute the Fowlkes–Mallows index, another measure for cluster similarity, using the elements of C as $\text{FMI} = \frac{\text{TP}}{\sqrt{(\text{TP} + \text{FP}) (\text{TP} + \text{FN})}}$

In [3]:
fmi = TP / np.sqrt((TP + FP) * (TP + FN))
fmi

# You can also compute FMI using scikit-learn. The results match, of course.
metrics.fowlkes_mallows_score([0, 0, 1, 1], [0, 0, 1, 2])

0.7071067811865475

0.7071067811865476