# 🧩 Rand Index (RI) and Adjusted Rand Index (ARI)

## 1. Intuition

Both RI and ARI measure how similar two partitions of the same dataset are:

- **True labels**: ground truth classes
- **Predicted labels**: clustering output

They ignore the actual label names, only caring whether pairs of points are grouped together or not.

---

## 2. Pairwise relationships

For \( n \) samples, there are \( \binom{n}{2} \) possible pairs.

For each pair \((i, j)\), there are four cases:

| Type | True | Predicted | Count |
|------|-------|------------|-------|
| SS | same | same | \( a \) |
| DD | diff | diff | \( b \) |
| SD | same | diff | \( c \) |
| DS | diff | same | \( d \) |

---

## 3. Rand Index (Unadjusted)

The Rand Index is simply the proportion of pairwise agreements:

$$
RI = \frac{a + b}{a + b + c + d} = \frac{a + b}{\binom{n}{2}}
$$

It measures **Observed / Max Possible**, since the maximum number of agreeing pairs is \(\binom{n}{2}\).

- Range: \([0, 1]\)
- 1 → perfect agreement
- 0 → total disagreement

However, random labelings can yield *nonzero* RI due to chance.

---

## 4. Contingency Table

Let \( n_{ij} \) be the number of samples in both true class \( i \) and predicted cluster \( j \):

| True / Pred | C₁ | C₂ | ... | Row Sum |
|--------------|----|----|------|----------|
| T₁ | \( n_{11} \) | \( n_{12} \) | ... | \( a_1 \) |
| T₂ | \( n_{21} \) | \( n_{22} \) | ... | \( a_2 \) |
| ... | ... | ... | ... | ... |
| Column Sum | \( b_1 \) | \( b_2 \) | ... | \( n \) |

Here:
- \( a_i = \sum_j n_{ij} \) (true class sizes)
- \( b_j = \sum_i n_{ij} \) (predicted cluster sizes)

---

## 5. Observed Index

Observed number of “same–same” pairs in both partitions:

$$
\text{Observed Index} = \sum_{ij} \binom{n_{ij}}{2}
$$

This counts how many sample pairs are in the same true class **and** the same predicted cluster.

---

## 6. Expected Index

We want to correct for **chance agreement**.
Even random clusterings will have some overlap between within-class and within-cluster pairs.

Probability that two samples are in the same true class:

$$
P_{\text{true}} = \frac{\sum_i \binom{a_i}{2}}{\binom{n}{2}}
$$

Probability that two samples are in the same predicted cluster:

$$
P_{\text{pred}} = \frac{\sum_j \binom{b_j}{2}}{\binom{n}{2}}
$$

Assuming independence, the expected probability they’re same–same in both:

$$
P_{\text{true}} \times P_{\text{pred}}
$$

Hence, expected number of such pairs:

$$
E[\text{Index}] =
\binom{n}{2} \, P_{\text{true}} \, P_{\text{pred}} =
\frac{
  \left( \sum_i \binom{a_i}{2} \right)
  \left( \sum_j \binom{b_j}{2} \right)
}{ \binom{n}{2} }
$$

---

## 7. Max Possible Index

The largest possible number of same–same pairs happens when both partitions are perfectly aligned.
Each partition defines a set of within-cluster pairs:

- True: \( |S_{\text{true}}| = \sum_i \binom{a_i}{2} \)
- Pred: \( |S_{\text{pred}}| = \sum_j \binom{b_j}{2} \)

Their maximum intersection (if perfectly identical) is the symmetric mean:

$$
\text{Max Index} =
\frac{1}{2}
\left[
  \sum_i \binom{a_i}{2} +
  \sum_j \binom{b_j}{2}
\right]
$$

This ensures symmetry between the two partitions and bounds the overlap.

---

## 8. Adjusted Rand Index (ARI)

We normalize by removing the expected random baseline and scaling by the maximum improvement possible:

$$
ARI =
\frac{
  \text{Observed Index} - E[\text{Index}]
}{
  \text{Max Index} - E[\text{Index}]
}
$$

Expanding all terms:

$$
ARI =
\frac{
  \sum_{ij} \binom{n_{ij}}{2}
  -
  \frac{ \sum_i \binom{a_i}{2} \sum_j \binom{b_j}{2} }{ \binom{n}{2} }
}{
  \frac{1}{2}
  \left[
    \sum_i \binom{a_i}{2}
    +
    \sum_j \binom{b_j}{2}
  \right]
  -
  \frac{ \sum_i \binom{a_i}{2} \sum_j \binom{b_j}{2} }{ \binom{n}{2} }
}
$$

---

## 9. Relationship between RI and ARI

| Concept | Formula | Description |
|----------|----------|-------------|
| **Unadjusted RI** | \( \frac{\text{Observed}}{\text{Max Possible}} \) | Measures pairwise agreement directly |
| **Adjusted RI (ARI)** | \( \frac{\text{Observed} - \text{Expected}}{\text{Max Possible} - \text{Expected}} \) | Subtracts chance agreement, normalizes to [−1, 1] |

- \( ARI = 1 \): perfect alignment
- \( ARI = 0 \): expected similarity for random labelings
- \( ARI < 0 \): worse than random

---

## 10. Intuition Summary

- **Observed**: actual same–same pairs in both clusterings
- **Expected**: what you’d expect just by random overlap
- **Max Possible**: upper bound if they matched perfectly

ARI therefore tells you how much *better than chance* the clustering agreement is.

---

## 11. Scikit-learn usage

```python
from sklearn.metrics import rand_score, adjusted_rand_score

rand_score(true_labels, predicted_labels)
adjusted_rand_score(true_labels, predicted_labels)


# 🎯 Homogeneity, Completeness, and V-Measure

## 1. Motivation

When we have **ground truth labels** \( K \) and **predicted cluster labels** \( C \),
we can evaluate clustering quality in two complementary ways:

1. **Homogeneity** — each cluster should contain *only members of a single class*
   (no mixing inside clusters).

2. **Completeness** — all members of a given class should fall into *the same cluster*
   (no splitting of a class across multiple clusters).

3. **V-Measure** — a balanced harmonic mean of both.

These are **information-theoretic** metrics derived from **entropy** and **conditional entropy**.

---

## 2. Entropy Setup

Let:

- \( n \): total number of samples
- \( K \): true labels (classes)
- \( C \): predicted cluster labels
- \( n_{k,c} \): number of samples with true label \( k \) and cluster label \( c \)
- \( n_k = \sum_c n_{k,c} \): total samples in true class \( k \)
- \( n_c = \sum_k n_{k,c} \): total samples in predicted cluster \( c \)

Then:

$$
H(K) = - \sum_k \frac{n_k}{n} \log \frac{n_k}{n}
$$

$$
H(C) = - \sum_c \frac{n_c}{n} \log \frac{n_c}{n}
$$

and the **conditional entropies**:

$$
H(K|C) = - \sum_c \sum_k \frac{n_{k,c}}{n} \log \frac{n_{k,c}}{n_c}
$$

$$
H(C|K) = - \sum_k \sum_c \frac{n_{k,c}}{n} \log \frac{n_{k,c}}{n_k}
$$

---

## 3. Homogeneity

If every cluster contains only a single true class,
then \( H(K|C) = 0 \).

Thus we define:

$$
\text{Homogeneity} = 1 - \frac{H(K|C)}{H(K)}
$$

- 1 → perfectly homogeneous (pure clusters)
- 0 → cluster assignment gives no info about true labels

---

## 4. Completeness

If all members of each class fall into a single cluster,
then \( H(C|K) = 0 \).

We define:

$$
\text{Completeness} = 1 - \frac{H(C|K)}{H(C)}
$$

- 1 → perfectly complete (no class split)
- 0 → cluster labels independent of true labels

---

## 5. Example Intuition

Consider the following case:

| True | Predicted cluster | Comment |
|------|--------------------|----------|
| A | 1 | good |
| A | 1 | good |
| A | 2 | split → lowers completeness |
| B | 2 | good |
| B | 2 | good |
| C | 3 | good |
| C | 3 | good |
| C | 3 | good |

Interpretation:

- Cluster **1**: contains only A → high *homogeneity*.
- But true class **A** is spread across clusters 1 and 2 → lower *completeness*.
- Thus, overall \( V \)-measure will fall between the two.

This example helps visualize **why** we calculate \( H(K|C) \) (for cluster purity)
and \( H(C|K) \) (for class completeness).

---

## 6. V-Measure

To balance both metrics, we define:

$$
V_\beta =
(1 + \beta)
\frac{
  \text{Homogeneity} \times \text{Completeness}
}{
  (\beta \times \text{Homogeneity}) + \text{Completeness}
}
$$

When \( \beta = 1 \), it becomes the **harmonic mean**:

$$
V = 2
\frac{
  \text{Homogeneity} \times \text{Completeness}
}{
  \text{Homogeneity} + \text{Completeness}
}
$$

- \( \beta > 1 \): weight completeness more
- \( \beta < 1 \): weight homogeneity more

---

## 7. Key Interpretations

| Metric | Measures | Perfect when | Uses |
|---------|-----------|---------------|------|
| **Homogeneity** | cluster purity | each cluster contains only one class | detect over-mixed clusters |
| **Completeness** | class compactness | all class members share one cluster | detect over-split classes |
| **V-Measure** | trade-off between both | both pure and complete | balanced evaluation |

---

## 8. Relationship to Entropy & Mutual Information

- \( H(K|C) \) quantifies *how impure clusters are*.
- \( H(C|K) \) quantifies *how fragmented classes are*.
- Both derived from conditional entropy, normalized by total entropy to give values in \([0,1]\).

They are also equivalent to **normalized mutual information (NMI)** under symmetric weighting.

---

## 9. Scikit-learn Usage

```python
from sklearn.metrics import homogeneity_score, completeness_score, v_measure_score

homogeneity_score(true_labels, predicted_labels)
completeness_score(true_labels, predicted_labels)
v_measure_score(true_labels, predicted_labels)


# 🧩 Mutual Information (MI), Normalized MI (NMI), and Adjusted MI (AMI)

## 1. Intuition

Mutual Information (MI) measures how much *knowing the predicted cluster labels* \( C \)
reduces our uncertainty about the *true labels* \( K \).

- If \( C \) and \( K \) are **independent** → no information gain → \( MI = 0 \).
- If \( C \) perfectly predicts \( K \) → complete information overlap → \( MI \) is maximal.

In clustering evaluation, we measure how much the two labelings share information.

---

## 2. Theoretical Definition

From information theory:

$$
I(K, C) = \sum_i \sum_j P(i, j) \log \frac{P(i, j)}{P(i) P(j)}
$$

- \( P(i, j) \): joint probability of true class \( i \) and cluster \( j \)
- \( P(i) \), \( P(j) \): marginal probabilities

If \( P(i, j) = P(i) P(j) \) (independent), then \( I(K, C) = 0 \).

---

## 3. Empirical Form (Using Counts)

We have empirical counts:

- \( n_{ij} \): number of samples in both true class \( i \) and predicted cluster \( j \)
- \( n_i = \sum_j n_{ij} \): number in class \( i \)
- \( n_j = \sum_i n_{ij} \): number in cluster \( j \)
- \( n \): total samples

We estimate:

$$
P(i, j) = \frac{n_{ij}}{n}, \quad P(i) = \frac{n_i}{n}, \quad P(j) = \frac{n_j}{n}
$$

Substitute into the definition:

$$
MI(K, C)
= \sum_i \sum_j
\frac{n_{ij}}{n}
\log
\frac{
  \frac{n_{ij}}{n}
}{
  \frac{n_i}{n} \frac{n_j}{n}
}
$$

---

## 4. Simplifying the Expression

Simplify the argument of the logarithm:

$$
\frac{
  \frac{n_{ij}}{n}
}{
  \frac{n_i}{n} \frac{n_j}{n}
}
= \frac{n_{ij}/n}{(n_i n_j)/n^2}
= \frac{n_{ij} \, n}{n_i \, n_j}
$$

Thus:

$$
MI(K, C)
= \sum_i \sum_j
\frac{n_{ij}}{n}
\log
\frac{n_{ij} \, n}{n_i \, n_j}
$$

---

## 5. Why There Is an \( n \) in the Numerator

That \( n \) appears because we converted from probabilities to counts.

Originally, we had a *ratio of probabilities*:

$$
\frac{P(i, j)}{P(i) P(j)}
$$

Substituting their empirical forms:

$$
\frac{n_{ij}/n}{(n_i/n)(n_j/n)} = \frac{n_{ij} n}{n_i n_j}
$$

The \( n \) ensures the ratio is **dimensionless** and correctly scales the joint probability
to what would be expected under independence.

---

## 6. Alternative Form

You may also see:

$$
MI(K, C) = \frac{1}{n} \sum_{i,j} n_{ij} \log \frac{n_{ij} n}{n_i n_j}
$$

Both forms are equivalent because:

$$
\frac{n_{ij}}{n} \log(\cdot) = \frac{1}{n} n_{ij} \log(\cdot)
$$

---

## 7. Relation to Entropy

MI can also be expressed as:

$$
MI(K, C) = H(K) + H(C) - H(K, C)
$$

or equivalently,

$$
MI(K, C) = H(K) - H(K|C) = H(C) - H(C|K)
$$

This means MI quantifies the **reduction in uncertainty** of one variable when the other is known.

---

## 8. Normalized Mutual Information (NMI)

Raw MI depends on the scale of entropies, so we normalize it:

$$
NMI(K, C) = \frac{MI(K, C)}{\sqrt{H(K) \, H(C)}}
$$

Range: \([0, 1]\)
- 1 → identical partitions
- 0 → completely independent

Alternative normalizations (like dividing by average entropy) exist, but the geometric mean is the symmetric one.

---

## 9. Adjusted Mutual Information (AMI)

Random clusterings can still yield positive MI values.
To correct for that, we adjust for the expected MI under independence:

$$
AMI =
\frac{
  MI(K, C) - E[MI(K, C)]
}{
  \max(H(K), H(C)) - E[MI(K, C)]
}
$$

where \( E[MI(K, C)] \) is the **expected mutual information** if \( K \) and \( C \)
were random labelings with the same size distributions.

Range: \([-1, 1]\)
- 1 → perfect agreement
- 0 → random labeling baseline
- < 0 → worse than random

---

## 10. Intuitive Analogy

| Concept | Meaning |
|----------|----------|
| \( H(K) \) | Uncertainty in true labels |
| \( H(K|C) \) | Remaining uncertainty after knowing clusters |
| \( MI = H(K) - H(K|C) \) | Information gained about true labels from clustering |
| \( NMI \) | Rescales MI to 0–1 |
| \( AMI \) | Corrects MI for chance overlap |

---

## 11. Summary Table

| Metric | Formula | Range | Meaning |
|---------|----------|--------|----------|
| **MI** | \( \sum_{ij} \frac{n_{ij}}{n} \log \frac{n_{ij} n}{n_i n_j} \) | ≥ 0 | Shared information between partitions |
| **NMI** | \( \frac{MI}{\sqrt{H(K)H(C)}} \) | [0, 1] | Scale-free version of MI |
| **AMI** | \( \frac{MI - E[MI]}{\max(H(K), H(C)) - E[MI]} \) | [−1, 1] | Chance-adjusted MI |

---

## 12. When to Use

| Scenario | Best Metric | Reason |
|-----------|--------------|---------|
| Compare clusterings on same dataset | **NMI** | Scale-free, symmetric |
| Compare across datasets / cluster counts | **AMI** | Removes random-labeling bias |
| Want raw info-theoretic score | **MI** | Shows total shared information |

---

✅ **Summary Insight**

That “\( n \)” inside the logarithm is not arbitrary — it’s the artifact of moving
from the probabilistic definition
\( P(i,j) / [P(i)P(j)] \)
to its empirical count-based version.

It ensures that mutual information reflects how much more (or less) often a true–predicted pair co-occurs than expected by pure chance.


# 🧩 Fowlkes–Mallows Index (FMI)

## 1. Intuition

The **Fowlkes–Mallows Index (FMI)** measures the similarity between two clusterings
by comparing them as a **pairwise binary classification task**.

Each pair of samples \((x_i, x_j)\) is treated as a separate data point:

| Pair type | “Ground truth” label | “Prediction” label |
|------------|----------------------|--------------------|
| Same true class | 1 (positive) | ? |
| Different true class | 0 (negative) | ? |

The clustering algorithm "predicts" whether a pair belongs to the same cluster (1) or not (0).

Thus, FMI behaves like an **F1-score for pairwise clustering consistency**.

---

## 2. Pairwise Confusion Matrix

For all pairs of samples, we define:

| Term | Meaning | Analogy |
|------|----------|---------|
| **TP (True Positive)** | Same true class *and* same predicted cluster | Correctly grouped pairs |
| **FP (False Positive)** | Different true class but same predicted cluster | Wrongly grouped pairs |
| **FN (False Negative)** | Same true class but different predicted clusters | Wrongly separated pairs |
| **TN (True Negative)** | Different true class and different predicted clusters | Correctly separated pairs |

Unlike the Rand Index, FMI ignores **TN** pairs because they dominate in large datasets
and carry less useful information about cluster quality.

---

## 3. Pairwise Precision and Recall

We define **pairwise precision** and **pairwise recall** as:

$$
P = \frac{TP}{TP + FP} = \frac{a}{a + b}
$$

$$
R = \frac{TP}{TP + FN} = \frac{a}{a + c}
$$

where:

- \( a = TP \): number of pairs that are both same-class and same-cluster
- \( b = FP \): pairs in the same cluster but different classes
- \( c = FN \): pairs in the same class but different clusters

Interpretation:

- **Precision (P)**: of all pairs clustered together, how many truly belong together?
- **Recall (R)**: of all true same-class pairs, how many are clustered together?

---

## 4. Fowlkes–Mallows Index Definition

The FMI is the **geometric mean** of precision and recall:

$$
FMI = \sqrt{P \times R}
$$

or equivalently in terms of pair counts:

$$
FMI = \frac{a}{\sqrt{(a + b)(a + c)}}
$$

---

## 5. Why Geometric Mean?

The geometric mean ensures that both precision and recall contribute symmetrically.
It penalizes strong imbalance between the two while remaining scale-free and symmetric.

If either precision or recall is zero, the whole FMI becomes zero.

---

## 6. Range and Interpretation

- Range: \( [0, 1] \)
  - \( 1 \): perfect clustering (every true pair grouped correctly)
  - \( 0 \): no pairwise match
- Symmetric: swapping true ↔ predicted labels does not change FMI
- Independent of the number of clusters (unlike plain RI)

---

## 7. Example

Suppose we have the following counts:

| Symbol | Meaning | Count |
|---------|----------|-------|
| \( a \) | same-class & same-cluster | 30 |
| \( b \) | different-class but same-cluster | 10 |
| \( c \) | same-class but different-cluster | 20 |

Then:

$$
P = \frac{30}{30 + 10} = 0.75, \quad
R = \frac{30}{30 + 20} = 0.6
$$

and

$$
FMI = \sqrt{0.75 \times 0.6} = 0.67
$$

---

## 8. Relation to Rand and Adjusted Rand Indices

| Metric | Includes TN? | Analogy | Focus |
|---------|--------------|---------|--------|
| **Rand Index (RI)** | ✅ yes | Accuracy over all pairs | Overall agreement |
| **Adjusted Rand Index (ARI)** | ✅ yes (chance corrected) | Adjusted accuracy | Chance-adjusted consistency |
| **Fowlkes–Mallows (FMI)** | ❌ no | F1-score on “same-cluster” pairs | Precision–recall balance |

---

## 9. Conceptual Summary

| Concept | Analogy | Formula |
|----------|----------|----------|
| Pairwise Precision | Classification precision | \( P = \frac{a}{a + b} \) |
| Pairwise Recall | Classification recall | \( R = \frac{a}{a + c} \) |
| FMI | F1-like score for clustering | \( FMI = \sqrt{P \times R} \) |

---

✅ **Summary Insight**

FMI views clustering as a **binary classification** over pairs of samples:

- “Same cluster?” is the prediction.
- “Same true class?” is the ground truth.

It balances **pairwise precision and recall**,
giving a symmetric, interpretable measure of how well the clusters match the real classes.


# 🧭 Choosing the Right Supervised Clustering Metric

Supervised clustering metrics require **ground truth labels** to compare
against predicted cluster assignments.

They can be grouped into three main families:

1. **Pair-based metrics** — compare pairs of samples.
2. **Information-theoretic metrics** — measure shared information.
3. **Entropy-based purity metrics** — measure cluster purity and completeness.

---

## 1. Pair-based Metrics

### 🧩 Adjusted Rand Index (ARI)

- **Measures:** overall pairwise agreement, adjusted for random chance.
- **Formula:**

  $$
  ARI =
  \frac{
    \text{Index} - E[\text{Index}]
  }{
    \text{Max Index} - E[\text{Index}]
  }
  $$

- **Intuition:** evaluates how consistent pair relationships are between true and predicted partitions.
- **Range:** \( [-1, 1] \)
- **Best for:** comparing clustering results when chance similarity must be removed.
- **Strengths:** chance correction, symmetric, robust.
- **Weaknesses:** quadratic in \( n \) (pairwise comparisons).

✅ *Use ARI when comparing algorithms or assessing true agreement structurally.*

---

### 🧩 Fowlkes–Mallows Index (FMI)

- **Measures:** geometric mean of pairwise precision and recall.
- **Formula:**

  $$
  FMI = \sqrt{P \times R} = \frac{a}{\sqrt{(a + b)(a + c)}}
  $$

  where \( a = TP, b = FP, c = FN \).

- **Intuition:** treats clustering as a binary classification over pairs:
  - "Same cluster?" = prediction
  - "Same true class?" = ground truth
- **Range:** \( [0, 1] \)
- **Best for:** quick, interpretable balance between clustering precision and recall.
- **Strengths:** symmetric, intuitive, scale-free.
- **Weaknesses:** no chance correction.

✅ *Use FMI when you want a simple, interpretable “pairwise F1-score”.*

---

## 2. Information-Theoretic Metrics

### 🧩 Mutual Information (MI)

- **Measures:** shared information between true labels and predicted clusters.
- **Formula:**

  $$
  MI(K, C) = \sum_{i,j} \frac{n_{ij}}{n} \log \frac{n_{ij} n}{n_i n_j}
  $$

- **Intuition:** how much knowing one labeling reduces uncertainty about the other.
- **Range:** ≥ 0 (unbounded)
- **Best for:** theoretical analysis or conceptual understanding.
- **Weakness:** depends on entropy scale (not normalized).

⚠️ *Use MI only for conceptual exploration; not suitable for direct comparison.*

---

### 🧩 Normalized Mutual Information (NMI)

- **Measures:** MI scaled to \([0, 1]\).
- **Formula:**

  $$
  NMI(K, C) = \frac{MI(K, C)}{\sqrt{H(K) \, H(C)}}
  $$

- **Intuition:** proportion of shared information between true and predicted labels.
- **Range:** \( [0, 1] \)
- **Best for:** comparing clustering quality on the same dataset.
- **Strengths:** normalized, symmetric, interpretable.
- **Weaknesses:** still slightly biased toward more clusters.

✅ *Use NMI for normalized, interpretable comparisons across multiple runs.*

---

### 🧩 Adjusted Mutual Information (AMI)

- **Measures:** MI corrected for chance.
- **Formula:**

  $$
  AMI =
  \frac{
    MI(K, C) - E[MI(K, C)]
  }{
    \max(H(K), H(C)) - E[MI(K, C)]
  }
  $$

- **Intuition:** how much more information is shared than expected by random labelings.
- **Range:** \( [-1, 1] \)
- **Best for:** comparing clusterings across datasets or when cluster counts vary.
- **Strengths:** chance-corrected, scale-invariant.
- **Weaknesses:** slightly less intuitive to interpret.

✅ *Use AMI when comparing across datasets or cluster counts with fair normalization.*

---

## 3. Entropy-Based Purity Metrics

### 🧩 Homogeneity, Completeness, and V-Measure

- **Formulas:**

  $$
  \text{Homogeneity} = 1 - \frac{H(K|C)}{H(K)}
  $$

  $$
  \text{Completeness} = 1 - \frac{H(C|K)}{H(C)}
  $$

  $$
  V = 2 \frac{\text{Homogeneity} \times \text{Completeness}}{\text{Homogeneity} + \text{Completeness}}
  $$

- **Intuition:**
  - **Homogeneity:** each cluster should contain one class (purity).
  - **Completeness:** each class should appear in one cluster (coverage).
  - **V-Measure:** harmonic mean balancing both.
- **Range:** \( [0, 1] \)
- **Best for:** diagnosing *how* a clustering fails (mixing vs splitting).
- **Strengths:** interpretable, symmetric, scale-free.
- **Weaknesses:** no chance correction.

✅ *Use V-Measure for interpretability; use Homogeneity/Completeness to diagnose over-splitting or merging.*

---

## 4. Quick Selection Guide

| Goal / Situation | Recommended Metric | Reason |
|------------------|--------------------|--------|
| Need robust, chance-corrected similarity | **Adjusted Rand Index (ARI)** | Gold standard for supervised clustering |
| Want interpretable, F1-like pairwise score | **Fowlkes–Mallows (FMI)** | Simple, symmetric, intuitive |
| Want normalized info overlap | **Normalized Mutual Information (NMI)** | Easy 0–1 comparison |
| Want chance-corrected info overlap | **Adjusted Mutual Information (AMI)** | Fair comparison across datasets |
| Want to inspect purity vs fragmentation | **Homogeneity / Completeness / V-Measure** | Diagnostic interpretability |
| Want conceptual measure of shared info | **Mutual Information (MI)** | Raw theoretical understanding |

---

## 5. Practical Tips

- ✅ **Use ARI** for robust quantitative comparison.
- ✅ **Use AMI** when number of clusters differs across algorithms.
- ✅ **Use V-Measure** when interpretability or diagnostic value is important.
- ✅ **Use FMI** for quick intuitive checks (pairwise F1-like metric).
- ⚠️ Avoid plain RI or MI — they’re unnormalized and can mislead.
- 💡 Compare multiple metrics to get both numerical and interpretive insight.

---

✅ **Summary Insight**

| Family | Metric | What It Captures |
|---------|---------|------------------|
| **Pair-based** | ARI / FMI | Pairwise agreement (like accuracy or F1) |
| **Information-theoretic** | MI / NMI / AMI | Information overlap and chance-corrected entropy |
| **Entropy-based purity** | Homogeneity / Completeness / V | Purity vs coverage diagnostic |

Together, they provide a comprehensive toolkit:
- **ARI / AMI:** quantitative robustness
- **V-Measure:** interpretability
- **FMI:** intuition
- **NMI:** normalized comparability
