# Day 33: Clustering Evaluation Metrics

Welcome to Day 33! Unlike supervised learning, where you have a clear target variable to measure your model against, evaluating unsupervised clustering models is more nuanced. Since there's no "correct" answer, we use metrics that assess the quality and compactness of the clusters themselves.

## Topics Covered:

- The Challenge of Evaluating Unsupervised Models

- Common Clustering Metrics
    - Intertia

    - Silhouette Score

    - Davies-Bouldin Index

    - Calinski-Harabasz Index

- How to Interpret These Metrics

## The Challenge of Unsupervised Evaluation

In supervised learning, we have an answer key (the labels). We can easily calculate how many predictions were correct using metrics like accuracy or an F1-Score. 

In unsupervised learning, however, we don't have labels. The model is creating its own structure, and we need to find a way to measure how "good" that structure is without an answer key

So, we use **internal evaluation metrics** to assess the compactness and separation of clusters:
- **Compactness**: Points within a cluster should be close to each other.
- **Separation**: Clusters should be well-separated from one another.

## Internal Metrics

### Inertia

Inertia is sum of squared distances of samples to their closest cluster center. It measures how compact the clusters are.

### Silhouette Score

Measures **how similar a point is to its own cluster vs. other clusters**.


- Ranges from **-1 to 1**:
  - +1 → well-clustered
  - 0 → on the boundary
  - -1 → wrongly clustered

#### Formula:

For a point \( i \):

- a(i): average distance to points in the **same** cluster  
- b(i): lowest average distance to points in **another** cluster  

$$ Silhouette(i) = \frac{b(i) - a(i)}{\max(a(i), b(i))} $$


### Davies-Bouldin Index

The Davies-Bouldin Index measures the average similarity between each cluster and its most similar cluster. Similarity is a ratio of within-cluster distance to between-cluster distance. A lower Davies-Bouldin score indicates a better clustering. A perfect score is 0.

$$DB = \frac{1}{k} \sum_{i=1}^{k} \max_{j \neq i} \left(\frac{\sigma_i + \sigma_j}{d(c_i, c_j)}\right)$$

### Calinski-Harabasz Index

Also known as the Variance Ratio Criterion, this index is a ratio of the between-cluster variance to the within-cluster variance. A higher score means the clusters are better defined.

$$CH = \frac{SS_B / (k-1)}{SS_W / (N-k)}$$

## External Metrics

### Adjusted Rand Index (ARI)

### When to Use Which Clustering Evaluation Metric

#### When you DO NOT have Ground Truth Labels

| Metric | When to Use It | Analogy to Explain |
| :--- | :--- | :--- |
| **Inertia** | To determine the **optimal number of clusters (k)**. Use it with the Elbow Method. The goal is to find the "k" where adding more clusters no longer significantly decreases the inertia. | **The Team Huddle:** Imagine a sports team huddling. Inertia measures how close each player is to the center of their team's huddle. You want them close, but a huddle of one player isn't a team. The elbow point is the sweet spot where the teams are as compact as possible without being too fragmented. |
| **Silhouette Score** | To compare the performance of different clustering algorithms or a different number of clusters. A higher score is better. It balances **compactness** and **separation**. | **The Socialite's Scorecard:** A socialite wants to know if they're in the right friend group. Their score is high if they are very close to their own friends and very far from other groups. A score of 1 is ideal, 0 means they're on the border, and -1 means they're in the wrong group. |
| **Davies-Bouldin Index** | To compare different clustering results. A **lower score is better**, with 0 being the perfect score. It measures the ratio of within-cluster distance to between-cluster distance. | **The Bubble Bath:** The index measures how close a bubble is to its closest neighbor. A low score means the bubbles are tight and well-defined (compact) and the bubbles themselves are far apart (well-separated). |
| **Calinski-Harabasz Index** | To compare the quality of different clustering results. A **higher score is better**. It measures the ratio of between-cluster variance to within-cluster variance. | **The Bandstand:** This index measures how much better organized the bands are when they are on their own stage versus when they are all jumbled together. A high score means the groups are distinct and well-separated. |


When you DO have Ground Truth Labels

| Metric | When to Use It | Analogy to Explain |
| :--- | :--- | :--- |
| **Adjusted Rand Index (ARI)** | To see how well your clustering algorithm **"rediscovered"** the true labels in your data. It's the most robust metric for this purpose. The score ranges from -1 to 1, with 1 being a perfect match. | **The Answer Key:** You've just sorted a mixed box of sports balls. The ARI is like having a teacher's answer key that tells you which balls are which. The score tells you how perfectly your sorting matches the key. |