# Clustering Evaluation Metrics

Since clustering is an unsupervised technique, we don’t have labels to directly measure accuracy. Instead, we use **clustering evaluation metrics** to assess the quality of clusters.

## Common Metrics:
- **Silhouette Score** → Measures how similar a point is to its own cluster vs other clusters.
- **Calinski-Harabasz Index** → Ratio of between-cluster variance to within-cluster variance.
- **Davies-Bouldin Index** → Average similarity measure of each cluster with its most similar cluster (lower is better).
- **Adjusted Rand Index (ARI)** → Compares clustering with ground truth labels (if available).

## Import Libraries and Dataset

In [None]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score, calinski_harabasz_score, davies_bouldin_score, adjusted_rand_score

# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

## Apply KMeans Clustering

In [None]:
kmeans = KMeans(n_clusters=3, random_state=42)
labels = kmeans.fit_predict(X)

## Calculate Evaluation Metrics

In [None]:
# Silhouette Score (ranges from -1 to 1, higher is better)
sil_score = silhouette_score(X, labels)

# Calinski-Harabasz Index (higher is better)
ch_score = calinski_harabasz_score(X, labels)

# Davies-Bouldin Index (lower is better)
db_score = davies_bouldin_score(X, labels)

# Adjusted Rand Index (compare with true labels, higher is better)
ari_score = adjusted_rand_score(y, labels)

print(f"Silhouette Score: {sil_score:.3f}")
print(f"Calinski-Harabasz Index: {ch_score:.3f}")
print(f"Davies-Bouldin Index: {db_score:.3f}")
print(f"Adjusted Rand Index: {ari_score:.3f}")

## Key Notes:
- **Silhouette Score** close to 1 → clusters are well-separated.
- **Calinski-Harabasz** higher → better-defined clusters.
- **Davies-Bouldin** lower → better clusters.
- **ARI** compares clustering to known labels if available; useful for benchmarking.
- Choosing the right metric depends on whether ground truth labels are available and the shape of clusters.