# Hierarchical Clustering

Hierarchical clustering is an **unsupervised learning method** that builds a hierarchy (tree structure) of clusters.

### Key Concepts:
- Two types:
  1. **Agglomerative** (bottom-up) – each point starts as its own cluster, then merges step by step.
  2. **Divisive** (top-down) – start with one big cluster, then split recursively.
- Results can be visualized using a **dendrogram**.
- Unlike K-Means, you don’t need to predefine the number of clusters (though you can cut the dendrogram at desired level).

### Pros:
- Produces a hierarchy of clusters.
- No need to pre-specify K (but you may choose it later).

### Cons:
- Computationally expensive for large datasets.
- Sensitive to noisy data/outliers.


In [1]:
# Import libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_blobs
from scipy.cluster.hierarchy import dendrogram, linkage, fcluster

In [2]:
# Generate synthetic dataset
X, y_true = make_blobs(n_samples=150, centers=3, cluster_std=0.7, random_state=42)

plt.scatter(X[:, 0], X[:, 1], s=40)
plt.title("Synthetic Data for Hierarchical Clustering")
plt.show()

In [3]:
# Perform hierarchical clustering (Agglomerative)
Z = linkage(X, method='ward')  # Ward minimizes variance within clusters

# Plot dendrogram
plt.figure(figsize=(10, 6))
dendrogram(Z)
plt.title("Dendrogram (Hierarchical Clustering)")
plt.xlabel("Data Points")
plt.ylabel("Distance")
plt.show()

In [4]:
# Cut the dendrogram to form clusters (k=3)
clusters = fcluster(Z, t=3, criterion='maxclust')

plt.scatter(X[:, 0], X[:, 1], c=clusters, cmap='viridis', s=40)
plt.title("Clusters from Hierarchical Clustering")
plt.show()

### Key Takeaways:
- Hierarchical clustering creates a dendrogram that shows how data points are merged.
- You can decide the number of clusters by **cutting the dendrogram** at a chosen distance.
- More flexible than K-Means (no need to fix K initially).
- Works well on small to medium datasets.
