# Hierarchical Clustering

Hierarchical Clustering is an **unsupervised learning method** used to group data into clusters by creating a hierarchy (tree-like structure).

There are two main types:
- **Agglomerative (Bottom-Up)** → Start with each point as its own cluster and merge step by step.
- **Divisive (Top-Down)** → Start with one big cluster and split step by step.

The result is usually represented with a **Dendrogram**, which shows how clusters are merged or split.

## Import Libraries and Dataset

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from scipy.cluster.hierarchy import dendrogram, linkage, fcluster

# Load Iris dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df.head()

## Creating a Dendrogram
We use **linkage()** from Scipy to compute the hierarchical clustering and visualize it using a dendrogram.

In [None]:
# Perform hierarchical clustering
Z = linkage(df, method='ward')

# Plot dendrogram
plt.figure(figsize=(10, 5))
dendrogram(Z, truncate_mode='lastp', p=30, leaf_rotation=45., leaf_font_size=10., show_contracted=True)
plt.title('Hierarchical Clustering Dendrogram (Truncated)')
plt.xlabel('Cluster Size')
plt.ylabel('Distance')
plt.show()

## Forming Clusters
We can cut the dendrogram at a chosen distance to form a fixed number of clusters.

In [None]:
# Create 3 clusters from the dendrogram
clusters = fcluster(Z, t=3, criterion='maxclust')
df['cluster'] = clusters
df.head()

## Visualizing the Clusters

In [None]:
plt.scatter(df['sepal length (cm)'], df['sepal width (cm)'], 
            c=df['cluster'], cmap='rainbow', alpha=0.7)
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.title('Hierarchical Clustering on Iris Dataset')
plt.show()

## Key Notes:
- Hierarchical clustering builds a **tree of clusters** without requiring us to specify `k` upfront.
- Dendrograms help us decide the number of clusters by cutting at a particular distance.
- It can be computationally expensive for very large datasets.
- Useful when the cluster hierarchy (relationships between clusters) is important.