### ***Hierarchical Clustering***

Hierarchical clustering is another cluster-based algorithm that builds a hierarchy of clusters either by agglomerative (bottom-up) or divisive (top-down) approaches. It does not require specifying the number of clusters in advance and produces a dendrogram to visualize the cluster hierarchy.


***Dendrogram***

It is a tree-like diagram that illustrates the arrangement of clusters produced by hierarchical clustering. It shows the relationships between data points and how they are grouped together at various levels of similarity. The vertical axis represents the distance or dissimilarity between clusters, while the horizontal axis represents the individual data points or clusters. By cutting the dendrogram at a specific height, one can determine the number of clusters in the data.

![image.png](attachment:image.png)

***Types of Hierarchical Clustering***

| Type                          | Description                                                                                                                         |
| ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| **Agglomerative (Bottom-Up)** | Start with each point as its own cluster, then **merge** closest clusters step-by-step until one big cluster remains. (Most common) |
| **Divisive (Top-Down)**       | Start with one big cluster and **split** it repeatedly into smaller clusters. (Less common)                                         |


How It Works (Agglomerative Hierarchical Clustering)
- Start ‚Äî Each data point is its own cluster.
- Compute distance matrix ‚Äî Find distances between all pairs of clusters (usually Euclidean).

- Merge the two closest clusters.

- Recompute distances between the new cluster and all others.

- Repeat steps 3‚Äì4 until all points belong to one single cluster.

- Cut the dendrogram at a chosen level to get your desired number of clusters.

**üìè Linkage Methods (Distance Between Clusters)**

| Method               | How Distance is Calculated                  | Characteristics                |
| -------------------- | ------------------------------------------- | ------------------------------ |
| **Single Linkage**   | Min distance between points in two clusters | Can form long ‚Äúchain‚Äù clusters |
| **Complete Linkage** | Max distance between points                 | Produces compact clusters      |
| **Average Linkage**  | Mean distance between points                | Balanced result                |
| **Ward‚Äôs Method**    | Minimizes total variance (most used)        | Similar to K-Means effect      |


**üß© Order of Implementation**

### 1. Import Libraries
```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import linkage, dendrogram, fcluster
from sklearn.preprocessing import StandardScaler
````

---

### 2. Load Dataset

*(Load your data from CSV or any source)*

---

### 3. Preprocess Data

* Handle missing values
* Scale features (optional but recommended)

---

### 4. Compute Linkage Matrix

```python
Z = linkage(X, method='ward')  # Options: 'ward', 'complete', 'average', 'single'
```

---

### 5. Plot Dendrogram

```python
plt.figure(figsize=(10, 6))
dendrogram(Z)
plt.title("Dendrogram")
plt.xlabel("Data Points")
plt.ylabel("Euclidean Distance")
plt.show()
```

---

### 6. Choose Number of Clusters and Cut the Tree

```python
from scipy.cluster.hierarchy import fcluster
clusters = fcluster(Z, t=3, criterion='maxclust')  # Example: 3 clusters
```

---

### 7. Analyze and Visualize Clusters

```python
plt.scatter(X[:, 0], X[:, 1], c=clusters, cmap='rainbow')
plt.title("Hierarchical Clustering Results")
plt.show()
```

---