### What is Hierarchical Clustering?

It’s a way to group similar things (data points) step-by-step, making a tree of groups (clusters). You don’t need to say how many groups you want beforehand.

---

### How It Works (Agglomerative Way):

1. Start with every item in its own group.
2. Find the two closest groups and combine them.
3. Repeat step 2 until all items are in one big group or until you want to stop.

---

### What is a Dendrogram?

* A **dendrogram** looks like a tree diagram.
* It shows which groups got combined at each step.
* You can "cut" this tree at any height to decide how many groups you want.

---

### How to Measure Distance Between Groups? (Linkage)

When merging groups, you need to know how close they are. There are different ways:

* **Single linkage:** Distance between two groups = the smallest distance between any two points in those groups.
  *Think: just one close pair is enough to join them.*

* **Complete linkage:** Distance = the largest distance between any two points in the groups.
  *Think: groups only join if all points are close.*

* **Average linkage:** Distance = the average distance between all pairs of points in the groups.
  *Think: average closeness matters.*

* **Ward’s linkage:** Join groups so the overall difference inside groups stays as small as possible.
  *Think: keeps groups tight and balanced.*

---

### Quick Example

Imagine you have 4 points: A, B, C, D.

* A and B are close, so you join them first.
* Then C and D are close, so you join them.
* Finally, you join the two groups {A,B} and {C,D}.


## Dataset (1D points)

| Point | Value |
| ----- | ----- |
| A     | 1     |
| B     | 2     |
| C     | 5     |
| D     | 8     |

---

### Step 1: Calculate distances between all points

Distance = absolute difference

|      | A(1) | B(2) | C(5) | D(8) |
| ---- | ---- | ---- | ---- | ---- |
| A(1) | 0    | 1    | 4    | 7    |
| B(2) | 1    | 0    | 3    | 6    |
| C(5) | 4    | 3    | 0    | 3    |
| D(8) | 7    | 6    | 3    | 0    |

---

### Step 2: Merge closest points/clusters

* Closest points: A & B (distance 1)
* Merge → cluster {A,B}

Clusters now: {A,B}, C, D

---

### Step 3: Update distances from new cluster {A,B} to others using **single linkage** (minimum distance between any points):

* Distance({A,B}, C) = min(distance A-C, distance B-C) = min(4,3) = 3
* Distance({A,B}, D) = min(7,6) = 6

---

### Step 4: Merge closest clusters again

* Closest clusters: C & D (distance 3)
* Merge → cluster {C,D}

Clusters now: {A,B}, {C,D}

---

### Step 5: Update distance between clusters {A,B} and {C,D}

* Distance({A,B}, {C,D}) = min(distance between any points in these clusters)
* \= min(4,7,3,6) = 3

---

### Step 6: Merge clusters {A,B} and {C,D} at distance 3

Clusters now: {A,B,C,D} (all points in one cluster)

---

## Prediction: Assign a new point E = 4 to a cluster

---

### Step 1: Current clusters at cutoff distance 3:

* Cluster 1: {A (1), B (2)}
* Cluster 2: {C (5), D (8)}

---

### Step 2: Calculate distance from E=4 to each cluster (using single linkage)

* Distance(E, Cluster 1) = min(|4-1|, |4-2|) = min(3, 2) = 2
* Distance(E, Cluster 2) = min(|4-5|, |4-8|) = min(1, 4) = 1

---

### Step 3: Assign E to closest cluster

* Since 1 < 2, assign E to **Cluster 2 ({C, D})**

---

## Summary Table

| Step       | Action                                     | Clusters after step | Distance or note  |
| ---------- | ------------------------------------------ | ------------------- | ----------------- |
| 1          | Calculate distances                        | —                   | See Step 1 table  |
| 2          | Merge closest points A & B                 | {A,B}, C, D         | Distance 1        |
| 3          | Calculate distances from {A,B}             | {A,B}, C, D         | Distances 3 and 6 |
| 4          | Merge closest clusters C & D               | {A,B}, {C,D}        | Distance 3        |
| 5          | Calculate distance between {A,B} and {C,D} | {A,B}, {C,D}        | Distance 3        |
| 6          | Merge {A,B} and {C,D}                      | {A,B,C,D}           | Distance 3        |
| Prediction | Assign new point E=4                       | E assigned to {C,D} | Distances 2 and 1 |



In [4]:
import numpy as np

# Dataset points
points = {
    'A': 1,
    'B': 2,
    'C': 5,
    'D': 8
}

# Function to calculate distance between two points
def distance(p1, p2):
    return abs(p1 - p2)



In [5]:
# Initialize clusters: each point is its own cluster
clusters = [{k: v} for k, v in points.items()]

def single_linkage_distance(cluster1, cluster2):
    # Minimum distance between any point in cluster1 and cluster2
    distances = []
    for p1 in cluster1.values():
        for p2 in cluster2.values():
            distances.append(distance(p1, p2))
    return min(distances)

def merge_clusters(c1, c2):
    merged = c1.copy()
    merged.update(c2)
    return merged

# Hierarchical clustering steps
while len(clusters) > 1:
    min_dist = float('inf')
    to_merge = (None, None)
    
    # Find two clusters with minimum single linkage distance
    for i in range(len(clusters)):
        for j in range(i + 1, len(clusters)):
            dist = single_linkage_distance(clusters[i], clusters[j])
            if dist < min_dist:
                min_dist = dist
                to_merge = (i, j)
    
    # Merge the two closest clusters
    i, j = to_merge
    new_cluster = merge_clusters(clusters[i], clusters[j])
    
    # Remove old clusters and add the new merged cluster
    clusters = [clusters[k] for k in range(len(clusters)) if k not in (i, j)]
    clusters.append(new_cluster)
    
    # Print merge info
    print(f"Merged clusters with distance {min_dist}: {list(new_cluster.keys())}")

# Final cluster contains all points
final_cluster = clusters[0]

print("\nFinal cluster:", list(final_cluster.keys()))

Merged clusters with distance 1: ['A', 'B']
Merged clusters with distance 3: ['C', 'D']
Merged clusters with distance 3: ['A', 'B', 'C', 'D']

Final cluster: ['A', 'B', 'C', 'D']


In [6]:
# Prediction: Assign new point E=4 to closest cluster at earlier step
# For simplicity, let's split final cluster into two clusters formed before last merge:
cluster_1 = {'A': 1, 'B': 2}
cluster_2 = {'C': 5, 'D': 8}

new_point = 4

def dist_to_cluster(point, cluster):
    return min([abs(point - val) for val in cluster.values()])

dist1 = dist_to_cluster(new_point, cluster_1)
dist2 = dist_to_cluster(new_point, cluster_2)

assigned = 'Cluster 1 (A,B)' if dist1 < dist2 else 'Cluster 2 (C,D)'

print(f"\nNew point E={new_point} is assigned to {assigned} with distance {min(dist1, dist2)}")



New point E=4 is assigned to Cluster 2 (C,D) with distance 1
