# Lab 11: Clustering Algorithms

In this lab, you'll explore **unsupervised learning** through clustering — the task of grouping data points based on similarity without using labels. You'll compare K-Means, DBSCAN, and Hierarchical clustering, understand their parameters, and learn when to use each algorithm.

### Learning Objectives

- Understand K-Means clustering and choose K using the elbow method
- Understand DBSCAN and its sensitivity to ε (eps) and MinPts
- Interpret dendrograms from hierarchical clustering
- Compare clustering algorithms on datasets that highlight their strengths and weaknesses

### Overview

| Part | Topic                                 |
| ---- | ------------------------------------- |
| 1    | K-Means Clustering                    |
| 2    | DBSCAN Clustering                     |
| 3    | Hierarchical Clustering & Dendrograms |
| 4    | Algorithm Comparison                  |


---

## Setup


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering
from sklearn.datasets import make_blobs, make_moons, make_circles
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score

from scipy.cluster.hierarchy import dendrogram, linkage

import warnings

warnings.filterwarnings("ignore")

# Set random seed for reproducibility
np.random.seed(42)

### Helper Functions


In [None]:
def plot_clusters(
    X, labels, ax=None, title="Clusters", cmap="viridis", show_noise=True
):
    """
    Plot clustered data points with different colors for each cluster.

    Parameters:
    -----------
    X : array-like, shape (n_samples, 2)
        Data points
    labels : array-like
        Cluster labels (-1 indicates noise for DBSCAN)
    ax : matplotlib axis (optional)
    title : plot title
    cmap : colormap
    show_noise : whether to highlight noise points
    """
    if ax is None:
        fig, ax = plt.subplots(figsize=(8, 6))

    unique_labels = set(labels)

    # Plot each cluster
    for label in unique_labels:
        mask = labels == label
        if label == -1 and show_noise:
            # Noise points (DBSCAN)
            ax.scatter(
                X[mask, 0],
                X[mask, 1],
                c="gray",
                marker="x",
                s=50,
                alpha=0.6,
                label="Noise",
            )
        else:
            ax.scatter(
                X[mask, 0],
                X[mask, 1],
                s=50,
                edgecolors="k",
                linewidths=0.5,
                label=f"Cluster {label}",
            )

    ax.set_xlabel("Feature 1")
    ax.set_ylabel("Feature 2")
    ax.set_title(title)

    return ax


def plot_kmeans_centroids(X, kmeans, ax=None, title="K-Means Clusters"):
    """
    Plot K-Means clusters with centroids marked.
    """
    if ax is None:
        fig, ax = plt.subplots(figsize=(8, 6))

    # Plot data points colored by cluster
    scatter = ax.scatter(
        X[:, 0],
        X[:, 1],
        c=kmeans.labels_,
        cmap="viridis",
        s=50,
        edgecolors="k",
        linewidths=0.5,
    )

    # Plot centroids
    ax.scatter(
        kmeans.cluster_centers_[:, 0],
        kmeans.cluster_centers_[:, 1],
        c="red",
        marker="X",
        s=200,
        edgecolors="k",
        linewidths=2,
        label="Centroids",
    )

    ax.set_xlabel("Feature 1")
    ax.set_ylabel("Feature 2")
    ax.set_title(title)
    ax.legend()

    return ax

### Generate Datasets

We'll create several synthetic datasets that highlight different clustering scenarios.


In [None]:
# Dataset 1: Well-separated spherical blobs (ideal for K-Means)
X_blobs, y_blobs = make_blobs(
    n_samples=300, centers=3, cluster_std=0.6, random_state=42
)

# Dataset 2: Two moons (non-convex shapes, ideal for DBSCAN)
X_moons, y_moons = make_moons(n_samples=300, noise=0.08, random_state=42)

# Dataset 3: Concentric circles (non-convex, ideal for DBSCAN)
X_circles, y_circles = make_circles(
    n_samples=300, noise=0.05, factor=0.5, random_state=42
)

# Dataset 4: Blobs with varying density (challenging for DBSCAN)
X_varied, y_varied = make_blobs(
    n_samples=[100, 300, 50],
    centers=[[0, 0], [3, 3], [6, 0]],
    cluster_std=[0.3, 1.0, 0.5],
    random_state=42,
)

# Dataset 5: Data with outliers
X_outliers, y_outliers = make_blobs(
    n_samples=250, centers=2, cluster_std=0.5, random_state=42
)
# Add outliers
outliers = np.array([[-4, 4], [5, -3], [-3, -4], [6, 5], [0, 5]])
X_outliers = np.vstack([X_outliers, outliers])

# Visualize all datasets
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

datasets = [
    (X_blobs, "Spherical Blobs\n(Ideal for K-Means)"),
    (X_moons, "Two Moons\n(Non-convex shapes)"),
    (X_circles, "Concentric Circles\n(Non-convex shapes)"),
    (X_varied, "Varying Density\n(Different cluster sizes)"),
    (X_outliers, "Data with Outliers\n(5 outlier points added)"),
]

for ax, (X, title) in zip(axes, datasets):
    ax.scatter(X[:, 0], X[:, 1], s=30, edgecolors="k", linewidths=0.5)
    ax.set_title(title, fontsize=11)
    ax.set_xlabel("Feature 1")
    ax.set_ylabel("Feature 2")

# Hide the empty subplot
axes[5].axis("off")

plt.suptitle("Synthetic Datasets for Clustering", fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print(
    "These datasets will help us understand when each clustering algorithm excels or struggles."
)

---

## Part 1: K-Means Clustering

### How K-Means Works

K-Means is an iterative algorithm that partitions data into **K clusters** by:

1. **Initialize:** Randomly place K centroids
2. **Assign:** Assign each point to the nearest centroid
3. **Update:** Move each centroid to the mean of its assigned points
4. **Repeat:** Steps 2-3 until centroids stop moving (convergence)

**Objective:** Minimize the **within-cluster sum of squares (inertia)**:

$$\text{Inertia} = \sum_{i=1}^{K} \sum_{x \in C_i} ||x - \mu_i||^2$$

where $C_i$ is cluster $i$ and $\mu_i$ is its centroid.


### 1.1 K-Means Step by Step

Let's visualize how K-Means converges over iterations. We'll manually implement the algorithm to see exactly what happens at each step.


In [None]:
# Manual K-Means implementation to visualize each step clearly
def kmeans_step_by_step(X, initial_centroids, n_steps=6):
    """
    Run K-Means manually, returning centroids and labels at each step.
    """
    centroids = initial_centroids.copy()
    history = []

    for step in range(n_steps):
        # ASSIGN: Each point to nearest centroid
        distances = np.sqrt(((X[:, np.newaxis] - centroids) ** 2).sum(axis=2))
        labels = distances.argmin(axis=1)

        # Calculate inertia
        inertia = sum(
            ((X[labels == k] - centroids[k]) ** 2).sum() for k in range(len(centroids))
        )

        history.append(
            {"centroids": centroids.copy(), "labels": labels.copy(), "inertia": inertia}
        )

        # UPDATE: Move centroids to mean of assigned points
        # Handle empty clusters by keeping centroid in place
        new_centroids = []
        for k in range(len(centroids)):
            if (labels == k).sum() > 0:
                new_centroids.append(X[labels == k].mean(axis=0))
            else:
                new_centroids.append(centroids[k])  # Keep old position
        new_centroids = np.array(new_centroids)

        # Check for convergence
        if np.allclose(centroids, new_centroids):
            # Fill remaining history with converged state
            for _ in range(n_steps - step - 1):
                history.append(history[-1].copy())
            break

        centroids = new_centroids

    return history


# Use blobs dataset
X = X_blobs.copy()

# Start with initial centroids positioned near the center
# This causes each centroid to initially "steal" points from multiple clusters,
# resulting in more gradual convergence over several steps
initial_centroids = np.array(
    [
        [-3, 3],  # Between top and bottom-left clusters
        [0, 0],  # Central position
        [1, 5],  # Between top and right clusters
    ]
)

# Run K-Means step by step
history = kmeans_step_by_step(X, initial_centroids, n_steps=4)

# Visualize
fig, axes = plt.subplots(1, 4, figsize=(16, 4))
axes = axes.flatten()

for idx, (ax, state) in enumerate(zip(axes, history)):
    # Plot clusters
    ax.scatter(
        X[:, 0],
        X[:, 1],
        c=state["labels"],
        cmap="viridis",
        s=50,
        edgecolors="k",
        linewidths=0.5,
    )

    # Plot centroids
    ax.scatter(
        state["centroids"][:, 0],
        state["centroids"][:, 1],
        c="red",
        marker="X",
        s=200,
        edgecolors="k",
        linewidths=2,
    )

    # Draw arrows showing centroid movement (except first frame)
    if idx > 0:
        prev_centroids = history[idx - 1]["centroids"]
        for old, new in zip(prev_centroids, state["centroids"]):
            # Only draw arrow if centroid moved a meaningful distance
            dist = np.sqrt(((old - new) ** 2).sum())
            if dist > 0.1:  # Minimum distance threshold
                ax.annotate(
                    "",
                    xy=new,
                    xytext=old,
                    arrowprops=dict(arrowstyle="->", color="red", lw=2),
                )

    ax.set_title(f"Step {idx + 1}\nInertia: {state['inertia']:.1f}", fontsize=11)
    ax.set_xlabel("Feature 1")
    ax.set_ylabel("Feature 2")

plt.suptitle(
    "K-Means Convergence: Centroids Move Toward Cluster Centers", fontsize=14, y=1.02
)
plt.tight_layout()
plt.show()

print("Observe how:")
print(
    "• Centroids (red X) start in poor positions and MOVE toward the true cluster centers"
)
print("• Inertia DECREASES as the algorithm converges")
print("• Movement continues until centroids stabilize")

### 1.2 K-Means on Well-Separated Blobs

K-Means works best on **spherical, well-separated clusters** of similar size.


In [None]:
# Train K-Means with K=3 (the true number of clusters)
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
kmeans.fit(X_blobs)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Original data
axes[0].scatter(
    X_blobs[:, 0],
    X_blobs[:, 1],
    c=y_blobs,
    cmap="viridis",
    s=50,
    edgecolors="k",
    linewidths=0.5,
)
axes[0].set_title("True Labels (Ground Truth)", fontsize=12)
axes[0].set_xlabel("Feature 1")
axes[0].set_ylabel("Feature 2")

# K-Means result
plot_kmeans_centroids(X_blobs, kmeans, ax=axes[1], title="K-Means Clustering (K=3)")

plt.tight_layout()
plt.show()

print(f"Inertia (within-cluster sum of squares): {kmeans.inertia_:.2f}")
print(f"Silhouette Score: {silhouette_score(X_blobs, kmeans.labels_):.3f}")
print("\nK-Means successfully identifies the three clusters!")

### 1.3 Choosing K: The Elbow Method

In practice, we often don't know the true number of clusters. The **elbow method** helps us choose K:

1. Run K-Means for K = 1, 2, 3, ...
2. Plot inertia vs K
3. Look for the "elbow" — the point where adding more clusters gives diminishing returns


In [None]:
# Elbow method
K_range = range(1, 11)
inertias = []
silhouettes = []

for k in K_range:
    kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
    kmeans.fit(X_blobs)
    inertias.append(kmeans.inertia_)
    if k > 1:  # Silhouette score requires at least 2 clusters
        silhouettes.append(silhouette_score(X_blobs, kmeans.labels_))

# Plot
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Elbow plot
axes[0].plot(K_range, inertias, "bo-", linewidth=2, markersize=8)
axes[0].axvline(x=3, color="red", linestyle="--", alpha=0.7, label="Elbow at K=3")
axes[0].set_xlabel("Number of Clusters (K)", fontsize=12)
axes[0].set_ylabel("Inertia", fontsize=12)
axes[0].set_title("Elbow Method", fontsize=14)
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Silhouette plot
axes[1].plot(range(2, 11), silhouettes, "go-", linewidth=2, markersize=8)
axes[1].axvline(x=3, color="red", linestyle="--", alpha=0.7, label="Best at K=3")
axes[1].set_xlabel("Number of Clusters (K)", fontsize=12)
axes[1].set_ylabel("Silhouette Score", fontsize=12)
axes[1].set_title("Silhouette Score (Higher is Better)", fontsize=14)
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("The 'elbow' in the inertia plot suggests K=3 is optimal.")
print("The silhouette score also peaks at K=3, confirming this choice.")

### 1.4 Sensitivity to K

Let's see what happens when we choose the wrong number of clusters.


In [None]:
# K-Means with different K values
K_values = [2, 3, 4, 5]

fig, axes = plt.subplots(1, 4, figsize=(18, 4))

for ax, k in zip(axes, K_values):
    kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
    kmeans.fit(X_blobs)

    plot_kmeans_centroids(X_blobs, kmeans, ax=ax, title=f"K = {k}")
    ax.set_title(f"K = {k}\nInertia: {kmeans.inertia_:.1f}", fontsize=11)

plt.suptitle("Effect of K on Clustering Results (True K = 3)", fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print("K=2: Under-clustering — two natural clusters are merged")
print("K=3: Correct — matches the true structure")
print("K=4,5: Over-clustering — natural clusters are unnecessarily split")

### 1.5 Sensitivity to Initialization

K-Means can converge to different solutions depending on initial centroid positions. This is why `n_init` (number of random initializations) matters.


In [None]:
# K-Means with different random initializations
fig, axes = plt.subplots(2, 4, figsize=(18, 9))

seeds = [0, 1, 2, 3, 10, 20, 30, 42]

for ax, seed in zip(axes.flatten(), seeds):
    # Use n_init=1 to see effect of different initializations
    kmeans = KMeans(n_clusters=3, init="random", n_init=1, random_state=seed)
    kmeans.fit(X_blobs)

    plot_kmeans_centroids(X_blobs, kmeans, ax=ax, title=f"Seed {seed}")
    ax.set_title(f"Random Seed: {seed}\nInertia: {kmeans.inertia_:.1f}", fontsize=10)
    ax.legend().remove()  # Remove legend for cleaner display

plt.suptitle(
    "K-Means with Different Random Initializations (n_init=1)", fontsize=14, y=1.02
)
plt.tight_layout()
plt.show()

print("Notice: Different initializations can lead to different results!")
print("Some have higher inertia (worse) than others.")
print("\nSolution: Use n_init > 1 (sklearn default is 10) to run multiple")
print("initializations and keep the best result.")

### 1.6 K-Means Limitations: Non-Convex Shapes

K-Means assumes clusters are **spherical** and **similar in size**. Let's see what happens on non-convex data.


In [None]:
# K-Means on moons and circles datasets
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Moons - True labels
axes[0, 0].scatter(
    X_moons[:, 0],
    X_moons[:, 1],
    c=y_moons,
    cmap="viridis",
    s=50,
    edgecolors="k",
    linewidths=0.5,
)
axes[0, 0].set_title("Moons: True Labels", fontsize=12)

# Moons - K-Means
kmeans_moons = KMeans(n_clusters=2, random_state=42)
kmeans_moons.fit(X_moons)
plot_kmeans_centroids(
    X_moons, kmeans_moons, ax=axes[0, 1], title="Moons: K-Means (K=2)"
)

# Circles - True labels
axes[1, 0].scatter(
    X_circles[:, 0],
    X_circles[:, 1],
    c=y_circles,
    cmap="viridis",
    s=50,
    edgecolors="k",
    linewidths=0.5,
)
axes[1, 0].set_title("Circles: True Labels", fontsize=12)

# Circles - K-Means
kmeans_circles = KMeans(n_clusters=2, random_state=42)
kmeans_circles.fit(X_circles)
plot_kmeans_centroids(
    X_circles, kmeans_circles, ax=axes[1, 1], title="Circles: K-Means (K=2)"
)

for ax in axes.flatten():
    ax.set_xlabel("Feature 1")
    ax.set_ylabel("Feature 2")

plt.suptitle("K-Means Fails on Non-Convex Shapes", fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print("K-Means fails because it draws LINEAR boundaries between centroids.")
print(
    "It cannot capture the curved structure of moons or the nested structure of circles."
)
print("\nThis is where DBSCAN shines!")

### 1.7 Your Turn: Find Optimal K

**Task:** Use the elbow method to find the optimal K for a mystery dataset.


In [None]:
# Mystery dataset
X_mystery, _ = make_blobs(n_samples=400, centers=5, cluster_std=0.8, random_state=123)

# Visualize
plt.figure(figsize=(8, 6))
plt.scatter(X_mystery[:, 0], X_mystery[:, 1], s=30, edgecolors="k", linewidths=0.5)
plt.title("Mystery Dataset - How many clusters?", fontsize=12)
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

In [None]:
# TODO: Use the elbow method to find the optimal K
# 1. Run K-Means for K = 1 to 10
# 2. Collect inertias
# 3. Plot the elbow curve
# 4. Identify the optimal K

# K_range = range(1, 11)
# inertias = []

# for k in K_range:
#     TODO: Fit K-Means and record inertia

# TODO: Plot the elbow curve

---

## Part 2: DBSCAN Clustering

### How DBSCAN Works

**DBSCAN** (Density-Based Spatial Clustering of Applications with Noise) groups together points that are closely packed and marks points in low-density regions as outliers.

**Key parameters:**

- **eps (ε):** The maximum distance between two points to be considered neighbors
- **MinPts (min_samples):** Minimum number of points required to form a dense region

**Point types:**

- **Core point:** Has at least MinPts neighbors within eps distance
- **Border point:** Within eps of a core point but has fewer than MinPts neighbors
- **Noise point:** Not within eps of any core point (labeled as -1)

**Advantages:**

- No need to specify K (number of clusters)
- Can find arbitrarily shaped clusters
- Automatically identifies outliers/noise


### 2.1 Core, Border, and Noise Points

Let's visualize these different point types.


In [None]:
# Demonstrate core, border, and noise points
from sklearn.neighbors import NearestNeighbors

# Use blobs with some outliers
X = X_outliers.copy()

# Fit DBSCAN
dbscan = DBSCAN(eps=0.5, min_samples=5)
labels = dbscan.fit_predict(X)

# Identify core samples
core_mask = np.zeros(len(X), dtype=bool)
core_mask[dbscan.core_sample_indices_] = True

# Identify noise, border points
noise_mask = labels == -1
border_mask = ~core_mask & ~noise_mask

# Plot
fig, ax = plt.subplots(figsize=(10, 8))

ax.scatter(
    X[core_mask, 0],
    X[core_mask, 1],
    c="blue",
    s=100,
    edgecolors="k",
    linewidths=1,
    label=f"Core Points ({core_mask.sum()})",
)
ax.scatter(
    X[border_mask, 0],
    X[border_mask, 1],
    c="yellow",
    s=80,
    edgecolors="k",
    linewidths=1,
    label=f"Border Points ({border_mask.sum()})",
)
ax.scatter(
    X[noise_mask, 0],
    X[noise_mask, 1],
    c="red",
    marker="x",
    s=100,
    linewidths=2,
    label=f"Noise Points ({noise_mask.sum()})",
)

ax.set_xlabel("Feature 1", fontsize=12)
ax.set_ylabel("Feature 2", fontsize=12)
ax.set_title(
    f"DBSCAN: Core, Border, and Noise Points\n(eps={0.5}, min_samples={5})", fontsize=14
)
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Core points (blue): Have ≥ min_samples neighbors within eps distance")
print(
    "Border points (yellow): Within eps of a core point, but have < min_samples neighbors"
)
print("Noise points (red X): Not within eps of any core point — these are outliers!")

### 2.2 DBSCAN on Non-Convex Data

DBSCAN can find **arbitrarily shaped clusters** — exactly where K-Means failed!


In [None]:
# DBSCAN on moons and circles
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Moons - True labels
axes[0, 0].scatter(
    X_moons[:, 0],
    X_moons[:, 1],
    c=y_moons,
    cmap="viridis",
    s=50,
    edgecolors="k",
    linewidths=0.5,
)
axes[0, 0].set_title("Moons: True Labels", fontsize=12)

# Moons - DBSCAN
dbscan_moons = DBSCAN(eps=0.2, min_samples=5)
labels_moons = dbscan_moons.fit_predict(X_moons)
plot_clusters(X_moons, labels_moons, ax=axes[0, 1], title="Moons: DBSCAN")

# Circles - True labels
axes[1, 0].scatter(
    X_circles[:, 0],
    X_circles[:, 1],
    c=y_circles,
    cmap="viridis",
    s=50,
    edgecolors="k",
    linewidths=0.5,
)
axes[1, 0].set_title("Circles: True Labels", fontsize=12)

# Circles - DBSCAN
dbscan_circles = DBSCAN(eps=0.2, min_samples=5)
labels_circles = dbscan_circles.fit_predict(X_circles)
plot_clusters(X_circles, labels_circles, ax=axes[1, 1], title="Circles: DBSCAN")

for ax in axes.flatten():
    ax.set_xlabel("Feature 1")
    ax.set_ylabel("Feature 2")

plt.suptitle("DBSCAN Successfully Clusters Non-Convex Shapes!", fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print("DBSCAN correctly identifies both moons and both circles!")
print(
    "Unlike K-Means, DBSCAN follows the DENSITY of points, not distance to centroids."
)

### 2.3 Sensitivity to eps

The **eps** parameter defines the neighborhood size. Too small = everything is noise. Too large = everything is one cluster.


In [None]:
# DBSCAN with different eps values
eps_values = [0.05, 0.1, 0.2, 0.3, 0.5, 1.0]

fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

for ax, eps in zip(axes, eps_values):
    dbscan = DBSCAN(eps=eps, min_samples=5)
    labels = dbscan.fit_predict(X_moons)

    n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
    n_noise = (labels == -1).sum()

    plot_clusters(X_moons, labels, ax=ax, title=f"eps = {eps}")
    ax.set_title(f"eps = {eps}\nClusters: {n_clusters}, Noise: {n_noise}", fontsize=11)

plt.suptitle("DBSCAN Sensitivity to eps (min_samples=5)", fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print("eps too small (0.05): Almost everything is noise — neighborhoods are too tight")
print("eps too large (1.0): Everything merges into one cluster")
print("eps just right (0.2): Correctly identifies the two moons")

### 2.4 Sensitivity to min_samples (MinPts)

The **min_samples** parameter controls how many neighbors a point needs to be considered a core point.


In [None]:
# DBSCAN with different min_samples values
min_samples_values = [2, 3, 5, 10, 20, 50]

fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

for ax, min_samples in zip(axes, min_samples_values):
    dbscan = DBSCAN(eps=0.2, min_samples=min_samples)
    labels = dbscan.fit_predict(X_moons)

    n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
    n_noise = (labels == -1).sum()

    plot_clusters(X_moons, labels, ax=ax, title=f"min_samples = {min_samples}")
    ax.set_title(
        f"min_samples = {min_samples}\nClusters: {n_clusters}, Noise: {n_noise}",
        fontsize=11,
    )

plt.suptitle("DBSCAN Sensitivity to min_samples (eps=0.2)", fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print("min_samples too low (2): May create spurious clusters from noise")
print("min_samples too high (50): Too strict — sparse regions become noise")
print("min_samples just right (5-10): Good balance")

### 2.5 DBSCAN Limitation: Varying Density

DBSCAN uses a **global eps** parameter. When clusters have very different densities, one eps value can't fit all.


In [None]:
# DBSCAN on varying density clusters
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# True labels
axes[0].scatter(
    X_varied[:, 0],
    X_varied[:, 1],
    c=y_varied,
    cmap="viridis",
    s=50,
    edgecolors="k",
    linewidths=0.5,
)
axes[0].set_title("True Labels\n(3 clusters of different densities)", fontsize=11)

# DBSCAN with small eps (good for dense cluster)
dbscan_small = DBSCAN(eps=0.3, min_samples=5)
labels_small = dbscan_small.fit_predict(X_varied)
n_clusters = len(set(labels_small)) - (1 if -1 in labels_small else 0)
plot_clusters(X_varied, labels_small, ax=axes[1])
axes[1].set_title(
    f"DBSCAN (eps=0.3)\nFinds {n_clusters} clusters, misses sparse one", fontsize=11
)

# DBSCAN with large eps (good for sparse cluster)
dbscan_large = DBSCAN(eps=0.8, min_samples=5)
labels_large = dbscan_large.fit_predict(X_varied)
n_clusters = len(set(labels_large)) - (1 if -1 in labels_large else 0)
plot_clusters(X_varied, labels_large, ax=axes[2])
axes[2].set_title(
    f"DBSCAN (eps=0.8)\nFinds {n_clusters} clusters, merges dense ones", fontsize=11
)

for ax in axes:
    ax.set_xlabel("Feature 1")
    ax.set_ylabel("Feature 2")

plt.suptitle("DBSCAN Struggles with Varying Density", fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print("The dilemma: No single eps works for all clusters!")
print("• Small eps: Misses the sparse (spread out) cluster")
print("• Large eps: Merges the dense (tight) clusters")

### 2.6 Your Turn: Tune DBSCAN Parameters

**Task:** Find good eps and min_samples values to correctly cluster this noisy dataset.


In [None]:
# Dataset with noise
X_noisy, y_noisy = make_blobs(
    n_samples=200, centers=3, cluster_std=0.4, random_state=42
)
# Add uniform noise
noise = np.random.uniform(low=-6, high=6, size=(30, 2))
X_noisy = np.vstack([X_noisy, noise])

# Visualize
plt.figure(figsize=(8, 6))
plt.scatter(X_noisy[:, 0], X_noisy[:, 1], s=30, edgecolors="k", linewidths=0.5)
plt.title("Noisy Dataset - Find the 3 clusters and identify noise", fontsize=12)
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

print("Goal: Identify 3 clusters and label the scattered points as noise (-1)")

In [None]:
# TODO: Experiment with different eps and min_samples values
# Try to find parameters that:
# 1. Correctly identify 3 clusters
# 2. Label scattered points as noise (gray X markers)

# eps = ???
# min_samples = ???

# dbscan = DBSCAN(eps=eps, min_samples=min_samples)
# labels = dbscan.fit_predict(X_noisy)

# plot_clusters(X_noisy, labels, title=f'DBSCAN (eps={eps}, min_samples={min_samples})')
# plt.show()

---

## Part 3: Hierarchical Clustering & Dendrograms

### How Hierarchical Clustering Works

**Agglomerative (bottom-up) clustering:**

1. Start with each point as its own cluster
2. Repeatedly merge the two closest clusters
3. Stop when all points are in one cluster (or desired number reached)

**Linkage methods** (how to measure distance between clusters):

- **Single linkage:** Distance between closest points (tends to create elongated clusters)
- **Complete linkage:** Distance between farthest points (tends to create compact clusters)
- **Average linkage:** Average distance between all pairs
- **Ward linkage:** Minimizes increase in total within-cluster variance (often best for spherical)

**Advantages:**

- No need to specify K upfront
- Produces a **dendrogram** — a tree showing the merge hierarchy
- Can cut the tree at different heights to get different numbers of clusters


### 3.1 Understanding Dendrograms

A **dendrogram** shows the hierarchical relationships between clusters. The **y-axis** shows the distance (or dissimilarity) at which clusters are merged.


In [None]:
# Use a smaller dataset for clearer visualization
X_small, y_small = make_blobs(n_samples=50, centers=3, cluster_std=0.6, random_state=42)

# Compute linkage matrix
Z = linkage(X_small, method="ward")

# Plot dendrogram
fig, ax = plt.subplots(figsize=(14, 7))

dendrogram(Z, ax=ax, leaf_rotation=90, leaf_font_size=8)

ax.set_xlabel("Sample Index", fontsize=12)
ax.set_ylabel("Distance (Ward)", fontsize=12)
ax.set_title("Dendrogram: Hierarchical Clustering with Ward Linkage", fontsize=14)

# Add horizontal lines to show different cuts
ax.axhline(y=5, color="red", linestyle="--", alpha=0.7, label="Cut for 3 clusters")
ax.axhline(y=10, color="blue", linestyle="--", alpha=0.7, label="Cut for 2 clusters")
ax.legend()

plt.tight_layout()
plt.show()

print("How to read a dendrogram:")
print("• Each leaf at the bottom is a data point")
print("• Vertical lines show merges; height = distance at merge")
print("• Cutting horizontally gives different numbers of clusters")
print("• Red line (height 5) → 3 clusters | Blue line (height 10) → 2 clusters")

### 3.2 Cutting the Dendrogram

We can cut the dendrogram at different heights to obtain different numbers of clusters.


In [None]:
# Cut dendrogram at different heights / numbers of clusters
n_clusters_list = [2, 3, 4, 5]

fig, axes = plt.subplots(1, 4, figsize=(18, 4))

for ax, n_clusters in zip(axes, n_clusters_list):
    # Agglomerative clustering
    agg = AgglomerativeClustering(n_clusters=n_clusters, linkage="ward")
    labels = agg.fit_predict(X_small)

    # Plot
    scatter = ax.scatter(
        X_small[:, 0],
        X_small[:, 1],
        c=labels,
        cmap="viridis",
        s=60,
        edgecolors="k",
        linewidths=0.5,
    )
    ax.set_title(f"{n_clusters} Clusters", fontsize=12)
    ax.set_xlabel("Feature 1")
    ax.set_ylabel("Feature 2")

plt.suptitle("Cutting the Dendrogram at Different Levels", fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print("The dendrogram lets us explore clustering at multiple levels of granularity.")

### 3.3 Linkage Methods Comparison

Different linkage methods can produce very different cluster structures.


In [None]:
# Compare linkage methods
linkage_methods = ["single", "complete", "average", "ward"]

# Create a dataset that shows the differences
X_link = X_blobs.copy()

fig, axes = plt.subplots(2, 4, figsize=(18, 10))

for i, method in enumerate(linkage_methods):
    # Compute linkage
    Z = linkage(X_link, method=method)

    # Dendrogram
    dendrogram(
        Z,
        ax=axes[0, i],
        leaf_rotation=90,
        leaf_font_size=6,
        truncate_mode="lastp",
        p=30,
    )  # Show only last 30 merges
    axes[0, i].set_title(f"{method.capitalize()} Linkage", fontsize=12)
    axes[0, i].set_xlabel("Cluster")
    if i == 0:
        axes[0, i].set_ylabel("Distance")

    # Clustering result with 3 clusters
    agg = AgglomerativeClustering(n_clusters=3, linkage=method)
    labels = agg.fit_predict(X_link)

    axes[1, i].scatter(
        X_link[:, 0],
        X_link[:, 1],
        c=labels,
        cmap="viridis",
        s=40,
        edgecolors="k",
        linewidths=0.5,
    )
    axes[1, i].set_xlabel("Feature 1")
    if i == 0:
        axes[1, i].set_ylabel("Feature 2")

plt.suptitle("Comparison of Linkage Methods (3 Clusters)", fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print("Linkage method effects:")
print("• Single: Tends to create 'chaining' — elongated clusters")
print("• Complete: Creates compact, similar-diameter clusters")
print("• Average: Compromise between single and complete")
print("• Ward: Minimizes variance — often best for spherical clusters")

### 3.4 The Chaining Effect (Single Linkage)

**Single linkage** can create long, chain-like clusters because it only considers the closest pair of points between clusters.


In [None]:
# Create elongated clusters to demonstrate chaining
np.random.seed(42)
n_points = 100

# Two elongated clusters with a potential bridge
cluster1_x = np.linspace(0, 5, n_points) + np.random.normal(0, 0.2, n_points)
cluster1_y = np.random.normal(0, 0.3, n_points)

cluster2_x = np.linspace(0, 5, n_points) + np.random.normal(0, 0.2, n_points)
cluster2_y = np.random.normal(2, 0.3, n_points)

# Add a few points that bridge the clusters
bridge_x = np.array([2.5, 2.6])
bridge_y = np.array([0.7, 1.3])

X_chain = np.vstack(
    [
        np.column_stack([cluster1_x, cluster1_y]),
        np.column_stack([cluster2_x, cluster2_y]),
        np.column_stack([bridge_x, bridge_y]),
    ]
)

# Compare single vs ward linkage
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# Original data
axes[0].scatter(X_chain[:, 0], X_chain[:, 1], s=30, edgecolors="k", linewidths=0.5)
axes[0].scatter(bridge_x, bridge_y, c="red", s=100, marker="*", label="Bridge points")
axes[0].set_title("Data with Bridge Points", fontsize=12)
axes[0].legend()

# Single linkage
agg_single = AgglomerativeClustering(n_clusters=2, linkage="single")
labels_single = agg_single.fit_predict(X_chain)
axes[1].scatter(
    X_chain[:, 0],
    X_chain[:, 1],
    c=labels_single,
    cmap="viridis",
    s=30,
    edgecolors="k",
    linewidths=0.5,
)
axes[1].set_title(
    "Single Linkage (2 clusters)\nChaining creates ONE cluster!", fontsize=11
)

# Ward linkage
agg_ward = AgglomerativeClustering(n_clusters=2, linkage="ward")
labels_ward = agg_ward.fit_predict(X_chain)
axes[2].scatter(
    X_chain[:, 0],
    X_chain[:, 1],
    c=labels_ward,
    cmap="viridis",
    s=30,
    edgecolors="k",
    linewidths=0.5,
)
axes[2].set_title(
    "Ward Linkage (2 clusters)\nCorrectly separates clusters", fontsize=11
)

for ax in axes:
    ax.set_xlabel("Feature 1")
    ax.set_ylabel("Feature 2")

plt.suptitle("The Chaining Effect in Single Linkage", fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print(
    "Single linkage chains through the bridge points, merging both elongated clusters."
)
print("Ward linkage is more robust to such 'bridges'.")

### 3.5 Using Dendrograms to Choose K

Look for **large jumps** in the dendrogram — they indicate natural cluster boundaries.


In [None]:
# Use blobs dataset
Z = linkage(X_blobs, method="ward")

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Full dendrogram
dendrogram(Z, ax=axes[0], truncate_mode="lastp", p=50, leaf_rotation=90)
axes[0].axhline(y=15, color="red", linestyle="--", alpha=0.7)
axes[0].set_title("Dendrogram (Ward Linkage)\nLook for large gaps!", fontsize=12)
axes[0].set_xlabel("Cluster")
axes[0].set_ylabel("Distance")

# Distance vs number of clusters
# Get the distances at which merges occur
distances = Z[:, 2]
n_clusters_range = range(1, 11)

# For n clusters, we need to look at the (n-1)th largest distance
sorted_distances = np.sort(distances)[::-1]
cluster_distances = sorted_distances[:10]

axes[1].plot(n_clusters_range, cluster_distances, "bo-", linewidth=2, markersize=8)
axes[1].set_xlabel("Number of Clusters", fontsize=12)
axes[1].set_ylabel("Distance at Merge", fontsize=12)
axes[1].set_title(
    "Merge Distance vs Number of Clusters\n(Large drop suggests natural boundary)",
    fontsize=12,
)
axes[1].grid(True, alpha=0.3)
axes[1].axvline(x=3, color="red", linestyle="--", alpha=0.7, label="Suggested K=3")
axes[1].legend()

plt.tight_layout()
plt.show()

print("The large gap in the dendrogram around height 15 suggests K=3 clusters.")
print("The right plot shows a significant drop in merge distance after 3 clusters.")

### 3.6 Hierarchical Clustering on the Iris Dataset

Let's apply hierarchical clustering to a real dataset.


In [None]:
from sklearn.datasets import load_iris

# Load Iris dataset
iris = load_iris()
X_iris = iris.data
y_iris = iris.target
feature_names = iris.feature_names
target_names = iris.target_names

# Scale features
scaler = StandardScaler()
X_iris_scaled = scaler.fit_transform(X_iris)

# Compute linkage
Z_iris = linkage(X_iris_scaled, method="ward")

# Plot dendrogram
fig, ax = plt.subplots(figsize=(14, 7))

dendrogram(Z_iris, ax=ax, truncate_mode="lastp", p=30, leaf_rotation=90)
ax.axhline(y=7, color="red", linestyle="--", alpha=0.7, label="Cut for 3 clusters")
ax.set_xlabel("Cluster", fontsize=12)
ax.set_ylabel("Distance (Ward)", fontsize=12)
ax.set_title("Dendrogram of Iris Dataset", fontsize=14)
ax.legend()

plt.tight_layout()
plt.show()

print(f"Iris dataset has 3 species: {list(target_names)}")
print("The dendrogram shows a natural split into 2 or 3 clusters.")

In [None]:
# Compare clustering result with true labels
agg_iris = AgglomerativeClustering(n_clusters=3, linkage="ward")
labels_iris = agg_iris.fit_predict(X_iris_scaled)

# Visualize using first two principal components for 2D projection
from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_iris_2d = pca.fit_transform(X_iris_scaled)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# True labels
for i, name in enumerate(target_names):
    mask = y_iris == i
    axes[0].scatter(
        X_iris_2d[mask, 0],
        X_iris_2d[mask, 1],
        s=50,
        edgecolors="k",
        linewidths=0.5,
        label=name,
    )
axes[0].set_title("True Species Labels", fontsize=12)
axes[0].legend()

# Hierarchical clustering result
for i in range(3):
    mask = labels_iris == i
    axes[1].scatter(
        X_iris_2d[mask, 0],
        X_iris_2d[mask, 1],
        s=50,
        edgecolors="k",
        linewidths=0.5,
        label=f"Cluster {i}",
    )
axes[1].set_title("Hierarchical Clustering (Ward, K=3)", fontsize=12)
axes[1].legend()

for ax in axes:
    ax.set_xlabel("PC1")
    ax.set_ylabel("PC2")

plt.suptitle(
    "Iris Dataset: True Labels vs Hierarchical Clustering", fontsize=14, y=1.02
)
plt.tight_layout()
plt.show()

# Compute accuracy (note: cluster labels may not match species labels directly)
from sklearn.metrics import adjusted_rand_score

ari = adjusted_rand_score(y_iris, labels_iris)
print(f"Adjusted Rand Index: {ari:.3f}")
print("(1.0 = perfect match, 0.0 = random)")

### 3.7 Your Turn: Interpret a Dendrogram

**Task:** Use the dendrogram to determine a good number of clusters for this mystery dataset.


In [None]:
# Mystery dataset
X_hier_mystery, _ = make_blobs(
    n_samples=200, centers=4, cluster_std=[0.5, 0.8, 0.6, 0.7], random_state=77
)

# TODO: Compute linkage (try 'ward')
# Z = linkage(X_hier_mystery, method='ward')

# TODO: Plot dendrogram
# dendrogram(Z, ...)

# TODO: Based on the dendrogram, what's a good number of clusters?
# Look for large gaps in the merge distances

---

## Part 4: Algorithm Comparison

Now let's compare all three clustering algorithms head-to-head on different types of data.


### 4.1 Visual Comparison Grid

Let's see how each algorithm performs on each dataset type.


In [None]:
# Datasets and their characteristics
datasets = [
    (X_blobs, "Spherical Blobs", 3),
    (X_moons, "Two Moons", 2),
    (X_circles, "Concentric Circles", 2),
    (X_varied, "Varying Density", 3),
    (X_outliers, "With Outliers", 2),
]

fig, axes = plt.subplots(5, 4, figsize=(18, 22))

for row, (X, name, true_k) in enumerate(datasets):
    # Standardize for fair comparison
    X_std = StandardScaler().fit_transform(X)

    # Original data
    axes[row, 0].scatter(X_std[:, 0], X_std[:, 1], s=30, edgecolors="k", linewidths=0.5)
    axes[row, 0].set_title(f"{name}\n(Original Data)", fontsize=10)

    # K-Means
    kmeans = KMeans(n_clusters=true_k, random_state=42, n_init=10)
    labels_km = kmeans.fit_predict(X_std)
    axes[row, 1].scatter(
        X_std[:, 0],
        X_std[:, 1],
        c=labels_km,
        cmap="viridis",
        s=30,
        edgecolors="k",
        linewidths=0.5,
    )
    axes[row, 1].scatter(
        kmeans.cluster_centers_[:, 0],
        kmeans.cluster_centers_[:, 1],
        c="red",
        marker="X",
        s=150,
        edgecolors="k",
    )
    axes[row, 1].set_title(f"K-Means (K={true_k})", fontsize=10)

    # DBSCAN (with tuned parameters)
    # Different eps for different datasets
    if name == "Spherical Blobs":
        eps = 0.5
    elif name == "Varying Density":
        eps = 0.5
    else:
        eps = 0.3

    dbscan = DBSCAN(eps=eps, min_samples=5)
    labels_db = dbscan.fit_predict(X_std)
    n_clusters_db = len(set(labels_db)) - (1 if -1 in labels_db else 0)
    n_noise = (labels_db == -1).sum()

    # Color noise differently
    colors = labels_db.copy().astype(float)
    colors[labels_db == -1] = -1  # Noise
    scatter = axes[row, 2].scatter(
        X_std[:, 0],
        X_std[:, 1],
        c=colors,
        cmap="viridis",
        s=30,
        edgecolors="k",
        linewidths=0.5,
    )
    # Mark noise points
    noise_mask = labels_db == -1
    if noise_mask.any():
        axes[row, 2].scatter(
            X_std[noise_mask, 0],
            X_std[noise_mask, 1],
            c="gray",
            marker="x",
            s=50,
            alpha=0.7,
        )
    axes[row, 2].set_title(
        f"DBSCAN (eps={eps})\n{n_clusters_db} clusters, {n_noise} noise", fontsize=10
    )

    # Hierarchical (Ward)
    agg = AgglomerativeClustering(n_clusters=true_k, linkage="ward")
    labels_agg = agg.fit_predict(X_std)
    axes[row, 3].scatter(
        X_std[:, 0],
        X_std[:, 1],
        c=labels_agg,
        cmap="viridis",
        s=30,
        edgecolors="k",
        linewidths=0.5,
    )
    axes[row, 3].set_title(f"Hierarchical (Ward, K={true_k})", fontsize=10)

# Add column headers
col_titles = ["Data", "K-Means", "DBSCAN", "Hierarchical"]
for ax, title in zip(axes[0], col_titles):
    ax.annotate(
        title,
        xy=(0.5, 1.15),
        xycoords="axes fraction",
        fontsize=12,
        fontweight="bold",
        ha="center",
    )

plt.suptitle("Clustering Algorithm Comparison", fontsize=16, y=1.01)
plt.tight_layout()
plt.show()

### 4.2 Performance Summary

Let's summarize how each algorithm performed.


| Dataset            | K-Means     | DBSCAN      | Hierarchical | Best Choice            |
| ------------------ | ----------- | ----------- | ------------ | ---------------------- |
| Spherical Blobs    | ✓ Excellent | ✓ Good      | ✓ Excellent  | K-Means / Hierarchical |
| Two Moons          | ✗ Fails     | ✓ Excellent | ⚠ Okay       | DBSCAN                 |
| Concentric Circles | ✗ Fails     | ✓ Excellent | ⚠ Okay       | DBSCAN                 |
| Varying Density    | ✓ Good      | ⚠ Struggles | ✓ Good       | K-Means                |
| With Outliers      | ⚠ Affected  | ✓ Robust    | ⚠ Affected   | DBSCAN                 |


### 4.3 When to Use Each Algorithm

Here's a decision guide based on your data characteristics:


| Algorithm        | Use When                               | Avoid When                             |
| ---------------- | -------------------------------------- | -------------------------------------- |
| **K-Means**      | Spherical clusters of similar size     | Non-convex shapes (moons, circles)     |
|                  | You know or can estimate K             | Data with many outliers                |
|                  | Large datasets (scales well)           |                                        |
| **DBSCAN**       | Arbitrarily shaped clusters            | Clusters of very different densities   |
|                  | Data has outliers/noise                | High-dimensional data                  |
|                  | Unknown number of clusters             |                                        |
| **Hierarchical** | Want to explore multiple granularities | Very large datasets (O(n²) complexity) |
|                  | Need to understand cluster hierarchy   | Non-convex shapes (with Ward)          |
|                  | Don't know K upfront                   |                                        |


### 4.4 Your Turn: Choose the Right Algorithm

**Task:** For each dataset below, choose the most appropriate clustering algorithm and justify your choice.


In [None]:
# Mystery dataset 1
np.random.seed(55)
X_m1 = np.vstack(
    [
        np.random.randn(100, 2) * 0.5 + [0, 0],
        np.random.randn(100, 2) * 0.5 + [4, 0],
        np.random.randn(100, 2) * 0.5 + [2, 3],
    ]
)

# Mystery dataset 2
t = np.linspace(0, 2 * np.pi, 200)
X_m2 = np.vstack(
    [
        np.column_stack(
            [
                np.cos(t) + np.random.randn(200) * 0.1,
                np.sin(t) + np.random.randn(200) * 0.1,
            ]
        ),
        np.column_stack(
            [
                3 * np.cos(t) + np.random.randn(200) * 0.15,
                3 * np.sin(t) + np.random.randn(200) * 0.15,
            ]
        ),
    ]
)

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].scatter(X_m1[:, 0], X_m1[:, 1], s=30, edgecolors="k", linewidths=0.5)
axes[0].set_title("Mystery Dataset 1\nWhich algorithm?", fontsize=12)

axes[1].scatter(X_m2[:, 0], X_m2[:, 1], s=30, edgecolors="k", linewidths=0.5)
axes[1].set_title("Mystery Dataset 2\nWhich algorithm?", fontsize=12)

for ax in axes:
    ax.set_xlabel("Feature 1")
    ax.set_ylabel("Feature 2")

plt.tight_layout()
plt.show()

print("Think about:")
print("• What shape are the clusters?")
print("• How many clusters do you see?")
print("• Are there outliers?")
print("• Are the clusters similar in density/size?")

In [None]:
# TODO: For Mystery Dataset 1
# What algorithm would you choose? Why?
# Hint: What shape are the clusters?

# TODO: For Mystery Dataset 2
# What algorithm would you choose? Why?
# Hint: Can a straight line separate these clusters?

---

## Reflection Questions

1. **Why does K-Means fail on the moons dataset?** What assumption does K-Means make about cluster shapes?

2. **What happens to DBSCAN when clusters have very different densities?** Why is this a fundamental limitation?

3. **How would you choose between single linkage and Ward linkage** for hierarchical clustering? When might single linkage be appropriate despite the chaining effect?

4. **If you don't know anything about your data**, which algorithm would you try first, and why?

5. **How could you validate clustering results** when you don't have true labels?


---

## Summary

In this lab, you learned:

1. **K-Means** partitions data by minimizing within-cluster variance. It works best on spherical clusters of similar size, but requires specifying K and is sensitive to initialization.

2. **DBSCAN** finds density-based clusters of arbitrary shape and automatically identifies outliers. It doesn't require K but needs eps and min_samples parameters, and struggles with varying-density clusters.

3. **Hierarchical clustering** builds a tree of clusters (dendrogram) that shows relationships at multiple levels. It's useful when you want to explore different granularities but doesn't scale well to large datasets.

4. **No single algorithm is best for all data** — the choice depends on cluster shapes, density, outliers, and whether you know K.

### Key Takeaways

- Use the **elbow method** or **silhouette score** to choose K for K-Means
- DBSCAN excels at **non-convex shapes** and **outlier detection**
- **Dendrograms** help visualize hierarchical relationships and choose the number of clusters
- Always **visualize your clustering results** to validate they make sense
- Consider **scaling your features** before clustering (especially for K-Means and hierarchical)
