
# Visualizing Unsupervised Learning

Unsupervised learning results can be hard to explain without visual tools. **Visualization helps both technical teams and decision-makers** make sense of discovered patterns and relationships in data.

This notebook demonstrates **common visualization techniques** for unsupervised learning models like clustering and dimensionality reduction.

- **Scatter Plot** (PCA or clustering)
- **Heatmap** (correlation or group structure)
- **Silhouette Plot** (cluster quality)
- **t-SNE / UMAP** (natural high-dimensional groupings)


In [None]:
# Install required packages (uncomment if needed)
# !pip install numpy pandas matplotlib seaborn scikit-learn umap-learn

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_blobs
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_samples, silhouette_score
from sklearn.preprocessing import StandardScaler
from sklearn.manifold import TSNE
import umap.umap_ as umap

sns.set_theme(style="whitegrid")

In [None]:
# Create synthetic 2D data with clusters
X, y = make_blobs(n_samples=300, centers=4, cluster_std=1.0, random_state=42)
X = StandardScaler().fit_transform(X)

# Fit KMeans
kmeans = KMeans(n_clusters=4, random_state=42)
labels = kmeans.fit_predict(X)

In [None]:
# Scatter plot of clustering result
plt.figure(figsize=(8, 6))
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=labels, palette='tab10', s=40)
plt.title("Scatter Plot – Cluster Visualization (K-Means)")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend(title="Cluster")
plt.tight_layout()
plt.show()

In [None]:
# Heatmap of cluster centroids
centroids_df = pd.DataFrame(kmeans.cluster_centers_, columns=["Feature1", "Feature2"])
cmap = sns.diverging_palette(256, 256, n=256, as_cmap=True)
plt.figure(figsize=(6, 4))
sns.heatmap(centroids_df.T, annot=True, cmap=cmap, center=0, cbar_kws={'label': 'Centroid Value'})
plt.title("Heatmap – Cluster Centroids by Feature")
plt.xlabel("Cluster")
plt.ylabel("Feature")
plt.tight_layout()
plt.show()

In [None]:
# Silhouette Plot
silhouette_vals = silhouette_samples(X, labels)
silhouette_avg = silhouette_score(X, labels)

plt.figure(figsize=(8, 6))
y_lower = 10
for i in range(4):
    ith_vals = silhouette_vals[labels == i]
    ith_vals.sort()
    size = ith_vals.shape[0]
    y_upper = y_lower + size
    plt.fill_betweenx(np.arange(y_lower, y_upper), 0, ith_vals, label=f'Cluster {i+1}')
    y_lower = y_upper + 10

plt.axvline(x=silhouette_avg, color="red", linestyle="--", label="Average Silhouette")
plt.title("📏 Silhouette Plot – Cluster Cohesion and Separation")
plt.xlabel("Silhouette Coefficient")
plt.ylabel("Clustered Samples")
plt.legend()
plt.tight_layout()
plt.show()

In [None]:
# t-SNE and UMAP projection
X_tsne = TSNE(n_components=2, random_state=42).fit_transform(X)
X_umap = umap.UMAP(n_components=2, random_state=42).fit_transform(X)

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

sns.scatterplot(x=X_tsne[:, 0], y=X_tsne[:, 1], hue=labels, palette='tab10', s=40, ax=axes[0])
axes[0].set_title("t-SNE Projection")

sns.scatterplot(x=X_umap[:, 0], y=X_umap[:, 1], hue=labels, palette='tab10', s=40, ax=axes[1])
axes[1].set_title("UMAP Projection")

for ax in axes:
    ax.set_xlabel("Component 1")
    ax.set_ylabel("Component 2")

plt.suptitle("🌐 t-SNE & UMAP – Nonlinear Dimensionality Reduction", fontsize=14)
plt.tight_layout()
plt.show()


## Summary: Why Visualization Matters

Unsupervised learning doesn't produce labels or accuracy scores, so **visualizing the outcome is essential** to evaluate and communicate insights.

| Technique         | Use Case |
|------------------|----------|
| **Scatter Plot** | Quick overview of clusters or PCA projections |
| **Heatmap**       | Compare features across clusters or groups |
| **Silhouette Plot** | Evaluate how tightly and distinctly clusters are formed |
| **t-SNE / UMAP**  | Explore complex patterns in high-dimensional data |

> Use multiple visualizations together to understand structure, validate assumptions, and communicate clearly to stakeholders.
