# Notebook 10 — Unsupervised Learning (Exercises)

In this notebook, you'll practice **unsupervised learning** techniques: clustering and dimensionality reduction.

## Exercise 1 — k-Means Clustering

- Generate a synthetic dataset with 2 clusters.
- Apply `k-means` with `k=2`.
- Plot the data points colored by cluster assignment.
- Print the cluster centers.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans

# TODO: Generate data with 2 clusters
X, _ = make_blobs(n_samples=200, centers=2, random_state=42)

# TODO: Fit k-means
# kmeans = KMeans(n_clusters=2, random_state=42)
# y_pred = kmeans.fit_predict(X)

# TODO: Plot data points and cluster centers
# plt.scatter(X[:,0], X[:,1], c=y_pred, cmap='viridis')
# plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1], c='red', marker='x')
# plt.show()

## Exercise 2 — Hierarchical Clustering

- Use the same dataset from Exercise 1.
- Apply **Agglomerative Clustering** with `n_clusters=2`.
- Compare results with k-means.
- Plot a scatter plot of cluster assignments.

In [2]:
from sklearn.cluster import AgglomerativeClustering

# TODO: Fit Agglomerative Clustering
# hc = AgglomerativeClustering(n_clusters=2)
# y_hc = hc.fit_predict(X)

# TODO: Plot
# plt.scatter(X[:,0], X[:,1], c=y_hc, cmap='coolwarm')
# plt.show()

## Exercise 3 — Principal Component Analysis (PCA)

- Generate a 3D dataset with `make_blobs`.
- Apply PCA to reduce it to 2D.
- Plot the transformed dataset.

In [3]:
from sklearn.decomposition import PCA

# TODO: Generate 3D dataset
X3, _ = make_blobs(n_samples=200, centers=3, n_features=3, random_state=42)

# TODO: Apply PCA
# pca = PCA(n_components=2)
# X2 = pca.fit_transform(X3)

# TODO: Plot
# plt.scatter(X2[:,0], X2[:,1], c='blue')
# plt.show()

## Exercise 4 — t-SNE for Dimensionality Reduction

- Use the MNIST digits dataset (`load_digits`).
- Reduce the data to 2D using t-SNE.
- Plot the 2D representation colored by digit label.

In [4]:
from sklearn.datasets import load_digits
from sklearn.manifold import TSNE

# TODO: Load digits dataset
# digits = load_digits()
# X_digits, y_digits = digits.data, digits.target

# TODO: Apply t-SNE (this may take a minute)
# tsne = TSNE(n_components=2, random_state=42)
# X_tsne = tsne.fit_transform(X_digits)

# TODO: Plot
# plt.scatter(X_tsne[:,0], X_tsne[:,1], c=y_digits, cmap='tab10', s=10)
# plt.colorbar()
# plt.show()