# CIFAR-10 Image Clustering (Unsupervised)
_Last updated: 2025-08-14 08:34 UTC_

**Goals**
- Extract features (simple CNN or pretrained)
- Reduce dimensionality (PCA/t-SNE/UMAP)
- Cluster (KMeans) and evaluate with NMI/ARI

In [None]:

import numpy as np, matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from sklearn.cluster import KMeans
from sklearn.metrics import normalized_mutual_info_score, adjusted_rand_score
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
%matplotlib inline


## 1. Load CIFAR-10

In [None]:

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x = x_test.astype('float32')
y = y_test.flatten()
x.shape, y.shape


## 2. Feature Extraction (pretrained MobileNetV2)

In [None]:

# Resize to 96x96 for MobileNetV2
import tensorflow as tf
x_resized = tf.image.resize(x, (96,96)).numpy()
x_pp = preprocess_input(x_resized)
base = MobileNetV2(include_top=False, pooling='avg', input_shape=(96,96,3), weights='imagenet')
feat = base.predict(x_pp, batch_size=128, verbose=0)
feat.shape


## 3. Dimensionality Reduction + Visualization

In [None]:

pca = PCA(n_components=50, random_state=0).fit_transform(feat)
tsne = TSNE(n_components=2, init='pca', random_state=0, learning_rate='auto').fit_transform(pca)
plt.figure(figsize=(6,6))
plt.scatter(tsne[:,0], tsne[:,1], s=5, c=y, cmap='tab10')
plt.title("t-SNE of features (color = true label)")
plt.show()


## 4. Clustering & Metrics

In [None]:

kmeans = KMeans(n_clusters=10, n_init=10, random_state=0).fit(feat)
labels = kmeans.labels_
nmi = normalized_mutual_info_score(y, labels)
ari = adjusted_rand_score(y, labels)
nmi, ari


## 5. ✅ Exercises
- Try KMeans on PCA-50 vs raw features
- Swap MobileNetV2 for EfficientNetB0
- Evaluate cluster purity by majority-vote mapping