# Real-World Use Case: Fashion MNIST Visualization

## 1. The Problem
We have 70,000 images of clothing (784 dimensions). This is too big to plot. We want to see if our dataset is "clean". Are Sneakers separating from Boots? Are Shirts mixing with T-Shirts?

## 2. Why t-SNE?
PCA will just smudge everything together. t-SNE will try to create a map where similar items are neighbors.

## 3. Data
Fashion MNIST (via Keras or Sklearn).

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA

# 1. Load Data (Subset for speed)
# Fetching might take a moment, using a smaller sample.
fashion = fetch_openml('Fashion-MNIST', version=1, cache=True, as_frame=False)

indices = np.random.choice(70000, 1000, replace=False)
X = fashion.data[indices]
y = fashion.target[indices].astype(int)
target_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 
                'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# 2. Pipeline: PCA (Reduce to 50) -> t-SNE (Reduce to 2)
# It is best practice to run PCA before t-SNE to remove noise!
pca = PCA(n_components=50)
X_pca = pca.fit_transform(X)

tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
X_embedded = tsne.fit_transform(X_pca)

# 3. Plot
plt.figure(figsize=(12, 10))
scatter = plt.scatter(X_embedded[:, 0], X_embedded[:, 1], c=y, cmap='tab10', s=20)
plt.title('Fashion MNIST t-SNE Visualization')
plt.legend(handles=scatter.legend_elements()[0], labels=target_names)
plt.colorbar()
plt.show()