# Dimensionality Reduction

Many Machine Learning problems involve thousands or even millions of features for
each training instance. Not only does this make training extremely slow, it can also
make it much harder to find a good solution, as we will see. This problem is often
referred to as the curse of dimensionality

### Main Approaches for Dimensionality Reduction

#### Projection 

In most real-world problems, training instances are not spread out uniformly across
all dimensions. Many features are almost constant, while others are highly correlated
(as discussed earlier for MNIST). As a result, all training instances actually lie within
(or close to) a much lower-dimensional subspace of the high-dimensional space.

### Manifold Learning

Many dimensionality reduction algorithms work by modeling the manifold on which
the training instances lie; this is called Manifold Learning. It relies on the manifold
assumption, also called the manifold hypothesis, which holds that most real-world
high-dimensional datasets lie close to a much lower-dimensional manifold. This
assumption is very often empirically observed.

### PCA

First it identifies the hyperplane that lies closest to the data, and then
it projects the data onto it.

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')
X, y = mnist.data, mnist.target

In [3]:
from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X2D = pca.fit_transform(X)

In [4]:
pca.explained_variance_ratio_

array([0.09746116, 0.07155445])

In [5]:
pca = PCA(n_components=0.95)
X_reduced = pca.fit_transform(X)

In [8]:
len(pca.explained_variance_ratio_)

154

In [9]:
X_compressed = pca.inverse_transform(X_reduced)

In [11]:
X_compressed.shape

(70000, 784)