In [2]:
import numpy as np

from sklearn.decomposition import PCA, IncrementalPCA
from sklearn.random_projection import johnson_lindenstrauss_min_dim, GaussianRandomProjection

# The Curse of Dimensionality

Training sets with high dimensions are resource hogs when training a model, but reducing the dimensions can negatively impact performance.

Additionally, these training sets are very sparse in their space because there is a lot of distance between points in higher dimensions. The number of training instances required to reach a given density grows exponentially with the number of dimensions. With just 100 features, all ranging from $0$ to $1$, you would need more training instances than atoms in the observable universe in order for training instances to be within $0.1$ of each other on average, assuming they were spread out uniformly across all dimensions.

In most data sets, training instances are not spread out uniformly across all dimensions. Many features will be highly correlated or are almost constant. As a result, the training instances lie within (or close to) a much lower-dimensional subspace of the high-dimensional space. 

# Principle Component Analysis (PCA)

First, PCA identifies the hyperplane closest to the data and then projects the data onto it. PCA identifies the $1$-dimensional subspace that has the largest amount of variance in the training set. Then it finds a $1$-dimensional subspace that is orthogonal to the irst one and that accounts for the largest amount of the remaining variance. It continues until a hyperplane is found. The $i^{th}$ subspace is called the _$i^{th}$ principal component_ of the data.

_Singular Value Decomposition (SVD)_ \
A matrix decomposition of a matrix $X$ into $U\Sigma V^{T}$ where $V$ contains the unit vectors that define all the principal components.