## Dimensionality Reduction
Reducing dimensionality does lose some information (just like compressing an image to JPEG can degrade its quality), so even though it will speed up training, it may also make your system perform slightly worse. It also makes your pipelines a bit more complex and thus harder to maintain. So you should first try to train your system with the original data before considering using dimensionality reduction if training is too slow. In some cases, however, reducing the dimensionality of the training data may filter out some noise and unnecessary details and thus result in higher performance (but in general it won’t; it will just speed up training).

Apart from speeding up training, it is also extremely useful
for data visualization (or DataViz). Reducing the number of dimensions down to two
205(or three) makes it possible to plot a high-dimensional training set on a graph and
often gain some important insights by visually detecting patterns, such as clusters. we will present 2 main approaches to dimensionality reduction (projection and Manifold Learning), and we will go
through three of the most popular dimensionality reduction techniques: PCA, Kernel PCA, and LLE.

### Principal Component Analysis (PCA)
It is by far the most popular dimensionality reduc‐
tion algorithm. First it identifies the hyperplane that lies closest to the data, and then
it projects the data onto it. PCA identifies the axis that accounts for the largest amount of variance in the training set as solid line, then 2nd and so on..., it's called i th principal component (PC) c1...

The direction of the principal components is not stable: if you per‐
turb the training set slightly and run PCA again, some of the new
PCs may point in the opposite direction of the original PCs. How‐
ever, they will generally still lie on the same axes. In some cases, a
pair of PCs may even rotate or swap, but the plane they define will
generally remain the same. if you implement PCA yourself (as in the pre‐
ceding example), or if you use other libraries, don’t forget to center
the data first.

### Projecting Down to d Dimensions
Once you have identified all the principal components, you can reduce the dimen‐
sionality of the dataset down to d dimensions by projecting it onto the hyperplane
defined by the first d principal components. You now know how to reduce the dimensionality of any dataset
down to any number of dimensions, while preserving as much variance as possible.

Scikit-Learn’s PCA class implements PCA using SVD decomposition, (note that it automatically takes care of centering the data)

In [None]:
from sklearn.decomposition import PCA

pca = PCA(n_components = 2)
X2D = pca.fit_transform(X)

In [None]:
pca.components_.T[:,0]

In [None]:
# the explained variance ratio of each principal component
print(pca.explained_variance_ratio_)

array([ 0.84248607, 0.14631839]), This tells you that 84.2% of the dataset’s variance lies along the first axis, and 14.6% lies along the second axis. This leaves less than 1.2% for the third axis, so it is reasonable to assume that it probably carries little information.

### Choosing the Right Number of Dimensions
it is
generally preferable to choose the number of dimensions that add up to a sufficiently
large portion of the variance (e.g., 95%). Unless, of course, you are reducing dimen‐
sionality for data visualization—in that case you will generally want to reduce the
dimensionality down to 2 or 3.

In [None]:
pca = PCA()
pca.fit(X)
cumsum = np.cumsum(pca.explained_variance_ratio_)
d = np.argmax(cumsum >= 0.95) + 1

In [None]:
pca = PCA(n_components=0.95)
X_reduced = pca.fit_transform(X)

### PCA for Compression
the reconstructed data
(compressed and then decompressed) is called the reconstruction error. For example,
the following code compresses the MNIST dataset down to 154 dimensions, then uses
the inverse_transform() method to decompress it back to 784 dimensions.

In [None]:
pca = PCA(n_components = 154)
X_mnist_reduced = pca.fit_transform(X_mnist)
X_mnist_recovered = pca.inverse_transform(X_mnist_reduced)

### Incremental PCA
the preceding implementation of PCA is that it requires the whole
training set to fit in memory in order for the SVD algorithm to run. Fortunately,
Incremental PCA (IPCA) algorithms have been developed: you can split the training
set into mini-batches and feed an IPCA algorithm one mini-batch at a time. This is
useful for large training sets, and also to apply PCA online (i.e., on the fly, as new
instances arrive).

The following code splits the MNIST dataset into 100 mini-batches (using NumPy’s
array_split() function) and feeds them to Scikit-Learn’s IncrementalPCA class 5 to
reduce the dimensionality of the MNIST dataset down to 154 dimensions (just like
before). Note that you must call the partial_fit() method with each mini-batch
rather than the fit() method with the whole training set:

In [None]:
from sklearn.decomposition import IncrementalPCA

n_batches = 100
inc_pca = IncrementalPCA(n_components=154)

for X_batch in np.array_split(X_mnist, n_batches):
    inc_pca.partial_fit(X_batch)
    
X_mnist_reduced = inc_pca.transform(X_mnist)

In [None]:
# Alternatively
X_mm = np.memmap(filename, dtype="float32", mode="readonly", shape=(m, n))
batch_size = m // n_batches
inc_pca = IncrementalPCA(n_components=154, batch_size=batch_size)
inc_pca.fit(X_mm)

### Randomized PCA
Scikit-Learn offers yet another option to perform PCA, called Randomized PCA. This
is a stochastic algorithm that quickly finds an approximation of the first d principal
components. it is dramatically faster than the previous algorithms when d is much
smaller than n.

In [None]:
rnd_pca = PCA(n_components=154, svd_solver="randomized")
X_reduced = rnd_pca.fit_transform(X_mnist)

### Kernel PCA
the kernel trick can be applied to PCA, making it possible to perform
complex nonlinear projections for dimensionality reduction.

In [None]:
from sklearn.decomposition import KernelPCA

rbf_pca = KernelPCA(n_components = 2, kernel="rbf", gamma=0.04)
X_reduced = rbf_pca.fit_transform(X)

### Selecting a Kernel and Tuning Hyperparameters
dimensionality reduction is often a preparation step for a supervised learning task
(e.g., classification), so you can simply use grid search to select the kernel and hyper‐
parameters that lead to the best performance on that task.

the following code creates a two-step pipeline, first reducing dimensionality to two dimensions
using kPCA, then applying Logistic Regression for classification. Then it uses Grid
SearchCV to find the best kernel and gamma value for kPCA in order to get the best
classification accuracy at the end of the pipeline

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

In [None]:
clf = Pipeline([("kpca", KernelPCA(n_components=2)),("log_reg", LogisticRegression())])

param_grid = [{"kpca__gamma": np.linspace(0.03, 0.05, 10),"kpca__kernel": ["rbf", "sigmoid"]}]

grid_search = GridSearchCV(clf, param_grid, cv=3)
grid_search.fit(X, y)

In [None]:
print(grid_search.best_params_)

In [None]:
# Otherwise
rbf_pca = KernelPCA(n_components = 2, kernel="rbf", gamma=0.0433, fit_inverse_transform=True)
X_reduced = rbf_pca.fit_transform(X)
X_preimage = rbf_pca.inverse_transform(X_reduced)

In [None]:
# You can then compute the reconstruction pre-image error
from sklearn.metrics import mean_squared_error
mean_squared_error(X, X_preimage)

Now you can use grid search with cross-validation to find the kernel and hyperpara‐
meters that minimize this pre-image reconstruction error.

### Locally Linear Embedding (LLE)
another very powerful nonlinear dimensionality
reduction (NLDR) technique. It is a Manifold Learning technique that does not rely
on projections like the previous algorithms.

In [None]:
from sklearn.manifold import LocallyLinearEmbedding

lle = LocallyLinearEmbedding(n_components=2, n_neighbors=10)
X_reduced = lle.fit_transform(X)

### Other Dimensionality Reduction Techniques
* Multidimensional Scaling (MDS) 
* Isomap
* t-Distributed Stochastic Neighbor Embedding (t-SNE)
* Linear Discriminant Analysis (LDA)