# Chapter 8: Dimensionality Reduction

*Curse of dimensionality* - Each training instance has thousands or millions of features, which makes training extremely slow and much harder to find a good solution.

> Note: Reducing dimensionality does cause some information loss (similar to image compression). So if training is too slow, you should first try to train your system with the original data before considering using dimensionality reduction.

Dimensionality reduction is also useful for data visualization (DataViz). If data is reduced to 2-3 dimensions, you can plot it for any additional patterns or insight. Also great for presentations to non-data scientists.

## 8.1 The Curse of Dimensionality

Average distance in N-Dimension Comparison:

- 2D (unit square): 0.52
- 3D (unit cube): 0.66
- 1,000,000-D (unit hypercube): 408.25

Conclusions:

- There's just plenty of space in high dimensions.
- High-dimension datasets are at risk of being very sparse (ie. most training instances would be far away from each other).
- The more dimensions the training set has, the greater risk of overfitting.

> Recall: To solve high bias, add more and/or complex features.

## 8.2 Main Approaches for Dimensionality Reduction

### 8.2.1 Projection

Not all training instances are spread uniformly across all dimensions; some are constant and some are highly correlated with each other.  
=> All training instances lie within or close to a much lower-dimensional subspace.

Suppose all training instance of 3-D dataset lie close to a plane. If we project each instance onto the plane, we have successfully reduced from 3-D to 2-D.

But sometimes it's not the best idea if the dataset "twists or turns" because in doing so, the 2-D projections may overlap with each other instead of "unrolling" each section.

> Note: See Figures 8-4 and 8-5 in the book for pictorial reference.

### 8.2.2 Manifold Learning

A d-dimensional manifold is a part of an n-dimensional space (where d < n) that locally resembles a d-dimensional hyperplane.  
In Swiss roll example, d=2 & n=3, so it resembles a 2-D hyperplane rolled up in 3-D.

Key assumptions for **Manifold Learning**:

1. *Manifold assumption (manifold hypothesis)* - Most real-world high-dimensional datasets lie close to a much lower-dimensional manifold. Constraints tend to squeeze the dataset into a lower-dimensional manifold.

2. The task at hand (eg. classification, regression) will be simpler if expressed in the lower-dimensional space of the manifold.

> Note: Assumption #2 doesn't always hold. Sometimes, going from 3-D to 2-D results in complex -> simple decision boundary. Other times it results in simple -> complex. See Figure 8-6 in book for pictorial reference.

## 8.3 PCA

### 8.3.1 Preserving the Variance

### 8.3.2 Principal Components

### 8.3.3 Projecting Down to d Dimensions

### 8.3.4 Using Scikit-Learn

### 8.3.5 Explained Variance Ratio

### 8.3.6 Choosing the Right Number of Dimensions

### 8.3.7 PCA for Compression

### 8.3.8 Randomized PCA

### 8.3.9 Incremental PCA

## 8.4 Kernel PCA

### 8.4.1 Selecting a Kernel and Tuning Hyperparameters

## 8.5 LLE

## 8.6 Other Dimensionality Reduction Techniques