# Dimensionality Reduction

## The Curse of Dimensionality

## Main approaches for Dimensionality Reduction

### Projection

In below we can see a 3D dataset represented by
circles.

![image-3.png](attachment:image-3.png)

Notice that all training instances lie close to a plane: this is a lower-
dimensional (2D) subspace of the high-dimensional (3D) space. If we
project every training instance perpendicularly onto this subspace (as
represented by the short lines connecting the instances to the plane), we get the new 2D dataset shown in below figure Ta-da! We have just reduced
the dataset’s dimensionality from 3D to 2D. Note that the axes correspond
to new features z1 and z2 (the coordinates of the projections on the plane).

![image-2.png](attachment:image-2.png)

However, projection is not always the best approach to dimensionality
reduction. In many cases the subspace may twist and turn, such as in the
famous *Swiss roll* toy dataset represented in below figure.

![image.png](attachment:image.png)

Simply projecting onto a plane (e.g., by dropping $x_3$) would squash
different layers of the Swiss roll together, as shown on the left side of
below figure. What we really want is to unroll the Swiss roll to obtain the
2D dataset on the right side of below figure.

![image-4.png](attachment:image-4.png)

### Manifold Learning

The manifold assumption is often accompanied by another implicit
assumption: that the task at hand (e.g., classification or regression) will be
simpler if expressed in the lower-dimensional space of the manifold. For
example, in the top row of below figure the Swiss roll is split into two
classes: in the 3D space (on the left), the decision boundary would be
fairly complex, but in the 2D unrolled manifold space (on the right), the
decision boundary is a straight line.

However, this implicit assumption does not always hold. For example, in
the bottom row of below figure, the decision boundary is located at x1 = 5.
This decision boundary looks very simple in the original 3D space (a vertical plane), but it looks more complex in the unrolled manifold (a
collection of four independent line segments).

![image.png](attachment:image.png)

In short, reducing the dimensionality of your training set before training a
model will usually speed up training, but it may not always lead to a better
or simpler solution; it all depends on the dataset.

## PCA

### Preserving Variance

For example, a simple 2D dataset is represented on the left in below figure, along with three
different axes (i.e., 1D hyperplanes). On the right is the result of the projection of the dataset onto each of these axes. As you can see, the projection onto the solid line preserves the maximum variance, while the projection onto the dotted line preserves very little variance and the projection onto the dashed line preserves an intermediate amount of variance.

![image.png](attachment:image.png)


### Principal Components

PCA identifies the axis that accounts for the largest amount of variance in
the training set. In above figure, it is the solid line. It also finds a second
axis, orthogonal to the first one, that accounts for the largest amount of
remaining variance. In this 2D example there is no choice: it is the dotted
line. If it were a higher-dimensional dataset, PCA would also find a third
axis, orthogonal to both previous axes, and a fourth, a fifth, and so on—as
many axes as the number of dimensions in the dataset.

In above figure 8-7, the first PC is the axis on which vector **`c1`** lies, and the second
PC is the axis on which vector **`c2`** lies.