*Dimensionality* refers to the number of attributes a dataset has. In the context of machine learning, it's the number of features an instance is composed of.

The *curse of dimensionality* refers to the complications introduced by the typically-high-dimensionality of machine learning problems, including slow training speed and difficulty establishing a good soolution.

*Dimensionality Reduction* is the practice of reducing the number of features in a dataset through techniques such as transformation, compression, and projection. In addition to training optimization, reducing dimensions can also enable better visualization.

Topics:
- Curse of Dimensionality
- Typical Approaches
- Principal Component Analysis (PCA)
- Kernel PCA
- Locally Linear Embedding (LLE)
- Other Techniques

# Curse of Dimensionality

High-dimensional spaces aren't easy to grasp since there's no reasonable way to visualize them, but the curse of dimensionality can be demonstrated with an example:

- Given a random point in a unit square, there is a 0.4% probability of the point being within 0.001u of a border
- Given a random point in a 10,000-dimensional unit hypercube, the probability is $\gt$ 99.999999%
- Given two random points in a unit square, the average distance between them is ~0.52u
- Given two random points in a 10,000-dimensional unit hypercube, the average distance is ~40.8u

This illustrates two things:

- Most data in a high-dimensional space is likely to be considered 'extreme' in at least one dimension
- A *huge* amount of data is required to reach a reasonable density of training instances

# Typical Approaches

Data is rarely distributed evenly across all dimensions, and certain patterns enable techniques that can reduce dimensionality with minimal information loss.

**Projection** is a transformation technique that identifies a lower-dimensional subspace that aligns well with the data, perpendicularly projects the data onto that plane, and represents the data in terms of the projections' coordinates in the subspace.

When the dataset follows a more complex pattern, however, projection can further convolute the data.

**Manifold Learning** is a technique that identifies and models a *manifold* that aligns with the data, where a $d$-dimensionoal manifold is part of an $n$-dimensional space (where $d \lt n$) that locally resembles a $d$-dimensional hyperplane. This relies on the *manifold assumption*, which holds that most high-dimensional datasets lie close to a much lower-dimensional *manifold*.