# Dimensionality Reduction
The technique used to reduce the number of features in a dataset while retaining as much information as possible is called dimensionality reduction. Dimensionality reduction is a key step in the data preprocessing pipeline. It can be used to reduce the number of features in a dataset, which can be useful for the following reasons:

## Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is a technique used to transform a dataset into a new coordinate system where the dimensions are orthogonal and uncorrelated. Simply put, PCA is a mathematical method (orthogonal linear transformation) that takes a dataset and reduces its dimensions while retaining as much of the original information as possible. This technique is commonly used to simplify complex data sets and make them easier to visualize or work with.
PCA works by identifying the most important variables in a dataset and creating a new set of variables that captures as much of the variance in the original data as possible. The new set of variables, known as Principal Components, are ordered in terms of their importance, with the first Principal Component explaining the most variance in the data, the second explaining the second-most, and so on.

### Assumptions
- the relationship between the variables/features are linear
- the direction with the largest variance is the most informative
- the principle components are orthogonal (linearly independent, linearly uncorrelated)

### Advantages
- Fast
- Easy to interpret
- Works well with linear models (linear correlation)

### Disadvantages
- Assumes linear relationship between variables (strict assumptions)
- Cannot capture non-linear relationships
- Cannot capture complex relationships

Intrinsic dimension is the minimum number of variables needed to represent the key features of a dataset. It is not necessarily the same as the number of variables in the original dataset. Knowing the intrinsic dimension can help choose the right dimensionality reduction technique for a dataset.

## Manifold Learning
Manifold Learning is another technique used in dimensionality reduction. Unlike PCA, which is a linear method, manifold learning is a nonlinear method that can capture complex relationships between variables in a dataset.

Manifold learning works by mapping the dataset to a lower-dimensional space, while still preserving the relationships between the variables. The lower-dimensional space is often a manifold, which is a mathematical object that can be thought of as a curved surface embedded in higher-dimensional space. By mapping the dataset to a lower-dimensional manifold, it becomes easier to visualize and work with.

Manifold learning “estimates” this low-dimensional manifold, typically based only on the distances between the high-dimensional data.

## t-distributed Stochastic Neighbor Embedding (t-SNE)
t-Distributed Stochastic Neighbor Embedding (t-SNE) is a nonlinear technique used in dimensionality reduction, similar to manifold learning. t-SNE is particularly useful for visualizing high-dimensional data in a low-dimensional space, such as a 2D or 3D plot.

t-SNE works by modeling the high-dimensional data using a probability distribution, and then modeling the low-dimensional data using another probability distribution. The goal is to minimize the divergence between the two probability distributions, which is achieved by adjusting the positions of the data points in the low-dimensional space.

### Crowding problem
When attempting to preserve medium-range distances between data points, we also attempt to preserve the volume spanned by them. There isn't enough area/volume available in lower-dimensional spaces to reliably represent very large volumes in high-dimensional space. So when we "pull together" more distant points, we are overcrowding densely populated areas in low dimensional space and cannot separate distinct clusters very well.

### Advantages
- Can capture complex relationships (non-linear)
- Effective for visualizing high-dimensional data

### Disadvantages
- Slow for large datasets
- Parameters are difficult to tune

## UMAP
UMAP (Uniform Manifold Approximation and Projection) is a dimensionality reduction technique that is similar to t-SNE, but is faster and more scalable. UMAP can be used to visualize high-dimensional data in a low-dimensional space, as well as for clustering and classification tasks.

UMAP works by constructing a weighted graph based on the high-dimensional data, and then optimizing a low-dimensional representation of the data that preserves the graph structure. UMAP is particularly effective at preserving the global structure of the data, while still being able to capture local structure and details.

### Advantages
- Fast and scalable
- Preserves global structure of the data
- Good on large datasets

### Disadvantages
- Need to handpick the free parameters
- Sizes and distances between each cluster are not informative
- Sensitive to optimazation parameters