# Chapter 8. Dimensionality Reduction

In various Machine Learning scenarios, the inclusion of numerous features per training instance can lead to slow and challenging training processes due to the curse of dimensionality. However, this predicament can often be mitigated in real-world applications by substantially reducing feature counts, thus rendering an initially intractable task more manageable. For instance, in the case of MNIST images, pixels situated along the image periphery can typically be disregarded without significant loss of information. 

Similarly, the correlation between neighboring pixels permits their consolidation into single pixels, contributing to dimensionality reduction. It is important to note that although dimensionality reduction accelerates training, it may marginally affect system performance due to information loss and increased pipeline complexity. Dimensionality reduction's benefits extend beyond hastening training, as it enables effective data visualization, condensing multidimensional datasets into two or three dimensions for graphical representation, aiding pattern recognition and insights.

Additionally, this form of visualization, termed DataViz, serves as a critical tool for conveying findings to non-data scientist stakeholders, especially decision-makers who will utilize the outcomes. This chapter explores the intricacies of the curse of dimensionality and delves into the dynamics of high-dimensional space. Subsequently, it delves into the two core dimensionality reduction methodologies—projection and Manifold Learning. Three prominent techniques for dimensionality reduction—PCA, Kernel PCA, and LLE—are thoroughly examined and discussed, elucidating their applications and impact.

### The Curse of Dimensionality

Our cognitive understanding falters when we attempt to envision high-dimensional spaces, where intuition struggles to grasp the intricacies. Even conceiving a basic 4D hypercube becomes complex, let alone envisaging more dimensions. In such spaces, various phenomena diverge from our accustomed perception. For instance, the behavior of distances between points transforms substantially. In a low-dimensional context like a unit square or cube, randomly chosen points exhibit relatively short average distances between them. However, in a high-dimensional realm, like a 1,000,000-dimensional hypercube, the average distance between two randomly selected points astonishingly stretches out, reflecting the abundant expanse in high dimensions. Consequently, high-dimensional datasets often fall victim to sparsity, rendering training instances widely scattered, thereby undermining the reliability of predictions due to substantial extrapolations.

One potential solution to the challenges posed by high-dimensional spaces is to augment the training set's size, aiming to achieve an adequate density of instances. Nonetheless, practical implementation highlights a daunting reality: the number of training instances required grows exponentially with dimensionality. In the case of just 100 features, considerably fewer than in complex tasks like the MNIST problem, achieving an average distance of 0.1 between instances necessitates more instances than the observable universe's atoms, assuming even distribution across dimensions. This curse of dimensionality underscores the formidable complexity introduced by the amplification of dimensions and the subsequent exponential increase in the demand for training data.

### Main Approaches for Dimensionality Reduction