# **CHAPTER 8**
# **Dimensionality Reduction**

Many Machine Learning datasets contain a very large number of features, sometimes reaching thousands or even millions per instance. While having more features may seem beneficial, in practice it often causes serious problems. High dimensionality significantly slows down training, increases memory usage, and makes it harder for algorithms to generalize well. This phenomenon is known as the curse of dimensionality.
Dimensionality reduction aims to reduce the number of features while preserving as much useful information as possible. Although some information loss is inevitable, reducing dimensionality often makes problems tractable and training much faster. In some cases, it can even improve performance by removing noise and redundant information. Dimensionality reduction is also extremely valuable for data visualization, since reducing data to two or three dimensions allows humans to visually detect patterns such as clusters and anomalies.

**Main Approaches for Dimensionality Reduction**

There are two main approaches to dimensionality reduction: projection and manifold learning.

**Projection**

In many real-world datasets, data points do not fill the entire high-dimensional space uniformly. Instead, they often lie close to a lower-dimensional subspace due to correlations between features. Projection methods exploit this property by projecting the data onto a lower-dimensional subspace that preserves most of the information.
A simple example is projecting a 3D dataset onto a 2D plane. This works well when the data lies close to a flat subspace. However, projection fails when the data structure is nonlinear, such as in the Swiss roll dataset, where simple projection collapses important structures.

**Manifold Learning**

Manifold learning assumes that high-dimensional data lies on or near a lower-dimensional manifold embedded in a higher-dimensional space. A manifold can be curved and twisted, unlike a flat subspace.
This assumption, called the manifold hypothesis, often holds for real-world data such as images, speech, and text. By learning the underlying manifold, dimensionality reduction algorithms can unfold complex structures and represent the data more meaningfully in lower dimensions. However, reducing dimensionality does not always simplify the learning task; in some cases, decision boundaries may actually become more complex.

**Principal Component Analysis (PCA)**

PCA is the most widely used dimensionality reduction technique. It identifies the directions (principal components) along which the data varies the most and projects the data onto these directions.

**Preserving the Variance**

 PCA selects the axes that preserve the maximum variance in the data. Preserving variance helps retain the most important information while minimizing reconstruction error. PCA can also be interpreted as finding the projection that minimizes the mean squared distance between the original data and its lower-dimensional representation.

 **Principal Components and SVD**

 PCA finds orthogonal axes called principal components, ordered by the amount of variance they explain. These components are computed using Singular Value Decomposition (SVD).

In [2]:
import numpy as np
from sklearn.datasets import load_iris

# Load dataset
iris = load_iris()
X = iris["data"]  # shape (150, 4)

# Centering
X_centered = X - X.mean(axis=0)

# SVD
U, s, Vt = np.linalg.svd(X_centered)

# Ambil 2 komponen utama
c1 = Vt.T[:, 0]
c2 = Vt.T[:, 1]

print("c1:", c1)
print("c2:", c2)


c1: [ 0.36138659 -0.08452251  0.85667061  0.3582892 ]
c2: [-0.65658877 -0.73016143  0.17337266  0.07548102]


**Projecting Down to d Dimensions**

Once the principal components are identified, the dataset can be projected onto the subspace spanned by the first d components.

In [3]:
W2 = Vt.T[:, :2]
X2D = X_centered.dot(W2)

**PCA Using Scikit-Learn**

Scikit-Learn provides a PCA implementation that automatically centers the data and performs SVD internally.

In [4]:
from sklearn.decomposition import PCA

pca = PCA(n_components = 2)
X2D = pca.fit_transform(X)

**Explained Variance Ratio**

The explained variance ratio indicates how much variance each principal component captures. This information helps decide how many dimensions to keep.

In [5]:
pca.explained_variance_ratio_

array([0.92461872, 0.05306648])

**Choosing the Right Number of Dimensions**

Instead of choosing the number of components manually, PCA allows preserving a fixed proportion of variance (e.g., 95%).

In [7]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA

# Load dataset
iris = load_iris()
X = iris["data"]  # fitur
y = iris["target"]

# Split train/test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Fit PCA di train
pca = PCA()
pca.fit(X_train)

# Ambil cumulative explained variance
cumsum = np.cumsum(pca.explained_variance_ratio_)

# Tentukan jumlah komponen untu


In [8]:
pca = PCA(n_components=0.95)
X_reduced = pca.fit_transform(X_train)

**PCA for Compression**

PCA can significantly compress datasets while retaining most of the variance. Compressed data can be approximately reconstructed using the inverse transformation.

In [10]:
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

# Load dataset
iris = load_iris()
X = iris["data"]  # 4 fitur
y = iris["target"]

# Split train/test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# PCA dengan jumlah komponen maksimal
pca = PCA(n_components=4)
X_reduced = pca.fit_transform(X_train)
X_recovered = pca.inverse_transform(X_reduced)

print("Original shape:", X_train.shape)
print("Reduced shape:", X_reduced.shape)
print("Recovered shape:", X_recovered.shape)


Original shape: (120, 4)
Reduced shape: (120, 4)
Recovered shape: (120, 4)


**Randomized PCA**

Randomized PCA is a faster approximation of PCA that is especially useful for large datasets when the target dimensionality is much smaller than the original.


In [12]:
from sklearn.decomposition import PCA

rnd_pca = PCA(n_components=4, svd_solver="randomized")
X_reduced = rnd_pca.fit_transform(X_train)


**Incremental PCA**

Incremental PCA processes data in mini-batches, allowing PCA to scale to very large datasets and enabling online learning.


In [14]:
from sklearn.decomposition import IncrementalPCA
import numpy as np

n_batches = 10  # bisa sesuaikan
inc_pca = IncrementalPCA(n_components=4)

for X_batch in np.array_split(X_train, n_batches):
    inc_pca.partial_fit(X_batch)

X_reduced = inc_pca.transform(X_train)


**Kernel PCA**

Kernel PCA extends PCA using the kernel trick to perform nonlinear dimensionality reduction. It is particularly effective for unrolling complex manifolds.


In [20]:
from sklearn.decomposition import KernelPCA

rbf_pca = KernelPCA(n_components = 2, kernel="rbf", gamma=0.04)
X_reduced = rbf_pca.fit_transform(X)

**Kernel Selection and Hyperparameter Tuning **

Kernel and parameter selection can be done using supervised evaluation with a pipeline and grid search.

In [21]:
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

clf = Pipeline([
        ("kpca", KernelPCA(n_components=2)),
        ("log_reg", LogisticRegression())
    ])
param_grid = [{
        "kpca__gamma": np.linspace(0.03, 0.05, 10),
        "kpca__kernel": ["rbf", "sigmoid"]
    }]

grid_search = GridSearchCV(clf, param_grid, cv=3)
grid_search.fit(X, y)

In [22]:
print(grid_search.best_params_)

{'kpca__gamma': np.float64(0.03), 'kpca__kernel': 'rbf'}


In [23]:
rbf_pca = KernelPCA(n_components = 2, kernel="rbf", gamma=0.0433,
                    fit_inverse_transform=True)
X_reduced = rbf_pca.fit_transform(X)
X_preimage = rbf_pca.inverse_transform(X_reduced)

In [24]:
from sklearn.metrics import mean_squared_error
mean_squared_error(X, X_preimage)

0.15883006301704042

**Locally Linear Embedding (LLE)**

LLE is a nonlinear manifold learning technique that preserves local linear relationships between neighboring points. It is especially effective for unrolling twisted manifolds such as the Swiss roll.


In [25]:
from sklearn.manifold import LocallyLinearEmbedding

lle = LocallyLinearEmbedding(n_components=2, n_neighbors=10)
X_reduced = lle.fit_transform(X)

**Other Dimensionality Reduction Techniques**


Several additional methods exist:
•	Random Projections preserve distances probabilistically.
•	Multidimensional Scaling (MDS) preserves pairwise distances.
•	Isomap preserves geodesic distances.
•	t-SNE focuses on visualization and cluster separation.
•	Linear Discriminant Analysis (LDA) maximizes class separability.
Each method has specific strengths and use cases.
