# 1. Dimensionality Reduction Basics

## Purpose
Reducing the dimensionality of a dataset can simplify models, reduce computation costs, and help with visualizing data. However, it may also lead to some loss of information.
## Projection and Manifold Learning

Projection: Projects data from a high-dimensional space into a lower-dimensional space. Most effective when data is roughly linear.
Manifold Learning: Effective for complex, nonlinear structures. Instead of projecting, it finds a lower-dimensional manifold within the high-dimensional space.


# 2. Principal Component Analysis (PCA)

## Purpose
 PCA is a linear algorithm that aims to capture the maximum variance in the data by transforming the data into a new coordinate system using “principal components.

## Principal Components
 Each principal component represents a direction in which data variance is maximized. The first component captures the most variance, and each subsequent component captures progressively less.

## Explained Variance Ratio
 The ratio shows how much variance is explained by each principal component, helping decide the right number of components.

## PCA for Compression
 By retaining only a few principal components, you can compress data effectively.

In [1]:
from sklearn.decomposition import PCA
import numpy as np
from sklearn.datasets import load_iris

# Load example dataset
data = load_iris()
X = data.data

# Set the number of principal components
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)

# Explained variance ratio
print("Explained variance ratio:", pca.explained_variance_ratio_)


Explained variance ratio: [0.92461872 0.05306648]


## Randomized PCA

 An approximate version of PCA, faster for large dataset

In [2]:
pca_random = PCA(n_components=2, svd_solver='randomized')
X_reduced_random = pca_random.fit_transform(X)


# 3. Random Projection

## Purpose
 Reduces dimensionality by projecting data into a lower-dimensional subspace using random linear mappings. It’s useful for high-dimensional data but may be less accurate for smaller dimensions.

In [3]:
from sklearn.random_projection import GaussianRandomProjection

rp = GaussianRandomProjection(n_components=2)
X_projected = rp.fit_transform(X)


# 4. Locally Linear Embedding (LLE)

## Purpose
 LLE is a nonlinear dimensionality reduction technique that uses local linear relationships to reduce dimensionality. It’s ideal for data lying on a nonlinear manifold.

## Implementation
Unlike PCA, LLE does not rely on projections but preserves relationships by reconstructing data points from their neighbors.

In [4]:
from sklearn.manifold import LocallyLinearEmbedding

lle = LocallyLinearEmbedding(n_components=2, n_neighbors=10)
X_reduced_lle = lle.fit_transform(X)


# 5. Other Dimensionality Reduction Techniques

## Multidimensional Scaling (MDS)

## Purpose
MDS reduces dimensionality while preserving pairwise distances between points. Often used for visualization

In [5]:
from sklearn.manifold import MDS

mds = MDS(n_components=2)
X_reduced_mds = mds.fit_transform(X)


## Isomap
Purpose: Similar to LLE, Isomap preserves geodesic distances (distances measured along the manifold surface). Useful for nonlinear manifolds.

In [6]:
from sklearn.manifold import Isomap

isomap = Isomap(n_components=2, n_neighbors=5)
X_reduced_isomap = isomap.fit_transform(X)


  self._fit_transform(X)
  self._set_intXint(row, col, x.flat[0])


# t-Distributed Stochastic Neighbor Embedding (t-SNE)

Purpose: Primarily for visualization, t-SNE clusters similar points together while pushing dissimilar points apart. Ideal for high-dimensional data with complex clusters.

In [7]:
from sklearn.manifold import TSNE

tsne = TSNE(n_components=2)
X_reduced_tsne = tsne.fit_transform(X)


## Linear Discriminant Analysis (LDA)

## Purpose
 Though primarily a classification tool, LDA can be used to reduce dimensionality by projecting data in a way that maximizes class separability. Typically, the number of components is one less than the number of classes.

Note: LDA is supervised, so it requires class labels

In [8]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

# Assuming the dataset has target labels
y = data.target
lda = LDA(n_components=2)
X_reduced_lda = lda.fit_transform(X, y)
