# Dimensionality Reduction Techniques Notebook


# Section 1: Introduction to Dimensionality Reduction
Dimensionality reduction is the process of reducing the number of input variables or features in a dataset while preserving its essential structure and information.

# Why is Dimensionality Reduction Important?
1. Simplifies data visualization.
2. Reduces computational costs.
3. Mitigates the curse of dimensionality.
4. Improves model performance by removing irrelevant or redundant features.

# Key Techniques
1. Principal Component Analysis (PCA).
2. Multidimensional Scaling (MDS).
3. Locally Linear Embedding (LLE).
4. Isomap.
5. t-SNE.


In [None]:
# Load Required Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE, MDS, LocallyLinearEmbedding, Isomap
from sklearn.datasets import load_iris, fetch_openml, make_swiss_roll

## Principal Component Analysis (PCA)
PCA reduces the dimensionality of data by projecting it onto a set of orthogonal components that maximize variance.


In [None]:
# Step-by-Step Example with the Iris Dataset
from sklearn.preprocessing import StandardScaler

data = load_iris()
X = data.data
y = data.target

In [None]:
# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [None]:
# Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

In [None]:
# Visualize the results
plt.figure(figsize=(8, 6))
for target, color in zip(np.unique(y), ['r', 'g', 'b']):
    plt.scatter(X_pca[y == target, 0], X_pca[y == target, 1], label=data.target_names[target], color=color)
plt.title("PCA on Iris Dataset")
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.legend()
plt.show()

In [None]:
# Exercise 1: PCA Implementation
"""
Task:
1. Choose a different dataset (e.g., Wine dataset from sklearn).
2. Perform PCA to reduce the dataset to 2 dimensions.
3. Visualize the results with a scatter plot.

Hint:
Use `load_wine()` from sklearn.datasets.
"""

## t-SNE
t-SNE is a non-linear dimensionality reduction method used primarily for visualization of high-dimensional data.


In [None]:
# Example with MNIST Dataset
from sklearn.datasets import fetch_openml

# Load a subset of MNIST dataset
# Load MNIST dataset
mnist = fetch_openml('mnist_784', version=1, as_frame=False)
X = mnist.data
y = mnist.target

In [None]:
# Apply t-SNE
tsne = TSNE(n_components=2, random_state=42, perplexity=30, init='pca', max_iter=250)
X_tsne = tsne.fit_transform(X)

In [None]:
# Visualize the results
plt.figure(figsize=(8, 6))
for digit in np.unique(y):
    plt.scatter(X_tsne[y == digit, 0], X_tsne[y == digit, 1], label=digit)
plt.title("t-SNE Visualization of MNIST")
plt.legend()
plt.show()

In [None]:
# Exercise 2: t-SNE Visualization
"""
Task:
1. Use t-SNE on a different dataset (e.g., Iris dataset).
2. Visualize the clusters and interpret the results.

Hint:
Use the Iris dataset and try different values for `perplexity` (e.g., 5, 30, 50).
"""

## Isomap
Isomap is a non-linear dimensionality reduction technique that uses geodesic distances to preserve the global structure of the data.


In [None]:
# Example with Swiss Roll Dataset
X_swiss, _ = make_swiss_roll(n_samples=1000, noise=0.05)

In [None]:
# Apply Isomap
isomap = Isomap(n_components=2, n_neighbors=10)
X_isomap = isomap.fit_transform(X_swiss)

In [None]:
# Visualize the results
fig, ax = plt.subplots(1, 2, figsize=(12, 6))
ax[0].scatter(X_swiss[:, 0], X_swiss[:, 2], c=X_swiss[:, 1], cmap='Spectral')
ax[0].set_title("Original Swiss Roll")
ax[1].scatter(X_isomap[:, 0], X_isomap[:, 1], c=X_swiss[:, 1], cmap='Spectral')
ax[1].set_title("Isomap Flattened Representation")
plt.show()

## Locally Linear Embedding (LLE)
LLE is a non-linear dimensionality reduction technique that preserves local relationships among data points.

In [None]:
# Example with Swiss Roll Dataset
lle = LocallyLinearEmbedding(n_components=2, n_neighbors=10)
X_lle = lle.fit_transform(X_swiss)

# Visualize the results
plt.figure(figsize=(8, 6))
plt.scatter(X_lle[:, 0], X_lle[:, 1], c=X_swiss[:, 1], cmap='Spectral')
plt.title("LLE Flattened Representation")
plt.show()

In [None]:
# Exercise 4: LLE on Custom Data
"""
Task:
1. Use LLE on a different dataset (e.g., MNIST or Iris).
2. Compare the results with other techniques (e.g., Isomap).

Hint:
Adjust `n_neighbors` to observe its effect on the output.
"""
