# t-SNE and UMAP: Nonlinear Dimensionality Reduction

While PCA is a linear dimensionality reduction technique, **t-SNE** and **UMAP** are nonlinear methods that are better at preserving local and global structure in high-dimensional data.

They are commonly used for **visualization** of complex datasets.

## Import Libraries and Dataset

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.manifold import TSNE
import umap

# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

## Applying t-SNE

In [None]:
# t-SNE reduces data to 2D
tsne = TSNE(n_components=2, random_state=42)
X_tsne = tsne.fit_transform(X)

# Create DataFrame for visualization
df_tsne = pd.DataFrame(X_tsne, columns=['tSNE1','tSNE2'])
df_tsne['target'] = y

# Plot t-SNE
plt.figure(figsize=(8,6))
plt.scatter(df_tsne['tSNE1'], df_tsne['tSNE2'], c=df_tsne['target'], cmap='viridis', alpha=0.7)
plt.xlabel('t-SNE 1')
plt.ylabel('t-SNE 2')
plt.title('t-SNE Visualization of Iris Dataset')
plt.colorbar(label='Target Classes')
plt.show()

## Applying UMAP

In [None]:
# UMAP reduces data to 2D
umap_model = umap.UMAP(n_components=2, random_state=42)
X_umap = umap_model.fit_transform(X)

# Create DataFrame for visualization
df_umap = pd.DataFrame(X_umap, columns=['UMAP1','UMAP2'])
df_umap['target'] = y

# Plot UMAP
plt.figure(figsize=(8,6))
plt.scatter(df_umap['UMAP1'], df_umap['UMAP2'], c=df_umap['target'], cmap='plasma', alpha=0.7)
plt.xlabel('UMAP 1')
plt.ylabel('UMAP 2')
plt.title('UMAP Visualization of Iris Dataset')
plt.colorbar(label='Target Classes')
plt.show()

## Key Notes:
- **t-SNE** preserves local structure and is good for small datasets.
- **UMAP** preserves both local and global structure and scales better to large datasets.
- Both are primarily used for visualization rather than as preprocessing for predictive models.
- Useful for understanding high-dimensional data and clustering patterns visually.