# Dimensionality Reduction Basics

Dimensionality Reduction is the process of reducing the number of input features (dimensions) while retaining most of the information in the dataset.

## Why do we need it?
- High-dimensional data is harder to visualize and analyze.
- Removes noise and redundancy.
- Helps machine learning models run faster and generalize better.

## Common Techniques:
- **PCA (Principal Component Analysis)** → Projects data into lower dimensions.
- **t-SNE (t-distributed Stochastic Neighbor Embedding)** → Good for visualization.
- **Autoencoders** → Neural networks used for feature compression.

In this notebook, we’ll demonstrate PCA.

## Import Libraries and Dataset

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA

# Load Iris dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df.head()

## Applying PCA for 2D Visualization

In [None]:
# Apply PCA to reduce to 2 dimensions
pca = PCA(n_components=2)
reduced = pca.fit_transform(df)

df_pca = pd.DataFrame(reduced, columns=['PC1', 'PC2'])
df_pca['target'] = iris.target
df_pca.head()

## Visualizing Reduced Data

In [None]:
plt.figure(figsize=(8,6))
plt.scatter(df_pca['PC1'], df_pca['PC2'], c=df_pca['target'], cmap='viridis', alpha=0.7)
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA - Dimensionality Reduction on Iris Dataset')
plt.colorbar(label='Target Classes')
plt.show()

## Key Notes:
- PCA helps reduce **4D Iris data → 2D** for visualization.
- Most of the variance (information) is preserved in fewer dimensions.
- This makes clustering and classification tasks easier.
- PCA is unsupervised and does not use labels.

In practice, dimensionality reduction is often applied **before clustering or visualization**.