# PCA: Principal Component Analysis

Principal Component Analysis (PCA) is one of the most popular **dimensionality reduction** techniques.

It works by:
- Finding new axes (principal components) that maximize variance.
- Projecting data into a lower-dimensional space while keeping most information.

## Why PCA?
- Simplifies data without losing much information.
- Reduces noise and redundancy.
- Useful for visualization of high-dimensional datasets.
- Speeds up machine learning algorithms.

## Import Libraries and Dataset

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA

# Load Iris dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df.head()

## Applying PCA (2 Components)

In [None]:
# Apply PCA with 2 principal components
pca = PCA(n_components=2)
pca_result = pca.fit_transform(df)

df_pca = pd.DataFrame(pca_result, columns=['PC1', 'PC2'])
df_pca['target'] = iris.target
df_pca.head()

## Explained Variance Ratio
The explained variance tells us how much information (variance) is preserved by each principal component.

In [None]:
print("Explained Variance Ratio:", pca.explained_variance_ratio_)
print("Total Variance Explained:", sum(pca.explained_variance_ratio_))

## Visualizing PCA Result

In [None]:
plt.figure(figsize=(8,6))
plt.scatter(df_pca['PC1'], df_pca['PC2'], c=df_pca['target'], cmap='viridis', alpha=0.7)
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA on Iris Dataset')
plt.colorbar(label='Target Classes')
plt.show()

## Key Notes:
- PCA reduced the **4D Iris dataset → 2D** while preserving most variance.
- `explained_variance_ratio_` tells us how much variance each PC captures.
- Useful before clustering, classification, or visualization.
- PCA is a **linear technique**, which may not capture nonlinear relationships (t-SNE or UMAP can help there).