# Principal Component Analysis (PCA): Advanced Tutorial

**PCA** is a dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional space.
It finds the directions (principal components) that maximize variance in the data.

## 1. Import Required Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.decomposition import PCA
from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

sns.set(style='whitegrid')


## 2. Load and Preprocess the Data

In [None]:
digits = load_digits()
X = digits.data
y = digits.target

X_scaled = StandardScaler().fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)

print("Original shape:", X.shape)


## 3. Apply PCA

In [None]:
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

plt.figure(figsize=(10, 6))
scatter = plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='tab10', s=30)
plt.legend(*scatter.legend_elements(), title="Digits")
plt.xlabel("PC 1")
plt.ylabel("PC 2")
plt.title("Digits Dataset Projected onto 2D via PCA")
plt.show()


## 4. Explained Variance

In [None]:
pca_full = PCA().fit(X_scaled)
explained_variance = np.cumsum(pca_full.explained_variance_ratio_)

plt.plot(np.arange(1, len(explained_variance)+1), explained_variance, marker='o')
plt.xlabel("Number of Components")
plt.ylabel("Cumulative Explained Variance")
plt.title("PCA Explained Variance")
plt.grid(True)
plt.show()


## 5. PCA + Classification

In [None]:
pca = PCA(n_components=30)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)

clf = LogisticRegression(max_iter=1000)
clf.fit(X_train_pca, y_train)
y_pred = clf.predict(X_test_pca)

print("Classification Accuracy (with PCA):", accuracy_score(y_test, y_pred))


## 6. Summary

- PCA reduces dimensionality while preserving variance
- Useful for visualization and speeding up models
- PCA components are orthogonal and ranked by variance
- Use `explained_variance_ratio_` to guide `n_components` choice