# Principal Component Analysis (PCA) Examples


Principal Component Analysis (PCA) is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

Reducing the number of variables of a data set naturally comes at the expense of accuracy, but the trick in dimensionality reduction is to trade a little accuracy for simplicity. Because smaller data sets are easier to explore and visualize and make analyzing data much easier and faster for machine learning algorithms without extraneous variables to process.

In this notebook, we will walk through a couple of examples demonstrating PCA using the Python library `scikit-learn`.
    

## Example 1: Basic PCA on Synthetic Data

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Generating synthetic data
np.random.seed(0)
mean = [0, 0]
cov = [[1, 0.8], [0.8, 1]]  # diagonal covariance
X = np.random.multivariate_normal(mean, cov, 100)

# Standardizing the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Applying PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Plotting the original and PCA-transformed data
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
plt.scatter(X[:, 0], X[:, 1], alpha=0.7)
plt.title('Original Data')
plt.xlabel('X1')
plt.ylabel('X2')

plt.subplot(1, 2, 2)
plt.scatter(X_pca[:, 0], X_pca[:, 1], alpha=0.7)
plt.title('PCA-transformed Data')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')

plt.tight_layout()
plt.show()
    

## Example 2: PCA on the Iris Dataset

In [None]:

import seaborn as sns
from sklearn.datasets import load_iris

# Loading the Iris dataset
iris = load_iris()
X_iris = iris.data
y_iris = iris.target

# Standardizing the data
X_iris_scaled = StandardScaler().fit_transform(X_iris)

# Applying PCA
pca_iris = PCA(n_components=2)
X_iris_pca = pca_iris.fit_transform(X_iris_scaled)

# Plotting the PCA-transformed Iris dataset
plt.figure(figsize=(8, 6))
sns.scatterplot(X_iris_pca[:, 0], X_iris_pca[:, 1], hue=iris.target_names[y_iris], palette='Set1', s=60, alpha=0.8)
plt.title('PCA of Iris Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.show()
    