# PCA

Principle Component Analysis (PCA) is a dimensionality reduction technique that aims to capture the maximum variance in the data using fewer principal components. The primary goal is to transform the original variables into a new set of variables, the principal components, which are orthogonal (uncorrelated), and which reflect the maximum variance in the data.

### Mathematical Background

Given a data matrix $X$ where each row represents a sample and each column represents a feature, the steps for PCA are:

1. **Standardize the Data**: This step is crucial if the input features have different scales. We want each feature to have a mean of 0 and standard deviation of 1.

2. **Compute the Covariance Matrix**: The covariance matrix, $S$, is computed as:

   $$
   S = \frac{1}{n-1} X^T X
   $$

   where $n$ is the number of samples.

3. **Eigen Decomposition**: Calculate the eigenvectors and eigenvalues of the covariance matrix. The eigenvectors represent the directions of maximum variance (the principal components), and the eigenvalues represent the magnitude of the variance in each direction.

4. **Sort Eigenvalues and Choose Principal Components**: Rank the eigenvectors based on the magnitude of their corresponding eigenvalues. The eigenvector with the highest eigenvalue is the first principal component, and so on.

5. **Form the Projection Matrix**: Take the top $k$ eigenvectors (where $k$ is the number of dimensions you want to keep) and form a matrix $W$ with these eigenvectors.

6. **Transform the Original Data**: Multiply the original data matrix, $X$, by the projection matrix, $W$:

   $$
   Y = XW
   $$

   This results in the data represented in terms of the principal components.

### Interpretation

PCA is essentially about finding the axes in the data that maximize variance. When data is projected onto these axes, it's transformed into a new coordinate system where the first axis corresponds to the first principal component that explains the most variance, the second axis corresponds to the second principal component that explains the second most, and so on.

It's worth noting that while PCA reduces dimensionality, some information (or variance) is lost in the process. The amount of variance retained by each principal component is indicated by its eigenvalue.

PCA is widely used in various domains, including image processing, genomics, finance, and many others, due to its efficiency and simplicity.


## Libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler

## Example

In [None]:
# Load the Iris dataset
data = load_iris()
X = data.data
y = data.target

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA(n_components=2)  # reducing to 2 principal components
X_pca = pca.fit_transform(X_scaled)

# Plot the data in the new PCA space
plt.figure(figsize=(10, 7))
plt.scatter(X_pca[y == 0, 0], X_pca[y == 0, 1], color='red', label=data.target_names[0])
plt.scatter(X_pca[y == 1, 0], X_pca[y == 1, 1], color='blue', label=data.target_names[1])
plt.scatter(X_pca[y == 2, 0], X_pca[y == 2, 1], color='green', label=data.target_names[2])
plt.xlabel('First Principal Component')
plt.ylabel('Second Principal Component')
plt.legend()
plt.title('PCA of Iris Dataset')
plt.show()

# Display variance explained by each component
print("Explained variance ratio:", pca.explained_variance_ratio_)

In this example:

We first standardize the Iris dataset to have zero mean and unit variance for each feature.
Then, we apply PCA to reduce the dataset's dimensionality from 4 features to 2 principal components.
After applying PCA, we plot the data in the new PCA space.
Finally, we display the variance explained by each of the principal components.
You should see the Iris dataset plotted in terms of the first two principal components, with the data points color-coded based on their true class labels. The explained variance ratio will give you an idea of how much of the dataset's original variance is captured by each principal component.