# Principal Component Analysis (PCA) using Singular Value Decomposition (SVD)

In this notebook, we are going to walk through Principal Component Analysis (PCA) using Singular Value Decomposition (SVD). PCA is a statistical technique used for identifying patterns in data and expressing the data in a way which highlight their similarities and differences. SVD is a factorization method used to reduce a matrix into simpler parts which makes it easier to calculate. 

The goal is to provide deeper insights and understanding of the PCA technique using SVD. This notebook is based on the StatQuest video titled '29_StatQuest: Principal Component Analysis (PCA), Step-by-Step' by Josh Starmer.

## Importing the necessary libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import pandas as pd

## Creating a Simple Dataset

In [None]:
data = {'Gene 1': [10, 11, 8, 3, 2, 1],
        'Gene 2': [6, 4, 5, 3, 2.8, 1]}

df = pd.DataFrame(data, columns=['Gene 1', 'Gene 2'], index=['Mouse 1', 'Mouse 2', 'Mouse 3', 'Mouse 4', 'Mouse 5', 'Mouse 6'])
df

## Visualizing the Dataset

In [None]:
plt.scatter(df['Gene 1'], df['Gene 2'])
plt.xlabel('Gene 1')
plt.ylabel('Gene 2')
plt.title('Scatter plot of Gene 1 vs Gene 2')
plt.show()

## PCA using SVD

In [None]:
# Scaling the data so that each feature has a single unit variance.
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df)

# Applying PCA.
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(df_scaled)

principalDf = pd.DataFrame(data = principalComponents, columns = ['principal component 1', 'principal component 2'], index=['Mouse 1', 'Mouse 2', 'Mouse 3', 'Mouse 4', 'Mouse 5', 'Mouse 6'])
principalDf

## Visualizing the PCA plot

In [None]:
plt.scatter(principalDf['principal component 1'], principalDf['principal component 2'])
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('2 Component PCA')
plt.show()

## Eigenvalues and Explained Variance

In [None]:
print('Eigenvalues or explained variance: ', pca.explained_variance_)

## Eigenvectors or Principal Components

In [None]:
print('Eigenvectors or principal components: ', pca.components_)

## Scree plot of the Principal Components

In [None]:
explained_variance = pca.explained_variance_ratio_
plt.plot(np.cumsum(explained_variance))
plt.title('Scree Plot')
plt.xlabel('Number of Principal Components')
plt.ylabel('Cumulative Explained Variance')
plt.show()

## Conclusion

In this notebook, we went through the steps of Principal Component Analysis (PCA) using Singular Value Decomposition (SVD). We started by creating a simple dataset and visualizing it. We then applied PCA to the dataset and visualized the PCA plot. We also derived the eigenvalues and eigenvectors and visualized the Scree plot of the Principal Components.