# **Principal Component Analysis**

Principal component analysis, or PCA, is a dimensionality reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

## Importing Libraries

In [21]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import numpy as np


## Importing a Dataset

In [22]:
df = pd.read_csv("/workspaces/Supervised-Machine-Learning/Datasets/Iris.csv")

In [23]:
df.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


In [24]:
print(df.info())
print(df.describe())
print(df.shape)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             150 non-null    int64  
 1   SepalLengthCm  150 non-null    float64
 2   SepalWidthCm   150 non-null    float64
 3   PetalLengthCm  150 non-null    float64
 4   PetalWidthCm   150 non-null    float64
 5   Species        150 non-null    object 
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB
None
               Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm
count  150.000000     150.000000    150.000000     150.000000    150.000000
mean    75.500000       5.843333      3.054000       3.758667      1.198667
std     43.445368       0.828066      0.433594       1.764420      0.763161
min      1.000000       4.300000      2.000000       1.000000      0.100000
25%     38.250000       5.100000      2.800000       1.600000      0.300000
50%     75.500000    

## Defining Target and Independent Variables

In [25]:
X = df.select_dtypes(include=['float64', 'int64'])
X

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
0,1,5.1,3.5,1.4,0.2
1,2,4.9,3.0,1.4,0.2
2,3,4.7,3.2,1.3,0.2
3,4,4.6,3.1,1.5,0.2
4,5,5.0,3.6,1.4,0.2
...,...,...,...,...,...
145,146,6.7,3.0,5.2,2.3
146,147,6.3,2.5,5.0,1.9
147,148,6.5,3.0,5.2,2.0
148,149,6.2,3.4,5.4,2.3


In [26]:
Y = df.select_dtypes(include=['object'])
Y

Unnamed: 0,Species
0,Iris-setosa
1,Iris-setosa
2,Iris-setosa
3,Iris-setosa
4,Iris-setosa
...,...
145,Iris-virginica
146,Iris-virginica
147,Iris-virginica
148,Iris-virginica


## Applying Standard Scaler and Finding Covariance Matrix 

In [27]:
sc = StandardScaler()
X_scaled = sc.fit_transform(X)

In [31]:
cov_mat = np.cov(X_scaled.T)
eigen_values, eigen_vectors = np.linalg.eig(cov_mat)
print("Eigenvalues:\n {} \n".format(eigen_values))
print("Eigenvectors:\n {} \n".format(eigen_vectors))
print("Covariance Matrix:\n {} \n".format(cov_mat))

Eigenvalues:
 [3.7603354  0.92794917 0.23570257 0.08883057 0.02073933] 

Eigenvectors:
 [[-0.48136016 -0.02275157 -0.67406853  0.55978662 -0.0067323 ]
 [-0.44844975  0.38285827  0.64520569  0.40999945  0.26061932]
 [ 0.23195044  0.92007839 -0.27427786 -0.09491665 -0.12416613]
 [-0.51079205  0.03074857  0.13238322 -0.28817343 -0.79848404]
 [-0.5024696   0.07356757 -0.19127876 -0.65305918  0.52824072]] 

Covariance Matrix:
 [[ 1.00671141  0.72148618 -0.40039813  0.8886718   0.90579723]
 [ 0.72148618  1.00671141 -0.11010327  0.87760486  0.82344326]
 [-0.40039813 -0.11010327  1.00671141 -0.42333835 -0.358937  ]
 [ 0.8886718   0.87760486 -0.42333835  1.00671141  0.96921855]
 [ 0.90579723  0.82344326 -0.358937    0.96921855  1.00671141]] 



## Getting Eigen Values in Descending Order

In [33]:
eigen_pairs = [(np.abs(eigen_values[i]), eigen_vectors[:,i]) for i in range(len(eigen_values))]
print("Eigen Values in descending order:")
eigen_pairs.sort(key=lambda x: x[0], reverse=True)
for eigen_value, eigen_vector in eigen_pairs:
    print(eigen_value)

Eigen Values in descending order:
3.760335402824124
0.9279491722113417
0.2357025715021216
0.08883057252968925
0.02073932791258833
