#  Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is a statistical technique for dimensionality reduction. It identifies the most significant patterns in data by creating new, uncorrelated variables called principal components. These components capture and prioritize variance in the data, enabling simplification, visualization, and improved efficiency in data analysis and machine learning.

In [1]:
#  Examle of Principal Components Analysis 
import numpy as np
import pandas as pd

In [2]:
A = np.matrix([[1,2,3,4],
              [5,5,6,7],
              [1,4,2,3],
              [5,2,3,1],
              [8,1,2,2]])
print('Matrix A is \n',A)

Matrix A is 
 [[1 2 3 4]
 [5 5 6 7]
 [1 4 2 3]
 [5 2 3 1]
 [8 1 2 2]]


In [3]:
df = pd .DataFrame(A,columns = ['f1','f2','f3','f4'])
print('data frame \n',df)

data frame 
    f1  f2  f3  f4
0   1   2   3   4
1   5   5   6   7
2   1   4   2   3
3   5   2   3   1
4   8   1   2   2


In [4]:
df_std = (df-df.mean())/(df.std())
print('standard deviation \n',df_std)

standard deviation 
          f1        f2        f3        f4
0 -1.000000 -0.486864 -0.121716  0.260623
1  0.333333  1.338877  1.704026  1.563740
2 -1.000000  0.730297 -0.730297 -0.173749
3  0.333333 -0.486864 -0.121716 -1.042493
4  1.333333 -1.095445 -0.730297 -0.608121


In [5]:
n_components = 2
from sklearn.decomposition import PCA

In [7]:
pca = PCA(n_components=n_components)
print('n_component of the Matrix \n',pca)

n_component of the Matrix 
 PCA(n_components=2)


In [8]:
principalComponents = pca.fit_transform(df_std)
print('transform A \n',principalComponents)

transform A 
 [[-0.02486886  0.84479255]
 [ 2.56745374 -0.81165529]
 [ 0.06901024  1.32372239]
 [-1.01326092 -0.26521317]
 [-1.59833419 -1.09164648]]


In [9]:
principalDf = pd.DataFrame(data = principalComponents,columns=['nf'+str(i+1)for i in range(n_components)])
# principalDf = pd.DataFrame(data = principalComponents,columns=['nf1','nf2'])
print('final matrix is \n',principalDf)

final matrix is 
         nf1       nf2
0 -0.024869  0.844793
1  2.567454 -0.811655
2  0.069010  1.323722
3 -1.013261 -0.265213
4 -1.598334 -1.091646
