## Understand the PCA concept with simple example

Let's say you have a dataset of students' grades on a test. The dataset has five features: math, science, English, history, and social studies. Each student has a grade for each subject.

| Student | Math | Science | English | History | Social Studies |
|---|---|---|---|---|---|
| A | 90 | 95 | 80 | 75 | 85 |
| B | 80 | 85 | 70 | 65 | 75 |
| C | 70 | 75 | 60 | 55 | 65 |
| D | 60 | 65 | 50 | 45 | 55 |
| E | 50 | 55 | 40 | 35 | 45 |

Create Student table sample values.

In [21]:
# import required packages
import pandas as pd
import numpy as np

In [28]:
df = pd.DataFrame({
    'Student' : ['A','B','C','D','E'],
    'Math' : [90,83,70,60,77],
    'Science' : [73,85,75,65,65],
    'English' : [70,90,68,50,80],
    'History' : [85,65,92,45,75],
    'SS':[86,75,65,59,90]})

df.set_index('Student',inplace = True)

In [29]:
df

Unnamed: 0_level_0,Math,Science,English,History,SS
Student,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
A,90,73,70,85,86
B,83,85,90,65,75
C,70,75,68,92,65
D,60,65,50,45,59
E,77,65,80,75,90


The first step is to calculate the covariance matrix of the data. The covariance matrix is a square matrix that measures the correlation between each pair of features.

In [33]:
covariance_matrix = df.cov()
covariance_matrix

Unnamed: 0,Math,Science,English,History,SS
Math,134.5,48.0,120.5,112.0,121.25
Science,48.0,68.8,79.8,37.2,-3.0
English,120.5,79.8,222.8,96.7,122.5
History,112.0,37.2,96.7,338.8,105.0
SS,121.25,-3.0,122.5,105.0,175.5


The next step is to find the eigenvalues and eigenvectors of the covariance matrix. The eigenvalues are the variances of the principal components, and the eigenvectors are the directions of the principal components.

In [34]:
# Find the eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(covariance_matrix)

In [38]:
print("Eigenvalues:")
print(eigenvalues)

print('')

print("Eigenvectors:")
print(eigenvectors)

Eigenvalues:
[ 5.79134426e+02  2.04643774e+02 -3.21331331e-14  4.00099013e+01
  1.16611899e+02]

Eigenvectors:
[[ 0.42121228  0.19411769  0.45667606  0.75070764  0.11306439]
 [ 0.15819477  0.16339966 -0.66465266  0.36530491 -0.61078976]
 [ 0.49140467  0.53090696  0.27626881 -0.52871994 -0.34754858]
 [ 0.60745969 -0.77373914  0.00518723 -0.12438619 -0.12969824]
 [ 0.43251451  0.23470068 -0.52281207 -0.08929541  0.69031924]]


In [40]:
5.79134426e+02 

579.134426