## Fun with PCA
Author: Anish Mahapatra (https://www.linkedin.com/in/anishmahapatra/)

# Covariance

In [1]:
import pandas as pd
import numpy as np

In [8]:
a = [[2,1],[3,2],[2,1],[5,1]]
b = ['X', 'Y']

data = pd.DataFrame(a, columns = b)
data

Unnamed: 0,X,Y
0,2,1
1,3,2
2,2,1
3,5,1


In [9]:
np.cov(data.T)

array([[2.  , 0.  ],
       [0.  , 0.25]])

# Demonstration of PCA

![The Algorithm of PCA](AlgorithmOfPCA.jpg)


## The steps for PCA (code-wise)
1. Have your original array or make it using np.array()
2. Find the covariance matrix using np.cov() - Remember the m x n array results in a n x n covariance matrix
3. Eigen Decomposition of the covariance matrix - use np.linalg.eig()
4. 

In [13]:
# Making an array to represent the points in a graph
A = np.array([[2,1],[3,1.5],[4,2],[6,3],[7,3.5],[8,4]])
A

array([[2. , 1. ],
       [3. , 1.5],
       [4. , 2. ],
       [6. , 3. ],
       [7. , 3.5],
       [8. , 4. ]])

In [15]:
# Making the covariance matrix
covA = np.cov(A.T)
covA

array([[5.6, 2.8],
       [2.8, 1.4]])

In [25]:
# getting the eigen vectors of the above covariance matrix
eigenvaluesA, eigenvectorA = np.linalg.eig(covA)
eigenvectorA

array([[ 0.89442719, -0.4472136 ],
       [ 0.4472136 ,  0.89442719]])

In [26]:
eigenvaluesA

array([ 7.00000000e+00, -2.22044605e-16])

In [28]:
# here, they are sorted. incase they are not, we can do the following.

idx = eigenvaluesA.argsort()[::-1]   
eigenvaluesA= eigenvaluesA[idx]
eigenvectorA = eigenvectorA[:,idx]

### Transformation of data

In [30]:
# Finding the inverse of the eigenvector

M = np.linalg.inv(eigenvectorA)
M

array([[ 0.89442719,  0.4472136 ],
       [-0.4472136 ,  0.89442719]])

In [31]:
newData = M @ A.T

In [35]:
finalData = newData.T.round(2)
finalData

array([[ 2.24, -0.  ],
       [ 3.35, -0.  ],
       [ 4.47, -0.  ],
       [ 6.71, -0.  ],
       [ 7.83, -0.  ],
       [ 8.94, -0.  ]])

The second column is now completely redundant. Let's have a look at the covariance matrix.

In [37]:
np.cov(finalData.T)

array([[6.9978, 0.    ],
       [0.    , 0.    ]])

#### Hoot Hoot! We have diagonalized the covariance matrix.