$\begin{bmatrix}a & b & d\\c & d & e\\e & f & g\end{bmatrix}$   =   $\begin{bmatrix}a & b & d\\c & d & e\\e & f & g\end{bmatrix}$

# How does PCA help in data analysis?

In Machine Learning techniques, we usually try to find patterns in extensive data. Since the data can be extensive because of its more dimensions, it can become computationally expensive to use it. Even it can be challenging to find patterns for solving problems in this comprehensive data.
It turns out that many times there are datasets with high dimensions in which there might be inconsistencies in the features or some features might represent the same pattern as represented by other features. This leads to an increase in computation time making data processing time consuming.

That's where PCA (Principal Component Analysis ) helps in.

It is an unsupervised (no labels required for transformation or learning) statistical technique primarily used for dimensionality reduction in machine learning by finding the correlation among the features. 
Once the correlation among different features is found the decision is made to reduce the dimension of data in a way such that less/no information is lost or how much information loss is bearable before performing any computation on data.


* PCA can also be used to reduce noise from datasets, such as image compression datasets, when representing each image as a vector and form a matrix before feeding to a model.
* I can also be used to plot data in two dimensions by reducing it to two-dimensional space.



#### How projection of data is changed after applying PCA
![Screenshot from 2019-11-11 16-30-41](https://user-images.githubusercontent.com/10325504/68582463-c6836600-04a0-11ea-9dc4-468f8355a156.png)
[Figure source](http://setosa.io/ev/principal-component-analysis/)

In the Image above as we can see in the right image after projecting data in one dimension, the data points can be easily distinguished by retaining the information.

## Working Example of PCA.
 **Step 1**
Data Matrix D = ![](https://latex.codecogs.com/png.latex?%5Cbegin%7Bbmatrix%7D%202.5%20%26%202.4%5C%5C%200.5%20%26%200.7%5C%5C%202.2%20%26%202.9%5C%5C%201.9%20%26%202.2%5C%5C%203.1%20%26%203.0%5C%5C%202.3%20%26%202.7%5C%5C%202%20%26%201.6%5C%5C%201.1%20%26%201%5C%5C%201.5%20%26%201.6%5C%5C%201.1%20%26%200.9%20%5Cend%7Bbmatrix%7D)

 **Step 2**
Calculate Mean Subtracted Data **Data<sub>meansub</sub>**

Data<sub>meansub</sub> = ![](https://latex.codecogs.com/png.latex?%5Cbegin%7Bbmatrix%7D%20.69%20%26%20.49%5C%5C%20-1.31%20%26%20-1.21%5C%5C%20.39%20%26%20.99%5C%5C%20.09%20%26%20.29%5C%5C%201.29%20%26%201.09%5C%5C%20.49%20%26%20.79%5C%5C%20.19%20%26%20-.31%5C%5C%20-.81%20%26%20-.81%5C%5C%20-.31%20%26%20-.31%5C%5C%20-.71%20%26%20-1.01%20%5Cend%7Bbmatrix%7D)

**Step 3**
Calculate the Covariance Matrix **D<sub>cov</sub>** from Mean Subtracted Data **Data<sub>meansub</sub>**

D<sub>cov</sub> = ![](https://latex.codecogs.com/png.latex?%5Cbegin%7Bbmatrix%7D%200.616555556%20%26%200.615444444%5C%5C%200.615444444%20%26%200.716555556%20%5Cend%7Bbmatrix%7D)


 **Step 4**
Calculating the Eigen Values and Eigenvectors of the covariance Matrix **D<sub>cov</sub>**

D<sub>eigenvalues|cov</sub> = ![](https://latex.codecogs.com/png.latex?%5Cbegin%7Bbmatrix%7D%200.0490833989%5C%5C%201.28402771%20%5Cend%7Bbmatrix%7D)

D<sub>eigenvectors|cov</sub> = ![](https://latex.codecogs.com/png.latex?%5Cbegin%7Bbmatrix%7D%20-0.735178656%20%26%20-0.677873399%5C%5C%200.677873399%20%26%20-0.735178656%20%5Cend%7Bbmatrix%7D) 

**Step 5**
Computing the Principal Components **D<sub>principal</sub>**

D<sub>principal</sub> is formed by combining the eigenvectors formed from D<sub>cov</sub>

D<sub>principal</sub> = ![](https://latex.codecogs.com/png.latex?%5Cbegin%7Bbmatrix%7D%20-0.735178656%5C%5C%200.677873399%20%5Cend%7Bbmatrix%7D)

Based on how much dimensions one wants to project the data into.


**Step 6**
Calculating the resultant projected data **D<sub>resultant</sub>**

##### D<sub>resultant</sub> = D<sub>principal</sub> X D<sub>meansub</sub>

D<sub>resultant</sub> = ![](https://latex.codecogs.com/png.latex?%5Cbegin%7Bbmatrix%7D%20-0.827970186%5C%5C%201.77758033%5C%5C%20-0.992197494%5C%5C%20-0.274210416%5C%5C%20-1.67580142%5C%5C%20-0.912949103%5C%5C%200.0991094375%5C%5C%201.14457216%5C%5C%200.438046137%5C%5C%201.22382056%20%5Cend%7Bbmatrix%7D)

The resultant matrix D<sub>resultant</sub> is a one dimensional matrix obtained from data matrix D by maintaining the high variance of data.

https://stackoverflow.com/questions/25123845/pca-projection-plot-with-ggplot2

In [29]:
def strtomat(stringip,x,y):
    return matstring(np.array(stringip.split(',')).reshape(x,y))
def matstring(matrix):
    ms = "https://latex.codecogs.com/png.latex?%5Cbegin%7Bbmatrix%7D%20"
    count = len(matrix)-1
    for i in matrix:
        for j in range(len(i)):
            if(j < len(i) - 1):
                ms+=str(i[j])+"%20%26%20"
            else:
                ms+=str(i[j])
        count -=1
        if(count>=0):
            ms+="%5C%5C%20"
    ms+="%20%5Cend%7Bbmatrix%7D"
    return ms
        

In [16]:
matstring(A)

'https://latex.codecogs.com/png.latex?%5Cbegin%7Bbmatrix%7D%201%20%26%202%20%26%203%5C%5C%204%20%26%205%20%26%20-6%20%5Cend%7Bbmatrix%7D'

In [20]:
strtomat("1,2,3,4,5,-6",3,2)

'https://latex.codecogs.com/png.latex?%5Cbegin%7Bbmatrix%7D%201%20%26%202%5C%5C%203%20%26%204%5C%5C%205%20%26%20-6%20%5Cend%7Bbmatrix%7D'

In [12]:
import numpy as np

In [13]:
A = np.array(str("1,2,3,4,5,-6").split(',')).reshape(2,3)

In [21]:
strtomat("2.5,2.4,0.5,0.7,2.2,2.9,1.9,2.2,3.1,3.0,2.3,2.7,2,1.6,1.1,1,1.5,1.6,1.1,0.9",10,2)

'https://latex.codecogs.com/png.latex?%5Cbegin%7Bbmatrix%7D%202.5%20%26%202.4%5C%5C%200.5%20%26%200.7%5C%5C%202.2%20%26%202.9%5C%5C%201.9%20%26%202.2%5C%5C%203.1%20%26%203.0%5C%5C%202.3%20%26%202.7%5C%5C%202%20%26%201.6%5C%5C%201.1%20%26%201%5C%5C%201.5%20%26%201.6%5C%5C%201.1%20%26%200.9%20%5Cend%7Bbmatrix%7D'

In [None]:
strtomat("0.69,0.49,-1.31,-1.21,0.39,0.99,0.09,0.29,1.29,1.09,0.49,0.79,0.19,-0.31-0.81,-0.81,",10,2)

In [None]:
".19 -.31
-.81 -.81
-.31 -.31
-.71 -1.01"

In [30]:
strtomat(".69,.49,-1.31,-1.21,.39,.99,.09,.29,1.29,1.09,.49,.79,.19,-.31,-.81,-.81,-.31,-.31,-.71,-1.01",10,2)

'https://latex.codecogs.com/png.latex?%5Cbegin%7Bbmatrix%7D%20.69%20%26%20.49%5C%5C%20-1.31%20%26%20-1.21%5C%5C%20.39%20%26%20.99%5C%5C%20.09%20%26%20.29%5C%5C%201.29%20%26%201.09%5C%5C%20.49%20%26%20.79%5C%5C%20.19%20%26%20-.31%5C%5C%20-.81%20%26%20-.81%5C%5C%20-.31%20%26%20-.31%5C%5C%20-.71%20%26%20-1.01%20%5Cend%7Bbmatrix%7D'

In [31]:
strtomat("0.616555556,0.615444444,0.615444444,0.716555556",2,2)

'https://latex.codecogs.com/png.latex?%5Cbegin%7Bbmatrix%7D%200.616555556%20%26%200.615444444%5C%5C%200.615444444%20%26%200.716555556%20%5Cend%7Bbmatrix%7D'

In [34]:
strtomat("0.0490833989,1.28402771",2,1)

'https://latex.codecogs.com/png.latex?%5Cbegin%7Bbmatrix%7D%200.0490833989%5C%5C%201.28402771%20%5Cend%7Bbmatrix%7D'

In [33]:
strtomat("-0.735178656,-0.677873399,0.677873399,-0.735178656",2,2)

'https://latex.codecogs.com/png.latex?%5Cbegin%7Bbmatrix%7D%20-0.735178656%20%26%20-0.677873399%5C%5C%200.677873399%20%26%20-0.735178656%20%5Cend%7Bbmatrix%7D'

In [38]:
strtomat("-0.735178656,0.677873399",2,1)

'https://latex.codecogs.com/png.latex?%5Cbegin%7Bbmatrix%7D%20-0.735178656%5C%5C%200.677873399%20%5Cend%7Bbmatrix%7D'

In [37]:
strtomat("-0.827970186,1.77758033,-0.992197494,-0.274210416,-1.67580142,-0.912949103,0.0991094375,1.14457216,0.438046137,1.22382056",10,1)

'https://latex.codecogs.com/png.latex?%5Cbegin%7Bbmatrix%7D%20-0.827970186%5C%5C%201.77758033%5C%5C%20-0.992197494%5C%5C%20-0.274210416%5C%5C%20-1.67580142%5C%5C%20-0.912949103%5C%5C%200.0991094375%5C%5C%201.14457216%5C%5C%200.438046137%5C%5C%201.22382056%20%5Cend%7Bbmatrix%7D'