### PCA 
It finds a new set of dimensions (or a set of basis of views) such that all the dimensions are orthogonal (and hence linearly independent) and ranked according to the variance of data along them. It means more important principle
axis occurs first. (more important = more variance/more spread out data)

### How does PCA work -

- Calculate the covariance matrix X of data points.
- Calculate eigen vectors and corresponding eigen values.
- Sort the eigen vectors according to their eigen values in decreasing order.
- Choose first k eigen vectors and that will be the new k dimensions.
- Transform the original n dimensional data points into k dimensions.

<img src = 'https://cdn-images-1.medium.com/max/800/1*emZ_Pdjro4lBs5IdiZtQAg.png' />

We want the data to be spread out i.e. it should have high variance along dimensions. Also we want to remove correlated dimensions i.e. covariance among the dimensions should be zero (they should be linearly independent). Therefore, our covariance matrix should have -
* large numbers as the main diagonal elements.
* zero values as the off diagonal elements.
We call it a diagonal matrix.

### Goal of PCA -

- Find linearly independent dimensions (or basis of views) which can losslessly represent the data points.
- Those newly found dimensions should allow us to predict/reconstruct the original dimensions. The reconstruction/projection error should be minimized.


The directions in which our data are dispersed. (Eigenvectors.)
The relative importance of these different directions. (Eigenvalues.)

### Always normalize your data before doing PCA because if we use data(features here) of different scales, we get misleading components. We can also simply use correlation matrix instead of using covariance matrix if features are of different scales

In [1]:
from sklearn import datasets
from sklearn import metrics

In [2]:
data = datasets.load_iris()
x = data.data
y= data.target
x

array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 4.6,  3.1,  1.5,  0.2],
       [ 5. ,  3.6,  1.4,  0.2],
       [ 5.4,  3.9,  1.7,  0.4],
       [ 4.6,  3.4,  1.4,  0.3],
       [ 5. ,  3.4,  1.5,  0.2],
       [ 4.4,  2.9,  1.4,  0.2],
       [ 4.9,  3.1,  1.5,  0.1],
       [ 5.4,  3.7,  1.5,  0.2],
       [ 4.8,  3.4,  1.6,  0.2],
       [ 4.8,  3. ,  1.4,  0.1],
       [ 4.3,  3. ,  1.1,  0.1],
       [ 5.8,  4. ,  1.2,  0.2],
       [ 5.7,  4.4,  1.5,  0.4],
       [ 5.4,  3.9,  1.3,  0.4],
       [ 5.1,  3.5,  1.4,  0.3],
       [ 5.7,  3.8,  1.7,  0.3],
       [ 5.1,  3.8,  1.5,  0.3],
       [ 5.4,  3.4,  1.7,  0.2],
       [ 5.1,  3.7,  1.5,  0.4],
       [ 4.6,  3.6,  1. ,  0.2],
       [ 5.1,  3.3,  1.7,  0.5],
       [ 4.8,  3.4,  1.9,  0.2],
       [ 5. ,  3. ,  1.6,  0.2],
       [ 5. ,  3.4,  1.6,  0.4],
       [ 5.2,  3.5,  1.5,  0.2],
       [ 5.2,  3.4,  1.4,  0.2],
       [ 4.7,  3.2,  1.6,  0.2],
       [ 4

In [3]:
from sklearn.preprocessing import StandardScaler
x_std = StandardScaler().fit_transform(x)

In [4]:
from sklearn.decomposition import PCA
model = PCA(n_components=2)
model

PCA(copy=True, iterated_power='auto', n_components=2, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)

In [5]:
new_x = model.fit_transform(x_std)

In [6]:
import pandas as pd
new_df = pd.DataFrame(new_x,columns=['pc1','pc2'])

In [7]:
new_df['target']=y

In [8]:
new_df.head()

Unnamed: 0,pc1,pc2,target
0,-2.264542,0.505704,0
1,-2.086426,-0.655405,0
2,-2.36795,-0.318477,0
3,-2.304197,-0.575368,0
4,-2.388777,0.674767,0


In [19]:
import numpy as np
np.cov(new_x.T)

array([[  2.93035378e+00,   3.93421313e-16],
       [  3.93421313e-16,   9.27403622e-01]])