### Principle Component Analysis

#### Dimensionality Reduction Method
Converts High Dimension data to Low Dimension data
- Feature Extraction (Finding only Important Features)
- For Visualization (As we know, we cant use visulization for more than 2-3 data)


### Statistical Components
- Mean
- Standard Deviation
- Covariance (The variance of 1 column against 2 column)
- Covariance Matrix 
For eg we have x1,x2 then well have 2x2 matrix as [(x1,x1)(x1,x2)(x2,x1)(x2,x2)]. If 1 value changes the 2nd value will also change if the value is high.

### Eigen Values / Eigen Vectors 
- Doesn't change the direction when it's transformed.
- Its is used in PCA for Linear Transformation of the Data.
Let's say, 

A-> matrix
then A v = lambda v
where v is Eigen Vector
and lambda is Eigen Value

#### Steps

data(with multiple/n features)-> [x1,x2,x3,x4...xn]

- 1. Calculate the mean for each columns 
Let's say M = mean(data)
- 2. Subratracting each column by it's mean value.
Now, data - M
- 3. Calculate the covariance matrix. 
V = cov(data)
- 4. Calculate Eigen Values and Eigen Vectors 
eig(V)
- 5. Sort the values and pick the k largest eigen values
    - -> Select (values, vector)
    - -> k can be size of the transformed data ie, dimension that we want to transform. 
    - Suppose we have 30 columns, and we want a data with 15 features/columns then 15 will be the k- value.
    - -> ***B = (values, vectors) These are the princpal components***
- 6. Matrix Multiplication of Transpose of Principal Component with Original data.

    B.T * data -> ***Final Transformed Data***

## PCA using Numpy

In [1]:
import numpy as np

###### Data Matrix for PCA

In [2]:
data = np.array([[1,2],[3,4],[5,6]])
data

array([[1, 2],
       [3, 4],
       [5, 6]])

##### Calculate Mean of Each Col

In [3]:
M = np.mean(data.T, axis = 1)
M

array([3., 4.])

##### Subratract the mean value from data

In [4]:
#scale the data before the PCA
scaled_data = data - M
scaled_data

array([[-2., -2.],
       [ 0.,  0.],
       [ 2.,  2.]])

##### Calculate CoVariance Matrix on Scaled Data

In [5]:
V = np.cov(scaled_data.T)
V

array([[4., 4.],
       [4., 4.]])

##### Eigen Values and Eigen Vectors


In [6]:
values,vectors = np.linalg.eig(V)

In [7]:
values.shape

(2,)

In [8]:
values

array([8., 0.])

In [9]:
vectors.shape

(2, 2)

In [10]:
vectors

array([[ 0.70710678, -0.70710678],
       [ 0.70710678,  0.70710678]])

##### Project the data/ Trannsform The Data

In [11]:
p = vectors.T.dot(scaled_data.T)

In [12]:
p.T

array([[-2.82842712,  0.        ],
       [ 0.        ,  0.        ],
       [ 2.82842712,  0.        ]])

## PCA with SKLearn

In [13]:
from sklearn.decomposition import PCA

In [14]:
pca = PCA()
pca.fit(data)

PCA()

In [15]:
pca.components_

array([[ 0.70710678,  0.70710678],
       [ 0.70710678, -0.70710678]])

In [16]:
pca.explained_variance_

array([8.00000000e+00, 2.25080839e-33])

In [17]:
p = pca.transform(data)
p

array([[-2.82842712e+00,  2.22044605e-16],
       [ 0.00000000e+00,  0.00000000e+00],
       [ 2.82842712e+00, -2.22044605e-16]])