# Principal Component Analysis using sklearn
**[Click here for Theory](./1_scratch_fake.ipynb)**

In [1]:
import numpy as np

## For the fake data
**[Scratch Implementation](./1_scratch_fake.ipynb)** for this fake data

In [2]:
A = np.array([
    [1,2],
    [3,4],
    [5,6]
])

In [3]:
from sklearn.decomposition import PCA

In [4]:
pca = PCA(2)

In [5]:
pca.fit(A)

PCA(copy=True, iterated_power='auto', n_components=2, random_state=None,
    svd_solver='auto', tol=0.0, whiten=False)

In [6]:
print("Eigen Values: ")
print(pca.explained_variance_)
print("\nEigen Vectors:")
print(pca.components_)

Eigen Values: 
[8. 0.]

Eigen Vectors:
[[ 0.70710678  0.70710678]
 [-0.70710678  0.70710678]]


**Same as the [first one](./1_scratch_fake.ipynb)! Success**

In [7]:
B = pca.transform(A)
print(B)

[[-2.82842712e+00 -2.22044605e-16]
 [ 0.00000000e+00  0.00000000e+00]
 [ 2.82842712e+00  2.22044605e-16]]


## For the cancer data
**[Scratch Implementation](./2_scratch_cancer.ipynb)** for Cancer data

In [8]:
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
A = cancer.data
A.shape

(569, 30)

In [9]:
pca = PCA(5)

In [10]:
pca.fit(A)

PCA(copy=True, iterated_power='auto', n_components=5, random_state=None,
    svd_solver='auto', tol=0.0, whiten=False)

In [11]:
print("Eigen Values: ")
print(pca.explained_variance_)
print("\nEigen Vectors shape:", pca.components_.shape)

Eigen Values: 
[4.43782605e+05 7.31010006e+03 7.03833742e+02 5.46487379e+01
 3.98900178e+01]

Eigen Vectors shape: (5, 30)


In [12]:
print(pca.components_)

[[ 5.08623202e-03  2.19657026e-03  3.50763298e-02  5.16826469e-01
   4.23694535e-06  4.05260047e-05  8.19399539e-05  4.77807775e-05
   7.07804332e-06 -2.62155251e-06  3.13742507e-04 -6.50984008e-05
   2.23634150e-03  5.57271669e-02 -8.05646029e-07  5.51918197e-06
   8.87094462e-06  3.27915009e-06 -1.24101836e-06 -8.54530832e-08
   7.15473257e-03  3.06736622e-03  4.94576447e-02  8.52063392e-01
   6.42005481e-06  1.01275937e-04  1.68928625e-04  7.36658178e-05
   1.78986262e-05  1.61356159e-06]
 [ 9.28705650e-03 -2.88160658e-03  6.27480827e-02  8.51823720e-01
  -1.48194356e-05 -2.68862249e-06  7.51419574e-05  4.63501038e-05
  -2.52430431e-05 -1.61197148e-05 -5.38692831e-05  3.48370414e-04
   8.19640791e-04  7.51112451e-03  1.49438131e-06  1.27357957e-05
   2.86921009e-05  9.36007477e-06  1.22647432e-05  2.89683790e-07
  -5.68673345e-04 -1.32152605e-02 -1.85961117e-04 -5.19742358e-01
  -7.68565692e-05 -2.56104144e-04 -1.75471479e-04 -3.05051743e-05
  -1.57042845e-04 -5.53071662e-05]
 [-1.2

<font color="red" size=5><b>IMPORTANT</b></font>

**As you can see the Eigen vectors are all same as we [calculated](./2_scratch_cancer.ipynb) except the sign of last one. Since the sign change is for all numbers in that vector it would not cause any problem**

In [13]:
B = pca.transform(A)
print(B[:5])

[[ 1.16014257e+03 -2.93917544e+02  4.85783976e+01 -8.71197531e+00
   3.20004861e+01]
 [ 1.26912244e+03  1.56301818e+01 -3.53945342e+01  1.78612832e+01
  -4.33487404e+00]
 [ 9.95793889e+02  3.91567432e+01 -1.70975298e+00  4.19934010e+00
  -4.66529118e-01]
 [-4.07180803e+02 -6.73803198e+01  8.67284783e+00 -1.17598673e+01
   7.11546109e+00]
 [ 9.30341180e+02  1.89340742e+02  1.37480074e+00  8.49918256e+00
   7.61328922e+00]]
