## Principal Component Analysis
Principal Component Analysis, or PCA for short, is a method for reducing the dimensionality of data. It can be thought of as a projection method where data with m-columns (features) is projected into a subspace with m or fewer columns, whilst retaining the essence of the original data. 

In [1]:
# principal component analysis
from numpy import array
from numpy import mean
from numpy import cov
from numpy.linalg import eig

A = array([
    [1, 2],
    [3, 4],
    [5, 6]
])

# column means
M = mean(A.T, axis=1)
print(M)
# center columns by substracting column means
C = A - M
print(C)
# calculate covariance matrix of centered matrix
V = cov(C.T)
print(V)
# factorize covariance matrix
values, vectors = eig(V)
print(values)
print(vectors)
# project data
P = vectors.T.dot(C.T)
print(P.T)

[3. 4.]
[[-2. -2.]
 [ 0.  0.]
 [ 2.  2.]]
[[4. 4.]
 [4. 4.]]
[8. 0.]
[[ 0.70710678 -0.70710678]
 [ 0.70710678  0.70710678]]
[[-2.82842712  0.        ]
 [ 0.          0.        ]
 [ 2.82842712  0.        ]]


In [2]:
# principal component analysis with scikit-learn
from numpy import array
from sklearn.decomposition import PCA

A = array([
    [1, 2],
    [3, 4],
    [5, 6]
])

# create the transform
pca = PCA(2)
# fit the transform
pca.fit(A)
# access values and vectors
print(pca.components_)
print(pca.explained_variance_)
# transform data
B = pca.transform(A)
print(B)

[[ 0.70710678  0.70710678]
 [-0.70710678  0.70710678]]
[8. 0.]
[[-2.82842712e+00 -2.22044605e-16]
 [ 0.00000000e+00  0.00000000e+00]
 [ 2.82842712e+00  2.22044605e-16]]
