### Principal Component Analysis
PCA finds a new set of dimensions such that all the dimensions are orthogonal (and hence linearly independent) and ranked according to the variance of data along them.

### Variance
How much variation or spread the data has
$$
Var(X) = \frac {1}{n} \sum (X_i - X_{mean})^2
$$

### Covariance Matrix
Indicates the level to which two variables vary together
$$
Cov(X, Y) = \frac {1}{n}\sum (X_i - X_{mean})(Y_i - Y_{mean})^T
$$
$$
Cov(X, X) = \frac {1}{n} \sum (X_i - X_{mean})(X_i - X_{mean})^T
$$

### Eigenvector, Eigenvalues
The eigenvectors point in the direction of the maximum variance, and the corresponding eigenvalues indicates the importance of its corresponding eigen vector
$$
A\vec{v} = \lambda \vec{v}
$$

In [1]:
import numpy as np

In [None]:
class PCA:
    def __init__(self, n_components):
        self.n_components = n_components
        self.components = None
        self.mean = None

    def fit(self, X, y):
        # mean
        self.mean = np.mean(X, axis=0)
        X = X - self.mean

        # covariance
        cov = np.cov(X.T)

        # eigenvalues, eigenvectors
        eigenvalues, eigenvectors = np.linalg.eig(cov)
        eigenvectors = eigenvectors.T

        idxs = np.argsort(eigenvalues)[::-1]
        eigenvalues = eigenvalues[idxs]
        eigenvectors = eigenvectors[idxs]

        # store
        self.components = eigenvectors[0:self.n_components]

    def transform(self, X):
        X = X - self.mean
        return np.dot(X, self.components.T)

In [None]:
class PCA:
    def __init__(self, n_components):
        self.n_components = n_components
        self.components = None
        self.mean = None

    def fit(self, X, y):
        # calculate mean
        self.mean = np.mean(X, axis=0)
        X = X - self.mean

        # covariance
        cov = np.cov(X.T)

        # eigenvalues,eigenvectors
        eigenvalues, eigenvectors = np.linalg.eig(cov)
        eigenvectors = eigenvectors.T
        
        idxs = np.argsort(eigenvalues)
        eigenvalues = eigenvalues[idxs]
        eigenvectors = eigenvectors[idxs]

        self.components = eigenvectors[0:self.n_components]

    def transform(self, X):
        X = X - self.mean
        return np.dot(X, self.components.T)