# Hands On - Principal Component Analysis

In this Hands On, we are going to compress an image with PCA. We are going to use `sklearn`'s [olivetti faces dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_olivetti_faces.html) which consists of 400 images of faces with an image size of 64 x 64. Obviously, we're going to implement the PCA from scratch ðŸ˜‹

This Hands On won't have any questions, just pure implementation.

# Setup

As usual, do not tamper with this.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_olivetti_faces

# Task

Given the [olivetti faces dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_olivetti_faces.html), compress the image into smaller components with PCA.

In [None]:
faces = fetch_olivetti_faces(shuffle=True, random_state=12345678)

X = faces.images # Shape: (400, 64, 64)
y = faces.target # Shape: (400,)

num_samples, h, w = X.shape

In [None]:
def center_data(data):
    '''
    This function return the centered data.
    '''
    # vvvvv YOUR CODE HERE vvvvv

In [None]:
def covariance_matrix(centered_data):
    '''
    This function will return the covariance matrix of the centered data.
    Note: covariance matrix will be explained in the Probability and Statistics chapter.
    '''
    # vvvvv YOUR CODE HERE vvvvv

In [None]:
def get_eigen(cov):
    '''
    This function obtains eigenvalues and eigenvectors from the covariance matrix.
    '''
    # vvvvv YOUR CODE HERE vvvvv

In [None]:
def sort_eigen(eigenvalues, eigenvectors):
    '''
    This function will sort the eigenvectors based of their eigenvalues.
    '''
    # vvvvv YOUR CODE HERE vvvvv

In [None]:
def select_components(eigenvectors, num_components):
    '''
    This function will select the top components from the available eigenvectors.
    '''
    # vvvvv YOUR CODE HERE vvvvv

In [None]:
def project_lower_dim(centered_data, components):
    '''
    This function will project the data into a lower dimension.
    '''
    # vvvvv YOUR CODE HERE vvvvv

In [None]:
def pca(data, num_components):
    '''
    This function will conduct PCA, which include:
    1. Centering the data.
    2. Obtaining the covariance matrix of the centered data.
    3. Obtaining the eigenvalues and eigenvectors of the covariance matrix.
    4. Sort the eigenvectors based of their eigenvalues.
    5. Select top components from the sorted eigenvectors and number of components.
    6. Project the data into a lower dimension.
    7. Return the projected data, components and data mean.
    '''
    # vvvvv YOUR CODE HERE vvvvv

In [None]:
def inverse_pca(data, components, mean):
    '''
    This function will reconstruct the data given the projected data, components and data mean.
    '''
    # vvvvv YOUR CODE HERE vvvvv

In [None]:
X_flat = X.reshape(num_samples, h*w)

num_components_list = [1, 10, 100, 1000] # <<< feel free to try any number between [1, 4096]
plt.figure(figsize=(16, 4))

plt.subplot(1, len(num_components_list)+1, 1)
plt.imshow(X[0], cmap='gray')
plt.title(f'Original k = {h * w}')
plt.axis('off')

for i, k in enumerate(num_components_list):
    X_pca, components, mean = pca(X_flat, num_components=k)
    X_reconstructed = inverse_pca(X_pca, components, mean)

    plt.subplot(1, len(num_components_list)+1, i+2)
    plt.imshow(X_reconstructed[0].reshape(h, w), cmap='gray')
    plt.title(f'k={k}')
    plt.axis('off')

plt.tight_layout()
plt.show()