Part B: Principle Component Analysis
You will perform dimensionality reduction on a grayscale image (posted on
Canvas) using PCA. The PCA will be implemented using the eigendecomposition
technique. More specifically, you will find the top k eigenvectors (i.e., principle
components) of a pixel matrix (a gray scale image). Then, using the top k
eigenvector matrix, you will project the pixel matrix on its principle components.
This will reduce the dimension of the pixel matrix without losing much variance.
Note: See the following notebook for understanding the manual implementation
of the eigendecomposition based PCA using python.
https://github.com/rhasanbd/Dimensionality-Reduction-Get-More-From-Less-And-See-theUnseen/blob/master/Dimensionality%20Reduction-PCA-Eigendecomposition-Introduction.ipynb
6
9. Using the matplotlib.pyplot “imread” function read the image as a 2D matrix.
Denote it with “X”. Show the image using matplotlib.pyplot imshow function.
If the image is RGB, then you need to convert it into a grayscale image, as
follows (use matplotlib.pyplot “gray” function).
X = imread("image_path")[:,:,0]
gray()
[3 pts]
10. Implement the steps of eigendecomposition based PCA on X: (a) mean center the
data matrix X, (b) compute the covariance matrix from it, (c) find eigenvalues
and eigenvectors of the covariance matrix (you may use the numpy.linalg.eig
function). [7 pts]
11. Then, find the top k eigenvectors (sort eigenvalue-eigenvector pairs from high to
low, and get the top k eigenvectors), and create an eigenvector matrix using top k
eigenvectors (each eigenvector should be a column vector in the matrix, so there
should be k columns). [10 pts]
12. Finally project the mean centered data on the k top eigenvectors (it should be a
dot product between mean centered X and the top k eigenvector matrix).
 [5 pts]
13. Reconstruct the data matrix by taking dot product between the projected data
(from last step) and the transpose of the top k eigenvector matrix.
 [5 pts]
14. Compute the reconstruction error between the mean centered data matrix X and
reconstructed data matrix (you may use the sklearn.metrics.mean_squared_error
function).
[5 pts]
15. Perform steps 11 – 14 for the following values of k: 10, 30, 50, 100, 500. For
each k, show the reconstructed image (use the matplotlib.pyplot imshow function
with the reconstructed data matrix for each k). With each reconstructed image
print the value of k and the reconstruction error.


In [None]:
import numpy as np
from numpy.linalg import eig
import matplotlib.pyplot as plt

from sklearn.metrics import mean_squared_error
from sklearn.decomposition import PCA


In [None]:
X = plt.imread("Hinton.jpg")[:,:,0]
plt.gray()

In [None]:
X.shape

In [None]:
plt.imshow(X);

In [None]:
X_mean = np.mean(X,axis=0)

In [None]:
X_centered = X - X_mean

In [None]:
#Sample Covariance: Unbiased Estimator
cov_X =  X_centered.T.dot(X_centered)/(X.shape[0]-1)

In [None]:
cov_X.shape

In [None]:
eigenvalues, eigenvectors = eig(cov_X)
eigenvalues = np.real(eigenvalues)
eigenvectors = np.real(eigenvectors)

In [None]:
eigenvalues.shape

In [None]:
eigenvectors.shape

In [None]:
ev2 = eigenvalues.copy()

In [None]:
sortIndex = np.argsort(-ev2)

In [None]:
def getKeigVec(npArr,sortIndex):
    return npArr[:,sortIndex]
    

In [None]:
sortIndex.shape

In [None]:
for k in [10, 30, 50, 100, 500]:
    kComp = getKeigVec(eigenvectors,sortIndex[:k])
    X_projected_k = X_centered.dot(kComp)
    X_reconstructed_k = X_projected_k.dot(kComp.T)
    plt.figure()
    plt.imshow(X_reconstructed_k)
    plt.show()
    reconstruction_error = mean_squared_error(X_centered, X_reconstructed_k)
    print(f"\nOverall Reconstruction Error ( k = {k}): { reconstruction_error}")
    print('='*50)
    
    