# Excercise: Eigen Face

Here, we will look into ability of PCA to perform dimensionality reduction on a set of Labeled Faces in the Wild dataset made available from scikit-learn. Our images will be of shape (62, 47). This problem is also famously known as the eigenface problem. Mathematically, we would like to find the principal components (or eigenvectors) of the covariance matrix of the set of face images. These eigenvectors are essentially a set of orthonormal features depicts the amount of variation between face images. When plotted, these eigenvectors are called eigenfaces.

#### Imports

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from numpy import pi
from sklearn.datasets import fetch_lfw_people

import seaborn as sns; sns.set()

In [2]:
def plot_faces(noisy_faces,h):
    fig, axes = plt.subplots(1, h, figsize=(10, 2.5),
                             subplot_kw={'xticks':[], 'yticks':[]},
                             gridspec_kw=dict(hspace=0.1, wspace=0.1))
    for i, ax in enumerate(axes.flat):
      ax.imshow(noisy_faces[i].reshape(62, 47), cmap='binary_r')
    
    
def thresholdfre(array,percantage):
    percantage=percantage/100
    new=np.cumsum(array)
    new=new/new[len(array)-1]
    plt.plot(new)
    plt.figure()
    for i in range(0,len(new)):
      if (new[i]>=percantage):
        return i+1
      #returning when ever it crosses thre and +1 because indexing 

#### Setup data

In [3]:
faces = fetch_lfw_people(min_faces_per_person=8)
X = faces.data
y = faces.target

print(faces.target_names)
print(faces.images.shape)

['Abdullah Gul' 'Adrien Brody' 'Al Gore' 'Alejandro Toledo' 'Ali Naimi'
 'Alvaro Uribe' 'Amelie Mauresmo' 'Ana Palacio' 'Andre Agassi'
 'Andy Roddick' 'Angelina Jolie' 'Ann Veneman' 'Anna Kournikova'
 'Antonio Palocci' 'Ari Fleischer' 'Ariel Sharon' 'Arnold Schwarzenegger'
 'Atal Bihari Vajpayee' 'Bill Clinton' 'Bill Frist' 'Bill Gates'
 'Bill Graham' 'Bill McBride' 'Bill Simon' 'Bob Hope' 'Britney Spears'
 'Carlos Menem' 'Carlos Moya' 'Catherine Zeta-Jones' 'Celine Dion'
 'Cesar Gaviria' 'Charles Moose' 'Charles Taylor' 'Colin Farrell'
 'Colin Powell' 'Condoleezza Rice' 'David Beckham' 'David Nalbandian'
 'Dick Cheney' 'Dominique de Villepin' 'Donald Rumsfeld' 'Edmund Stoiber'
 'Eduardo Duhalde' 'Fernando Gonzalez' 'Fernando Henrique Cardoso'
 'Fidel Castro' 'George Clooney' 'George HW Bush' 'George Robertson'
 'George W Bush' 'Gerhard Schroeder' 'Gerry Adams'
 'Gloria Macapagal Arroyo' 'Gonzalo Sanchez de Lozada' 'Gordon Brown'
 'Gray Davis' 'Guillermo Coria' 'Halle Berry' 'Hamid Kar

In [4]:
def pca_transform(X_input,percent, num_components):

    """ PCA algorithm as per our pseudo code above.

    Parameters:
    --------------

    X_input: ndarray (num_examples (rows) x num_features(columns))
    Our input data on which we would like to perform PCA.

    num_components: int
    Defines the kth number of principal components (or eigenvectors) to keep
    while performing PCA. These components will be chosen in decreasing 
    order of variances (or eigenvalues).

    """

    # Centering our data (Step 1)
    X_mean = np.mean(X_input, axis=0)
    X_mean = X_mean.reshape(1, -1)
    X_input -= X_mean

    num_examples = (X_input.shape)[0]
    constant = 1/(num_examples - 1)

    # Calculating covariance matrix (Step 2)
    cov_matrix = constant * np.dot(X_input.T, X_input)
    cov_matrix = np.array(cov_matrix, dtype=float)

    # Calculating eigen values and eigen vectors (or first n-principal components)
    # Step 3
    eigvals, eigvecs = np.linalg.eig(cov_matrix)

    # Step 4
    idx = eigvals.argsort()[::-1]
    eigvals = eigvals[idx]
    madhuri=thresholdfre(eigvals,95)
    eigvals = eigvals[idx][:num_components]
    eigvecs = np.atleast_1d(eigvecs[:, idx])[:, :num_components]
    print("Question1:-Number of compoenents required=",end="")
    print(madhuri)

    X_projected = np.dot(X_input, eigvecs)
    #eigvecs = eigvecs.T
    return eigvecs, eigvals

In [None]:
eig_V,eig_v=pca_transform(X,95,100)
X_re= X @ eig_V @ eig_V.T
print("before:")
plot_faces(X,10)
plt.figure()
print("after:reconstruction of the first 10 face images using only 100 principal components")
plot_faces(X_re,10)

Since our images is of the shape (62, 47), we unroll each image into a single row vector of shape (1, 2914). This means that we have 2914 features defining each image. These 2914 features will result into 2914 principal components in the PCA projection space. Therefore, each image location contributes more or less to each principal component.

#### Implement Eigen Faces

# Adding noise to images

We now add gaussian noise to the images. Will PCA be able to effectively perform dimensionality reduction? 

In [None]:
def plot_noisy_faces(noisy_faces):
    fig, axes = plt.subplots(2, 10, figsize=(10, 2.5),
                             subplot_kw={'xticks':[], 'yticks':[]},
                             gridspec_kw=dict(hspace=0.1, wspace=0.1))
    for i, ax in enumerate(axes.flat):
      ax.imshow(noisy_faces[i].reshape(62, 47), cmap='binary_r')

Below we plot first twenty noisy input face images.

In [None]:
np.random.seed(42)
noisy_faces = np.random.normal(X, 15)
plot_noisy_faces(noisy_faces)

eig_nV,eig_nv=pca_transform(noisy_faces,95,100)
X_re_noise= noisy_faces @ eig_nV @ eig_nV.T
plot_faces(noisy_faces,10)
plt.figure()
plot_faces(X_re_noise,10)

In [None]:
Question1:-principal components are required such that 95% of the variance in the data is preserved before noise=178

    
Questionn3:--principal components are required such that 95% of the variance in the data is preserved before noise=1014
