#  Face recognition through Eigenfaces

The goal of this exercise is to have a good performing model for face recognition using Eigenfaces. This technique was often used in the earliest of computer vision techniques to perform face recognition.
By now, Eigenfaces have been replaced by techniques based on convolutional neural networks.

Here we have a dataset that is already split up into a training and test set. The faces are coming from 50 different people and each person has 15 pictures original data: http://www.anefian.com/research/face_reco.htm

Some preprocessing has already been performed like cropping and rotation.

From the filename of each picture, the persons ID can be determined. E.g. person22_15.jpg is the 15th picture of the 22th person.

Goals of this exercise:
* Read in the images and extract the ID of the person from the filenames
* Convert the image to black and white
* Scale the images to the same size (150x110)
* Transform the data so the images are converted to a single array of all pixel values
* Convert the faces to eigenfaces using PCA
* Train different types of classifiers on the PCA data, also perform some hyperparameter tuning and/or ensemble learning
* Test and compare the classifiers

In [1]:
%matplotlib inline
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn import preprocessing
from sklearn.linear_model import LogisticRegressionCV
from sklearn import svm
import seaborn as sns
import matplotlib.image as mpimg
from skimage import transform
from scipy import ndimage
from skimage.io import imread, imshow
from sklearn import linear_model, datasets
from sklearn import model_selection
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

In [19]:
# Read in the training images
training_images = [] # empty list
test_images = [] # empty list
y_train = [] # empty list
y_test = [] # empty list

# To list files in a directory you can use os.listdir("path here")
# y_train and y_test will have to be parsed from the filename (e.g. person01_02_.jpg -> 1 (or 01), person13_05_.jpg -> 13)
# Reading in the images can be done using the imread() function from skimage.io: https://scikit-image.org/docs/dev/api/skimage.io.html#skimage.io.imread
# Note the parameter as_gray in this function. Use this to convert the image to black and white.
# The result will be a 2D array that  contains all the pixel values between 0 and 1

array([[0.11704392, 0.11312235, 0.12488706, ..., 0.21675686, 0.2246    ,
        0.22067843],
       [0.13665176, 0.12096549, 0.12096549, ..., 0.23244314, 0.23636471,
        0.23244314],
       [0.14841647, 0.12880863, 0.11704392, ..., 0.23579922, 0.24028627,
        0.23636471],
       ...,
       [0.5374702 , 0.5492349 , 0.56099961, ..., 0.17348549, 0.16956392,
        0.16956392],
       [0.57449843, 0.59410627, 0.61538078, ..., 0.16956392, 0.16172078,
        0.15779922],
       [0.56026353, 0.57987137, 0.60450196, ..., 0.16956392, 0.16564235,
        0.15779922]])

In [None]:
# Show a single image using plt.imshow(image,cmap='gray')
# Image is the result of the imread you used above


In [None]:
# Resize the images to 150 rows * 110 columns
# Use the transform.resize function from skimage with mode=constant
# https://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.resize


In [None]:
# Reshape the images to a single dimension using the reshape function
# Note: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.reshape.html


In [None]:
# Dimensionality reduction: Perform Principle Component Analysis with 40 components
# Train the PCA algorithm on the training data
# Transform both the training and the test set to 40 components; Hint: save into a seperate variable


In [None]:
# Visualize 10 principle components (the 10 first Eigenfaces)


In [None]:
# Show the PCA values for a single face
# What do these values represent?


In [None]:
# Show how much variance is explained in total by the principal components


In [None]:
# Generate a combined graph of the explained variance in function of the principal component.
# Barplot for the variance of the single component as well as the cumulative explained variance


In [None]:
# Try and reconstruct a face using the eigenfaces


## Classification

Train multiple classifiers of choice (logistic regression, naive bayes, random forest trees, ensembles) for the recognition of the faces. Use the PCA values as features. Perform cross-validation for the optimal hyperparameters.

Also try following things:
* Vary the number of components, what happens when you decrease or increase the number?
* Does the number of components affect the time it takes to train a model?
* Visualize a few wrongly classified faces


### Change the number of components
also take a look at the time it takes for less or more components and the accuracy

In [14]:
%%time

print("Using %%time at the beginning of a cell you can time how long it takes to execute that cell")
# Fill some time
from time import time,sleep
sleep(1)

print("Or you can time pieces of code yourself using time")
start=time()
sleep(1)
print(f"Time taken: {round(time()-start,2)}s")

Using %%time at the beginning of a cell you can time how long it takes to execute that cell
Or you can time pieces of code yourself using time
Time taken: 1.01s
Wall time: 2.02 s


### Some wrongly classified faces
Use your best model from the things you tried above and take a look at the faces that are wrongly classified