# Face recognition using k-Nearest Neighbor

---

### Contents
<ol>
    <li><a href="#data-preprocessing" style="color: currentColor">Data preprocessing</a></li>
    <li><a href="#pca" style="color: currentColor">Principal Component Analysis</a></li>
    <li><a href="#knn" style="color: currentColor">k-Nearest-Neighbor</a></li>
    <li><a href="#testing" style="color: currentColor">Model testing</a></li>
    <li><a href="#accuracy" style="color: currentColor">Accuracy evaluation</a></li>
    <li><a href="#further-analysis" style="color: currentColor">Further Analysis</a></li>
</ol>
<br>

<i> Note: To keep the notebook readable, it focuses on executing the code and showing the results. The entire code can be found in the functions folder and the beloging .py files.</i>

---

### Libraries

In [1]:
import os
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image


# os.chdir(r"C:\Users\fedbe\OneDrive\Dokumente\GitHub\topic01_team01")
# checks if the current working directory is correct:
# print(os.getcwd())

---

## <a id="data-preprocessing"></a> 1. Data preprocessing 

(INSERT SOURCES?) (INCLUDE MORE INFO; e.g. FORMULAS TO DESCRIBE ALL STEPS BETTER?)

In the first part, we need to perform different preprocessing steps before we can move on to Principal component analysis (PCA).\
First, we transform each image into a 1D vector to create a 2D data matrix where each row is a single sample (image) and each column corrresponds to one feature (pixel). This procedure is called <b> flattening</b>.\
Secondly, we <b> convert the integer formats to floating point</b> so that arithmetic operations behave correctly and <b> normalize </b> the data to the range [0,1] to ensure that each pixel has a similar scale.\
Before performing further preprocessing steps, we split the dataset into training and test data. The dataset contains 11 images expressing different facial expression and light conditions of 15 subjects each. In order to split these images, 8 images of each subject are choosen randomly to use for training. All remaining images will serve as test data.
Next, we <b> center our data </b> . This is done, by subtracting the mean value for each pixel position across the dataset. This ensures that your dataset has a zero mean, which is essential because PCA will then capture the directions of maximum variance around this mean.\
Lastly, we <b> standardize</b>  the data by subtracting the mean and then dividing by the standard deviation for each feature (also called <b> z-transformation</b>).

In [2]:
# for MacOS users, the path separator is a forward slash, please uncomment the following line and comment the other one:
# %run functions/preprocessing.py

%run functions\preprocessing.py


Total training images: 120
Total testing images: 45

After preprocessing:
Training data shape: (120, 77760)
Testing data shape: (45, 77760)
First training image: Mean ≈ -0.1328


---

## <a id="pca"></a> 2. Pricinpal component analysis

In [None]:
from functions.pca import pca
from functions.pca import pca_transform

# semicolon is used to suppress output of the last line in Jupyter notebooks, so in our case it prevents the return output
U_reduced, S_reduced, V_reduced, train_reduced, eigenvalues, variance_explained = pca(final_train,100)
test_reduced = pca_transform(final_test,V_reduced);
# Maybe already implement here some Plots, to show which PC are the most important ones
# and what number of n_components is the best to use for our model instead of doing everything in 5.?


Succesfully reduced Matrix from (120, 77760) to (120, 100)

Explained variance ratio by first 5 components:
 [0.31275734 0.14035824 0.09093067 0.07442182 0.04790089]

Succesfully transformed Matrix from (45, 77760) to (45, 100)

float32


---

## <a id="knn"></a> 3. k-Nearest-Neighbor

In [4]:
# for MacOS users, the path separator is a forward slash, please uncomment the following line:
# %run functions/knn.py
# %run functions\knn.py
from functions.knn import knn_classifier

knn_classifier(train_reduced, test_reduced, train_labels, 3)

UFuncTypeError: ufunc 'subtract' did not contain a loop with signature matching types (dtype('float32'), dtype('<U9')) -> None

---

## <a id="testing"></a> 4. Model testing

---

## <a id="accuracy"></a> 5. Accuracy evaluation

---

## <a id="further-analysis"></a> 6. Further analysis

In [None]:
# for MacOS users, the path separator is a forward slash, please uncomment the following line:
# %run functions/furtheranalysis.py

%run functions\furtheranalysis.py