# Face recognition using k-Nearest Neighbor

---

### Contents
<ol>
    <li><a href="#data-preprocessing" style="color: currentColor">Data preprocessing</a></li>
    <li><a href="#pca" style="color: currentColor">Principal Component Analysis</a></li>
    <li><a href="#knn" style="color: currentColor">kNN-algorithm</a></li>
    <li><a href="#testing" style="color: currentColor">Model testing</a></li>
    <li><a href="#accuracy" style="color: currentColor">Accuracy evaluation</a></li>
    <li><a href="#further-analysis" style="color: currentColor">Further Analysis</a></li>
</ol>
<br>

---

### Libraries

In [1]:
import os
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

---

## <a id="image-loading"></a> 1. Image loading



In [2]:
# Loading the image path as a string
image_path = "/Users/matspanke/Documents/GitHub/topic01_team01/datasets"

# Creating a new list with only .gif files. 
# -> Every file from the dataset.
image_files = [f for f in os.listdir(image_path) if f.endswith(".gif")] 

# Store umprocessed images.
all_raw_images = []

# Sorting the list
# Windows does it automatically but not Mac
image_files.sort()

# Combines the data elements with each .gif file.
joined_img_path = [os.path.join(image_path, f) for f in image_files]

for img_path in joined_img_path:
    # Get the image data.
    with Image.open(img_path) as img:
        # Ensure grayscale.
        img = img.convert("L")
        
        # Convert the image object to a matrix representation
        # of the pixel values of the image.
        image_data = np.array(img) 

        # Add the data to all loaded images.
        all_raw_images.append(image_data)

---

## <a id="data-preprocessing"></a> 2. Data preprocessing 



### Test / Training Split

In [None]:
# Total amount of images.
N = 165

# We extract all indices and assign them to their individual.
# 165 divided by 11 individual images for 15 people.
test_img_indices = [[i for i in range(N) if i // 11 == j] for j in range(15)]

# For each individual we take 3 random images for our test set.
# The test-list should have 45 images, 3 of each individual
test_raw_images = [int(i) for individual in test_img_indices for i in np.random.choice(individual, 3, replace=False)]

# For the train set we just get access to every image that is not in the test set.
train_raw_images = [i for i in range(N) if i not in test_img_indices]

### Image Preprocessing

In [None]:
# The images have to be flattened so they fit into one matrix
X = [img.flatten() for img in (all_raw_images)]

# Calculation of the mean face.
# As each row represents one image with the amount of pixels, 
# the matrix dimension is (165, 777 600).
# Therefore we need to calculate the mean of each column.
mean_face = np.mean(X, axis=0)

# Center the data.
X_centered = X - mean_face

---

## <a id="pca"></a> 3. Pricinpal component analysis

---

## <a id="knn"></a> 4. kNN-algorithm

---

## <a id="testing"></a> 5. Model testing

---

## <a id="accuracy"></a> 6. Accuracy evaluation

---

## <a id="further-analysis"></a> 7. Further analysis