# How easy is it to test the accuracy of a face recognition algorithm?

## Part 1 - taking a sample set and finding the faces

### Background

Face recognition (FR) algorithms are becoming more and more commonly used in world today. From fraud detection at passport offices to real-time face recognition at sporting stadiums, you have probably been matched before! Given their increased use, we should also be wondering how accurate they are. Face recognition is a subset of the broader field of [biometrics](https://en.wikipedia.org/wiki/Biometrics) which aims to recognise a person based on anaylsis of their uniquq biological characteristics.

[Open source algorithms](https://github.com/ageitgey/face_recognition) are readily available such as this example written using dlib. dlib is an open source collection of image matching libraries written in C++. The 'face_recognition' module is a wrapper allowing access to this functionality using Python.

[Labelled data sets](https://sites.google.com/view/sof-dataset) can also be found which provide a useful tool to test algorithms and their accuracy before putting them to use. The fact that the data is labelled means the "ground truth" information is available. This allows testers to ensure they are matching pairs of images of the same person's face, or images of different faces at any given time. This means the accuracy of the score produced by an algorithm can be easily measured, since a low score should result if the faces are different, and a high score should result if the images are of the same person. Doing this over a large sample set can give a bigger picture of how accurate the algorithm performs statistically. The labelling of the data set referenced here is explained [in this report](http://www.cse.yorku.ca/~mafifi/TheSpecsonFace.pdf).

Ultimately the [reciever operating characteristics](https://en.wikipedia.org/wiki/Receiver_operating_characteristic) determined from an algoritm will give an overview of its performance. False acceptance rate (FAR) and false rejection rate (FRR) are commonly used to check the performance of an algorithm. In laymans terms:

* False acceptance -> A high score from the algorithm despite the images being of different persons.
* False rejection -> A low score from the algorithm despite the images being of the same person.


### Aim 

We want to understand how we can quantify the accuracy of an open source face recogntion algorithm. We should be able to understand what impact the quality of the image data has on the accuracy of matching using the provided metadata as a guideline.


In [1]:
# Import the data
import fr_data_extractor as dext

In [2]:
# Define the metadata columns as per the spec provided
metadata_columns = ['subject_id', 'image_sequence_number', 'gender', 'age', 'lighting', 'view', 'cropped', 'facial_emotion', 'year', 'part', 'occlusion', 'image_filters', 'level_of_difficulty']

In [3]:
# Read the folder of images (default kwarg for image_type is jpg, which is suitable for this dataset)
dext_read = dext.FrDataExtractor(image_path='./images', columns=metadata_columns) 

In [4]:
# Generate a CSV file from the image file names
dext_read.generate_csv("image_metadata.csv", ",") # should write a fresh CSV to local directory

In [5]:
# Import pandas to read the CSV into a dataframe
import pandas as pd

In [6]:
fr_df = pd.read_csv('image_metadata.csv')

In [7]:
fr_df.columns # Check the columns of the data frame are as expected

Index(['subject_id', 'image_sequence_number', 'gender', 'age', 'lighting',
       'view', 'cropped', 'facial_emotion', 'year', 'part', 'occlusion',
       'image_filters', 'level_of_difficulty', 'image_path'],
      dtype='object')

In [8]:
# Check the shape 
fr_df.shape

(42585, 14)

In [9]:
# Check the first 10 rows
fr_df.head(10)

Unnamed: 0,subject_id,image_sequence_number,gender,age,lighting,view,cropped,facial_emotion,year,part,occlusion,image_filters,level_of_difficulty,image_path
0,AboA,152,m,33,i,nf,nc,hp,2016,2,en,nl,m,./images/AboA_00152_m_33_i_nf_nc_hp_2016_2_en_...
1,MonD,1961,f,30,o,nf,nc,hp,2016,1,en,nl,h,./images/MonD_01961_f_30_o_nf_nc_hp_2016_1_en_...
2,BahA,594,m,36,i,fr,nc,no,2016,2,e0,Gn,e,./images/BahA_00594_m_36_i_fr_nc_no_2016_2_e0_...
3,YosB,2527,f,20,i,nf,nc,hp,2016,2,e0,Gs,m,./images/YosB_02527_f_20_i_nf_nc_hp_2016_2_e0_...
4,SarE,2381,f,22,i,nf,nc,hp,2016,2,e0,Ps,e,./images/SarE_02381_f_22_i_nf_nc_hp_2016_2_e0_...
5,BahA,717,m,36,i,fr,nc,no,2016,2,en,nl,e,./images/BahA_00717_m_36_i_fr_nc_no_2016_2_en_...
6,AhmI,328,m,20,o,nf,nc,hp,2009,1,em,nl,e,./images/AhmI_00328_m_20_o_nf_nc_hp_2009_1_em_...
7,MotM,2007,m,21,i,fr,nc,hp,2015,1,e0,Gs,h,./images/MotM_02007_m_21_i_fr_nc_hp_2015_1_e0_...
8,MotM,1984,m,21,i,fr,nc,sd,2015,1,e0,Gn,e,./images/MotM_01984_m_21_i_fr_nc_sd_2015_1_e0_...
9,AhmS,356,m,20,o,nf,nc,hp,2015,1,e0,Gn,m,./images/AhmS_00356_m_20_o_nf_nc_hp_2015_1_e0_...


In [10]:
# Check the datatypes
fr_df.dtypes

subject_id               object
image_sequence_number     int64
gender                   object
age                       int64
lighting                 object
view                     object
cropped                  object
facial_emotion           object
year                      int64
part                      int64
occlusion                object
image_filters            object
level_of_difficulty      object
image_path               object
dtype: object

In [11]:
# Import the face matcher class
import face_matcher as fm

In [12]:
matcher = fm.FaceMatcher()
# Test finding faces with the first 5 image paths in df
first_ten_face_locs = fr_df['image_path'][0:5].tolist() # convert to list to loop over and input to find_faces function
for face_loc in first_ten_face_locs:
    print(matcher.find_faces(face_loc))

[(41, 381, 183, 240)]
[(187, 225, 391, 21)]
None
None
[(136, 365, 254, 247)]


In [13]:
# Face finding takes about 1 second to compute for each image, which is quite time consuming
# Let's reduce the size of the sample set by taking only the images marked as easy and in year 2015
fr_easy_df = fr_df[fr_df['level_of_difficulty'] == "e"]
fr_easy_df = fr_easy_df[fr_easy_df['year'] == 2015]
fr_easy_df.shape

(2085, 14)

In [14]:
fr_easy_df.head(3)

Unnamed: 0,subject_id,image_sequence_number,gender,age,lighting,view,cropped,facial_emotion,year,part,occlusion,image_filters,level_of_difficulty,image_path
8,MotM,1984,m,21,i,fr,nc,sd,2015,1,e0,Gn,e,./images/MotM_01984_m_21_i_fr_nc_sd_2015_1_e0_...
14,PetK,2257,m,20,i,nf,nc,no,2015,1,e0,Ps,e,./images/PetK_02257_m_20_i_nf_nc_no_2015_1_e0_...
60,MosK,1975,m,20,o,nf,nc,sd,2015,1,e0,Gs,e,./images/MosK_01975_m_20_o_nf_nc_sd_2015_1_e0_...


In [15]:
# Better to reindex the dataframe
fr_easy_df = fr_easy_df.reset_index(drop=True)
fr_easy_df.head(3)

Unnamed: 0,subject_id,image_sequence_number,gender,age,lighting,view,cropped,facial_emotion,year,part,occlusion,image_filters,level_of_difficulty,image_path
0,MotM,1984,m,21,i,fr,nc,sd,2015,1,e0,Gn,e,./images/MotM_01984_m_21_i_fr_nc_sd_2015_1_e0_...
1,PetK,2257,m,20,i,nf,nc,no,2015,1,e0,Ps,e,./images/PetK_02257_m_20_i_nf_nc_no_2015_1_e0_...
2,MosK,1975,m,20,o,nf,nc,sd,2015,1,e0,Gs,e,./images/MosK_01975_m_20_o_nf_nc_sd_2015_1_e0_...


In [16]:
# group by the subject ID
grouped_easy_df = fr_easy_df.groupby('subject_id')

In [17]:
# Count the number of identities in the mix
subjects = grouped_easy_df.groups.keys()
len(subjects)

66

In [18]:
# This should work quite well as a small sample set for both mates matching and cross matching assessments
# Here want want to add the found faces to a new column. This is effectively an enrollment of the face locations.
# A function will be defined so it can be tested on a small set first before proceeding to the full set since
# this operation will take quite some time...
# If the faces are found correctly, we can now process over the full subset and append the column to our dataframe
faces_found_col = matcher.find_faces_batch(fr_easy_df['image_path'].tolist()[0:5], do_log=True, track_progress=2)
print(faces_found_col)  # Test for 5

Processing... ./images/MotM_01984_m_21_i_fr_nc_sd_2015_1_e0_Gn_e.jpg
Faces found [(159, 366, 404, 122)]
Processing... ./images/PetK_02257_m_20_i_nf_nc_no_2015_1_e0_Ps_e.jpg
Faces found [(117, 432, 287, 262)]
2 processed in 5 seconds
Processing... ./images/MosK_01975_m_20_o_nf_nc_sd_2015_1_e0_Gs_e.jpg
Faces found [(125, 452, 329, 248)]
Processing... ./images/MeiH_01356_f_20_i_nf_nc_sd_2015_1_en_nl_e.jpg
Faces found [(136, 459, 429, 165)]
4 processed in 11 seconds
Processing... ./images/GeoS_00791_m_22_i_fr_nc_no_2015_1_e0_Gs_e.jpg
Faces found [(135, 416, 379, 171)]
All completed. Average processing time of 2.6 seconds per image
[[(159, 366, 404, 122)], [(117, 432, 287, 262)], [(125, 452, 329, 248)], [(136, 459, 429, 165)], [(135, 416, 379, 171)]]


In [19]:
# Seems quite slow, lets try with the hog model with is less accurate but faster
faces_found_col = matcher.find_faces_batch(fr_easy_df['image_path'].tolist()[0:5], do_log=True, track_progress=2, match_model="hog")
print(faces_found_col)  # Test for 5

Processing... ./images/MotM_01984_m_21_i_fr_nc_sd_2015_1_e0_Gn_e.jpg
Faces found [(171, 379, 439, 111)]
Processing... ./images/PetK_02257_m_20_i_nf_nc_no_2015_1_e0_Ps_e.jpg
Faces found [(118, 464, 341, 241)]
2 processed in 0 seconds
Processing... ./images/MosK_01975_m_20_o_nf_nc_sd_2015_1_e0_Gs_e.jpg
Faces found [(142, 489, 365, 266)]
Processing... ./images/MeiH_01356_f_20_i_nf_nc_sd_2015_1_en_nl_e.jpg
Faces found [(170, 491, 480, 170)]
4 processed in 0 seconds
Processing... ./images/GeoS_00791_m_22_i_fr_nc_no_2015_1_e0_Gs_e.jpg
Faces found [(142, 409, 409, 141)]
All completed. Average processing time of 0.0 seconds per image
[[(171, 379, 439, 111)], [(118, 464, 341, 241)], [(142, 489, 365, 266)], [(170, 491, 480, 170)], [(142, 409, 409, 141)]]


In [20]:
faces_found_col = matcher.find_faces_batch(fr_easy_df['image_path'].tolist(), match_model="hog") # suppress logging in this case
len(faces_found_col)

200 processed in 31 seconds
400 processed in 62 seconds
600 processed in 93 seconds
800 processed in 125 seconds
1000 processed in 155 seconds
1200 processed in 186 seconds
1400 processed in 217 seconds
1600 processed in 247 seconds
1800 processed in 277 seconds
2000 processed in 306 seconds
All completed. Average processing time of 0.15299760191846523 seconds per image


2085

In [34]:
# Add the results as a new column
fr_easy_df['face_locations'] = faces_found_col
fr_easy_df.head(3)

Unnamed: 0,subject_id,image_sequence_number,gender,age,lighting,view,cropped,facial_emotion,year,part,occlusion,image_filters,level_of_difficulty,image_path,face_locations,faces_found
0,MotM,1984,m,21,i,fr,nc,sd,2015,1,e0,Gn,e,./images/MotM_01984_m_21_i_fr_nc_sd_2015_1_e0_...,"[(171, 379, 439, 111)]",1
1,PetK,2257,m,20,i,nf,nc,no,2015,1,e0,Ps,e,./images/PetK_02257_m_20_i_nf_nc_no_2015_1_e0_...,"[(118, 464, 341, 241)]",1
2,MosK,1975,m,20,o,nf,nc,sd,2015,1,e0,Gs,e,./images/MosK_01975_m_20_o_nf_nc_sd_2015_1_e0_...,"[(142, 489, 365, 266)]",1


In [22]:
# Also add the number of faces found as an integer
# Ternary expression handles any values of None, setting those to 0 faces found
faces_found_num_col = list(map(lambda x: len(x) if x else 0 , faces_found_col))
fr_easy_df['faces_found'] = faces_found_num_col
fr_easy_df.head(3)

Unnamed: 0,subject_id,image_sequence_number,gender,age,lighting,view,cropped,facial_emotion,year,part,occlusion,image_filters,level_of_difficulty,image_path,face_locations,faces_found
0,MotM,1984,m,21,i,fr,nc,sd,2015,1,e0,Gn,e,./images/MotM_01984_m_21_i_fr_nc_sd_2015_1_e0_...,"[(171, 379, 439, 111)]",1
1,PetK,2257,m,20,i,nf,nc,no,2015,1,e0,Ps,e,./images/PetK_02257_m_20_i_nf_nc_no_2015_1_e0_...,"[(118, 464, 341, 241)]",1
2,MosK,1975,m,20,o,nf,nc,sd,2015,1,e0,Gs,e,./images/MosK_01975_m_20_o_nf_nc_sd_2015_1_e0_...,"[(142, 489, 365, 266)]",1


In [23]:
# Have a peek at the images where no faces were found
no_faces = fr_easy_df[fr_easy_df['faces_found'] == 0].reset_index(drop=True)
print(len(no_faces), "\n", no_faces['image_path'][0], "\n", no_faces['image_path'][1], "\n", no_faces['image_path'][2])

113 
 ./images/DoaB_00744_f_19_o_nf_nc_no_2015_1_e0_Ps_e.jpg 
 ./images/DoaB_00746_f_19_o_nf_nc_no_2015_1_e0_Gn_e.jpg 
 ./images/YosA_02459_f_19_o_nf_nc_hp_2015_1_e0_Ps_e.jpg


<p float="left">
  <img style="display: inline;" src="./images/DoaB_00744_f_19_o_nf_nc_no_2015_1_e0_Ps_e.jpg " width="200" />
  <img style="display: inline;" src="./images/DoaB_00746_f_19_o_nf_nc_no_2015_1_e0_Gn_e.jpg" width="200" /> 
  <img style="display: inline;" src="./images/YosA_02459_f_19_o_nf_nc_hp_2015_1_e0_Ps_e.jpg" width="200" />
</p>

In [24]:
# Have a peek at the images where multiple faces were found
many_faces = fr_easy_df[fr_easy_df['faces_found'] > 1].reset_index(drop=True)
print(len(many_faces), "\n", many_faces['image_path'][0], "\n", many_faces['image_path'][1], "\n", many_faces['image_path'][2])

11 
 ./images/GeoS_00798_m_22_o_fr_nc_hp_2015_1_e0_Gs_e.jpg 
 ./images/MazR_01333_m_24_o_nf_nc_hp_2015_1_en_nl_e.jpg 
 ./images/PetJ_02247_m_20_o_fr_cr_hp_2015_1_e0_Gs_e.jpg


<p float="left">
  <img style="display: inline;" src="./images/GeoS_00798_m_22_o_fr_nc_hp_2015_1_e0_Gs_e.jpg" width="200" />
  <img style="display: inline;" src="./images/MazR_01333_m_24_o_nf_nc_hp_2015_1_en_nl_e.jpg" width="200" /> 
  <img style="display: inline;" src="./images/PetJ_02247_m_20_o_fr_cr_hp_2015_1_e0_Gs_e.jpg" width="200" />
</p>

In [25]:
# We should remove any instances where the number of faces found is not 1
fr_easy_df_one_face_only = fr_easy_df[fr_easy_df['faces_found'] == 1].reset_index(drop=True)
fr_easy_df_one_face_only.head(3)

Unnamed: 0,subject_id,image_sequence_number,gender,age,lighting,view,cropped,facial_emotion,year,part,occlusion,image_filters,level_of_difficulty,image_path,face_locations,faces_found
0,MotM,1984,m,21,i,fr,nc,sd,2015,1,e0,Gn,e,./images/MotM_01984_m_21_i_fr_nc_sd_2015_1_e0_...,"[(171, 379, 439, 111)]",1
1,PetK,2257,m,20,i,nf,nc,no,2015,1,e0,Ps,e,./images/PetK_02257_m_20_i_nf_nc_no_2015_1_e0_...,"[(118, 464, 341, 241)]",1
2,MosK,1975,m,20,o,nf,nc,sd,2015,1,e0,Gs,e,./images/MosK_01975_m_20_o_nf_nc_sd_2015_1_e0_...,"[(142, 489, 365, 266)]",1


In [26]:
fr_easy_df_one_face_only.shape

(1961, 16)

In [27]:
# Store the results up to now in a CSV for safe keeping
fr_easy_df_one_face_only.to_csv(index=False, sep=",", path_or_buf="./easy_faces_2015_one_face_only.csv")

Please refer to part 2 which will focus on matching...