## Testing the bias in open source facial detection models

In [1]:
import numpy as np
import os
import sys
sys.path.append('FNR.py')
sys.path.append('FPR.py')
import FNR
import FPR
from FNR import FNR
from FPR import FPR




The dataset used is a minute fraction of the SynthPar2 dataset:https://huggingface.co/datasets/pravsels/synthpar 

The dataset is in the form:

```
synthpar/ST1/1.png
...
synthpar/ST1/n.png
...
synthpar/ST2/1.png
...
synthpar/ST2/n.png
.
.
.
synthpar/ST8
```

where each ST subfolder contains 13 identities and 8 variations of each ID. There are 8 different skin tone groups.

In this experiment we are using the VGG Face model which is available from DeepFace: https://github.com/serengil/deepface/tree/master/deepface/models/facial_recognition

In [2]:
FNR(dataset_dir = 'synthpar/ST1', use_multiprocessing=True, num_cores=5)
FPR(dataset_dir = 'synthpar/ST1', use_multiprocessing=True, num_cores=5, percentage=10)

364 total input pairs from 13 IDs.


Processing input pairs: 100%|██████████| 364/364 [02:00<00:00,  3.03pair/s]


False Negatives: 155
Mean FNR across all IDs in group synthpar/ST1: 42.5824%
Total number of input pairs: 4992
Selected 499 pairs for evaluation (10% of total)


Processing input pairs: 100%|██████████| 499/499 [02:28<00:00,  3.37pair/s]

False Positives: 33
True Negatives: 466
Mean FPR across all IDs in group synthpar/ST1: 6.6132%





The FNR is calculated by processing every possible input pair for each individual in each ST group. For example, in ST1, there are 13 unique IDs and 8 variations of each one (as is the case for all ST groups). The FNR function produces all possible comparison combinations for each of the 13 IDs, i.e. 28 input pairs for 13 IDs, and uses DeepFace ```verify``` to check if they are the same individual or not. A false negative is a prediction from the model that an input pair is not the same person, as the input pairs are all representative of the same person. 

The FNR across the first Skin Tone group is 43%. This indicates the model is particularly poor at matching individuals with themselves; this is perhaps due to the challenging nature of the SynthPar Dataset as the variations of each individual are deliberately made to be broad. The images do not follow an ISO/IEC/NIST standard, which would be the case in most if not all biometric authentication applications. 

The FPR is calculated in a similar fashion to above, but instead of comparing individuals to themselves, it compares individuals to other individuals. The function firstly computes all possible combinations (exlcuding combinations of individuals with themselves) and then inputs these pairs into ```verify```. To save on computation, the user is able to select a percentage of the total possible combinations, which then picks out a random number of input pairs from all combinations to test against. 

This process is then repeated for all ST groups.


In [3]:
FNR(dataset_dir = 'synthpar/ST2', use_multiprocessing=True, num_cores=5)
FPR(dataset_dir = 'synthpar/ST2', use_multiprocessing=True, num_cores=5, percentage=10)

364 total input pairs from 13 IDs.


Processing input pairs: 100%|██████████| 364/364 [01:54<00:00,  3.19pair/s]


False Negatives: 197
Mean FNR across all IDs in group synthpar/ST2: 54.1209%
Total number of input pairs: 4992
Selected 499 pairs for evaluation (10% of total)


Processing input pairs: 100%|██████████| 499/499 [02:23<00:00,  3.47pair/s]

False Positives: 32
True Negatives: 467
Mean FPR across all IDs in group synthpar/ST2: 6.4128%





In [4]:
FNR(dataset_dir = 'synthpar/ST3', use_multiprocessing=True, num_cores=5)
FPR(dataset_dir = 'synthpar/ST3', use_multiprocessing=True, num_cores=5, percentage=10)

364 total input pairs from 13 IDs.


Processing input pairs: 100%|██████████| 364/364 [01:51<00:00,  3.28pair/s]


False Negatives: 202
Mean FNR across all IDs in group synthpar/ST3: 55.4945%
Total number of input pairs: 4992
Selected 499 pairs for evaluation (10% of total)


Processing input pairs: 100%|██████████| 499/499 [02:24<00:00,  3.45pair/s]

False Positives: 35
True Negatives: 464
Mean FPR across all IDs in group synthpar/ST3: 7.0140%





In [5]:
FNR(dataset_dir = 'synthpar/ST4', use_multiprocessing=True, num_cores=5)
FPR(dataset_dir = 'synthpar/ST4', use_multiprocessing=True, num_cores=5, percentage=10)

364 total input pairs from 13 IDs.


Processing input pairs: 100%|██████████| 364/364 [01:52<00:00,  3.24pair/s]


False Negatives: 205
Mean FNR across all IDs in group synthpar/ST4: 56.3187%
Total number of input pairs: 4992
Selected 499 pairs for evaluation (10% of total)


Processing input pairs: 100%|██████████| 499/499 [02:23<00:00,  3.48pair/s]

False Positives: 45
True Negatives: 454
Mean FPR across all IDs in group synthpar/ST4: 9.0180%





In [6]:
FNR(dataset_dir = 'synthpar/ST5', use_multiprocessing=True, num_cores=5)
FPR(dataset_dir = 'synthpar/ST5', use_multiprocessing=True, num_cores=5, percentage=10)

364 total input pairs from 13 IDs.


Processing input pairs: 100%|██████████| 364/364 [01:50<00:00,  3.28pair/s]


False Negatives: 198
Mean FNR across all IDs in group synthpar/ST5: 54.3956%
Total number of input pairs: 4992
Selected 499 pairs for evaluation (10% of total)


Processing input pairs: 100%|██████████| 499/499 [02:23<00:00,  3.48pair/s]

False Positives: 29
True Negatives: 470
Mean FPR across all IDs in group synthpar/ST5: 5.8116%





In [7]:
FNR(dataset_dir = 'synthpar/ST6', use_multiprocessing=True, num_cores=5)
FPR(dataset_dir = 'synthpar/ST6', use_multiprocessing=True, num_cores=5, percentage=10)

364 total input pairs from 13 IDs.


Processing input pairs: 100%|██████████| 364/364 [01:50<00:00,  3.28pair/s]


False Negatives: 171
Mean FNR across all IDs in group synthpar/ST6: 46.9780%
Total number of input pairs: 4992
Selected 499 pairs for evaluation (10% of total)


Processing input pairs: 100%|██████████| 499/499 [02:20<00:00,  3.56pair/s]

False Positives: 38
True Negatives: 461
Mean FPR across all IDs in group synthpar/ST6: 7.6152%





In [8]:
FNR(dataset_dir = 'synthpar/ST7', use_multiprocessing=True, num_cores=5)
FPR(dataset_dir = 'synthpar/ST7', use_multiprocessing=True, num_cores=5, percentage=10)

364 total input pairs from 13 IDs.


Processing input pairs: 100%|██████████| 364/364 [01:52<00:00,  3.25pair/s]


False Negatives: 211
Mean FNR across all IDs in group synthpar/ST7: 57.9670%
Total number of input pairs: 4992
Selected 499 pairs for evaluation (10% of total)


Processing input pairs: 100%|██████████| 499/499 [02:22<00:00,  3.50pair/s]

False Positives: 37
True Negatives: 462
Mean FPR across all IDs in group synthpar/ST7: 7.4148%





In [9]:
FNR(dataset_dir = 'synthpar/ST8', use_multiprocessing=True, num_cores=5)
FPR(dataset_dir = 'synthpar/ST8', use_multiprocessing=True, num_cores=5, percentage=10)

364 total input pairs from 13 IDs.


Processing input pairs: 100%|██████████| 364/364 [01:48<00:00,  3.36pair/s]


False Negatives: 158
Mean FNR across all IDs in group synthpar/ST8: 43.4066%
Total number of input pairs: 4992
Selected 499 pairs for evaluation (10% of total)


Processing input pairs: 100%|██████████| 499/499 [02:19<00:00,  3.57pair/s]

False Positives: 21
True Negatives: 478
Mean FPR across all IDs in group synthpar/ST8: 4.2084%



