## Testing the bias in open source facial detection models

In [None]:
import numpy as np
import os
import sys
sys.path.append('FNR.py')
sys.path.append('FPR.py')
import FNR
import FPR
from FNR import FNR
from FPR import FPR




The dataset used is a minute fraction of the SynthPar2 dataset:https://huggingface.co/datasets/pravsels/synthpar 

The dataset is in the form:

```
synthpar/ST1/1.png
...
synthpar/ST1/n.png
...
synthpar/ST2/1.png
...
synthpar/ST2/n.png
.
.
.
synthpar/ST8
```

where each ST subfolder contains 13 identities and 8 variations of each ID. There are 8 different skin tone groups.

In this experiment we are using the VGG Face model which is available from DeepFace: https://github.com/serengil/deepface/tree/master/deepface/models/facial_recognition

In [2]:
dataset_dir = 'synthpar/ST1'

The FNR is calculated by processing every possible input pair for each individual in each ST group. For example, in ST1, there are 13 unique IDs and 8 variations of each one (as is the case for all ST groups). The FNR function produces all possible comparison combinations for each of the 13 IDs, i.e. 28 input pairs for 13 IDs, and uses DeepFace ```verify``` to check if they are the same individual or not. A false negative is a prediction from the model that an input pair is not the same person, as the input pairs are all representative of the same person. 

In [3]:
FNR(dataset_dir)

364 total input pairs from 13 IDs.


Processing input pairs: 100%|██████████| 364/364 [01:51<00:00,  3.25pair/s]


Mean FNR across all IDs in group synthpar/ST1: 62.09%


{'mean_FNR': 0.6208791208791209}

made the dataset first and then run it all in one go with precompute steps

The FPR is calculated in a similar fashion to above, but instead of comparing individuals to themselves, it compares individuals to other individuals. The function firstly computes all possible combinations (exlcuding combinations of individuals with themselves) and then inputs these pairs into ```verify```. To save on computation, the user is able to select a percentage of the total possible combinations, which then picks out a random number of input pairs from all combinations to test against. 

In [4]:
FPR(dataset_dir, percentage=10)

Total number of input pairs: 4992
Selected 499 pairs for evaluation (10% of total)


Processing input pairs: 100%|██████████| 499/499 [02:23<00:00,  3.48pair/s]

False Positives: 28
True Negatives: 471
False Positive Rate (FPR): 0.06
True Negative Rate (TNR): 0.94





This process is then repeated for all ST groups.


In [5]:
dataset_dir = 'synthpar/ST2'

In [6]:
FNR(dataset_dir)

364 total input pairs from 13 IDs.


Processing input pairs: 100%|██████████| 364/364 [01:50<00:00,  3.31pair/s]


Mean FNR across all IDs in group synthpar/ST2: 53.85%


{'mean_FNR': 0.5384615384615384}

In [7]:
FPR(dataset_dir, percentage=10)

Total number of input pairs: 4992
Selected 499 pairs for evaluation (10% of total)


Processing input pairs: 100%|██████████| 499/499 [02:24<00:00,  3.46pair/s]

False Positives: 33
True Negatives: 466
False Positive Rate (FPR): 0.07
True Negative Rate (TNR): 0.93





exclude pairs from same folder in FPR rate
make number of testing pairs to be selectable

In [8]:
dataset_dir = 'synthpar/ST3'

In [9]:
FNR(dataset_dir)

364 total input pairs from 13 IDs.


Processing input pairs:   0%|          | 0/364 [00:00<?, ?pair/s]

Processing input pairs: 100%|██████████| 364/364 [01:51<00:00,  3.26pair/s]


Mean FNR across all IDs in group synthpar/ST3: 55.49%


{'mean_FNR': 0.554945054945055}

In [10]:
FPR(dataset_dir, percentage=10)

Total number of input pairs: 4992
Selected 499 pairs for evaluation (10% of total)


Processing input pairs: 100%|██████████| 499/499 [02:24<00:00,  3.45pair/s]

False Positives: 44
True Negatives: 455
False Positive Rate (FPR): 0.09
True Negative Rate (TNR): 0.91





In [11]:
dataset_dir = 'synthpar/ST4'

In [12]:
FNR(dataset_dir)

364 total input pairs from 13 IDs.


Processing input pairs: 100%|██████████| 364/364 [01:51<00:00,  3.26pair/s]


Mean FNR across all IDs in group synthpar/ST4: 56.32%


{'mean_FNR': 0.5631868131868132}

In [13]:
FPR(dataset_dir, percentage=10)

Total number of input pairs: 4992
Selected 499 pairs for evaluation (10% of total)


Processing input pairs: 100%|██████████| 499/499 [02:23<00:00,  3.48pair/s]

False Positives: 44
True Negatives: 455
False Positive Rate (FPR): 0.09
True Negative Rate (TNR): 0.91





In [14]:
dataset_dir = 'synthpar/ST5'

In [15]:
FNR(dataset_dir)

364 total input pairs from 13 IDs.


Processing input pairs: 100%|██████████| 364/364 [01:50<00:00,  3.30pair/s]


Mean FNR across all IDs in group synthpar/ST5: 54.40%


{'mean_FNR': 0.543956043956044}

In [16]:
FPR(dataset_dir, percentage=10)

Total number of input pairs: 4992
Selected 499 pairs for evaluation (10% of total)


Processing input pairs: 100%|██████████| 499/499 [02:22<00:00,  3.50pair/s]

False Positives: 29
True Negatives: 470
False Positive Rate (FPR): 0.06
True Negative Rate (TNR): 0.94





In [17]:
dataset_dir = 'synthpar/ST6'

In [18]:
FNR(dataset_dir)

364 total input pairs from 13 IDs.


Processing input pairs: 100%|██████████| 364/364 [01:53<00:00,  3.20pair/s]


Mean FNR across all IDs in group synthpar/ST6: 46.98%


{'mean_FNR': 0.4697802197802198}

In [19]:
FPR(dataset_dir, percentage=10)

Total number of input pairs: 4992
Selected 499 pairs for evaluation (10% of total)


Processing input pairs: 100%|██████████| 499/499 [02:23<00:00,  3.49pair/s]

False Positives: 39
True Negatives: 460
False Positive Rate (FPR): 0.08
True Negative Rate (TNR): 0.92





In [20]:
dataset_dir = 'synthpar/ST7'

In [21]:
FNR(dataset_dir)

364 total input pairs from 13 IDs.


Processing input pairs: 100%|██████████| 364/364 [01:53<00:00,  3.22pair/s]


Mean FNR across all IDs in group synthpar/ST7: 57.97%


{'mean_FNR': 0.5796703296703296}

In [22]:
FPR(dataset_dir, percentage=10)

Total number of input pairs: 4992
Selected 499 pairs for evaluation (10% of total)


Processing input pairs: 100%|██████████| 499/499 [02:20<00:00,  3.54pair/s]

False Positives: 44
True Negatives: 455
False Positive Rate (FPR): 0.09
True Negative Rate (TNR): 0.91





In [23]:
dataset_dir = 'synthpar/ST8'

In [24]:
FNR(dataset_dir)

364 total input pairs from 13 IDs.


Processing input pairs: 100%|██████████| 364/364 [01:48<00:00,  3.36pair/s]


Mean FNR across all IDs in group synthpar/ST8: 43.68%


{'mean_FNR': 0.4368131868131868}

In [25]:
FPR(dataset_dir, percentage=10)

Total number of input pairs: 4992
Selected 499 pairs for evaluation (10% of total)


Processing input pairs: 100%|██████████| 499/499 [02:19<00:00,  3.58pair/s]

False Positives: 26
True Negatives: 473
False Positive Rate (FPR): 0.05
True Negative Rate (TNR): 0.95



