## Results metrics

A threshold "T" is used to classify a test case to be either a correct (true or positive) case or false (negative) case. If the case is below a threshold "T" then it is classified false (negative) and if it is above threshold "T" it is classified true (positive).

**FPIR**: is the false positive identification rate. It is the ratio of the test cases that are classified as true cases although they are false cases. It is type I error. 
FPIR= Number of test cases classified above threshold "T" (true) / Number of all test cases

**FNIR**: is the false negative identification rate. It is  the ratio of the test cases that are classified as false cases although they are true cases. It is type II error.
FNIR= Number of test cases classified below threshold "T" (false) / Number of all test cases

In [1]:
# Load libraries
import numpy as np
import cv2
from matplotlib import pyplot as plt
# import torch
import seaborn as sns
import pandas as pd
sns.set(style="white")
%matplotlib inline

In [2]:
# load utils
import sys
sys.path.insert(0, '../utils')
from MagFace_utils.MagFace_funcs import *


## MagFace Results

**False Negatives**
FNIR represent the proportion of feature vectors that belong to a certain class but are incorrectly classified as not belonging to that class.

##### Load data - children

In [3]:
# Children feature vectors
with open('../data/feat_children.list', 'r') as f:
    lines = f.readlines()


In [47]:
img_2_feats = {}
img_2_mag = {}
# Convert to dictionary as adaface - is done in img_2_feats
for line in lines:
    parts = line.strip().split(' ')
    imgname = parts[0]
    imgname = "/"+"/".join(imgname.split("/")[4:])
    feats = [float(e) for e in parts[1:]]
    mag = np.linalg.norm(feats)
    img_2_feats[imgname] = feats/mag
    img_2_mag[imgname] = mag #magnitude of the feature vector

In [48]:
imgnames = list(img_2_mag.keys())
mags = [img_2_mag[imgname] for imgname in imgnames]
sort_idx = np.argsort(mags) #sorts the magnitude/quality of the images

In [49]:
feats = np.array([img_2_feats[imgnames[ele]] for ele in range(len(lines))]) #unsorted image quality
ids = np.array([imgnames[ele] for ele in range(len(lines))])

sim_mat = np.dot(feats, feats.T)

In [50]:
# # Example usage:
# ids = convert_unique_ids(ids)
# factors_c, unique_ids = factorize_ids(ids)
# print("Factorized list:", factors_c[:10])
# print("Unique IDs mapping:", unique_ids)

In [51]:
# im_ids = np.array(factors_c)
# sim_scores = sim_mat


In [52]:
# sim_scores_nan = np.where(sim_scores == 1, np.nan, sim_scores)

# max_values_less_than_1 = np.nanmax(sim_scores_nan, axis=1)
# max_values_less_than_1

In [53]:
## Data check

sim_scores_c = sim_mat.copy()
# E.g. classify all identities as positive if threshold is higher than 90% of all similarity scores
sims_excluding_probe = sim_scores_c[sim_scores_c < 0.9999] # OBs check if more scientific way
# Check that length of similarity scores are equal to len(sim_scores_c.flatten)- len(sim_scores)
print("Length of similarity scores without probe: ", len(sims_excluding_probe), "Equal to: ", len(sim_scores_c.flatten()) - len(sim_scores_c),
      "is",len(sims_excluding_probe) == (len(sim_scores_c.flatten()) - len(sim_scores_c)))


Length of similarity scores without probe:  10722350 Equal to:  10722350 is True


In [54]:
# sim_scores
# sims_excluding_probe = sim_scores[sim_scores < 0.9999] # OBs check if more scientific way
# sims_excluding_probe


In [55]:
# save similarity scores as a txt file

np.save('sims_excluding_probe.npy', sims_excluding_probe)

In [56]:
# # Convert tensor to NumPy array
plot_sims = sim_scores_c[sim_scores_c < 0.9999].reshape(-1) # filter out values equal to 1 #OBS check more scietific way
# # Plot histogram
# plt.hist(plot_sims, bins=len(sim_scores), color='blue',alpha=0.7)
# plt.title('Histogram of similarity scores - MagFace Children')
# plt.xlabel('Value')
# plt.ylabel('Frequency')
# plt.grid(True)
# plt.show()

# print("Average similarity score: ", np.mean(plot_sims))
# print("\nTop 10% similarity score: ", np.percentile(plot_sims, 97.5))
# print("\nTop 10 highest similarity score:", np.sort(plot_sims)[-10:])
# print("\nMax sim scores: ", np.max(plot_sims))

# Load children data canonical

In [14]:
df_c_can = pd.read_csv("../data/OFIQ_results/canonical_children.csv", sep=";")

In [15]:
# Get canonical ids and respective feature vectors
imgnames_can = [imgnames[ele] for ele in range(len(lines)) if imgnames[ele].split("/")[-1] in np.array(df_c_can.Filename)]
feats_can = np.array([img_2_feats[imgnames_can[ele]] for ele in range(len(imgnames_can))]) #unsorted image quality
sim_mat_can = np.dot(feats_can, feats_can.T)

In [16]:
ids_can = convert_unique_ids(imgnames_can)
factors_can, unique_ids_can = factorize_ids(imgnames_can)
im_ids_can = np.array(factors_can)
print("Factorized list:", factors_can[:10])
print("Unique IDs mapping:", unique_ids_can)
print("Image IDs mapping:", im_ids_can)

Factorized list: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Unique IDs mapping: {'/data/raw_full/children/children/Indian_682/Indian_682_32.png': 0, '/data/raw_full/children/children/Asian_504/Asian_504_86.png': 1, '/data/raw_full/children/children/Asian_504/Asian_504_89.png': 2, '/data/raw_full/children/children/Asian_m.01w1069/m.01w1069_0002.jpg': 3, '/data/raw_full/children/children/Asian_185/Asian_185_18.png': 4, '/data/raw_full/children/children/African_627/African_627_0.png': 5, '/data/raw_full/children/children/African_433/African_433_3.png': 6, '/data/raw_full/children/children/Caucasian_m.03vq05/m.03vq05_0003.jpg': 7, '/data/raw_full/children/children/African_265/African_265_15.png': 8, '/data/raw_full/children/children/Asian_59/Asian_59_0.png': 9, '/data/raw_full/children/children/African_415/African_415_0.png': 10, '/data/raw_full/children/children/African_274/African_274_4.png': 11, '/data/raw_full/children/children/Indian_458/Indian_458_0.png': 12, '/data/raw_full/children/children/Af

In [17]:
## Data check
sim_score_can = sim_mat_can.copy()
# E.g. classify all identities as positive if threshold is higher than 90% of all similarity scores
sims_excluding_probe_can = sim_score_can[sim_score_can < 0.9999] # OBs check if more scientific way
# Check that length of similarity scores are equal to len(sim_scores.flatten)- len(sim_scores)
print("Length of similarity scores without probe: ", len(sims_excluding_probe_can), "Equal to: ", len(sim_score_can.flatten()) - len(sim_score_can),
      "is",len(sims_excluding_probe_can) == (len(sim_score_can.flatten()) - len(sim_score_can)))


Length of similarity scores without probe:  2561600 Equal to:  2561600 is True


In [18]:
# # Convert tensor to NumPy array
# plot_sims_can = sim_score_can[sim_score_can < 0.9999].reshape(-1) # filter out values equal to 1 #OBS check more scietific way
# # Plot histogram
# plt.hist(plot_sims, bins=len(sim_score_can), color='blue',alpha=0.7)
# plt.title('Histogram of similarity scores - MagFace Canonical Children')
# plt.xlabel('Value')
# plt.ylabel('Frequency')
# plt.grid(True)
# plt.show()

# print("Average similarity score: ", np.mean(plot_sims_can))
# print("\nTop 10% similarity score: ", np.percentile(plot_sims_can, 97.5))
# print("\nTop 10 highest similarity score:", np.sort(plot_sims_can)[-10:])
# print("\nMax sim scores: ", np.max(plot_sims_can))

# Load data - adults


In [19]:
# Children feature vectors
# with open('../data/feat_adults.list', 'r') as f:
#     lines = f.readlines()

with open('../data/feat_adults_new.list', 'r') as f:
    lines = f.readlines()


In [20]:
img_2_feats = {}
img_2_mag = {}
# Convert to dictionary as adaface - is done in img_2_feats
for line in lines:
    parts = line.strip().split(' ')
    imgname = parts[0]
    imgname = "/"+"/".join(imgname.split("/")[4:])
    feats = [float(e) for e in parts[1:]]
    mag = np.linalg.norm(feats)
    img_2_feats[imgname] = feats/mag
    img_2_mag[imgname] = mag #magnitude of the feature vector

In [21]:
imgnames = list(img_2_mag.keys())
mags = [img_2_mag[imgname] for imgname in imgnames]
sort_idx = np.argsort(mags) #sorts the magnitude/quality of the images

In [22]:
feats_a = np.array([img_2_feats[imgnames[ele]] for ele in range(len(lines))]) #unsorted image quality
ids_a = np.array([imgnames[ele] for ele in range(len(lines))])

sim_mat_a = np.dot(feats_a, feats_a.T)

In [23]:
# convert ids
ids_a = convert_unique_ids(ids_a)
factors_a, unique_ids = factorize_ids(ids)
print("Factorized list:", factors_a[:10])
print("Unique IDs mapping:", unique_ids)

Factorized list: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Unique IDs mapping: {'/data/raw_full/children/children/Indian_682/Indian_682_8.png': 0, '/data/raw_full/children/children/Indian_682/Indian_682_34.png': 1, '/data/raw_full/children/children/Indian_682/Indian_682_32.png': 2, '/data/raw_full/children/children/Asian_119/Asian_119_39.png': 3, '/data/raw_full/children/children/Asian_504/Asian_504_86.png': 4, '/data/raw_full/children/children/Asian_504/Asian_504_89.png': 5, '/data/raw_full/children/children/Caucasian_73/Caucasian_73_17.png': 6, '/data/raw_full/children/children/Caucasian_249/Caucasian_249_27.png': 7, '/data/raw_full/children/children/Caucasian_249/Caucasian_249_0.png': 8, '/data/raw_full/children/children/Asian_m.01w1069/m.01w1069_0002.jpg': 9, '/data/raw_full/children/children/Caucasian_363/Caucasian_363_20.png': 10, '/data/raw_full/children/children/Caucasian_363/Caucasian_363_15.png': 11, '/data/raw_full/children/children/African_m.05n_fkv/m.05n_fkv_0001.jpg': 12, '/data/raw

In [24]:
## Data check
sim_scores_a = sim_mat_a.copy()
# E.g. classify all identities as positive if threshold is higher than 90% of all similarity scores
sims_excluding_probe = sim_scores_a[sim_scores_a < 0.99] # OBs check if more scientific way
# Check that length of similarity scores are equal to len(sim_scores.flatten)- len(sim_scores)
print("Length of similarity scores without probe: ", len(sims_excluding_probe), "Equal to: ", len(sim_scores_a.flatten()) - len(sim_scores_a),
      "is",len(sims_excluding_probe) == (len(sim_scores_a.flatten()) - len(sim_scores_a)))


Length of similarity scores without probe:  10722350 Equal to:  10722350 is True


In [25]:
# # Analyze similarity scores in terms of threshold...
# # For the first 10 similarity scores - what is the distribution of the scores?
plot_sims_a = sim_scores_a[sim_scores_a < 0.999].reshape(-1) # filter out values equal to 1 #OBS check more scietific way
# # Plot histogram
# plt.hist(plot_sims_a, bins=len(sim_scores_a), color='red', alpha=0.5)
# plt.title('Histogram of similarity scores - MagFace Adults')
# plt.xlabel('Value')
# plt.ylabel('Frequency')
# plt.grid(True)
# plt.show()

# print("Average similarity score: ", np.mean(plot_sims_a))
# print("\nTop 10% similarity score: ", np.percentile(plot_sims_a, 97.5))
# print("\nTop 10 highest similarity score:", np.sort(plot_sims_a)[-10:])
# print("\nMax sim scores: ", np.max(plot_sims_a))

# Load data children - canonical

### Convert stats
+ find threshold

In [26]:
# import numpy as np
# import matplotlib.pyplot as plt

# # Define quantiles
# quantiles = [90, 95, 97.5, 99]

# # Calculate threshold values for each group and each quantile
# thresholds_1 = [np.percentile(plot_sims, q) for q in quantiles]
# thresholds_2 = [np.percentile(plot_sims_a, q) for q in quantiles]

# # Plot histograms for both groups
# plt.hist(plot_sims, bins=len(sim_scores), color='blue', alpha=0.7, label='Group 1')
# plt.hist(plot_sims_a, bins=len(sim_scores_a), color='orange', alpha=0.7, label='Group 2')
# plt.title('Histogram of similarity scores')
# plt.xlabel('Similarity score max value')
# plt.ylabel('Frequency')

# # Plot threshold values for group 1
# for threshold in thresholds_1:
#     plt.axvline(x=threshold, color='blue', linestyle='--', label=f'Group 1 Threshold: {threshold:.2f}')

# # Plot threshold values for group 2
# for threshold in thresholds_2:
#     plt.axvline(x=threshold, color='orange', linestyle='--', label=f'Group 2 Threshold: {threshold:.2f}')

# plt.legend()
# plt.grid(True)
# plt.show()


Observation: a threshold at 0.48 can be set. That's approximately the 99th quantile for each. 

In [27]:
# How many ids above threshold? only approximatly 1% of the data? Does that approximately corresponds to how many mated samples there are?

# ((np.sum(sim_scores > 0.48)-3306) / len(sim_scores.flatten()))*100

In [28]:
# (np.sum(sim_scores_a > 0.48) / len(sim_scores_a.flatten()))*100

### FNIR and FPIR - functions

$$
\operatorname{FNIR}(N, R, T)=\frac{\left|\left\{i \in M_D \mid\left(\operatorname{rank}_i\left(m_i\right)>R\right) \operatorname{or}\left(\operatorname{score}_i\left(m_i\right) \leq T\right)\right\}\right|}{\left|M_D\right|}
$$
where:
$M_D \quad$ is the set of mated identification transactions with reference database $R$;
$m_i \quad$ is the mated reference for transaction $i$;
$\operatorname{rank}_i()$ gives the candidate rank of a reference in identification transaction $i$; and
$\operatorname{score}_i()$ gives the candidate score of a reference in identification transaction $i$.




$$
\operatorname{FPIR}(N, T)=\frac{\mid\left\{i \in U_D \mid \text { score }_i\left(t_i\right)>T\right\} \mid}{\left|U_D\right|}
$$
where:
$U_D \quad$ is the set of non-mated identification transactions with reference database $R$;
$t_i \quad$ is the top-ranked reference identifier in identification transaction $i$; and
score $_i()$ gives the candidate score of a reference in identification transaction $i$.

###

Der er noget her. vi skal retunere alle de nonmated id's sammenlignet med alle 

ikke alle similarity scores 

## Final results - FNIR and FPIR

# TODO der er noget her med dataframes

In [29]:
# Mated and non-mated ids
# a_df = pd.read_csv('../data/adults_balanced.csv')
# a_mates = a_df.groupby("img_name").agg({'img_org_name': ['count']})
# a_mated_ids = a_mates[a_mates[('img_org_name', 'count')] > 1].index
# a_nonmated_ids = a_mates[a_mates[('img_org_name', 'count')] == 1].index

a_df = pd.read_csv('../data/adults_balanced.csv')
a_mates = a_df.groupby("im_id").agg({'im_id': ['count']})
a_mated_ids = a_mates[a_mates[('im_id', 'count')] > 1].index
a_nonmated_ids = a_mates[a_mates[('im_id', 'count')] == 1].index



# # Mated and non-mated ids
# c_df = pd.read_csv('../data/child_balanced.csv')
# c_mates = c_df.groupby("im_id").agg({'im_id': ['count']})
# c_mated_ids = c_mates[c_mates[('im_id', 'count')] > 1].index
# c_nonmated_ids = c_mates[c_mates[('im_id', 'count')] == 1].index


# # Mated and non-mated ids
# a_df = pd.read_csv('../data/adults_balanced.csv')
# a_mates = a_df.groupby("img_name").agg({'img_org_name': ['count']})
# a_mated_ids = a_mates[a_mates[('img_org_name', 'count')] > 1].index
# a_nonmated_ids = a_mates[a_mates[('img_org_name', 'count')] == 1].index

# Mated and non-mated ids
c_df = pd.read_csv('../data/child_balanced.csv')
c_mates = c_df.groupby("im_id").agg({'im_id': ['count']})
c_mated_ids = c_mates[c_mates[('im_id', 'count')] > 1].index
c_nonmated_ids = c_mates[c_mates[('im_id', 'count')] == 1].index


# can_df is the subset of the c_df with corresponds to the canonical ids TODO
# Mated and non-mated ids
# can_df = pd.read_csv('../data/can_balanced.csv')
can_mates = can_df.groupby("im_id").agg({'im_id': ['count']})
can_mated_ids = can_mates[can_mates[('im_id', 'count')] > 1].index
can_nonmated_ids = can_mates[can_mates[('im_id', 'count')] == 1].index

#### children

In [30]:
# children

ids = convert_unique_ids(ids)
factors_c, unique_ids = factorize_ids(ids)


## Threshold set based on studying the similarity scores
thold_c = np.percentile(plot_sims, 99)
im_ids_c = np.array(factors_c)

mated_df = c_mated_ids
non_mated_df = c_nonmated_ids
sim_scores = sim_scores_c
im_ids = im_ids_c
ids = ids

fnir_c = compute_fnir(c_mated_ids, sim_scores, im_ids_c, ids, thold=thold_c)
fpir_c = compute_fpir(c_nonmated_ids, sim_scores, im_ids_c, ids, thold=thold_c)

print("FNIR for children: ", fnir_c)
print("FPIR for children: ", fpir_c)

FNIR for children:  (0.607940446650124, array([0.72422793, 0.67901897, 0.72422793, ..., 0.45288209, 0.66417667,
       0.66417667]))
FPIR for children:  (0.8572600492206727, array([0.29993762, 0.27719313, 0.22126157, ..., 0.05818179, 0.21945107,
       0.37661984]), array([[1.        , 0.72422793, 0.67901897, ..., 0.10887856, 0.16981499,
        0.1959246 ],
       [0.72422793, 1.        , 0.7677642 , ..., 0.16553825, 0.17820952,
        0.29758607],
       [0.67901897, 0.7677642 , 1.        , ..., 0.19140869, 0.08421105,
        0.19525943],
       ...,
       [0.10887856, 0.16553825, 0.19140869, ..., 1.        , 0.28783875,
        0.28300633],
       [0.16981499, 0.17820952, 0.08421105, ..., 0.28783875, 1.        ,
        0.66417667],
       [0.1959246 , 0.29758607, 0.19525943, ..., 0.28300633, 0.66417667,
        1.        ]]))


In [31]:
mated_sim_scores_child_final = compute_fnir(c_mated_ids, sim_scores, im_ids_c, ids, thold=thold_c)[1]
nonmated_sim_scores_child_final = compute_fpir(c_nonmated_ids, sim_scores, im_ids_c, ids, thold=thold_c)[1]

In [32]:
np.save('mated_sim_scores_child_final.npy', mated_sim_scores_child_final)
np.save('nonmated_sim_scores_child_final.npy', nonmated_sim_scores_child_final)


### adults

In [33]:
# adults
## Threshold set based on studying the similarity scores
thold_a = np.percentile(plot_sims_a, 99)
im_ids_a = np.array(factors_a)

mated_df = a_mated_ids
non_mated_df = a_nonmated_ids
sim_scores = sim_scores_a
im_ids = im_ids_a
ids = ids_a

fnir_a = compute_fnir(mated_df, sim_scores_a, im_ids, ids, thold=thold_a)
fpir_a = compute_fpir(non_mated_df, sim_scores_a, im_ids, ids, thold=thold_a)


In [34]:
print("FNIR for adults: ", fnir_a[0])
print("FPIR for adults: ", fpir_a[0])


FNIR for adults:  0.0
FPIR for adults:  0.7558756633813495


In [35]:
mated_sim_scores_adult_final = fnir_a[1]
nonmated_sim_scores_adult_final = fpir_a[1]

In [36]:
np.save('mated_sim_scores_adult_final.npy', mated_sim_scores_adult_final)
np.save('nonmated_sim_scores_adult_final.npy', nonmated_sim_scores_adult_final)

# TODO herunder 

##### FNIR

In [37]:
# # FNIR = FN / (TP + FN)
# fnir_children = np.sum(fns_c) / (np.sum(tps_c) + np.sum(fns_c))
# print("FNIR children result: ", fnir_children)

# fnir_adults = np.sum(fns_a) / (np.sum(tps_a) + np.sum(fns_a))
# print("FNIR children result: ", fnir_adults)

##### FPIR

In [38]:
# # FPIR = FP / (FP + TN)
# fpir_children = np.sum(fps_c) / (np.sum(fps_c) + np.sum(tns_c))
# print("FPIR children result: ", fpir_children)

# fpir_adults = np.sum(fps_a) / (np.sum(fps_a) + np.sum(tns_a))
# print("FPIR adults result: ", fpir_adults)

#### GARBE

Low value of alpha to put more weight on FND - rather capture more children than less

$$
\begin{aligned}
& F P D(\tau)=\left(\frac{n}{n-1}\right) \frac{\sum_i \sum_j\left|F P I R_{d_i}-F P I R_{d_j}\right|}{2 n^2 \overline{F P I R}} \forall d_i, d_j \in D \\
& F N D(\tau)=\left(\frac{n}{n-1}\right) \frac{\sum_i \sum_j\left|F N I R_{d_i}-F N I R_{d_j}\right|}{2 n^2 \overline{F N I R}} \forall d_i, d_j \in D
\end{aligned}
$$

$$
\operatorname{GARBE}(\tau)=\alpha F P D(\tau)+(1-\alpha) F N D(\tau)
$$


##### REMEMBER:
FND and FPD
    Clarification: n is the number of identification attempts not number of comparisons. 

In [39]:
thold_a

0.4825735789771279

In [40]:
thold_c

0.4752905018186878

In [41]:
# Total number of transactions = n child ids + n adult ids
n_child_ids = len(im_ids_c)
n_adult_ids = len(im_ids_a)
n = n_child_ids + n_adult_ids

In [42]:
# FPD = fpir_c/fpir_a
# print("FPD result: ", FPD)


# FND = fnir_c/fnir_a
# print("FND result: ", FND)



FPD = fpir_c[0]/fpir_a[0]
print("FPD result: ", FPD)


FND = fnir_c[0]/(fnir_a[0]+0.00000001)
print("FND result: ", FND)


FPD result:  1.1341283900923442
FND result:  60794044.665012404


# GARBE

In [43]:
alpha_val = 0.2
GARBE = alpha_val * FPD + (1 - alpha_val) * FND
print("GARBE result MagFace: ", GARBE)

GARBE result MagFace:  48635235.9588356
