# Face detection and recognition inference pipeline

The following example illustrates how to use the `facenet_pytorch` python package to perform face detection and recogition on an image dataset using an Inception Resnet V1 pretrained on the VGGFace2 dataset.

The following Pytorch methods are included:
* Datasets
* Dataloaders
* GPU/CPU processing

In [None]:
from models.mtcnn import MTCNN
from models.inception_resnet_v1 import InceptionResnetV1

import torch
from torch.utils.data import DataLoader
from torchvision import datasets
import numpy as np
import pandas as pd
import os
from tqdm import tqdm

workers = 0 if os.name == 'nt' else 4

#### Determine if an nvidia GPU is available

In [None]:
device = torch.device('cuda:4' if torch.cuda.is_available() else 'cpu')
print('Running on device: {}'.format(device))

#### Define MTCNN module

Default params shown for illustration, but not needed. Note that, since MTCNN is a collection of neural nets and other code, the device must be passed in the following way to enable copying of objects when needed internally.

See `help(MTCNN)` for more details.

In [None]:
mtcnn = MTCNN(
    image_size=160, margin=0, min_face_size=20,
    thresholds=[0.6, 0.7, 0.7], factor=0.709, post_process=True,
    device=device
)

#### Define Inception Resnet V1 module

Set classify=True for pretrained classifier. For this example, we will use the model to output embeddings/CNN features. Note that for inference, it is important to set the model to `eval` mode.

See `help(InceptionResnetV1)` for more details.

In [None]:
resnet = InceptionResnetV1(pretrained='vggface2').eval().to(device)

#### Define a dataset and data loader

We add the `idx_to_class` attribute to the dataset to enable easy recoding of label indices to identity names later one.

In [15]:
def collate_fn(x):
    return x[0]
dataset_dir = 'data/lfw_balanced_sample'
dataset = datasets.ImageFolder(dataset_dir)
dataset.idx_to_class = {i:c for c, i in dataset.class_to_idx.items()}
loader = DataLoader(dataset, collate_fn=collate_fn, num_workers=workers)

#### Perfom MTCNN facial detection

Iterate through the DataLoader object and detect faces and associated detection probabilities for each. The `MTCNN` forward method returns images cropped to the detected face, if a face was detected. By default only a single detected face is returned - to have `MTCNN` return all detected faces, set `keep_all=True` when creating the MTCNN object above.

To obtain bounding boxes rather than cropped face images, you can instead call the lower-level `mtcnn.detect()` function. See `help(mtcnn.detect)` for details.

In [None]:
from PIL import Image
import torchvision.transforms as T
import matplotlib.pyplot as plt

In [None]:
aligned = []
names = []
for x, y in tqdm(loader, desc="Detecting faces"):
    x_aligned, prob = mtcnn(x, return_prob=True)
    if x_aligned is not None:
        # print('Face detected with probability: {:8f}'.format(prob))
        aligned.append(x_aligned)
        names.append(dataset.idx_to_class[y])

# for img in aligned:
#     img = (img + 1) / 2
#     img = np.transpose(img, (1,2,0))
#     plt.axis("off")
#     plt.imshow(img)
#     plt.show()


#### Calculate image embeddings

MTCNN will return images of faces all the same size, enabling easy batch processing with the Resnet recognition module. Here, since we only have a few images, we build a single batch and perform inference on it. 

For real datasets, code should be modified to control batch sizes being passed to the Resnet, particularly if being processed on a GPU. For repeated testing, it is best to separate face detection (using MTCNN) from embedding or classification (using InceptionResnetV1), as calculation of cropped faces or bounding boxes can then be performed a single time and detected faces saved for future use.

In [23]:
aligned = torch.stack(aligned).to(device)
embeddings = resnet(aligned).detach().cpu()
print(embeddings.shape)

torch.Size([1268, 512])


#### Print distance matrix for classes

In [None]:
# dists = [[(e1 - e2).norm().item() for e2 in embeddings] for e1 in embeddings]
# print(pd.DataFrame(dists, columns=names, index=names))

#### Group images by identity

In [21]:
print(names[:20])

['Abdoulaye_Wade', 'Abdoulaye_Wade', 'Abdoulaye_Wade', 'Abdoulaye_Wade', 'Adam_Scott', 'Adam_Scott', 'Adrian_McPherson', 'Adrian_McPherson', 'Ai_Sugiyama', 'Ai_Sugiyama', 'Ai_Sugiyama', 'Ai_Sugiyama', 'Ai_Sugiyama', 'Al_Davis', 'Al_Davis', 'Al_Gore', 'Al_Gore', 'Al_Gore', 'Al_Gore', 'Al_Gore']


In [None]:
# index = 0
# for _, (name, emb) in enumerate(zip(names, embeddings)):
#     count = 0
#     while names[index] == name:
#         count += 1
#         index += 1
    
#     print(count)


In [20]:
subdir_file_counts = []

for subdir_name in sorted(os.listdir(dataset_dir)):
    subdir_path = os.path.join(dataset_dir, subdir_name)
    if os.path.isdir(subdir_path):
        file_count = len(os.listdir(subdir_path))
        # print(subdir_path, file_count)
        subdir_file_counts.append(file_count)
# print(subdir_file_counts)

[4, 2, 2, 5, 2, 8, 7, 5, 2, 2, 4, 3, 2, 2, 2, 5, 6, 3, 4, 2, 5, 2, 3, 4, 2, 5, 2, 2, 6, 3, 2, 5, 2, 5, 2, 13, 9, 2, 5, 2, 2, 3, 4, 5, 2, 3, 3, 2, 2, 4, 2, 4, 2, 2, 7, 5, 2, 2, 121, 2, 2, 3, 3, 2, 2, 2, 2, 6, 3, 2, 5, 2, 2, 2, 2, 44, 2, 2, 2, 2, 2, 3, 2, 3, 9, 2, 2, 9, 12, 15, 9, 2, 3, 2, 2, 2, 5, 2, 4, 2, 55, 2, 2, 9, 2, 2, 2, 2, 11, 2, 17, 2, 7, 2, 2, 2, 4, 2, 2, 60, 2, 2, 2, 2, 2, 2, 3, 3, 32, 5, 3, 4, 2, 2, 5, 2, 2, 3, 2, 8, 2, 3, 2, 2, 13, 2, 3, 3, 2, 2, 3, 8, 4, 2, 2, 8, 5, 4, 3, 4, 2, 2, 2, 2, 4, 2, 4, 3, 4, 2, 11, 19, 3, 4, 2, 5, 10, 2, 2, 3, 2, 8, 2, 3, 3, 2, 3, 2, 2, 2, 4, 3, 11, 4, 2, 8, 2, 2, 7, 2, 2, 2, 5, 2, 2, 4, 23, 2, 2, 5, 52, 11, 2, 2, 2, 4, 2, 2, 11, 2, 5, 3, 3, 6, 2, 2, 2, 9, 2, 2, 17, 2, 4, 7, 3, 2, 2, 13, 7, 4]


In [55]:
mean_embeddings = torch.tensor([]).reshape(0, embeddings.shape[1])

emb_copy = embeddings.clone()
# print(emb_copy.shape)
img_counts = subdir_file_counts.copy()
while emb_copy.shape[0] > 0:
    img_count = img_counts[0]
    mean_emb = torch.mean(emb_copy[:img_count], axis=0)
    mean_emb = torch.unsqueeze(mean_emb, dim=0)
    mean_embeddings = torch.concatenate((mean_embeddings, mean_emb), axis=0)
    emb_copy = emb_copy[img_count:]
    img_counts = img_counts[1:]
print(mean_embeddings.shape)

torch.Size([240, 512])


In [57]:
names_iden = list(set(names))

In [58]:
dists_iden = [[(e1 - e2).norm().item() for e2 in mean_embeddings] for e1 in mean_embeddings]
print(pd.DataFrame(dists_iden, columns=names_iden, index=names_iden))

                   Heizo_Takenaka  Chris_Tucker  Junichiro_Koizumi  \
Heizo_Takenaka           0.000000      1.293396           1.146032   
Chris_Tucker             1.293396      0.000000           1.290224   
Junichiro_Koizumi        1.146032      1.290224           0.000000   
Evander_Holyfield        1.354377      0.949838           1.269642   
Daryl_Hannah             1.207926      1.291774           1.152207   
...                           ...           ...                ...   
Boris_Yeltsin            1.167321      1.331528           1.228630   
Saburo_Kawabuchi         1.354946      1.152677           1.288538   
Michelle_Kwan            1.321478      1.232850           1.228452   
Antony_Leung             1.203000      1.407689           1.278030   
Michael_Jordan           1.106452      1.259932           1.101412   

                   Evander_Holyfield  Daryl_Hannah  James_Wolfensohn  \
Heizo_Takenaka              1.354377      1.207926          1.279640   
Chris_Tucker   