# Face detection and recognition inference pipeline

The following example illustrates how to use the `facenet_pytorch` python package to perform face detection and recogition on an image dataset using an Inception Resnet V1 pretrained on the VGGFace2 dataset.

The following Pytorch methods are included:
* Datasets
* Dataloaders
* GPU/CPU processing

In [1]:
from models.mtcnn import MTCNN, fixed_image_standardization
#from models.inception_resnet_v1 import  InceptionResnetV1
from models.utils import training
import torch
from torch.utils.data import DataLoader
from torchvision import datasets
import numpy as np
import pandas as pd
import os

workers = 0 if os.name == 'nt' else 4

#### Determine if an nvidia GPU is available

In [2]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print('Running on device: {}'.format(device))

Running on device: cuda:0


#### Define MTCNN module

Default params shown for illustration, but not needed. Note that, since MTCNN is a collection of neural nets and other code, the device must be passed in the following way to enable copying of objects when needed internally.

See `help(MTCNN)` for more details.

In [3]:
mtcnn = MTCNN(
    image_size=160, margin=0, min_face_size=20,
    thresholds=[0.6, 0.7, 0.7], factor=0.709, post_process=True,
    device=device
)

#### Define Inception Resnet V1 module

Set classify=True for pretrained classifier. For this example, we will use the model to output embeddings/CNN features. Note that for inference, it is important to set the model to `eval` mode.

See `help(InceptionResnetV1)` for more details.

In [4]:
#resnet = InceptionResnetV1(pretrained='vggface2').eval().to(device)
model_path = './data/model_with_mask.pt'
resnet = torch.load(model_path)

#### Define a dataset and data loader

We add the `idx_to_class` attribute to the dataset to enable easy recoding of label indices to identity names later one.

In [5]:
def collate_fn(x):
    return x[0]


dataset = datasets.ImageFolder('./data/test_no_mask')
dataset.idx_to_class = {i:c for c, i in dataset.class_to_idx.items()}
loader = DataLoader(dataset, collate_fn=collate_fn, num_workers=workers)

#### Perfom MTCNN facial detection

Iterate through the DataLoader object and detect faces and associated detection probabilities for each. The `MTCNN` forward method returns images cropped to the detected face, if a face was detected. By default only a single detected face is returned - to have `MTCNN` return all detected faces, set `keep_all=True` when creating the MTCNN object above.

To obtain bounding boxes rather than cropped face images, you can instead call the lower-level `mtcnn.detect()` function. See `help(mtcnn.detect)` for details.

In [6]:
aligned = []
names = []
for i, (x, y) in enumerate(loader):
    x_aligned, prob = mtcnn(x, return_prob=True)
    if x_aligned is not None:
        #print('Face detected with probability: {:8f}'.format(prob))
        aligned.append(x_aligned)
        names.append(dataset.idx_to_class[y])
        if i==63:
            break
aligned = torch.stack(aligned).to(device)            

  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


In [7]:
dataset = datasets.ImageFolder('./data/test_anchors')
dataset.idx_to_class = {i:c for c, i in dataset.class_to_idx.items()}
loader = DataLoader(dataset, collate_fn=collate_fn, num_workers=workers)
anchor_aligned = []
anchors = []
for x, y in loader:
    x_aligned, prob = mtcnn(x, return_prob=True)
    if x_aligned is not None:
        #print('Face detected with probability: {:8f}'.format(prob))
        anchor_aligned.append(x_aligned)
        anchors.append(dataset.idx_to_class[y])
anchor_aligned = torch.stack(anchor_aligned).to(device)

#### Calculate image embeddings

MTCNN will return images of faces all the same size, enabling easy batch processing with the Resnet recognition module. Here, since we only have a few images, we build a single batch and perform inference on it. 

For real datasets, code should be modified to control batch sizes being passed to the Resnet, particularly if being processed on a GPU. For repeated testing, it is best to separate face detection (using MTCNN) from embedding or classification (using InceptionResnetV1), as calculation of cropped faces or bounding boxes can then be performed a single time and detected faces saved for future use.

In [8]:
del mtcnn
model_path = './data/model_with_mask.pt'
resnet = torch.load(model_path)

In [9]:

embeddings = resnet(aligned).detach().cpu()
anchor_emb = resnet(anchor_aligned).detach().cpu()   

#### Print distance matrix for classes

In [11]:
dists = [[(e1 - e2).norm().item() for e2 in anchor_emb] for e1 in embeddings]
print(pd.DataFrame(dists, columns=anchors, index=names))

           n000001    n000009    n000029    n000040    n000078    n000082  \
n000001  24.010246  53.063042  58.832172  63.359829  71.862518  64.357109   
n000001  11.455333  48.455715  63.236080  64.244514  71.790527  63.548492   
n000001  13.006725  50.070484  61.098518  63.326969  70.779251  62.579308   
n000001  13.848736  55.900082  65.936813  67.159195  76.549568  67.890610   
n000001  16.399628  49.769009  67.959435  70.051048  69.023140  63.268959   
...            ...        ...        ...        ...        ...        ...   
n000001  18.057278  53.439285  67.853035  66.568947  69.214592  62.930725   
n000001  18.404099  51.451546  62.137474  63.991745  70.036270  64.997459   
n000001  20.582541  51.010044  57.569706  60.114887  63.813461  58.306004   
n000001  21.881485  57.605656  64.904343  64.309189  68.198601  64.356544   
n000001  17.307190  53.731899  69.430634  70.710274  71.534424  67.446396   

           n000106    n000129    n000129    n000148  ...    n000785  \
n000