# Face detection and recognition inference pipeline

The following example illustrates how to use the `facenet_pytorch` python package to perform face detection and recogition on an image dataset using an Inception Resnet V1 pretrained on the VGGFace2 dataset.

The following Pytorch methods are included:
* Datasets
* Dataloaders
* GPU/CPU processing

In [1]:
from facenet_pytorch import MTCNN, InceptionResnetV1
import torch
from torch.utils.data import DataLoader
from torchvision import datasets
import numpy as np
import pandas as pd
import os

workers = 0 if os.name == 'nt' else 4

#### Determine if an nvidia GPU is available

In [2]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print('Running on device: {}'.format(device))

Running on device: cuda:0


#### Define MTCNN module

Default params shown for illustration, but not needed. Note that, since MTCNN is a collection of neural nets and other code, the device must be passed in the following way to enable copying of objects when needed internally.

See `help(MTCNN)` for more details.

In [3]:
mtcnn = MTCNN(
    image_size=160, margin=0, min_face_size=20,
    thresholds=[0.6, 0.7, 0.7], factor=0.709, post_process=True,
    device=device
)

#### Define Inception Resnet V1 module

Set classify=True for pretrained classifier. For this example, we will use the model to output embeddings/CNN features. Note that for inference, it is important to set the model to `eval` mode.

See `help(InceptionResnetV1)` for more details.

In [4]:
resnet = InceptionResnetV1(pretrained='vggface2').eval().to(device)

#### Define a dataset and data loader

We add the `idx_to_class` attribute to the dataset to enable easy recoding of label indices to identity names later one.

In [6]:
def collate_fn(x):
    return x[0]

dataset = datasets.ImageFolder('../drama/')
dataset.idx_to_class = {i:c for c, i in dataset.class_to_idx.items()}
loader = DataLoader(dataset, collate_fn=collate_fn, num_workers=workers)

#### Perfom MTCNN facial detection

Iterate through the DataLoader object and detect faces and associated detection probabilities for each. The `MTCNN` forward method returns images cropped to the detected face, if a face was detected. By default only a single detected face is returned - to have `MTCNN` return all detected faces, set `keep_all=True` when creating the MTCNN object above.

To obtain bounding boxes rather than cropped face images, you can instead call the lower-level `mtcnn.detect()` function. See `help(mtcnn.detect)` for details.

In [16]:
aligned = []
names = []
drama_test_dir = "./drama-test/"
# for x, y in loader:
for i, (x, y) in enumerate(loader):    
    print(x, y)
    x_aligned, prob = mtcnn(x, save_path="{}/{}.jpg".format(drama_test_dir, i), return_prob=True)
    if x_aligned is not None:
        print('Face detected with probability: {:8f}'.format(prob))
        aligned.append(x_aligned)
        names.append(dataset.idx_to_class[y])

<PIL.Image.Image image mode=RGB size=1600x1090 at 0x7FB784FC2710> 0
Face detected with probability: 1.000000
<PIL.Image.Image image mode=RGB size=1600x1090 at 0x7FB784FC24E0> 0
Face detected with probability: 0.999999
<PIL.Image.Image image mode=RGB size=1600x1090 at 0x7FB784FC2400> 0
Face detected with probability: 0.999588
<PIL.Image.Image image mode=RGB size=1600x1090 at 0x7FB784FC23C8> 0
Face detected with probability: 0.999999
<PIL.Image.Image image mode=RGB size=1600x1090 at 0x7FB784FC26A0> 0
Face detected with probability: 0.999739
<PIL.Image.Image image mode=RGB size=1600x1090 at 0x7FB784FC24E0> 0
Face detected with probability: 0.982827
<PIL.Image.Image image mode=RGB size=1600x1090 at 0x7FB784FC25F8> 0
<PIL.Image.Image image mode=RGB size=1600x1090 at 0x7FB784FC2550> 0
Face detected with probability: 0.999991
<PIL.Image.Image image mode=RGB size=1600x1090 at 0x7FB784FC2748> 0
Face detected with probability: 0.999915
<PIL.Image.Image image mode=RGB size=1600x1090 at 0x7FB784FC

In [33]:
from torchvision import datasets, transforms
from facenet_pytorch import MTCNN, InceptionResnetV1, fixed_image_standardization
trans = transforms.Compose([
    np.float32,
    transforms.ToTensor(),
    fixed_image_standardization
])
dataset = datasets.ImageFolder(drama_test_dir, transform=trans)

In [34]:
data_loader = DataLoader(
    dataset,
    num_workers=workers,
    batch_size=4
)

tensor([[ 0.0280, -0.0488, -0.0042,  ..., -0.0166,  0.0195,  0.0817],
        [-0.0047, -0.0271, -0.0274,  ..., -0.0086, -0.0054,  0.0803],
        [ 0.0395, -0.0384, -0.0447,  ...,  0.0250,  0.0128,  0.0697],
        [ 0.0618, -0.0162, -0.0246,  ...,  0.0262,  0.0416,  0.0806]])
tensor([[ 0.0010, -0.0307, -0.0903,  ..., -0.0139,  0.0617,  0.0996],
        [ 0.0175, -0.0098, -0.0517,  ..., -0.0136,  0.0242,  0.1045],
        [ 0.0481, -0.0392, -0.0046,  ...,  0.0324,  0.0101,  0.0854],
        [ 0.0136, -0.0111, -0.1203,  ..., -0.0230,  0.0191,  0.0322]])
tensor([[ 3.9574e-02,  6.9587e-03, -4.8695e-02,  ..., -4.4574e-02,
          2.2265e-02,  4.1564e-03],
        [ 8.7712e-03,  4.3876e-05, -8.3381e-02,  ..., -1.0108e-02,
          2.4646e-02,  4.5991e-02],
        [ 1.6901e-02,  2.0650e-03, -7.7078e-02,  ..., -4.9287e-02,
          2.3515e-02,  2.6818e-02],
        [ 4.8946e-02,  1.0085e-02,  4.2821e-02,  ...,  4.1461e-03,
          4.1357e-02, -3.5165e-02]])
tensor([[ 0.0501, -0.0095

#### Calculate image embeddings

MTCNN will return images of faces all the same size, enabling easy batch processing with the Resnet recognition module. Here, since we only have a few images, we build a single batch and perform inference on it. 

For real datasets, code should be modified to control batch sizes being passed to the Resnet, particularly if being processed on a GPU. For repeated testing, it is best to separate face detection (using MTCNN) from embedding or classification (using InceptionResnetV1), as calculation of cropped faces or bounding boxes can then be performed a single time and detected faces saved for future use.

In [8]:
aligned = torch.stack(tuple(aligned)).to(device)

embeddings = resnet(aligned).detach().cpu()
embeddings

tensor([[-0.0064, -0.0262, -0.0182,  ..., -0.0567,  0.0154, -0.0006],
        [-0.0213, -0.0222, -0.0141,  ...,  0.0571,  0.0063, -0.0493],
        [ 0.0003,  0.0928, -0.0199,  ..., -0.0415,  0.0327,  0.0102],
        [-0.0151, -0.0361, -0.0691,  ..., -0.0010,  0.0526,  0.0031],
        [ 0.0160, -0.0785,  0.0068,  ..., -0.0013, -0.0796, -0.0432]])

In [9]:
embeddings.shape

torch.Size([5, 512])

#### Print distance matrix for classes

In [10]:
dists = [[(e1 - e2).norm().item() for e2 in embeddings] for e1 in embeddings]
print(pd.DataFrame(dists, columns=names, index=names))

                angelina_jolie  bradley_cooper  kate_siegel  paul_rudd  \
angelina_jolie        0.000000        1.447480     0.887728   1.429847   
bradley_cooper        1.447480        0.000000     1.313749   1.013447   
kate_siegel           0.887728        1.313749     0.000000   1.388377   
paul_rudd             1.429847        1.013447     1.388377   0.000000   
shea_whigham          1.399074        1.038684     1.379655   1.100503   

                shea_whigham  
angelina_jolie      1.399074  
bradley_cooper      1.038684  
kate_siegel         1.379655  
paul_rudd           1.100503  
shea_whigham        0.000000  


In [12]:
!pip install requests 
!pip install requests-aws4auth
!pip install Elasticsearch==7.12.1
!pip install urllib3

Collecting requests-aws4auth
  Using cached requests_aws4auth-1.0.1-py2.py3-none-any.whl (29 kB)
Installing collected packages: requests-aws4auth
Successfully installed requests-aws4auth-1.0.1
Collecting Elasticsearch==7.12.1
  Using cached elasticsearch-7.12.1-py2.py3-none-any.whl (339 kB)
Installing collected packages: Elasticsearch
Successfully installed Elasticsearch-7.12.1


In [35]:
from elasticsearch import Elasticsearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
import boto3

host = 'search-image-retrieval-bnppfgqwzoeu5dugmryflqaboa.us-west-2.es.amazonaws.com' # For example, my-test-domain.us-east-1.es.amazonaws.com
region = 'us-west-2' # e.g. us-west-1

service = 'es'
credentials = boto3.Session().get_credentials()



es = Elasticsearch(
    hosts = [{'host': host, 'port': 443}],
    http_auth = ('root','Peggy@@0218'),
    use_ssl = True,
    verify_certs = True,
    connection_class = RequestsHttpConnection
)

In [48]:
# es.indices.delete(index='faces', ignore=[400, 404])


{'acknowledged': True}

In [49]:
knn_index = {
    "settings": {
        "index.knn": True
    },
    "mappings": {
        "properties": {
            "face_vector": {
                "type": "knn_vector",
                "dimension": 512
            }
        }
    }
}

es.indices.create(index="faces",body=knn_index,ignore=400)

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'faces'}

In [50]:
def es_import(vector, celebid, id):
    es.index(index='faces',
             id=id, 
             body={"face_vector": vector, 
                   "celebid":celebid})
        
# es_import([0 for i in range(0, 256)], "q1")

In [51]:
for idx, (name, vector) in enumerate(zip(names, embeddings)): 
    es_import(vector.tolist(), name, idx)
    

In [52]:
def post(vector):
    res = es.search(index="faces",
                    body={
                        "size": 5,
                            "_source": {
                                "exclude": ["face_vector"]
                            },
                            "min_score": 0.3,
                            "query": {
                                "knn": {
                                    "face_vector": {
                                        "vector": vector,
                                        "k": 5
                                    }
                                }
                            }
                    })
    return res



In [53]:
for i_batch, (x, y) in enumerate(data_loader):
    x = x.to(device)
    embeddings = resnet(x).detach().cpu()
    print(embeddings.tolist())
    print(y.tolist())
    
    for em, l in zip(embeddings.tolist(), y.tolist()): 
        es_import(em, l, l)


[[0.0280041191726923, -0.048764917999506, -0.004158433992415667, 0.0021526284981518984, 0.02443731389939785, -0.07413841038942337, -0.02228892408311367, 0.044300638139247894, 0.011122863739728928, 0.03831297159194946, 0.0923815593123436, -0.0031082327477633953, 0.03307746723294258, 0.0016379038570448756, 0.024752190336585045, 0.02103581465780735, -0.031371671706438065, -0.014642582274973392, -0.13071776926517487, 0.04885857179760933, -0.016476482152938843, -0.02115459181368351, -0.03669428080320358, -0.015700893476605415, -0.09376261383295059, 0.008723549544811249, -0.026893222704529762, -0.03632628172636032, 0.002141036791726947, -0.0432710237801075, -0.00044235659879632294, 0.05612074211239815, 0.04047264903783798, 0.006305241491645575, -0.0039513735100626945, 0.022132467478513718, 0.022888896986842155, -0.04583035036921501, -0.009405549615621567, 0.0732303038239479, -0.09495580196380615, 0.06768511980772018, 0.01104997843503952, -0.030067134648561478, 0.08747828751802444, 0.05592999

[[0.03957390412688255, 0.0069587137550115585, -0.0486949123442173, 0.022314894944429398, 0.009981784969568253, 0.0027303225360810757, -0.07314132899045944, 0.07336752861738205, -0.06240972876548767, 0.032032787799835205, 0.039950642734766006, 0.04791399836540222, 0.014389993622899055, 0.028986318036913872, 0.008118562400341034, -0.06734280288219452, 0.039277657866477966, 0.11033925414085388, -0.039413563907146454, 0.013098128139972687, 0.03313814848661423, 0.08209269493818283, 0.05217673256993294, -0.0017572380602359772, -0.000983032863587141, -0.017049258574843407, 0.028119107708334923, 0.05634982883930206, 0.10886059701442719, 0.0006979150348342955, 0.04613000154495239, -0.04480203613638878, 0.03255292773246765, -0.0007690279744565487, 0.03200468420982361, -0.01978953927755356, -0.0784585103392601, -0.0015527585055679083, 0.06537538766860962, 0.05788340047001839, -0.037611402571201324, -0.034913524985313416, 0.023197585716843605, 0.01033804565668106, 0.038760099560022354, 0.054005507

In [54]:
for i_batch, (x, y) in enumerate(data_loader):
    x = x.to(device)
    embeddings = resnet(x).detach().cpu()
    
    for em, l in zip(embeddings.tolist(), y.tolist()): 
        result = post(em)
        print(result)
        print(l)

{'took': 9, 'timed_out': False, '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 4, 'relation': 'eq'}, 'max_score': 0.6279724, 'hits': [{'_index': 'faces', '_type': '_doc', '_id': '1', '_score': 0.6279724, '_source': {'celebid': 1}}, {'_index': 'faces', '_type': '_doc', '_id': '2', '_score': 0.39500058, '_source': {'celebid': 2}}, {'_index': 'faces', '_type': '_doc', '_id': '0', '_score': 0.3775033, '_source': {'celebid': 'dummy'}}, {'_index': 'faces', '_type': '_doc', '_id': '3', '_score': 0.36605343, '_source': {'celebid': 3}}]}}
1
{'took': 7, 'timed_out': False, '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 4, 'relation': 'eq'}, 'max_score': 0.6059149, 'hits': [{'_index': 'faces', '_type': '_doc', '_id': '1', '_score': 0.6059149, '_source': {'celebid': 1}}, {'_index': 'faces', '_type': '_doc', '_id': '2', '_score': 0.40008172, '_source': {'celebid': 2}}, {'_index': 'faces', '_type': '_