# Deploy a Face Detection and Recognition Model - Local

## Intro

**Face detection**  
In order to input only face pixels into the network, all input images are passed through a pretrained face detection and alignment model, [MTCNN detector](https://kpzhang93.github.io/MTCNN_face_detection_alignment/index.html). The output of this model are landmark points and a bounding box corresponding to the face in the image. Using this output, the image is processed using affine transforms to generate the aligned face images which are input to the network.

**Face feature generation**  
For each face image, the model produces a fixed length embedding vector corresponding to the face in the image. The vectors from face images of a single person have a higher similarity than that from different persons. Therefore, the model is primarily used for face recognition/verification. It can also be used in other applications like facial feature based clustering.  

**Model artifacts**     
In this implementation, we use LResNet100E-IR, ResNet100 backend with [ArcFace](https://arxiv.org/abs/1801.07698) loss.   
For both the MTCNN detector and the ResNet we use and use the pre-trained models from [ONNX Model Zoo](https://github.com/onnx/models), then import the [ONNX](http://onnx.ai/) files into MXNet model.

### Imports and environment setting

We start installing the necessary libraries as indicated in the `requirement.txt` file.

In [None]:
import sys

!{sys.executable} -m pip install -q -r model/code/requirements.txt

We can now import the libraries that we need to run this notebook. We also include a custom python library `mtcnn_detector.py`, that provides some useful abstractions for detecting and processing faces from images.

In [None]:
from pathlib import Path

import cv2
import matplotlib.pyplot as plt
import mxnet as mx
import numpy as np
from mxnet.contrib.onnx.onnx2mx.import_model import import_model
from skimage import transform as trans

from model.code.mtcnn_detector import MtcnnDetector

We set the context to make sure to use a GPU if available (and properly configured)

In [None]:
# Determine and set context
if len(mx.test_utils.list_gpus()) == 0:
    ctx = mx.cpu()
else:
    ctx = mx.gpu(0)

We also define some file paths for this project.

In [None]:
model_local_path = Path("model")
code_local_path = model_local_path / "code"
images_local_path = Path("images")

## Step 1 - Face detection
We start by downloading the pre-trained MTCNN model.

In [None]:
def download_mtcnn_model(i, dirname: str):
    base_url = f"https://s3.amazonaws.com/onnx-model-zoo/arcface/mtcnn-model/det{i+1}"
    mx.test_utils.download(url=f"{base_url}-0001.params", dirname=dirname)
    mx.test_utils.download(url=f"{base_url}-symbol.json", dirname=dirname)
    mx.test_utils.download(url=f"{base_url}.caffemodel", dirname=dirname)
    mx.test_utils.download(url=f"{base_url}.prototxt", dirname=dirname)
    return "Done"


mtcnn_local_path = model_local_path / "mtcnn-model"

[download_mtcnn_model(i, dirname=mtcnn_local_path) for i in range(4)]

print(f"MTCNN artifacts downloaded to `{mtcnn_local_path}`")

We can now configure and initialize the face detector

In [None]:
det_threshold = [0.6, 0.7, 0.8]
detector = MtcnnDetector(
    model_folder=mtcnn_local_path.as_posix(),
    ctx=ctx,
    num_worker=1,
    accurate_landmark=True,
    threshold=det_threshold,
)

In [None]:
def preprocess(img, bbox=None, landmark=None, **kwargs):
    """
    Preprocess images to detect and extract faces.

    Returns a 3 x 112 x 112 (channels x width x height) numpy array.
    """
    M = None
    image_size = []
    str_image_size = kwargs.get("image_size", "")
    # Assert input shape
    if len(str_image_size) > 0:
        image_size = [int(x) for x in str_image_size.split(",")]
        if len(image_size) == 1:
            image_size = [image_size[0], image_size[0]]
        assert len(image_size) == 2
        assert image_size[0] == 112
        assert image_size[0] == 112 or image_size[1] == 96

    # Do alignment using landmark points
    if landmark is not None:
        assert len(image_size) == 2
        src = np.array(
            [
                [30.2946, 51.6963],
                [65.5318, 51.5014],
                [48.0252, 71.7366],
                [33.5493, 92.3655],
                [62.7299, 92.2041],
            ],
            dtype=np.float32,
        )
        if image_size[1] == 112:
            src[:, 0] += 8.0
        dst = landmark.astype(np.float32)
        tform = trans.SimilarityTransform()
        tform.estimate(dst, src)
        M = tform.params[0:2, :]
        assert len(image_size) == 2
        warped = cv2.warpAffine(img, M, (image_size[1], image_size[0]), borderValue=0.0)
        return warped

    # If no landmark points available, do alignment using bounding box. If no bounding box available use center crop
    if M is None:
        if bbox is None:
            det = np.zeros(4, dtype=np.int32)
            det[0] = int(img.shape[1] * 0.0625)
            det[1] = int(img.shape[0] * 0.0625)
            det[2] = img.shape[1] - det[0]
            det[3] = img.shape[0] - det[1]
        else:
            det = bbox
        margin = kwargs.get("margin", 44)
        bb = np.zeros(4, dtype=np.int32)
        bb[0] = np.maximum(det[0] - margin / 2, 0)
        bb[1] = np.maximum(det[1] - margin / 2, 0)
        bb[2] = np.minimum(det[2] + margin / 2, img.shape[1])
        bb[3] = np.minimum(det[3] + margin / 2, img.shape[0])
        ret = img[bb[1] : bb[3], bb[0] : bb[2], :]
        if len(image_size) > 0:
            ret = cv2.resize(ret, (image_size[1], image_size[0]))
        return ret


def get_input(detector, face_img):
    """
    Pass input images through face detector
    """
    ret = detector.detect_face(face_img, det_type=0)
    if ret is None:
        return None
    bbox, points = ret
    if bbox.shape[0] == 0:
        return None
    bbox = bbox[0, 0:4]
    points = points[0, :].reshape((2, 5)).T
    # Call preprocess() to generate aligned images
    nimg = preprocess(face_img, bbox, points, image_size="112,112")
    nimg = cv2.cvtColor(nimg, cv2.COLOR_BGR2RGB)
    aligned = np.transpose(nimg, (2, 0, 1))
    return aligned

### Testing Face Detection

We can now test visually that the face detector and the preprocessing function effectively detect and apply the correct transofrmations to the test images.

Let's download the test images.

In [None]:
# Download first image
image1_path = mx.test_utils.download(
    "https://s3.amazonaws.com/onnx-model-zoo/arcface/player1.jpg",
    dirname=images_local_path,
)
# Download second image
image2_path = mx.test_utils.download(
    "https://s3.amazonaws.com/onnx-model-zoo/arcface/player2.jpg",
    dirname=images_local_path,
)

img1 = cv2.imread(image1_path)
img2 = cv2.imread(image2_path)

f, ax = plt.subplots(1, 2)
ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))
ax[0].set_title("Image1")
ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))
ax[1].set_title("Image2");

We can now test the detection + preprocessing...

In [None]:
img1_preprocessed = get_input(detector, img1)
img2_preprocessed = get_input(detector, img2)

... and visualize the results.

In [None]:
f, ax = plt.subplots(1, 2)
ax[0].imshow(np.transpose(img1_preprocessed, (1, 2, 0)))
ax[0].set_title("Image1_preprocessed")
ax[1].imshow(np.transpose(img2_preprocessed, (1, 2, 0)))
ax[1].set_title("Image2_preprocessed");

## Step 2 - Generate Feature Vector

For the LResNet100E-IR, we proceed in a similar way as previous section:

1. Download pre-trained model
2. Convert weights from ONNX to MXNet
3. Initialize model
4. Test

We download the pre-trained ONNX model.

In [None]:
model_path = mx.test_utils.download(
    "https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100.onnx",
    dirname=model_local_path,
)

In [None]:
def get_model(ctx, model):
    """
    Import ONNX artifact and initializes the model
    """
    image_size = (112, 112)
    sym, arg_params, aux_params = import_model(model)

    # Define and binds parameters to the network
    model = mx.mod.Module(symbol=sym, context=ctx, label_names=None)
    model.bind(data_shapes=[("data", (1, 3, image_size[0], image_size[1]))])
    model.set_params(arg_params, aux_params)
    return model


def get_feature(model, aligned):
    """
    Create feature vector from input face.

    Macth input dimensions to input expected by the model.
    Only process one image at the time.
    """
    input_blob = np.expand_dims(aligned, axis=0)
    data = mx.nd.array(input_blob)
    db = mx.io.DataBatch(data=(data,))
    model.forward(db, is_train=False)
    embedding = model.get_outputs()[0].asnumpy()
    embedding /= (embedding ** 2).sum() ** 0.5
    return embedding.flatten()


model = get_model(ctx, model_path)

### Testing 
We can finally test the model. We will generate features for both test images and then compute two distance metrics between them.

In [None]:
out1 = get_feature(model, img1_preprocessed)
out2 = get_feature(model, img2_preprocessed)

The result of each inference is a 512-element vector

In [None]:
out1.shape, out1.shape

Compute distance between the feature vectors

In [None]:
# Compute squared distance between embeddings
dist = np.sum(np.square(out1 - out2))
# Compute cosine similarity between embedddings
sim = np.dot(out1, out2.T)
# Print predictions
print("Distance = %f" % (dist))
print("Similarity = %f" % (sim))

We can also check the average processing time. This is not a rigoruous benchmark, but it gives us an idea.

In [None]:
%%timeit
get_feature(model, img1_preprocessed)