# Deploy a Face Detection and Recognition Model - SageMaker Inference Endpoint

## Intro

**Face detection**  
In order to input only face pixels into the network, all input images are passed through a pretrained face detection and alignment model, [MTCNN detector](https://kpzhang93.github.io/MTCNN_face_detection_alignment/index.html). The output of this model are landmark points and a bounding box corresponding to the face in the image. Using this output, the image is processed using affine transforms to generate the aligned face images which are input to the network.

**Face feature generation**  
For each face image, the model produces a fixed length embedding vector corresponding to the face in the image. The vectors from face images of a single person have a higher similarity than that from different persons. Therefore, the model is primarily used for face recognition/verification. It can also be used in other applications like facial feature based clustering.  

**Model artifacts**     
In this implementation, we use LResNet100E-IR, ResNet100 backend with [ArcFace](https://arxiv.org/abs/1801.07698) loss.   
For both the MTCNN detector and the ResNet we use and use the pre-trained models from [ONNX Model Zoo](https://github.com/onnx/models), then import the [ONNX](http://onnx.ai/) files into MXNet model.

**Deployment**   
We deploy the pre-build models (detection+recognition) on SageMaker Managed real-time Endpoint.
The steps are:

1. Create an inference Python script with functions:
    - `model_fn` loads your model
    - `transform_fn` to handle inference requests.
2. Package inference script, model artifacts, and additional files into a tarfile
3. Upload the tarfile to an S3 bucket
4. Create a `MXNetModel` ([documentation](https://sagemaker.readthedocs.io/en/stable/frameworks/mxnet/sagemaker.mxnet.html#sagemaker.mxnet.model.MXNetModel)), indicating framework version
5. Deploy a predictor, indicating the number of instances and the instance type

![Deploy Diagram](./images/sm_deploy_MXNet.png)

Here's the structure we will compress and upload to S3, and that will be replicated in the Endpoint instance.
```
Model
|-- code
|   |-- helper.py
|   |-- inference.py
|   |-- mtcnn_detector.py
|   `-- requirements.txt
|-- mtcnn-model
|   |-- det1-0001.params
|   |-- det1-symbol.json
|   |-- det1.caffemodel
|   |-- det1.prototxt
|   |-- det2-0001.params
|   |-- det2-symbol.json
|   |-- det2.caffemodel
|   |-- det2.prototxt
|   |-- det3-0001.params
|   |-- det3-symbol.json
|   |-- det3.caffemodel
|   |-- det3.prototxt
|   |-- det4-0001.params
|   |-- det4-symbol.json
|   |-- det4.caffemodel
|   `-- det4.prototxt
`-- resnet100.onnx
```

Additional references:

- https://github.com/aws/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/mxnet_onnx_superresolution
- https://sagemaker.readthedocs.io/en/latest/using_mxnet.html#serve-an-mxnet-model
- https://github.com/onnx/models/tree/master/vision/body_analysis/arcface

### Imports and environment setting

We start importing the necessary libraries to run this notebook.  
Differently from the `local` version of this notebook, we don't need to install addition libraries in the notebook kernel. The `requirement.txt` file will be included in the tarbool, and the additional libraries will be installed in the Framework image at deployment time.

In [None]:
import shutil
from pathlib import Path

import cv2
import matplotlib.pyplot as plt
import mxnet as mx
import numpy as np
import sagemaker as sm
from sagemaker.mxnet import MXNetModel

Define AWS environment and SageMaker objects.

In [None]:
sm_session = sm.Session()
sm_client = sm_session.sagemaker_client
region = sm_session.boto_region_name
role = sm.get_execution_role()
bucket = sm_session.default_bucket()

Define a prefix for all the files and artifacts of this demo, to easily identify the relevant object after uploading to S3.  
We also define few variables to be used later.

In [None]:
prefix = "facedetection-mxnet"

framework = "mxnet"
framework_version = "1.8.0"
cpu_instance_type = "ml.m5.xlarge"
gpu_instance_type = "ml.g4dn.xlarge"

## Download Pre-built Models

Download the pre-trained weights

In [None]:
model_local_path = Path("model")
code_local_path = model_local_path / "code"
images_local_path = Path("images")

In [None]:
def download_mtcnn_model(i, dirname: str):
    base_url = f"https://s3.amazonaws.com/onnx-model-zoo/arcface/mtcnn-model/det{i+1}"
    mx.test_utils.download(url=f"{base_url}-0001.params", dirname=dirname)
    mx.test_utils.download(url=f"{base_url}-symbol.json", dirname=dirname)
    mx.test_utils.download(url=f"{base_url}.caffemodel", dirname=dirname)
    mx.test_utils.download(url=f"{base_url}.prototxt", dirname=dirname)
    return "Done"


mtcnn_local_path = model_local_path / "mtcnn-model"

[download_mtcnn_model(i, dirname=mtcnn_local_path) for i in range(4)]

print(f"MTCNN artifacts downloaded to `{mtcnn_local_path}`")

Download onnx model.

In [None]:
arcface_local_path = mx.test_utils.download(
    "https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100.onnx",
    dirname=model_local_path,
)

## Inference script
The inference script is in `model/code`. In the same folder, there are also the support libraries and `requirements.txt` file.

In [None]:
!pygmentize {code_local_path}/inference.py

Let's test that the inference script works as expected. to test it, we need to download the test images.

In [None]:
# Download first image
image1_path = mx.test_utils.download(
    "https://s3.amazonaws.com/onnx-model-zoo/arcface/player1.jpg",
    dirname=images_local_path,
)
# Download second image
image2_path = mx.test_utils.download(
    "https://s3.amazonaws.com/onnx-model-zoo/arcface/player2.jpg",
    dirname=images_local_path,
)

img1 = cv2.imread(image1_path)
img2 = cv2.imread(image2_path)

f, ax = plt.subplots(1, 2)
ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))
ax[0].set_title("Image1")
ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))
ax[1].set_title("Image2");

We can now run the inference script

In [None]:
%run {code_local_path}/inference.py

## Create compressed archive
We can now compress the folder containing the scripts and the pre-built model and upload it to S3.

In [None]:
model_compressed = shutil.make_archive(
    "model", format="gztar", root_dir=model_local_path
)
model_uri = sm_session.upload_data(
    path=model_compressed, key_prefix=f"{prefix}/{model_local_path.name}"
)
print(f"Compressed Model (scripts and model weights) uploaded to:\n{model_uri}")

## Create Model

In [None]:
model_sm = MXNetModel(
    model_data=model_uri,
    entry_point="inference.py",
    role=role,
    py_version="py37",
    framework_version=framework_version,
)

## Deploy Endpoint

In [None]:
predictor = model_sm.deploy(initial_instance_count=1, instance_type=gpu_instance_type)

### Testing 
We can finally test the model. We will generate features for two test images and then compute two distance metrics between them.

Get predictions

In [None]:
out = predictor.predict(img1)
img1_preprocessed, out1 = out["preprocessed_image"], np.array(
    out["feature_vector"], dtype=float
)
out = predictor.predict(img2)
img2_preprocessed, out2 = out["preprocessed_image"], np.array(
    out["feature_vector"], dtype=float
)

In [None]:
f, ax = plt.subplots(1, 2)
ax[0].imshow(np.transpose(img1_preprocessed, (1, 2, 0)))
ax[0].set_title("Image1_preprocessed")
ax[1].imshow(np.transpose(img2_preprocessed, (1, 2, 0)))
ax[1].set_title("Image2_preprocessed");

Compute distance between the feature vectors

In [None]:
# Compute squared distance between embeddings
dist = np.sum(np.square(out1 - out2))

# Compute cosine similarity between embedddings
sim = np.dot(out1, out2.T)

# Print predictions
print(f"Distance = {dist:.4f}")
print(f"Similarity = {sim:.4f}")

We can also check the average inference time. This is not a rigoruous benchmark, but it gives us an idea.

In [None]:
%%timeit
predictor.predict(img1)

## Cleanup of resources

In [None]:
predictor.delete_endpoint()