# Inference Demo for ArcFace models

## Overview
This notebook can be used for inference on ArcFace ONNX models. The demo shows how to use the trained models to do inference in MXNet.

## Models supported
* LResNet100E-IR (ResNet100 backend with ArcFace loss)

## Prerequisites
The following packages need to be installed before proceeding:
* Protobuf compiler - `sudo apt-get install protobuf-compiler libprotoc-dev` (required for ONNX. This will work for any linux system. For detailed installation guidelines head over to [ONNX documentation](https://github.com/onnx/onnx#installation))
* ONNX - `pip install onnx`
* MXNet - `pip install mxnet-cu90mkl --pre -U` (tested on this version GPU, can use other versions. `--pre` indicates a pre build of MXNet which is required here for ONNX version compatibility. `-U` uninstalls any existing MXNet version allowing for a clean install)
* numpy - `pip install numpy`
* matplotlib - `pip install matplotlib`
* OpenCV - `pip install opencv-python`
* Scikit-learn - `pip install scikit-learn`
* EasyDict - `pip install easydict`
* Scikit-image - `pip install scikit-image`
* Scipy -    `pip install scipy`
* DLib -     `pip install dlib`
* Tensorflow - `pip install tensorflow-gpu==1.11`

Also the following scripts and folders (included in the repo) must be present in the same folder as this notebook:
* `mtcnn_detector.py` (Performs face detection as a part of preprocessing)-(In this version i have changed mtcnn to dlib so that you don't need to care about it)
* `helper.py` (helper script for face detection)

In order to do inference with a python script:
* Generate the script : In Jupyter Notebook browser, go to File -> Download as -> Python (.py)
* Run the script: `python arcface_inference.py`

### Import dependencies
Verify that all dependencies are installed using the cell below. Continue if no errors encountered, warnings can be ignored.

In [1]:
import cv2
import sys
import numpy as np
import mxnet as mx
import os
import dlib
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from scipy import misc
import random
import sklearn
from sklearn.decomposition import PCA
from time import sleep
from easydict import EasyDict as edict
from skimage import transform as trans
import matplotlib.pyplot as plt
from mxnet.contrib.onnx.onnx2mx.import_model import import_model

### Load pretrained model
`get_model()` : Loads ONNX model into MXNet symbols and params, defines model using symbol file and binds parameters to the model using params file.

In [2]:
def get_model(ctx, model):
    image_size = (112,112)
    # Import ONNX model
    sym, arg_params, aux_params = import_model(model)
    # Define and binds parameters to the network
    model = mx.mod.Module(symbol=sym, context=ctx, label_names = None)
    model.bind(data_shapes=[('data', (1, 3, image_size[0], image_size[1]))])
    model.set_params(arg_params, aux_params)
    return model

### Configure face detection model for preprocessing

In [3]:
# Determine and set context
if len(mx.test_utils.list_gpus())==0:
    ctx = mx.cpu()
else:
    ctx = mx.gpu(0)

### Preprocess images
In order to input only face pixels into the network, all input images are passed through a pretrained face detection and alignment model as described above. The output of this model are landmark points and a bounding box corresponding to the face in the image. Using this output, the image is processed using affine transforms to generate the aligned face images which are input to the network. The functions performing this is defined below.

`preprocess()` : Takes output of face detector (bounding box and landmark points for face in the image) as input and generates aligned face images

`get_all_input()` : Passes input images through the face detector, and returns aligned face images generated by `preprocess()`

In [4]:
def preprocess(img, bbox=None, landmark=None, **kwargs):
    M = None
    image_size = []
    str_image_size = kwargs.get('image_size', '')
    # Assert input shape
    if len(str_image_size)>0:
        image_size = [int(x) for x in str_image_size.split(',')]
        if len(image_size)==1:
            image_size = [image_size[0], image_size[0]]
        assert len(image_size)==2
        assert image_size[0]==112
        assert image_size[0]==112 or image_size[1]==96
    
    # Do alignment using landmark points
    if landmark is not None:
        assert len(image_size)==2
        src = np.array([
          [30.2946, 51.6963],
          [65.5318, 51.5014],
          [48.0252, 71.7366],
          [33.5493, 92.3655],
          [62.7299, 92.2041] ], dtype=np.float32 )
        if image_size[1]==112:
            src[:,0] += 8.0
        dst = landmark.astype(np.float32)
        tform = trans.SimilarityTransform()
        tform.estimate(dst, src)
        M = tform.params[0:2,:]
        assert len(image_size)==2
        warped = cv2.warpAffine(img,M,(image_size[1],image_size[0]), borderValue = 0.0)
        return warped
    
    # If no landmark points available, do alignment using bounding box. If no bounding box available use center crop
    if M is None:
        if bbox is None:
            det = np.zeros(4, dtype=np.int32)
            det[0] = int(img.shape[1]*0.0625)
            det[1] = int(img.shape[0]*0.0625)
            det[2] = img.shape[1] - det[0]
            det[3] = img.shape[0] - det[1]
        else:
            det = bbox
        margin = kwargs.get('margin', 44)
        bb = np.zeros(4, dtype=np.int32)
        bb[0] = np.maximum(det[0]-margin/2, 0)
        bb[1] = np.maximum(det[1]-margin/2, 0)
        bb[2] = np.minimum(det[2]+margin/2, img.shape[1])
        bb[3] = np.minimum(det[3]+margin/2, img.shape[0])
        ret = img[bb[1]:bb[3],bb[0]:bb[2],:]
        if len(image_size)>0:
            ret = cv2.resize(ret, (image_size[1], image_size[0]))
        return ret

### Get data from input

In [5]:
def get_all_input(face_img,name_face, save_img=False): 
    #Use dlib to get bbox and 5 points
    detector = dlib.get_frontal_face_detector()
    predictor = dlib.shape_predictor("./shape_predictor_68_face_landmarks.dat")

    # Load the image using Dlib
    img = dlib.load_rgb_image(name_face)
    
    #Get info image
    dets, scores, idx = detector.run(img, 1,-1)
    points = list()
    bbox = list()
    for i, d in enumerate(dets):
        point = list()
        bb = list()
        sp = predictor(img,d)
        #Get values points
        point.extend((sp.part(36).x,sp.part(45).x,sp.part(33).x,sp.part(48).x,sp.part(54).x,
                      sp.part(36).y,sp.part(45).y,sp.part(33).y,sp.part(48).y,sp.part(54).y))
        #set the threshold to identify faces 
        if scores[i] >0:
            bb.extend((d.left(), d.top(), d.right(), d.bottom(),scores[i]))
        if len(bb)>0:
            bbox.append(bb)
            points.append(point)
    bbox = np.array((bbox))
    points = np.array((points))
    
    if bbox.shape[0] == 0:
        return None
    aligned = []
    for index in range(0, len(bbox)):
        item_bbox = bbox[index, 0:4]
        item_points = points[index, :].reshape((2, 5)).T
        # print(bbox)
        # print(points)
        nimg = preprocess(face_img, item_bbox, item_points, image_size='112,112')
        #if save_img:
        #    cv2.imwrite('./Temp/{}-{}.jpg'.format(time.time(),
        #                                          face_counter), nimg)
            # print(self.face_counter)
        #    face_counter += 1

        nimg = cv2.cvtColor(nimg, cv2.COLOR_BGR2RGB)
        aligned.append(np.transpose(nimg, (2, 0, 1)))

    
    return aligned

### Predict
`get_feature()` : Performs forward pass on the data `aligned` using `model` and returns the embedding

In [6]:
def get_feature(model,aligned):
    input_blob = np.expand_dims(aligned, axis=0)
    data = mx.nd.array(input_blob)
    db = mx.io.DataBatch(data=(data,))
    model.forward(db, is_train=False)
    embedding = model.get_outputs()[0].asnumpy()
    embedding = sklearn.preprocessing.normalize(embedding).flatten()
    return embedding

### Download input images and prepare ONNX model

In [7]:
# Download first image
mx.test_utils.download('https://s3.amazonaws.com/onnx-model-zoo/arcface/player1.jpg')
# Download second image
mx.test_utils.download('https://s3.amazonaws.com/onnx-model-zoo/arcface/player2.jpg')
# Download onnx model
mx.test_utils.download('https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100.onnx')
# Path to ONNX model
model_name = '/home/pocpon/models/models/face_recognition/ArcFace/resnet100.onnx'

In [None]:
# Load ONNX model
model = get_model(ctx , model_name)

### Generate predictions
Two face images are passed through the network sequentially to generate embedding vectors for each. The squared distance and cosine similarity between the embedding vectors are computed and displayed. Images containing face of a single person will have low distance and high similarity and vice-versa. The distance values are in [0,4) and similarity values in [-1,1].

In [None]:
# Load first image
img1 = cv2.imread('player1.jpg')
name_img1 = "player1.jpg"
# Display first image
plt.imshow(cv2.cvtColor(img1,cv2.COLOR_BGR2RGB))
plt.show()

In [None]:
pre1 = get_all_input(img1,name_img1)
#print(pre.shape)
#print((np.transpose(pre[0],(1,2,0))).shape)
for i in range(len(pre)):
    plt.imshow(np.transpose(pre1[i],(1,2,0)))
    plt.show()
# Get embedding of second image
out1 = get_feature(model,pre1[0])

In [None]:
# Load second image
img2 = cv2.imread('player2.jpg')
name_img2 = "player2.jpg"
# Display second image
plt.imshow(cv2.cvtColor(img2,cv2.COLOR_BGR2RGB))
plt.show()

In [None]:
pre2 = get_all_input(img2, name_img2)
#print(pre.shape)
#print((np.transpose(pre[0],(1,2,0))).shape)
for i in range(len(pre)):
    plt.imshow(np.transpose(pre2[i],(1,2,0)))
    plt.show()
# Get embedding of second image
out2 = get_feature(model,pre2[0])

In [None]:
# Compute squared distance between embeddings
dist = np.sum(np.square(out1-out2))
# Compute cosine similarity between embedddings
sim = np.dot(out1, out2.T)
# Print predictions
print('Distance = %f' %(dist))
print('Similarity = %f' %(sim))