# Face Recognition

- Implement the triplet loss function
- Use a pretrained model to map face images into 128-dimensional encodings
- Use these encodings to perform face verification and face recognition

<img src="images/distance_kiank.png" style="width:500px;height:200px;">

The network uses 96x96 dimensional RGB images as its input. Specifically, it inputs a face image (or batch of $m$ face images) as a tensor of shape $(m, n_C, n_H, n_W)$ = $(m, 3, 96, 96)$. Note: input is not $(m, n_C, n_H, n_W)$, so channels first convention is used as opposed to usual. It outputs a matrix of shape $(m, 128)$ that encodes each input face image into a 128-dimensional vector

For an image $x$, its encoding is denoted as $f(x)$, where $f$ is the function computed by the neural network.

Training will use triplets of images $(A, P, N)$:  where A is an "Anchor" image (a picture of a person), P is a "Positive" image (another picture of the same person) and N is a "Negative" image (a picture of a different person). $(A^{(i)}, P^{(i)}, N^{(i)})$ denotes the $i$-th training example. 

Need to make sure that an image $A^{(i)}$ of an individual is closer to the Positive $P^{(i)}$ than to the Negative image $N^{(i)}$) by at least a margin $\alpha$:

$$\mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 + \alpha < \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2$$

The objective is to minimize the following "triplet cost":

$$\mathcal{J} = \sum^{m}_{i=1} \large[ \small \underbrace{\mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2}_\text{(1)} - \underbrace{\mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2}_\text{(2)} + \alpha \large ] \small_+ \tag{3}$$

Here the notation "$[z]_+$" denotes $max(z,0)$.

Version notes: tensorflow v1.4.0, numpy v1.16.4 (downgraded for incompatibility with tf), Keras v2.1.4

In [1]:
from keras.models import Sequential
from keras.layers import Conv2D, ZeroPadding2D, Activation, Input, concatenate
from keras.models import Model
from keras.layers.normalization import BatchNormalization
from keras.layers.pooling import MaxPooling2D, AveragePooling2D
from keras.layers.merge import Concatenate
from keras.layers.core import Lambda, Flatten, Dense
from keras.initializers import glorot_uniform
from keras.engine.topology import Layer

from keras import backend as K
K.set_image_data_format('channels_first')

import cv2
import os
import numpy as np
from numpy import genfromtxt
import pandas as pd
import tensorflow as tf

from frutils import *

%matplotlib inline
%load_ext autoreload
%autoreload 2

import warnings
warnings.filterwarnings(action="ignore", category=FutureWarning)

np.set_printoptions(threshold=2**31-1)

Using TensorFlow backend.


In [2]:
FRmodel = faceRecoModel(input_shape=(3, 96, 96))

In [3]:
print("Total Params:", FRmodel.count_params())

Total Params: 3743280


In [4]:
def triplet_loss(y_true, y_pred, alpha = 0.2):
    """
    Implementation of the triplet loss as defined by formula (3)
    
    Arguments:
    y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
    y_pred -- python list containing three objects:
            anchor -- the encodings for the anchor images, of shape (None, 128)
            positive -- the encodings for the positive images, of shape (None, 128)
            negative -- the encodings for the negative images, of shape (None, 128)
    
    Returns:
    loss -- real number, value of the loss
    """
    
    anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
    
    pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), axis=-1)
    neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)), axis=-1)
    basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), alpha)
    loss = tf.reduce_sum(tf.maximum(basic_loss,0.0))
    
    return loss

In [5]:
FRmodel.compile(optimizer = 'adam', loss = triplet_loss, metrics = ['accuracy'])
load_weights_from_FaceNet(FRmodel)

In [6]:
def img_to_encoding(image_path, model):
    img1 = cv2.imread(image_path, 1)
    img = img1[...,::-1]
    img = np.around(np.transpose(img, (2,0,1))/255.0, decimals=12)
    x_train = np.array([img])
    embedding = model.predict_on_batch(x_train)
    return embedding

In [7]:
database = {}
database["danielle"] = img_to_encoding("data/database/danielle.png", FRmodel)
database["younes"] = img_to_encoding("data/database/younes.jpg", FRmodel)
database["tian"] = img_to_encoding("data/database/tian.jpg", FRmodel)
database["andrew"] = img_to_encoding("data/database/andrew.jpg", FRmodel)
database["kian"] = img_to_encoding("data/database/kian.jpg", FRmodel)
database["dan"] = img_to_encoding("data/database/dan.jpg", FRmodel)
database["sebastiano"] = img_to_encoding("data/database/sebastiano.jpg", FRmodel)
database["bertrand"] = img_to_encoding("data/database/bertrand.jpg", FRmodel)
database["kevin"] = img_to_encoding("data/database/kevin.jpg", FRmodel)
database["felix"] = img_to_encoding("data/database/felix.jpg", FRmodel)
database["benoit"] = img_to_encoding("data/database/benoit.jpg", FRmodel)
database["arnaud"] = img_to_encoding("data/database/arnaud.jpg", FRmodel)

In [8]:
def who_is_it(image_path, database, model):
    """
    Implements face recognition for the office by finding who is the person on the image_path image.
    
    Arguments:
    image_path -- path to an image
    database -- database containing image encodings along with the name of the person on the image
    model -- your Inception model instance in Keras
    
    Returns:
    min_dist -- the minimum distance between image_path encoding and the encodings from the database
    identity -- string, the name prediction for the person on image_path
    """
    
    ## Step 1: Compute the target "encoding" for the image. Use img_to_encoding()
    encoding = img_to_encoding(image_path, model)
    
    ## Step 2: Find the closest encoding ##
    min_dist = 100
    
    for (name, db_enc) in database.items():
        dist = np.linalg.norm(db_enc - encoding)
        if dist < min_dist:
            min_dist = dist
            identity = name

    if min_dist > 0.7:
        print("Not in the database.")
    else:
        print ("it's " + str(identity) + ", the distance is " + str(min_dist))
        
    return min_dist, identity

In [9]:
who_is_it("images/camera_5.jpg", database, FRmodel)

it's dan, the distance is 0.49662504


(0.49662504, 'dan')