# Face Recognition for the Happy House

Many of the ideas presented here are from [FaceNet](https://arxiv.org/pdf/1503.03832.pdf). 

Face recognition problems commonly fall into two categories: 

- **Face Verification** - "is this the claimed person?". For example, at some airports, you can pass through customs by letting a system scan your passport and then verifying that you (the person carrying the passport) are the correct person. A mobile phone that unlocks using your face is also using face verification. This is a 1:1 matching problem. 
- **Face Recognition** - "who is this person?". This is a 1:K matching problem. 

FaceNet learns a neural network that encodes a face image into a vector of 128 numbers. By comparing two such vectors, you can then determine if two pictures are of the same person.
    


#1 - Load, Save and Change Directory

In [55]:
# Load Libraries
from keras import backend as K
K.set_image_data_format('channels_first')
import cv2
import os
import numpy as np
from numpy import genfromtxt
import pandas as pd
import tensorflow as tf
import dlib
from PIL import Image
from skimage import io

%matplotlib inline
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [2]:
# Mount Drive
from os.path import join
from google.colab import drive

ROOT = "/content/drive"
drive.mount(ROOT)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
# Change working directory
cd '/content/drive/My Drive/Colab/Workspace/DeepLearn/CNN_Data/Face Recognition OpenFace'

/content/drive/My Drive/Colab/Workspace/DeepLearn/CNN_Data/Face Recognition OpenFace


In [0]:
# Create folder to save Git contents
# Check if such a folder exists from previous runtime. If yes, delete it.
path = '/content/drive/My Drive/Colab/Workspace/DeepLearn/CNN_Data/Face Recognition OpenFace'

files = os.listdir(path)

if 'Github' in files:
  !rm -r Github
  

In [7]:
# Make a new Github directory
!mkdir Github

# Define path for the Github folder
path = join('/content/drive', 'My Drive/Colab/Workspace/DeepLearn/CNN_Data/Face Recognition OpenFace/Github')

# Clone Git contents to the folder created 
!git clone https://github.com/SindhuSobhan/DeepLearn.git "{path}"
  

Cloning into '/content/drive/My Drive/Colab/Workspace/DeepLearn/CNN_Data/Face Recognition OpenFace/Github'...
remote: Enumerating objects: 156, done.[K
remote: Counting objects:   0% (1/156)[Kremote: Counting objects:   1% (2/156)[Kremote: Counting objects:   2% (4/156)[Kremote: Counting objects:   3% (5/156)[Kremote: Counting objects:   4% (7/156)[Kremote: Counting objects:   5% (8/156)[Kremote: Counting objects:   6% (10/156)[Kremote: Counting objects:   7% (11/156)[Kremote: Counting objects:   8% (13/156)[Kremote: Counting objects:   9% (15/156)[Kremote: Counting objects:  10% (16/156)[Kremote: Counting objects:  11% (18/156)[Kremote: Counting objects:  12% (19/156)[Kremote: Counting objects:  13% (21/156)[Kremote: Counting objects:  14% (22/156)[Kremote: Counting objects:  15% (24/156)[Kremote: Counting objects:  16% (25/156)[Kremote: Counting objects:  17% (27/156)[Kremote: Counting objects:  18% (29/156)[Kremote: Counting objects:  19% (30/1

In [8]:
# Change working directory
cd '/content/drive/My Drive/Colab/Workspace/DeepLearn/CNN_Data/Face Recognition OpenFace/Github'

/content/drive/My Drive/Colab/Workspace/DeepLearn/CNN_Data/Face Recognition OpenFace/Github


In [9]:
# Switch branch if needed
!git checkout CNN

Branch 'CNN' set up to track remote branch 'CNN' from 'origin'.
Switched to a new branch 'CNN'


In [10]:
# Switch current working folder to Github folder containing files
cd '/content/drive/My Drive/Colab/Workspace/DeepLearn/CNN_Data/Face Recognition OpenFace/Github/CNN/Face Recognition'

/content/drive/My Drive/Colab/Workspace/DeepLearn/CNN_Data/Face Recognition OpenFace/Github/CNN/Face Recognition


In [0]:
# import custom libraries
from fr_utils import *
from inception_blocks_v2 import *


#2 - Face Detection

Code for Face Detection. The code is inspired from the answer by Katrina Malakhova on [this](https://stackoverflow.com/questions/13211745/detect-face-then-autocrop-pictures) StackOverflow page.

The code works by using the inbuilt face detection function in the dlib library. Of the two face detection functions, frontal_face_detector and CNN_face_detector, the latter was found to provide better results. 

The function provides the coordinates of the bounding box which can be used to crop the image such that only the face is used to produce the encoding later. The cropped image is also resized to 96 by 96 so that it can be fed into the FaceNet model. The resized and cropped images can either be saved in a new folder specified by save_direc, or the image array can be obtained by keeping the save argument as false in the function.

In [0]:
def detect_faces(image):
  # Create a face detector (We choose CNN model)
  face_detector = dlib.cnn_face_detection_model_v1('mmod_human_face_detector.dat')

  # Run detector and get bounding boxes of the faces on image.
  detected_faces = face_detector(image, 1)
  face_frames = [(x.rect.left(), x.rect.top(),
                    x.rect.right(), x.rect.bottom()) for x in detected_faces]

  return face_frames
  
  
  
def crop_with_face_frame(image):
  # Detect faces
  detected_faces = detect_faces(image)
  face = None
  
  # Crop faces and plot
  for n, face_rect in enumerate(detected_faces):
    face = np.array(Image.fromarray(image).crop(face_rect))
  return face



def process_and_save_image(image_path, save_direc, save = False):
  # Load image
  image = io.imread(image_path)
  
  # Get filename
  im_name = os.path.split(image_path)[-1]
  
  # Crop face out of image
  image = crop_with_face_frame(image)
  
  # Resize image for the CNN
  image = cv2.resize(image, (96, 96), interpolation = cv2.INTER_AREA)
  
  # Convert the image to an Image type object
  im = Image.fromarray(image)
  
  # Save the image as a jpg file or return the image as an array
  if save:
    im.save(os.path.join(save_direc, im_name))
  else:
    return image
  

In [0]:
# Specify path to sirectory where the folder for processed images is to be created
path = '/content/drive/My Drive/Colab/Workspace/DeepLearn/CNN_Data/Face Recognition OpenFace/Github/CNN/Face Recognition'

# Get all existing files/folder names from that directory
files = os.listdir(path)

# If such folder exists from previous runtime, remove it. Otherwise, create a new folder. 
if 'processed_images' in files:
  !rm -r processed_images
else:
  !mkdir processed_images

Let us save the images after cropping and resizing them in a folder callled preprocessed images. **This folder is in the same directory as this file and other custom library files**. We will be converting any .png files to .jpg so that they can be fed into the model withot any channel size issues.

In [0]:
# The base path for all file handling
base_path = '/content/drive/My Drive/Colab/Workspace/DeepLearn/CNN_Data/Face Recognition OpenFace/Github/CNN/Face Recognition'
# Original image path
path = os.path.join(base_path, 'images')
# Processed image directory
save_direc = os.path.join(base_path, 'processed_images')

# Save files as .jpg in save_direc. The name of the file remains the same.
for file in files:
  direc = os.path.join(path, file)
  if '.png' in file:
    new_path = direc[:-3] + "jpg"
    Image.open(direc).convert('RGB').save(new_path , "JPEG")
    process_and_save_image(new_path, save_direc, save = True)
  else:
    process_and_save_image(direc, save_direc, save = True)

#3 - Face Recognition

## 1 - Encoding face images into a 128-dimensional vector 

### 1.1 - Using an ConvNet  to compute encodings

The FaceNet model takes a lot of data and a long time to train. So following common practice in applied deep learning settings, let's just load weights that someone else has already trained. The network architecture follows the Inception model from [Szegedy *et al.*](https://arxiv.org/abs/1409.4842). 


The key things you need to know are:

- This network uses 96x96 dimensional RGB images as its input. Specifically, inputs a face image (or batch of $m$ face images) as a tensor of shape $(m, n_C, n_H, n_W) = (m, 3, 96, 96)$ 
- It outputs a matrix of shape $(m, 128)$ that encodes each input face image into a 128-dimensional vector

Run the cell below to create the model for face images.

In [12]:
FRmodel = faceRecoModel(input_shape=(3, 96, 96))











In [0]:
print("Total Params:", FRmodel.count_params())

Total Params: 3743280


By using a 128-neuron fully connected layer as its last layer, the model ensures that the output is an encoding vector of size 128. You then use the encodings the compare two face images as follows:

So, an encoding is a good one if: 
- The encodings of two images of the same person are quite similar to each other 
- The encodings of two images of different persons are very different

The triplet loss function formalizes this, and tries to "push" the encodings of two images of the same person (Anchor and Positive) closer together, while "pulling" the encodings of two images of different persons (Anchor, Negative) further apart. 





### 1.2 - The Triplet Loss (Function not needed for this project)

For an image $x$, we denote its encoding $f(x)$, where $f$ is the function computed by the neural network.

<!--
We will also add a normalization step at the end of our model so that $\mid \mid f(x) \mid \mid_2 = 1$ (means the vector of encoding should be of norm 1).
!-->

Training will use triplets of images $(A, P, N)$:  

- A is an "Anchor" image--a picture of a person. 
- P is a "Positive" image--a picture of the same person as the Anchor image.
- N is a "Negative" image--a picture of a different person than the Anchor image.

These triplets are picked from our training dataset. We will write $(A^{(i)}, P^{(i)}, N^{(i)})$ to denote the $i$-th training example. 

You'd like to make sure that an image $A^{(i)}$ of an individual is closer to the Positive $P^{(i)}$ than to the Negative image $N^{(i)}$) by at least a margin $\alpha$:

$$\mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 + \alpha < \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2$$

You would thus like to minimize the following "triplet cost":

$$\mathcal{J} = \sum^{m}_{i=1} \large[ \small \underbrace{\mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2}_\text{(1)} - \underbrace{\mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2}_\text{(2)} + \alpha \large ] \small_+ \tag{3}$$

Here, we are using the notation "$[z]_+$" to denote $max(z,0)$.  

Notes:
- The term (1) is the squared distance between the anchor "A" and the positive "P" for a given triplet; you want this to be small. 
- The term (2) is the squared distance between the anchor "A" and the negative "N" for a given triplet, you want this to be relatively large, so it thus makes sense to have a minus sign preceding it. 
- $\alpha$ is called the margin. It is a hyperparameter that you should pick manually. We will use $\alpha = 0.2$. 

Most implementations also normalize the encoding vectors  to have norm equal one (i.e., $\mid \mid f(img)\mid \mid_2$=1); you won't have to worry about that here.

**Exercise**: Implement the triplet loss as defined by formula (3). Here are the 4 steps:
1. Compute the distance between the encodings of "anchor" and "positive": $\mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2$
2. Compute the distance between the encodings of "anchor" and "negative": $\mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2$
3. Compute the formula per training example: $ \mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 - \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2 + \alpha$
3. Compute the full formula by taking the max with zero and summing over the training examples:
$$\mathcal{J} = \sum^{m}_{i=1} \large[ \small \mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 - \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2+ \alpha \large ] \small_+ \tag{3}$$

Useful functions: `tf.reduce_sum()`, `tf.square()`, `tf.subtract()`, `tf.add()`, `tf.maximum()`.
For steps 1 and 2, you will need to sum over the entries of $\mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2$ and $\mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2$ while for step 4 you will need to sum over the training examples.

In [0]:
def triplet_loss(y_true, y_pred, alpha = 0.2):
    """
    Implementation of the triplet loss as defined by formula (3)
    
    Arguments:
    y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
    y_pred -- python list containing three objects:
            anchor -- the encodings for the anchor images, of shape (None, 128)
            positive -- the encodings for the positive images, of shape (None, 128)
            negative -- the encodings for the negative images, of shape (None, 128)
    
    Returns:
    loss -- real number, value of the loss
    """
    
    anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
    
    # Compute the (encoding) distance between the anchor and the positive, you will need to sum over axis=-1
    pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), axis = -1)
    # Compute the (encoding) distance between the anchor and the negative, you will need to sum over axis=-1
    neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)), axis = -1)
    # Subtract the two previous distances and add alpha.
    basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), alpha)
    # Take the maximum of basic_loss and 0.0. Sum over the training examples.
    loss = tf.reduce_sum(tf.maximum(basic_loss, 0))
    
    return loss

## 2 - Loading the trained model

FaceNet is trained by minimizing the triplet loss. But since training requires a lot of data and a lot of computation, we won't train it from scratch here. Instead, we load a previously trained model. Load a model using the following cell; this might take a couple of minutes to run. 

In [14]:
#FRmodel.compile(optimizer = 'adam', loss = triplet_loss, metrics = ['accuracy'])
weights_directory = '/content/drive/My Drive/Colab/Workspace/DeepLearn/CNN_Data/Face Recognition OpenFace/weights'
load_weights_from_FaceNet(FRmodel, weights_directory)




## 3 - Applying the model

### 3.1 - Face Verification

Let's build a database for each authorised person. To generate the encoding we use `img_to_encoding(image_path, model)` which basically runs the forward propagation of the model on the specified image. 


In [0]:
database = {}
database["danielle"] = img_to_encoding("processed_images/danielle.jpg", FRmodel)
database["younes"] = img_to_encoding("processed_images/younes.jpg", FRmodel)
database["tian"] = img_to_encoding("processed_images/tian.jpg", FRmodel)
database["andrew"] = img_to_encoding("processed_images/andrew.jpg", FRmodel)
database["kian"] = img_to_encoding("processed_images/kian.jpg", FRmodel)
database["dan"] = img_to_encoding("processed_images/dan.jpg", FRmodel)
database["sebastiano"] = img_to_encoding("processed_images/sebastiano.jpg", FRmodel)
database["bertrand"] = img_to_encoding("processed_images/bertrand.jpg", FRmodel)
database["kevin"] = img_to_encoding("processed_images/kevin.jpg", FRmodel)
database["felix"] = img_to_encoding("processed_images/felix.jpg", FRmodel)
database["benoit"] = img_to_encoding("processed_images/benoit.jpg", FRmodel)
database["arnaud"] = img_to_encoding("processed_images/arnaud.jpg", FRmodel)
database["sobhan"] = img_to_encoding("processed_images/sobhan.jpg", FRmodel)
database["sobhan1"] = img_to_encoding("processed_images/sobhan1.jpg", FRmodel)
database["neeraj"] = img_to_encoding("processed_images/Neeraj.jpg", FRmodel)

Implement the verify() function which checks if the front-door camera picture (`image_path`) is actually the person called "identity". 

In [0]:
def verify(image_path, identity, database, model):
    """
    Function that verifies if the person on the "image_path" image is "identity".
    
    Arguments:
    image_path -- path to an image
    identity -- string, name of the person you'd like to verify the identity. Has to be a resident of the Happy house.
    database -- python dictionary mapping names of allowed people's names (strings) to their encodings (vectors).
    model -- your Inception model instance in Keras
    
    Returns:
    dist -- distance between the image_path and the image of "identity" in the database.
    door_open -- True, if the door should open. False otherwise.
    """
    
    # Process image
    image_name = os.path.split(image_path)[-1]
    save_direc = '/tmp'
    process_and_save_image(image_path, save_direc , save = True)
    
   # Compute the encoding for the image. Use img_to_encoding() see example above. (≈ 1 line)
    image_path = os.path.join(save_direc, image_name)
    io.imshow(image_path)
    encoding = img_to_encoding(image_path, model)
    
    # Compute distance with identity's image (≈ 1 line)
    dist = np.linalg.norm(database[identity] - encoding)
    
    # Open the door if dist < 0.7, else don't open (≈ 3 lines)
    if dist < 0.7:
        print("It's " + str(identity) + ", welcome home!")
        door_open = None
    else:
        print("It's not " + str(identity) + ", please go away")
        door_open = None
        
    return dist, door_open

Let us verify with a new image.

In [0]:
verify("/content/sobhan1.jpg", "sobhan", database, FRmodel)

### 3.2 - Face Recognition


To reduce such shenanigans, you'd like to change your face verification system to a face recognition system. This way, no one has to carry an ID card anymore. An authorized person can just walk up to the house, and the front door will unlock for them! 

Implement `who_is_it()`. We will have to go through the following steps:
1. Compute the target encoding of the image from image_path
2. Find the encoding from the database that has smallest distance with the target encoding. We will..
    - Initialize the `min_dist` variable to a large enough number (100). It will help you keep track of what is the closest encoding to the input's encoding.
    - Loop over the database dictionary's names and encodings. To loop use `for (name, db_enc) in database.items()`.
        - Compute L2 distance between the target "encoding" and the current "encoding" from the database.
        - If this distance is less than the min_dist, then set min_dist to dist, and identity to name.

In [0]:
def who_is_it(image_path, database, model):
    """
    Implements face recognition for the happy house by finding who is the person on the image_path image.
    
    Arguments:
    image_path -- path to an image
    database -- database containing image encodings along with the name of the person on the image
    model -- your Inception model instance in Keras
    
    Returns:
    min_dist -- the minimum distance between image_path encoding and the encodings from the database
    identity -- string, the name prediction for the person on image_path
    """
    
    
    ## Compute the target "encoding" for the image. Use img_to_encoding() see example above. ## (≈ 1 line)
    encoding = img_to_encoding(image_path, model)
    
    ## Find the closest encoding ##
    
    # Initialize "min_dist" to a large value, say 100 (≈1 line)
    min_dist = 1e3
    
    # Loop over the database dictionary's names and encodings.
    for (name, db_enc) in database.items():
        
        # Compute L2 distance between the target "encoding" and the current "emb" from the database. (≈ 1 line)
        dist = np.linalg.norm(database[name] - encoding)

        # If this distance is less than the min_dist, then set min_dist to dist, and identity to name. (≈ 3 lines)
        if dist < min_dist:
            min_dist = dist
            identity = name

    
    if min_dist > 0.7:
        print("Not in the database.")
    else:
        print ("it's " + str(identity) + ", the distance is " + str(min_dist))
        
    return min_dist, identity

Let us check a new image.

In [54]:
who_is_it("/content/photo3.jpg", database, FRmodel)

it's sobhan, the distance is 0.5752792


(0.5752792, 'sobhan')


Although we won't implement it here, here're some ways to further improve the algorithm:
- Put more images of each person (under different lighting conditions, taken on different days, etc.) into the database. Then given a new image, compare the new face to multiple pictures of the person. This would increae accuracy.
- SVM Classifier can be added on top of the encodings to classify/recognise images of persons. This approch requires a much larger dataset for training than the current approach which does not require training and can work with only one image of the person. However, the current approach performs badly when lighting conditions are different between the image in the database and the image to be recognised.

### References:

- Florian Schroff, Dmitry Kalenichenko, James Philbin (2015). [FaceNet: A Unified Embedding for Face Recognition and Clustering](https://arxiv.org/pdf/1503.03832.pdf)
- Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf (2014). [DeepFace: Closing the gap to human-level performance in face verification](https://research.fb.com/wp-content/uploads/2016/11/deepface-closing-the-gap-to-human-level-performance-in-face-verification.pdf) 
- The pretrained model we use is inspired by Victor Sy Wang's implementation and was loaded using his code: https://github.com/iwantooxxoox/Keras-OpenFace.
- Our implementation also took a lot of inspiration from the official FaceNet github repository: https://github.com/davidsandberg/facenet 
