# Real time face recognizition application using deep neural network 


Below is the implementation of face detector and recognizer which can identify the face of the person showing on a web cam. We'll be implementing it Keras framework.

The deep neural nework we'll be using here is based on [FaceNet](https://arxiv.org/pdf/1503.03832.pdf), which was published by Google in 2015 and achieved 99.57% accuracy on a popular face recognition dataset named “Labeled Faces in thae Wild(LFW)". You can find its open-source Keras version [here](https://github.com/iwantooxxoox/Keras-OpenFace) and Tensorflow version [here](https://github.com/davidsandberg/facenet), and play around to build your own models. 

# Import libraries

In [1]:
import numpy as np
from numpy import genfromtxt
import pandas as pd
import os
import glob
import cv2
from mtcnn.mtcnn import MTCNN

import utils
#keras imported in utils.py file
%load_ext autoreload
%autoreload 2

np.set_printoptions(threshold=np.nan)

ModuleNotFoundError: No module named 'mtcnn'

# How to let computers tell whether two pictures are the same person?

Looking at the two photos below with our naked eyes, we can easily tell it is the same person, although the hairstyle, dressing and distance from the camera are different. But how can we let computers tell whether it is the same person or not?

![1](pictures/yifei.jpg) 

Notice when computers 'see' pictures, a RGB picture will be 'seen' as values with RGB three channels at each pixel of the picture. If it is a pixel_size*pixel_size RGB picture, it will be a (pixel_size, pixel_size, 3) matrix. 

Then, how to let computers tell whether two matrices represent the same person? At first, we might think of reshaping the (pixel_size, pixel_size, 3) matrix into a 1-dimensional vector and verify whether they are the same person based on the distance between them. However, the time when she took different pictures, she might be dressing in different clothes, wearing different accessories, standing at different distances away from the camera, etc. All these possibilities will significantly mislead the computer's judgements. 

Based on this, a direct comparation of corresponding 1-d vectors of two pictures is not an ideal strategy. Instead, we'll approach this problem by encoding the input picture into a 128-dimentional embedding by passing this picture through a deep neural network, and use the 128-dimentional embedding as the representaion of each picture. The model architecture is shown below. If the distance between two 128-d vectors is larger than the customized threshold, then these two pictures are not the same person, vice versa. We'll talk about the triplet loss function in later chapter, first, let's implement the deep neural network. 
![2](pictures/model.png) 

# Deep neural network--Facenet

Notice the input data shape is (96, 96, 3), which is 96*96 pixel RGB(3 channels) picture; after driving through this Inception-blocks model, the last layer (which is the output) is a fully connected layer with 128 neurons. The output 128-dimension vector extracts the important features of the input facial picture and will be as the representaion of input picture.

In [None]:
#import facenet model
#see inception_blocks.py for model implementation
from utils import LRN2D
import utils
from inception_blocks import *

#show the architecture of the network
model = faceRecoModel((96, 96, 3))
model.summary()

# Triplet loss function

The FaceNet model converts input images into 128-d embeddings to represent the image. Then parameters are trained by minimizing the triplet loss. The Triplet Loss minimizes the distance between an anchor and a positive, both of which have the same identity, and maximizes the distance between the anchor and a negative of a different identity. As shown below:

![3](pictures/triplet.png)

The training process requires GPU and high amount of training data, you can also transfer learning and fine tune the weights. But here we'll be loading previously trained weights, which are available at [here](https://github.com/iwantooxxoox/Keras-OpenFace)  in the "weights" folder and they are also provided in this source.

In [None]:
# load weights(this process will take a few minutes)
import utils
weights = utils.weights
weights_dict = utils.load_weights()

for name in weights:
  if model.get_layer(name) != None:
    model.get_layer(name).set_weights(weights_dict[name])
  elif model.get_layer(name) != None:
    model.get_layer(name).set_weights(weights_dict[name])

## Capture, crop, align and resize identity face image in real time using OpenCV
In this section, we'll be using OpenCV (make sure you've installed it) to open a web camera, detect and outine the face area using a blue rectangle and then capture 15 face images of the person that is in front of the camera. 

These cropped face snapshots are stored in **"images"** folder with the name NameHere_1 to NameHere_15. Select onely one well captured face image from these 15 images for each person. Rename it with the name of person and delete rest of them. Repeat this process by different people, with each person only keeps one picture in this folder. Later in this program, when a person shows up in front of the camera, it will calculate its distance from each stored pictures and return the most likely one's name. 

In [None]:
cap = cv2.VideoCapture(0)
detector = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

count = 0
while(True):
    # capture frame by frame
    ret, img = cap.read()
    
    # detect the face, you can change the scaleFactor according to your case
    faces = detector.detectMultiScale(img, scaleFactor= 1.5, minNeighbors= 5)
    for (x,y,w,h) in faces:
        
        # outline the face area by a blue rectangle
        cv2.rectangle(img, (x,y), (x+w,y+h), (255,0,0),2)     
        count += 1
        # save the cropped face image into the datasets folder
        cv2.imwrite("images/NameHere_" + str(count) + ".jpg", img[y:y+h,x:x+w])
        cv2.imshow('image', img)
    # Press 'ESC' for exiting video
    k = cv2.waitKey(200) & 0xff 
    if k == 27:
        break
    elif count >= 8: 
         break
cap.release()
cv2.destroyAllWindows()

# Or detect face by Multi-task CNN
Besides using OpenCV to detect face, we can also use Dlib or deep learning Multi-task CNN. Here we show how to use MTCNN to detect the face from an image. After running this section, you may go to "pictures" folder to check t 

In [2]:
from mtcnn.mtcnn import MTCNN

image= cv2.imread('pictures/yifei.jpg')
detector1= MTCNN()
result=detector1.detect_faces(image)
print(result)

count=0
for person in result:
            bounding_box = person['box']
            x=bounding_box[0]
            y=bounding_box[1]
            w=bounding_box[2]
            h=bounding_box[3]
            keypoints = person['keypoints']
            
            cv2.rectangle(image, (x, y), (x+w, y+h), (255,0,255), 2)
            cv2.circle(image,(keypoints['left_eye']), 2, (0,155,255), 2)
            cv2.circle(image,(keypoints['right_eye']), 2, (0,155,255), 2)
            cv2.circle(image,(keypoints['nose']), 2, (0,155,255), 2)
            cv2.circle(image,(keypoints['mouth_left']), 2, (0,155,255), 2)
            cv2.circle(image,(keypoints['mouth_right']), 2, (0,155,255), 2)
            cv2.imwrite("pictures/" + str(count)+ "_detected.jpg", image)
            cv2.imwrite("pictures/" + str(count)+ ".jpg", image[y:y+h,x:x+w])
            count +=1     

ModuleNotFoundError: No module named 'mtcnn'

# Steps to recognize faces:
#First, encode one single image into embeddings
#Second, build a database containing embeddings for all images by passing all images through the weighted Facenet model
#Third, identify images by using the embeddings(find the minimum L2 euclidean distance between embeddings)

In [3]:
#First, encode one single image into embeddings
def image_to_embedding(image, model):
    image = cv2.resize(image, (96, 96)) 
    img = image[...,::-1]
    img = np.around(np.transpose(img, (0,1,2))/255.0, decimals=12)
    x_train = np.array([img])
    embedding = model.predict_on_batch(x_train)
    return embedding



#Second, build a database containing embeddings for all images
def build_database_dict():
    database = {}   
    for file in glob.glob("/Users/Olivia/Documents/ML/Face-recognition-using-deep-learning-master/images/*"):
        database_name = os.path.splitext(os.path.basename(file))[0]
        image_file = cv2.imread(file, 1)
        database[database_name] = image_to_embedding(image_file, model)
    return database


#Third, identify images by using the embeddings(find the minimum L2 euclidean distance between embeddings)
def recognize_face(face_image, database, model):
    
    embedding = image_to_embedding(face_image, model)   
    minimum_distance = 200
    name = None    
    # Loop over  names and encodings.
    for (database_name, database_embedding) in database.items():
            
        euclidean_distance = np.linalg.norm(embedding-database_embedding)
        print('Euclidean distance from %s is %s' %(database_name, euclidean_distance))
        if euclidean_distance < minimum_distance:
            minimum_distance = euclidean_distance
            name = database_name
    
    if minimum_distance < 0.8:
        return str(name)+str('  ')+str(round(minimum_distance,14))
    else:
        return 'Unknown'
    

# Try an image

In [4]:
database= build_database_dict()
image= cv2.imread('images/Obama.jpg')
recognize_face(image, database, model)

NameError: name 'model' is not defined

# Recognize faces in real time using webcam

In [5]:
cv2.namedWindow("Face Recognizer")
vc = cv2.VideoCapture(0)
font = cv2.FONT_HERSHEY_SIMPLEX
detector = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
      
while True:
    ret, frame = vc.read()
    height, width, channels = frame.shape
     
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    faces = detector.detectMultiScale(gray, 1.3, 5)

    # loop through all the faces detected 
    for (x, y, w, h) in faces:          
        face_image = frame[max(0, y):min(height, y+h), max(0, x):min(width, x+w)]    
        identity = recognize_face(face_image, database, model)          
        if identity is not None:
            img = cv2.rectangle(frame,(x, y),(x+w, y+h),(255,0,0),2)
            cv2.putText(frame, str(identity), (x+5,y-5), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,0,255), 2)
        
    key = cv2.waitKey(100)
    cv2.imshow("Face Recognizer", frame)
    
    if key == 27: # exit on ESC
        break
vc.release()
cv2.destroyAllWindows()

NameError: name 'model' is not defined