<a href="https://colab.research.google.com/github/TanmayKhot/AI_project/blob/main/2_Facenet_Embeddings_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### **DeepFace**

DeepFace is an open source face recognition and facial attribute analysis library. It includes a lot of AI models for face recognition and automatically handles all procedures for facial recognition in the background. 

A modern face recognition pipeline consists of 5 common stages: detect, align, normalize, represent and verify. Deepface handles all these common stages in the background.

Using DeepFace gives us access to the following set of features:

1. Face Verification: The task of face verification refers to comparing a face with another to verify if it is a match or not. 
2. Face Recognition: This refers to finding a face in the database.
3. Facial Attribute Analysis: This task refers to describing the visual properties of face images. For example emotion analysis, etc
4. Real-Time Face Analysis: This includes testing the above tasks with real time video feed.

There are various deep learning algorithms that can be used with the DeepFace library. These are based on Convolutional Neural Networks (CNN). Some of the models are as follows:

1. VGG Face: The DeepFace library uses VGG-Face as its default model. It is based on deep convolutional neural networks. It has the same structure as the regular VGG model except it is fine-tuned for images.

2. OpenFace: OpenFace is an open source tool heavily inspired by FaceNet. 

3. DeepID: It is a face verification algorithm based on deep learning. It is an external face recognition model wrapped DeepFace Library.

4. Dlib: This face recognition model is used to recognize and manipulate faces. Dlib’s face recognition tool maps an image of a human face to a 128-dimensional vector space and uses Euclidean distance to compute similarity.

5. Arcface: Traditionally for face recognition tasks, softmax is used. Sometimes it leads to a performance gap for deep face recognition under large intra-class appearance variations. In this case ArcFace, or Additive Angular Margin Loss is used to solve this discrepancy.


There are many such models. However for our project we use the *FaceNet model* to improve the performance over our baseline model and get better results.

#### **FaceNet**

FaceNet was developed at Google. It can be used for face recognition, verification and clustering. The main reason we chose this model is its high efficiency and performance. It is reported to achieve 99% accuracy on the LFW dataset. It is 22 layers deep neural network that directly trains its output to be a 128-dimension embedding. The loss function used at the last layer is called triplet loss. 

The FaceNet model structure is as follows:

<img src = 'https://drive.google.com/uc?id=1j9-ezNOykuwZxinCp17hb4TQN9geushY' height = 230>

FaceNet is based on the idea of a Siamese Network.

**Siamese Network:** Also called as twins neural network. It involvs the idea that any pair fed to the neural network leads to an output in the form of its features. The distance of the two outputs is calculated to compare their features. The distance indicates the similarity between them; larger it is, higher probability they belong to the same class.

There is a direct and an indirect method to validate the model. 

The direct method comes into play for FaceNet. Here you find the correct class of the picture. This approach takes any picture, let’s call it A, calculates the similarity_score between A to some randomly picked pictures such that this group has only one same class to A and all other are in different classes. The predicted class is defined by the picture which has highest similarity_score with A.

Well known distance measures like Euclidean distance is used to measure the distance between the embeddings extracted.

Another important aspect of FaceNet is the loss function as follows:

**Triplet Loss:** So the CNN is trained with triple images at each step as
1. Anchor
2. Positive
3. Negative

<img src = 'https://drive.google.com/uc?id=1g0TsBCTe7LEjbFAeAz7Ti3y-Yc046RZ2' height = 170>

The anchor and positive image belong to the same class and negative image belongs to a different class. Now, the intuition behind triplet loss is that we want our anchor image to be closer to positive images as compared to negative. 

The loss function is formally defined as:

$$ \sum_{i}^{N} [||f(x_i^a) - f(x_i^p)||_2^2 - ||f(x_i^a) - f(x_i^n)||_2^2 + \alpha] $$

where...

$x_i$ -> represents an image

$f(x_i)$ -> represents embedding of an image

$\alpha$ -> represents margin between positive and negative pairs

Let's extract embeddings using FaceNet!

In [None]:
# Importing required libraries

from deepface import DeepFace
import os
import pickle
from tqdm.notebook import tqdm_notebook

Let's load the location of our pics computer while creating the previous embeddings

In [None]:
#function to load the previously saved files
def load_pickle(file):
    objects = []
    with (open(file, "rb")) as openfile:
        while True:
            try:
                objects.append(pickle.load(openfile))
            except EOFError:
                break
    return objects

In [None]:
filenames = load_pickle('saved/filenames-lfw-deepfunneled.pickle')[0]

In [None]:
len(filenames)

13233

Now that we have a list with file locations let's create embeddings with facenet and deepface

We are going to use .represent(): This function is used to represent facial images as vectors / embeddings. 

Parameters:

img_path: exact image path, numpy array or based64 encoded images could be passed.

model_name (string): VGG-Face, Facenet, OpenFace, DeepFace, DeepID, Dlib, ArcFace.

model: Built deepface model. A face recognition model is built every call of verify function unless a model is passed. Given that we don't need to build the model at every call we prebuild it by calling  `model = DeepFace.build_model('Facenet')` and passing it in the function.
      
enforce_detection (boolean): If any face could not be detected in an image, then verify function will return exception. Set this to False not to have this exception. This might be convenient for low resolution images.

In [None]:
#let's build the model we want to use:
model = DeepFace.build_model('Facenet')

#function to get embeddings given an image
def extract_features(image_path, model):
    vector = DeepFace.represent(img_path = image_path, model_name = "Facenet", model = model, enforce_detection = False)

    return vector

#### Dimensions of the Embedding:

The embedding size for FaceNet is 128. Now, larger embeddings require more training to reach a certain level of accuracy. However, for FaceNet a 128 dimensional float vector can be quantized to 128 bytes without any loss of accuracy ! Thus large scale clustering and recognition becomes easier.

In [None]:
feature_list = []
for i in tqdm_notebook(range(len(filenames))):
    feature_list.append(extract_features(filenames[i], model))

HBox(children=(FloatProgress(value=0.0, max=13233.0), HTML(value='')))




In [None]:
#let's save the embedding in a file, to import in ec2
pickle.dump(feature_list, open('saved/features-facenet.pickle', 'wb'))


In [None]:
#let's normalize and pickle again
from numpy.linalg import norm
for i in range(len(feature_list)):
    feature_list[i] = feature_list[i] / norm(feature_list[i])

In [None]:
pickle.dump(feature_list, open('saved/features-facenet-normalized.pickle', 'wb'))


Now that we have embeddings generated by FaceNet model we download the pickle file and upload it in EC2 and continue there.

