# Extracting Embeddings using OpenCV

This notebook is based on the [OpenCV Face Recognition by Pyimagesearch](https://www.pyimagesearch.com/2018/09/24/opencv-face-recognition/).

OpenCV offers to us a piratical module, the dnn (Deep Neural Network module). Among the several functions, we have the readNetFromTorch, that uses  the Torch7 framework's format to read pre-trained models. This example considers the FaceNet model by [Schroff et al](https://www.cv-foundation.org/openaccess/content_cvpr_2015/app/1A_089.pdf), pre-trained in Torch.

The FaceNet model considers a CNN model (read the paper for more details) to recognize and cluster the images. In the end of the model, we have a $128-d$ face embedding vector with image features. The model also considers the  Triplet Loss Function. This functions takes into account 3 images, the anchor, the positive and negative. To calculate the loss, we have

$L = \sum_i^N\left[ \| f\left( x_i^a\right) -  f\left( x_i^p\right)\|_2^{2}  -\| f\left( x_i^a\right) -  f\left( x_i^n\right)\|_2^{2}  + \alpha \right]$,

where $f\left( x\right)$ is the embedding representation for the anchor, positive and negative images, the term $\alpha$ is a margin that is enforced between positive and negative pairs. Using the Triplet loss function, the model tweaks the weighs according this condition $\| f\left( x_i^a\right) -  f\left( x_i^p\right)\|_2^{2} + \alpha < \| f\left( x_i^a\right) -  f\left( x_i^n\right)\|_2^{2}$, that means the error between the anchor and positive images is smaller than the error between the anchor and negative images.

The dataset is composed by four classes and twenty images for each of them. The images come from three Brazilian singers (Caetano Veloso, Chico Buarque and Gilberto Gil) and a set of other persons labeled into the unknown class.

In this example, we do not consider a deep image preprocessing, the idea is to deploy a simple and directly model (a training of how to do). There is other example that we consider the dlib library and facenet, these libraries offers to us a better image preprocessing. Thanks PyImage for it. Let's go on ahead.

## Importing Libraries

In [1]:
from imutils import paths
import numpy as np
import argparse
import imutils
import pickle
import cv2
import os

## Loading the pre-trained models

In [2]:
# Setting the face detector
protoPath = os.path.sep.join(["input", "deploy.prototxt.txt"])
modelPath = os.path.sep.join(["input", "res10_300x300_ssd_iter_140000.caffemodel"])
detector = cv2.dnn.readNetFromCaffe(protoPath, modelPath)

In [3]:
# Setting the pre trained FaceNet model
embedder = cv2.dnn.readNetFromTorch("input/nn4.small2.v1.t7")

## Setting the images paths


In [4]:
imagePaths = list(paths.list_images("Dataset"))

In [5]:
# List of feature vector and labels
Embeddings = []
Names = []

## Embeddings the faces

All images in the dataset contain just one face of each class. To obtain the embeddings vector, the main loop follow these steps
- Start the loop and detect the face in the image
- Once we have the detection, we extract the ROI and crop the image
- Finally, we extract the features with the embedder.

In [6]:
total = 0

In [7]:
for (i, imagePath) in enumerate(imagePaths):
    name = imagePath.split(os.path.sep)[-2] # grabbing the label
    image = cv2.imread(imagePath)
    image = imutils.resize(image, width=600) # resize
    (h, w) = image.shape[:2]
    
    # construct a blob from the image
    imageBlob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 1.0, (300, 300),
                                      (104.0, 177.0, 123.0), swapRB=False, crop=False)
    # apply OpenCV's deep learning-based face detector to localize
    # faces in the input image
    detector.setInput(imageBlob)
    detections = detector.forward()
    if len(detections) > 0:
        # we're making the assumption that each image has only ONE
        # face, so find the bounding box with the largest probability
        i = np.argmax(detections[0, 0, :, 2])
        confidence = detections[0, 0, i, 2]
        # ensure that the detection with the largest probability also
        # means our minimum probability test (thus helping filter out
        # weak detections)
        if confidence > 0.5:
            # compute the (x, y)-coordinates of the bounding box for
            # the face
            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
            (startX, startY, endX, endY) = box.astype("int")
            # extract the face ROI and grab the ROI dimensions
            face = image[startY:endY, startX:endX]
            (fH, fW) = face.shape[:2]
            # ensure the face width and height are sufficiently large
            if fW < 20 or fH < 20:
                continue
            faceBlob = cv2.dnn.blobFromImage(face, 1.0 / 255,
                                             (96, 96), (0, 0, 0), swapRB=True, crop=False)
            embedder.setInput(faceBlob)
            vec = embedder.forward()
            # add the name of the person + corresponding face
            # embedding to their respective lists
            Names.append(name)
            Embeddings.append(vec.flatten())
            total += 1

## Saving the features into a pickle data format

In [8]:
print("[INFO] serializing {} encodings...".format(total))
data = {"embeddings": Embeddings, "names": Names}
f = open("output/embeddings1.pickle", "wb")
f.write(pickle.dumps(data))
f.close()


[INFO] serializing 78 encodings...


**Next step, train the classification model**