## End-to-End Procedure

### Procedure Outline
1. Normalize the dataset
    - Detect faces among all the images. Reject images that have more than one face.
    - Crop the face from the image.
    - Align it.
2. Generate Train-Test Splits
    - Create folds.
3. Evaluate 
    - Generate embeddings from the splits
    - Train classifier on the embeddings
    - Test classifier on the embeddings
4. Tune classifier
    - Tune the classifier 
5. Save the model

### Imports

In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

In [2]:
import os
import cv2
import pprint
import logging
import tqdm

In [3]:
import face_trigger

from face_trigger.model.deep.FaceRecognizer import FaceRecognizer
from face_trigger.process.post_process import FaceDetector, LandmarkDetector, FaceAlign
from face_trigger.utils.common import RepeatedTimer, clamp_rectangle
from face_trigger.utils.Dataset import Dataset

In [4]:
unnormalized_dataset_path = "/media/ankurrc/new_volume/softura/facerec/datasets/standard_att"
dataset_path = "/media/ankurrc/new_volume/softura/facerec/att_norm"
split_path = "/media/ankurrc/new_volume/softura/facerec/att_split_path"

In [5]:
logging.basicConfig(level=logging.DEBUG)

### Normalize dataset
 While normalizing the dataset we assume that the original dataset has the following structure:
 1. At the root level there are directories that represent each personality. The directories may or may not have a numeric name.
 2. Within each directory, the files should represent the images that contain the parent directory's(personality) faces. The file names may or may not be numeric.
 
 The final dimensions is assumed to be 256x256, since that is what the DNN ingests.
 Also, the detected faces align the eyes about 0.35th of the width from the ends. 

In [4]:
import uuid

In [53]:
def normalize_dataset(dataset_path=None, output_path=None):

    logger = logging.getLogger(__name__)

    face_detector = FaceDetector()
    face_align = FaceAlign(left_eye_offset=(0.35, 0.35), final_width=256)
    landmark_predictor = LandmarkDetector()

    rejected_faces = {}

    bar = tqdm.tqdm(total=None)

    if not os.path.exists(dataset_path):
        raise Exception("Invalid dataset path!")

    # setup output directory
    if os.path.isdir(output_path):
        os.rename(output_path, os.path.join(os.path.split(
            output_path)[0], os.path.split(
            output_path)[1] + "_" + str(uuid.uuid4().get_hex())))
    os.makedirs(output_path)

    for root, dirs, files in os.walk(dataset_path):

        if root == dataset_path:
            bar.total = len(dirs)

        for direc in dirs:
            # create output directory for this presonality
            output_direc_path = os.path.join(output_path, direc)
            os.mkdir(output_direc_path)

        for img in files:

            img_path = os.path.join(dataset_path, root, img)

            # read the image
            rgbImg = cv2.imread(img_path)

            grayImg = None
            if rgbImg is None:
                break
            elif rgbImg.shape[2] == 3:
                grayImg = cv2.cvtColor(rgbImg, cv2.COLOR_BGR2GRAY)
            else:
                grayImg = rgbImg

            # detect faces
            faces = face_detector.detect_unbounded(grayImg)

            if len(faces) == 1:

                # get face (only interested if there's one and only one)
                face_bb = faces[0]

                # get the landmarks
                landmarks = landmark_predictor.predict(face_bb, grayImg)

                # align the face
                aligned_face = face_align.align(grayImg, landmarks)

                # write to output directory
                save_path = os.path.join(
                    output_path, os.path.basename(root), img)

                cv2.imwrite(save_path, aligned_face)

            else:
                root = os.path.basename(root)
                if root in rejected_faces:
                    rejected_faces[root].append(img)
                else:
                    rejected_faces[root] = [img]

        if root != dataset_path:
            bar.update()

    bar.close()
    logger.info("Normalized dataset created at {}".format(output_path))

    print("Rejected directories:")
    pprint.pprint(rejected_faces)

    return rejected_faces

In [54]:
rejected_dirs = normalize_dataset(
    dataset_path=unnormalized_dataset_path, output_path=dataset_path)

100%|██████████| 40/40 [00:03<00:00, 12.90it/s]
INFO:__main__:Normalized dataset created at /media/ankurrc/new_volume/softura/facerec/att_norm


Rejected directories:
{'33': ['4.png'], '35': ['2.png'], '37': ['4.png', '5.png']}


### Generate Splits

In [8]:
def generate_splits(dataset_path=None, split_path=None):
    dataset = Dataset(dataset_path=dataset_path,
                      split_path=split_path)
    folds = 3
    training_samples = [2, 5, 8]
    
    dataset.split(num_train_list=training_samples, folds=folds)

In [9]:
generate_splits(dataset_path=dataset_path, split_path=split_path)

INFO:face_trigger.utils.Dataset:Generating for 2 training samples per subject.
  0%|          | 0/3 [00:00<?, ?it/s]
  0%|          | 0/9000000000 [00:00<?, ?it/s][A
Dir-->1:   0%|          | 0/40 [00:00<?, ?it/s][A
Dir-->10:   2%|▎         | 1/40 [00:00<00:00, 60.66it/s][A
Dir-->11:   5%|▌         | 2/40 [00:00<00:00, 86.07it/s][A
Dir-->12:   8%|▊         | 3/40 [00:00<00:00, 95.46it/s][A
Dir-->13:  10%|█         | 4/40 [00:00<00:00, 92.85it/s][A
Dir-->14:  12%|█▎        | 5/40 [00:00<00:00, 99.93it/s][A
Dir-->15:  15%|█▌        | 6/40 [00:00<00:00, 95.16it/s][A
Dir-->16:  18%|█▊        | 7/40 [00:00<00:00, 93.87it/s][A
Dir-->17:  20%|██        | 8/40 [00:00<00:00, 97.62it/s][A
Dir-->18:  22%|██▎       | 9/40 [00:00<00:00, 96.54it/s][A
Dir-->19:  25%|██▌       | 10/40 [00:00<00:00, 99.34it/s][A
Dir-->19:  28%|██▊       | 11/40 [00:00<00:00, 100.21it/s][A
Dir-->2:  28%|██▊       | 11/40 [00:00<00:00, 100.21it/s] [A
Dir-->20:  30%|███       | 12/40 [00:00<00:00, 100.21it/s

Generating: Fold 1
Creating directory: /media/ankurrc/new_volume/softura/facerec/att_split_path/2/1
done.
/media/ankurrc/new_volume/softura/facerec/att_split_path/2/1/train.csv


Dir-->21:  32%|███▎      | 13/40 [00:00<00:00, 100.21it/s][A
Dir-->22:  35%|███▌      | 14/40 [00:00<00:00, 100.21it/s][A
Dir-->23:  38%|███▊      | 15/40 [00:00<00:00, 100.21it/s][A
Dir-->24:  40%|████      | 16/40 [00:00<00:00, 100.21it/s][A
Dir-->25:  42%|████▎     | 17/40 [00:00<00:00, 100.21it/s][A
Dir-->26:  45%|████▌     | 18/40 [00:00<00:00, 100.21it/s][A
Dir-->27:  48%|████▊     | 19/40 [00:00<00:00, 100.21it/s][A
Dir-->28:  50%|█████     | 20/40 [00:00<00:00, 100.21it/s][A
Dir-->28:  52%|█████▎    | 21/40 [00:00<00:00, 100.10it/s][A
Dir-->29:  52%|█████▎    | 21/40 [00:00<00:00, 100.10it/s][A
Dir-->3:  55%|█████▌    | 22/40 [00:00<00:00, 100.10it/s] [A
Dir-->30:  57%|█████▊    | 23/40 [00:00<00:00, 100.10it/s][A
Dir-->31:  60%|██████    | 24/40 [00:00<00:00, 100.10it/s][A
Dir-->32:  62%|██████▎   | 25/40 [00:00<00:00, 100.10it/s][A
Dir-->33:  65%|██████▌   | 26/40 [00:00<00:00, 100.10it/s][A
Dir-->34:  68%|██████▊   | 27/40 [00:00<00:00, 100.10it/s][A
Dir-->35

Generating: Fold 2
Creating directory: /media/ankurrc/new_volume/softura/facerec/att_split_path/2/2
done.
/media/ankurrc/new_volume/softura/facerec/att_split_path/2/2/train.csv



Dir-->26:  45%|████▌     | 18/40 [00:00<00:00, 112.87it/s][A
Dir-->27:  48%|████▊     | 19/40 [00:00<00:00, 112.87it/s][A
Dir-->28:  50%|█████     | 20/40 [00:00<00:00, 112.87it/s][A
Dir-->29:  52%|█████▎    | 21/40 [00:00<00:00, 112.87it/s][A
Dir-->3:  55%|█████▌    | 22/40 [00:00<00:00, 112.87it/s] [A
Dir-->30:  57%|█████▊    | 23/40 [00:00<00:00, 112.87it/s][A
Dir-->30:  60%|██████    | 24/40 [00:00<00:00, 114.62it/s][A
Dir-->31:  60%|██████    | 24/40 [00:00<00:00, 114.62it/s][A
Dir-->32:  62%|██████▎   | 25/40 [00:00<00:00, 114.62it/s][A
Dir-->33:  65%|██████▌   | 26/40 [00:00<00:00, 114.62it/s][A
Dir-->34:  68%|██████▊   | 27/40 [00:00<00:00, 114.62it/s][A
Dir-->35:  70%|███████   | 28/40 [00:00<00:00, 114.62it/s][A
Dir-->36:  72%|███████▎  | 29/40 [00:00<00:00, 114.62it/s][A
Dir-->37:  75%|███████▌  | 30/40 [00:00<00:00, 114.62it/s][A
Dir-->38:  78%|███████▊  | 31/40 [00:00<00:00, 114.62it/s][A
Dir-->39:  80%|████████  | 32/40 [00:00<00:00, 114.62it/s][A
Dir-->4

Generating: Fold 3
Creating directory: /media/ankurrc/new_volume/softura/facerec/att_split_path/2/3
done.
/media/ankurrc/new_volume/softura/facerec/att_split_path/2/3/train.csv



Dir-->21:  32%|███▎      | 13/40 [00:00<00:00, 88.09it/s][A
Dir-->22:  35%|███▌      | 14/40 [00:00<00:00, 88.09it/s][A
Dir-->23:  38%|███▊      | 15/40 [00:00<00:00, 88.09it/s][A
Dir-->23:  40%|████      | 16/40 [00:00<00:00, 76.45it/s][A
Dir-->24:  40%|████      | 16/40 [00:00<00:00, 76.45it/s][A
Dir-->25:  42%|████▎     | 17/40 [00:00<00:00, 76.45it/s][A
Dir-->26:  45%|████▌     | 18/40 [00:00<00:00, 76.45it/s][A
Dir-->27:  48%|████▊     | 19/40 [00:00<00:00, 76.45it/s][A
Dir-->28:  50%|█████     | 20/40 [00:00<00:00, 76.45it/s][A
Dir-->29:  52%|█████▎    | 21/40 [00:00<00:00, 76.45it/s][A
Dir-->3:  55%|█████▌    | 22/40 [00:00<00:00, 76.45it/s] [A
Dir-->30:  57%|█████▊    | 23/40 [00:00<00:00, 76.45it/s][A
Dir-->31:  60%|██████    | 24/40 [00:00<00:00, 76.45it/s][A
Dir-->32:  62%|██████▎   | 25/40 [00:00<00:00, 76.45it/s][A
Dir-->33:  65%|██████▌   | 26/40 [00:00<00:00, 76.45it/s][A
Dir-->33:  68%|██████▊   | 27/40 [00:00<00:00, 83.85it/s][A
Dir-->34:  68%|██████▊ 

Generating: Fold 1
Creating directory: /media/ankurrc/new_volume/softura/facerec/att_split_path/5/1
done.
/media/ankurrc/new_volume/softura/facerec/att_split_path/5/1/train.csv


[A
Dir-->19:  25%|██▌       | 10/40 [00:00<00:00, 74.99it/s][A
Dir-->2:  28%|██▊       | 11/40 [00:00<00:00, 74.99it/s] [A
Dir-->20:  30%|███       | 12/40 [00:00<00:00, 74.99it/s][A
Dir-->21:  32%|███▎      | 13/40 [00:00<00:00, 74.99it/s][A
Dir-->22:  35%|███▌      | 14/40 [00:00<00:00, 74.99it/s][A
Dir-->22:  38%|███▊      | 15/40 [00:00<00:00, 73.04it/s][A
Dir-->23:  38%|███▊      | 15/40 [00:00<00:00, 73.04it/s][A
Dir-->24:  40%|████      | 16/40 [00:00<00:00, 73.04it/s][A
Dir-->25:  42%|████▎     | 17/40 [00:00<00:00, 73.04it/s][A
Dir-->26:  45%|████▌     | 18/40 [00:00<00:00, 73.04it/s][A
Dir-->27:  48%|████▊     | 19/40 [00:00<00:00, 73.04it/s][A
Dir-->28:  50%|█████     | 20/40 [00:00<00:00, 73.04it/s][A
Dir-->29:  52%|█████▎    | 21/40 [00:00<00:00, 73.04it/s][A
Dir-->3:  55%|█████▌    | 22/40 [00:00<00:00, 73.04it/s] [A
Dir-->30:  57%|█████▊    | 23/40 [00:00<00:00, 73.04it/s][A
Dir-->31:  60%|██████    | 24/40 [00:00<00:00, 73.04it/s][A
Dir-->31:  62%|█████

Generating: Fold 2
Creating directory: /media/ankurrc/new_volume/softura/facerec/att_split_path/5/2
done.
/media/ankurrc/new_volume/softura/facerec/att_split_path/5/2/train.csv



Dir-->20:  30%|███       | 12/40 [00:00<00:00, 94.45it/s][A
Dir-->21:  32%|███▎      | 13/40 [00:00<00:00, 94.45it/s][A
Dir-->22:  35%|███▌      | 14/40 [00:00<00:00, 94.45it/s][A
Dir-->23:  38%|███▊      | 15/40 [00:00<00:00, 94.45it/s][A
Dir-->24:  40%|████      | 16/40 [00:00<00:00, 94.45it/s][A
Dir-->25:  42%|████▎     | 17/40 [00:00<00:00, 94.45it/s][A
Dir-->25:  45%|████▌     | 18/40 [00:00<00:00, 89.00it/s][A
Dir-->26:  45%|████▌     | 18/40 [00:00<00:00, 89.00it/s][A
Dir-->27:  48%|████▊     | 19/40 [00:00<00:00, 89.00it/s][A
Dir-->28:  50%|█████     | 20/40 [00:00<00:00, 89.00it/s][A
Dir-->29:  52%|█████▎    | 21/40 [00:00<00:00, 89.00it/s][A
Dir-->3:  55%|█████▌    | 22/40 [00:00<00:00, 89.00it/s] [A
Dir-->30:  57%|█████▊    | 23/40 [00:00<00:00, 89.00it/s][A
Dir-->31:  60%|██████    | 24/40 [00:00<00:00, 89.00it/s][A
Dir-->32:  62%|██████▎   | 25/40 [00:00<00:00, 89.00it/s][A
Dir-->33:  65%|██████▌   | 26/40 [00:00<00:00, 89.00it/s][A
Dir-->34:  68%|██████▊ 

Generating: Fold 3
Creating directory: /media/ankurrc/new_volume/softura/facerec/att_split_path/5/3
done.
/media/ankurrc/new_volume/softura/facerec/att_split_path/5/3/train.csv



Dir-->23:  38%|███▊      | 15/40 [00:00<00:00, 102.32it/s][A
Dir-->24:  40%|████      | 16/40 [00:00<00:00, 102.32it/s][A
Dir-->25:  42%|████▎     | 17/40 [00:00<00:00, 102.32it/s][A
Dir-->26:  45%|████▌     | 18/40 [00:00<00:00, 102.32it/s][A
Dir-->27:  48%|████▊     | 19/40 [00:00<00:00, 102.32it/s][A
Dir-->28:  50%|█████     | 20/40 [00:00<00:00, 102.32it/s][A
Dir-->29:  52%|█████▎    | 21/40 [00:00<00:00, 102.32it/s][A
Dir-->29:  55%|█████▌    | 22/40 [00:00<00:00, 104.50it/s][A
Dir-->3:  55%|█████▌    | 22/40 [00:00<00:00, 104.50it/s] [A
Dir-->30:  57%|█████▊    | 23/40 [00:00<00:00, 104.50it/s][A
Dir-->31:  60%|██████    | 24/40 [00:00<00:00, 104.50it/s][A
Dir-->32:  62%|██████▎   | 25/40 [00:00<00:00, 104.50it/s][A
Dir-->33:  65%|██████▌   | 26/40 [00:00<00:00, 104.50it/s][A
Dir-->34:  68%|██████▊   | 27/40 [00:00<00:00, 104.50it/s][A
Dir-->35:  70%|███████   | 28/40 [00:00<00:00, 104.50it/s][A
Dir-->36:  72%|███████▎  | 29/40 [00:00<00:00, 104.50it/s][A
Dir-->3

Generating: Fold 1
Creating directory: /media/ankurrc/new_volume/softura/facerec/att_split_path/8/1
done.
/media/ankurrc/new_volume/softura/facerec/att_split_path/8/1/train.csv


[A
Dir-->22:  35%|███▌      | 14/40 [00:00<00:00, 89.52it/s][A
Dir-->23:  38%|███▊      | 15/40 [00:00<00:00, 89.52it/s][A
Dir-->24:  40%|████      | 16/40 [00:00<00:00, 89.52it/s][A
Dir-->25:  42%|████▎     | 17/40 [00:00<00:00, 89.52it/s][A
Dir-->26:  45%|████▌     | 18/40 [00:00<00:00, 89.52it/s][A
Dir-->26:  48%|████▊     | 19/40 [00:00<00:00, 91.71it/s][A
Dir-->27:  48%|████▊     | 19/40 [00:00<00:00, 91.71it/s][A
Dir-->28:  50%|█████     | 20/40 [00:00<00:00, 91.71it/s][A
Dir-->29:  52%|█████▎    | 21/40 [00:00<00:00, 91.71it/s][A
Dir-->3:  55%|█████▌    | 22/40 [00:00<00:00, 91.71it/s] [A
Dir-->30:  57%|█████▊    | 23/40 [00:00<00:00, 91.71it/s][A
Dir-->31:  60%|██████    | 24/40 [00:00<00:00, 91.71it/s][A
Dir-->32:  62%|██████▎   | 25/40 [00:00<00:00, 91.71it/s][A
Dir-->33:  65%|██████▌   | 26/40 [00:00<00:00, 91.71it/s][A
Dir-->34:  68%|██████▊   | 27/40 [00:00<00:00, 91.71it/s][A
Dir-->35:  70%|███████   | 28/40 [00:00<00:00, 91.71it/s][A
Dir-->36:  72%|█████

Generating: Fold 2
Creating directory: /media/ankurrc/new_volume/softura/facerec/att_split_path/8/2
done.
/media/ankurrc/new_volume/softura/facerec/att_split_path/8/2/train.csv


[A
Dir-->26:  45%|████▌     | 18/40 [00:00<00:00, 124.94it/s][A
Dir-->27:  48%|████▊     | 19/40 [00:00<00:00, 124.94it/s][A
Dir-->28:  50%|█████     | 20/40 [00:00<00:00, 124.94it/s][A
Dir-->29:  52%|█████▎    | 21/40 [00:00<00:00, 124.94it/s][A
Dir-->3:  55%|█████▌    | 22/40 [00:00<00:00, 124.94it/s] [A
Dir-->3:  57%|█████▊    | 23/40 [00:00<00:00, 114.57it/s][A
Dir-->30:  57%|█████▊    | 23/40 [00:00<00:00, 114.57it/s][A
Dir-->31:  60%|██████    | 24/40 [00:00<00:00, 114.57it/s][A
Dir-->32:  62%|██████▎   | 25/40 [00:00<00:00, 114.57it/s][A
Dir-->33:  65%|██████▌   | 26/40 [00:00<00:00, 114.57it/s][A
Dir-->34:  68%|██████▊   | 27/40 [00:00<00:00, 114.57it/s][A
Dir-->35:  70%|███████   | 28/40 [00:00<00:00, 114.57it/s][A
Dir-->36:  72%|███████▎  | 29/40 [00:00<00:00, 114.57it/s][A
Dir-->37:  75%|███████▌  | 30/40 [00:00<00:00, 114.57it/s][A
Dir-->38:  78%|███████▊  | 31/40 [00:00<00:00, 114.57it/s][A
Dir-->39:  80%|████████  | 32/40 [00:00<00:00, 114.57it/s][A
Dir--

Generating: Fold 3
Creating directory: /media/ankurrc/new_volume/softura/facerec/att_split_path/8/3
done.
/media/ankurrc/new_volume/softura/facerec/att_split_path/8/3/train.csv


Dir-->24:  40%|████      | 16/40 [00:00<00:00, 98.00it/s][A
Dir-->25:  42%|████▎     | 17/40 [00:00<00:00, 98.00it/s][A
Dir-->26:  45%|████▌     | 18/40 [00:00<00:00, 98.00it/s][A
Dir-->27:  48%|████▊     | 19/40 [00:00<00:00, 98.00it/s][A
Dir-->28:  50%|█████     | 20/40 [00:00<00:00, 98.00it/s][A
Dir-->29:  52%|█████▎    | 21/40 [00:00<00:00, 98.00it/s][A
Dir-->3:  55%|█████▌    | 22/40 [00:00<00:00, 98.00it/s] [A
Dir-->3:  57%|█████▊    | 23/40 [00:00<00:00, 104.45it/s][A
Dir-->30:  57%|█████▊    | 23/40 [00:00<00:00, 104.45it/s][A
Dir-->31:  60%|██████    | 24/40 [00:00<00:00, 104.45it/s][A
Dir-->32:  62%|██████▎   | 25/40 [00:00<00:00, 104.45it/s][A
Dir-->33:  65%|██████▌   | 26/40 [00:00<00:00, 104.45it/s][A
Dir-->34:  68%|██████▊   | 27/40 [00:00<00:00, 104.45it/s][A
Dir-->35:  70%|███████   | 28/40 [00:00<00:00, 104.45it/s][A
Dir-->36:  72%|███████▎  | 29/40 [00:00<00:00, 104.45it/s][A
Dir-->37:  75%|███████▌  | 30/40 [00:00<00:00, 104.45it/s][A
Dir-->38:  78%|█