## Face bbox detection:
As the AffectNet dataset is based on face detection, I analyze many face detectors and check two metrics: the average execution time and for the known bodies which of them have a face identified. I will use the following face detectors:

1. [OpenCV (HAAR Cascade) face detector](https://docs.opencv.org/3.4.1/d7/d8b/tutorial_py_face_detection.html)
2. [YOLOv8 face detector](https://github.com/akanametov/yolov8-face)

*Note: YOLOv8 face detector is forked from the original model and trained by the Github user "akanametov". The model structure is the same as the original one, but the model weights are taken from the mentioned user.*


In [None]:
import cv2
import pandas as pd
import time
from pathlib import Path
import os
import torch
print("Cuda is available:", torch.cuda.is_available())
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("The device is:", torch.cuda.get_device_name(device))


from src.models.load_pretrained_models import load_YOLO_model_face_recognition, load_HAAR_cascade_face_detection
from src.models.inference_face_detection_model import detect_faces_HAAR_cascade, detect_faces_YOLO
from src import INTERIM_DATA_DIR

Cuda is available: True
The device is: NVIDIA GeForce GTX 1080 Ti


Load the interim annotations:

In [7]:
annotations_path = Path(os.path.join(INTERIM_DATA_DIR, 'annotations'))

annotations = {}
for data_split in os.listdir(annotations_path):
    if data_split.endswith('.pkl'):
        file = os.path.join(annotations_path, data_split)
        data_part_name = data_split.split('.')[0]
        annotations[data_part_name] = pd.read_pickle(file)

FileNotFoundError: [Errno 2] No such file or directory: '/home/usuaris/imatge/armand.de.asis/emotion_recognition/data/interim/annotations'

We load the face detectors:

In [2]:
haar_detector = load_HAAR_cascade_face_detection()
yolo_detector = load_YOLO_model_face_recognition(size = "medium", device = device)


In [None]:
def check_face_predictions (body_bbox, faces_bbox, well_detected):
    for id, f_bbox in faces_bbox.items():
        upper_left_point_inside = f_bbox[0] >= body_bbox[0] and f_bbox[1] >= body_bbox[1]
        down_right_point_inside = f_bbox[0] + f_bbox[2] <= body_bbox[2] and f_bbox[1] + f_bbox[3] <= body_bbox[3]
        if upper_left_point_inside and down_right_point_inside:
            del faces_bbox[id]
            return well_detected+1, faces_bbox
        
    return well_detected, faces_bbox

In [None]:
execution_time = {'detector_HAAR_cascade': [], 'detect_faces_YOLO': []}
photo_directory = os.path.join(INTERIM_DATA_DIR, "images")

# Preprocess the data
for photo_idx in range(len(annotations['val'])):
        sample = annotations[data_split].loc[photo_idx]
        # Read the image
        img_file = sample['path']
        img_path = os.path.join(photo_directory, img_file)
        img = cv2.imread(img_path)

        # HAAR cascade detector
        start = time.time()
        faces_bbox_HAAR_cascade = detect_faces_HAAR_cascade(img, haar_detector)
        end = time.time()
        execution_time['detector_HAAR_cascade'].append(end - start)
        faces_bbox_HAAR_cascade = dict(enumerate(faces_bbox_HAAR_cascade))

        # YOLO detector
        start = time.time()
        faces_bbox_YOLO = detect_faces_YOLO(img, yolo_detector)
        end = time.time()
        execution_time['detect_faces_YOLO'].append(end - start)
        faces_bbox_YOLO = dict(enumerate(faces_bbox_YOLO))

        HAAR_cascade_well_detected = 0
        YOLO_well_detected = 0

        # Get the person data
        for person_idx in range(sample['people']):
            body_bbox = sample['bbox'][person_idx]
            HAAR_cascade_well_detected, faces_bbox_HAAR_cascade = check_face_predictions(body_bbox, faces_bbox_HAAR_cascade, HAAR_cascade_well_detected)
            YOLO_well_detected, faces_bbox_YOLO = check_face_predictions(body_bbox, faces_bbox_YOLO, YOLO_well_detected)
                

[640, 640]
[640, 480]
[640, 480]
[480, 640]
[500, 333]
[640, 478]
[500, 375]
[640, 429]
[640, 480]
[350, 500]
[471, 623]
[640, 425]
[640, 425]
[500, 334]
[634, 641]
[640, 480]
[640, 455]
[749, 499]
[640, 512]
[500, 500]
[640, 480]
[640, 427]
[570, 378]
[2592, 1944]
[588, 640]
[640, 427]
[640, 427]
[640, 425]
[640, 427]
[640, 456]
[640, 480]
[640, 480]
[612, 612]
[450, 311]
[640, 480]
[640, 480]
[640, 480]
[640, 640]
[427, 640]
[5477, 3651]
[640, 460]
[640, 426]
[640, 424]
[500, 375]
[640, 427]
[500, 375]
[640, 426]
[640, 428]
[225, 225]
[640, 640]
[600, 400]
[799, 499]
[640, 480]
[640, 361]
[640, 471]
[640, 480]
[640, 427]
[640, 427]
[612, 612]
[500, 500]
[960, 500]
[640, 480]
[3264, 2448]
[730, 433]
[640, 452]
[427, 640]
[370, 556]
[427, 640]
[640, 480]
[640, 425]
[675, 457]
[500, 354]
[258, 460]
[320, 320]
[640, 427]
[640, 427]
[640, 480]
[634, 418]
[300, 300]
[640, 426]
[640, 427]
[2586, 3630]
[443, 640]
[420, 316]
[640, 430]
[640, 426]
[640, 457]
[400, 300]
[720, 1280]
[500, 375]
[

Pros and Cons of Dlib’s CNN Face Detector
Accurate Detection in Challenging Conditions
Dlib’s CNN face detector, powered by the deepface library, is renowned for its exceptional accuracy in image recognition. It excels at detecting faces, even under challenging conditions such as low lighting, varying angles, or different facial expressions. This AI model consistently delivers impressive results and outperforms other detectors.

Superior Performance with Occluded or Partially Visible Faces
One of the standout features of Dlib’s CNN face detector is its ability to perform well with occluded or partially visible faces. It can accurately detect faces even when they are partially covered by objects or obscured by accessories such as glasses or hats. This capability makes it a valuable tool in scenarios where reliable facial recognition and facial attribute analysis is crucial. The deep face library can be used for accurate image analysis.

Computational Complexity as a Drawback
While Dlib’s CNN face detector, which is a facial detection library, offers remarkable accuracy in facial attribute analysis, it comes at the cost of computational complexity. The model’s intricate architecture requires significant computing power to efficiently detect faces in images. As a result, running this facial detection library can be computationally expensive, especially when dealing with large datasets or real-time applications.

Training Requires Ample Data for Optimal Results
To achieve optimal results with Dlib’s CNN face detector and the deep face library, training the model necessitates a substantial dataset for recognition and analysis. The process involves exposing the model to a wide range of facial variations and conditions so that it can learn to accurately detect faces across various scenarios. Acquiring and curating such a dataset may pose challenges for some users who have limited access to diverse facial images, but using the deep face library can help streamline the process.

Pros and Cons of OpenCV’s Deep Learning Face Detector
OpenCV’s Deep Learning face detector, also known as the Deep face library, offers several advantages and disadvantages for facial attribute analysis and image recognition. Let’s explore the pros and cons of this popular face detection model.

Pros
Efficiency: One of the major benefits of OpenCV’s Deep Learning face detector is its efficiency in terms of speed and resource usage. The model is optimized to provide real-time face detection, making it suitable for applications that require quick processing, such as video surveillance or live streaming.

Accuracy: OpenCV’s Deep Learning face detector delivers good accuracy. It can reliably identify faces facing the camera directly, making it useful for various facial recognition tasks like identity verification or emotion analysis.

Ease of Use: OpenCV provides a user-friendly interface for implementing their deep learning models, including the face detector. This makes it accessible even to developers who are new to computer vision or deep learning techniques. With minimal coding effort, you can integrate this powerful face detection capability into your applications.

Compatibility: OpenCV supports multiple programming languages like Python, C++, Java, and more. This cross-platform compatibility allows developers from different backgrounds to utilize the deep learning face detector in their preferred language.

Cons
Challenges with Extreme Angles: While OpenCV’s Deep Learning face detector excels at detecting frontal faces, it may struggle when faced with extreme angles or non-frontal views. In such cases, the accuracy drops significantly as the model finds it difficult to recognize faces that deviate from a straight-on perspective.

Limited Performance under Poor Lighting Conditions: Another limitation is its performance under poor lighting conditions. The model heavily relies on well-illuminated environments for accurate face detection results. In low-light situations or uneven lighting setups, the performance may degrade due to the lack of clear facial features.

Fine-Tuning for Specific Use Cases: Although OpenCV’s Deep Learning face detector provides a solid foundation, fine-tuning the model might be necessary to achieve optimal performance for specific use cases. This involves training the model on custom datasets or adjusting hyperparameters to improve accuracy and address any shortcomings encountered in your application domain.