## Case Study - Comparing Accuracy for Real-Time Face Detection Models

The goal of this case study is to figure out what the best model would be for real-time face detection. More specifically, what model detect images more accurately. Instead of using a video, this study use a set of images instead of videos. The model will only be robust if it is tested on a set of images that are diverse in lighting condition, position of the faces, amount of faces, ski color, etc. Videos is essentially a finite set of frames (images) put together. The argument is that a diverse set of images will test the accuracy better than a video would do. <br>

The face detection models compared will be: 
- Haar Cascade
- HOG
- DNN
- MMOD

All of the tested videos are within the folder `data/test_data/` in the same repository. Run the following code block to import all libraries: 

In [None]:
# Importing all packages
import cv2
import dlib
import matplotlib.pyplot as pyplot

# For benchmarking the process of generating the graph
import time

# For creating the progress bar when looping over the dataset
from tqdm.notebook import tqdm

### 1. Code snippets for each model

The following code are the function for each face detection model. The code for the individual models can be found in within `src/models/code/` the `OpenCV_Server` repository. Each file has a function that is called on a single frame. The output is either a list of boundary boxes (`Rect`) or `None`. The functions are moved over to this file for the sake of simplicity.

Run them to save the functions:

In [None]:
# Haar Cascade
def detect_face_haar(img,detectMultipleFaces=False, scale=1.1, neighbors=10, size=50):
    """Detect a face in an image using a pre-trained Haar Cascade model. 

    The model has been trained by OpenCV.
    See: https://opencv.org/

    Args:
        img (numpy.ndarray): 
            Image read from the cv2.imread function. Ut is a numpy
        detectMultipleFaces (boolean): 
            Toggle for returning more than one face detected. Default is false. 
        scale (float, optional): 
            For scaling down the input image, before trying to detect a face. Makes it easier to detect a face with smaller scale. Defaults to 1.1.
        neighbors (int, optional): 
            Amount of neighbor rectangles needed for a face to be set as detected. Defaults to 10.
        size (int, optional): 
            Size of the sliding window that checks for any facial features. Should match the face size in the image, that should be detected. Defaults to 50.

    Returns:
        Rect: Datatype of a rectangle, that overlays the position of the detected face. It has four attributes of intrests: x-position, y-position, 
    """

    # Turing the image into a grayscale image
    gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Printing the gray scale image
    # print(f"Gray-Scale Image dimension: ({gray_image.shape})")

    # Loading the classifier from a pretrained dataset
    face_classifier = cv2.CascadeClassifier(
        cv2.data.haarcascades + "haarcascade_frontalface_default.xml"
    )

    # Performing the face detection
    faces = face_classifier.detectMultiScale(
        gray_image, scaleFactor=scale, minNeighbors=neighbors, minSize=(size,size)
    )

    # Return amount of 
    if detectMultipleFaces == True:
        return faces
    return faces[0]


# HOG
def detect_face_hog(img,detectMultipleFaces=False):
    # Turing the image into a grayscale image
    gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Define the HOG detector from dlib
    hog_face_detector = dlib.get_frontal_face_detector()

    # Detect faces from the grayscale image
    faces = hog_face_detector(gray_image, 1)

    # Check if only one face is needed
    if detectMultipleFaces == False:
        return faces[0]

    return faces


# DNN
def detect_face_dnn(img, net, framework="caffe", conf_threshold=0.7, detectMultipleFaces=False):
    """
    Detect faces in an image using a deep neural network (DNN).

    Parameters:
    - img: The input image.
    - net: The pre-trained DNN model for face detection.
    - framework: The framework used for the DNN model ("caffe" or "tensorflow").
    - conf_threshold: The confidence threshold for detecting faces.
    - detect_multiple_faces: Boolean flag to detect multiple faces or just the first one.

    Returns:
    - A list of bounding boxes for detected faces or a single bounding box if detectMultipleFaces is False.
    """
    frameHeight = img.shape[0]
    frameWidth = img.shape[1]
    if framework == "caffe":
        blob = cv2.dnn.blobFromImage(img, 1.0, (300, 300), [104, 117, 123], False, False)
    else:
        blob = cv2.dnn.blobFromImage(img, 1.0, (300, 300), [104, 117, 123], True, False)

    net.setInput(blob)
    detections = net.forward()
    bboxes = []
    for i in range(detections.shape[2]):
        confidence = detections[0, 0, i, 2]
        if confidence > conf_threshold:
            x1 = int(detections[0, 0, i, 3] * frameWidth)
            y1 = int(detections[0, 0, i, 4] * frameHeight)
            x2 = int(detections[0, 0, i, 5] * frameWidth)
            y2 = int(detections[0, 0, i, 6] * frameHeight)
            width = x2 - x1
            height = y2 - y1
            bboxes.append((x1, y1, width, height))

    if detectMultipleFaces == True:
        return bboxes  # Return all detected faces
    else:
        return bboxes[0] if bboxes else None # Return the first face or None if no faces are detected
    

# MMOD 
def detect_face_mmod(img, detector, inHeight=300, inWidth=0, detectMultipleFaces=False):
    """
    Detect faces in an image using the dlib MMOD detector.

    Parameters:
    - img: The input image.
    - detector: The dlib MMOD face detector.
    - inHeight: The height of the image for detection.
    - inWidth: The width of the image for detection. If 0, it will be calculated based on the aspect ratio of the input image.
    - detectMultipleFaces: Boolean flag to indicate whether to detect multiple faces or just the first one.

    Returns:
    - A list of bounding boxes for each detected face or a single bounding box if detectMultipleFaces is False.
    """
    frameHeight = img.shape[0]
    frameWidth = img.shape[1]
    if not inWidth:
        inWidth = int((frameWidth / frameHeight) * inHeight)

    scaleHeight = frameHeight / inHeight
    scaleWidth = frameWidth / inWidth

    resized_img = cv2.resize(img, (inWidth, inHeight))
    resized_img = cv2.cvtColor(resized_img, cv2.COLOR_BGR2RGB)
    faceRects = detector(resized_img, 0)

    bboxes = []
    for faceRect in faceRects:
        x1 = int(faceRect.rect.left() * scaleWidth)
        y1 = int(faceRect.rect.top() * scaleHeight)
        x2 = int(faceRect.rect.right() * scaleWidth)
        y2 = int(faceRect.rect.bottom() * scaleHeight)
        width = x2 - x1
        height = y2 - y1
        bboxes.append((x1, y1, width, height))

    if detectMultipleFaces == True:
        return bboxes  # Return all detected faces
    else:
        return bboxes[0] if bboxes else None  # Return the first face or None if no faces are detected

### 2. Metrics for quantifying face detection accuracy  

The metrics for face detection accuracy have been the same as any object detection problem: 

**IoU:** Intersection of Union is the relationship between the predicted boundary box and the actual boundary box. <br>

**Precision:** Predicted positives that are correct. <br>

**Recall:** The proportion of predicted ^

TODO: REWRITE THIS


For this case study, the metrics will be precision as a metric. This requires a dataset that is labeled with images  

### 3. The Dataset

For the accuracy test we are going to use a dataset with 5171 faces in a set of 2845 images. The dataset is called FDDB: Face Detection Data Set and Benchmark, and was created by the University of Massachusetts. A link to the dataset can be found in the [resources](#resources). 


The images has to be extracted in the path `data/test_data/FDDB/`. This folder is git ignored, meaning that setup is required before running this Jupiter Hub file. Create a new directory within the `data/test_data`, called `FDDB`. This is a huge dataset with over 28 000 images. Not all where used in the FDDB study. The complete list of all images used in the study is located in the file: `data/test_data/fddb_paths.txt`. The file has the paths for the 2845 images that were used in the study.


To test that the dataset has been correctly imported run the following code blocks (the image is of a golfer with a golf club): 

In [None]:
# Expect 2845 images
total_amount_of_images = 2845
total_amount_of_faces = 5171
actual = 0

# Count how many lines there is in the list of all images used by the 
with open("../data/test_data/fddb_paths.txt", "r") as file:
    actual = sum(1 for line in file)

# Assert that this is the case
assert actual == total_amount_of_images

In [None]:
# Opening the first image in the dataset 
path = "../data/test_data/FDDB/2002/07/19/big/img_65.jpg"

# Opening the image and reading the image for the correct model
img = cv2.imread(path)
img= cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# Showing the image (should be no error)
pyplot.imshow(img)

### 4. Comparing the models 

We.....

In [None]:
# The amount of 
dataset_folder = "../data/test_data/FDDB"

# Variable for counting how many images was processed 
image_count = 0

# Start benchmarking 
start_time = time.time()

# Variables for the data
haar_positives = 0
hog_positives = 0
dnn_positives = 0 

# Setup NET for DNN
modelFile = "../src/models/trained_models/res10_300x300_ssd_iter_140000_fp16.caffemodel"
configFile = "../src/models/trained_models/deploy.prototxt"
net = cv2.dnn.readNetFromCaffe(configFile, modelFile)

# Total images for this dataset 
total_amount_of_images = 2845

# Initialize the progress bar
progress_bar = tqdm(total=total_amount_of_images, desc="Processing Images")


# Walk through all directories and files in the folder
with open("../data/test_data/fddb_paths.txt", "r") as file:
    for path in file: 
        # Increment the image count
        image_count += 1

        # Full path 
        image_path = "../data/test_data/FDDB/" + path.strip() + ".jpg"

        # Read image and correct color
        img = cv2.imread(image_path) 
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

        # Test Haar Method 
        faces_haar = detect_face_haar(img, True)
        if faces_haar is not None and len(faces_haar) > 0:
            haar_positives += len(faces_haar) 

        # Test HOG Method 
        faces_hog = detect_face_hog(img, True)
        if faces_hog is not None and len(faces_hog) > 0:
            hog_positives += len(faces_hog)

        # Test DNN Method
        faces_dnn = detect_face_dnn(img, net)
        if faces_dnn is not None and len(faces_dnn) > 0:
            dnn_positives += len(faces_dnn)
        
        # Update the progress bar
        progress_bar.update(1)

# Stop the progress bar
progress_bar.close()

# End benchmarking time
end_time = time.time()
seconds_passed = round(end_time - start_time, 0)
min, sec = divmod(seconds_passed, 60)


# Print metrics of process 
print(f"Images processed: {image_count}")
print(f"Time elapsed:   {min} min, {sec}s")

Calculating the result **R** as a percentage (rounded to two decimals): 
$$
  R = \frac{\text{Amount of detected faces}}{\text{Total amount of faces}} \times 100
$$

In [None]:
# Calculating the percentage of each individual method 
haar_res = round((haar_positives / total_amount_of_faces) * 100, 2)
hog_res = round((hog_positives / total_amount_of_faces) * 100, 2)
dnn_res = round((dnn_positives / total_amount_of_faces) * 100, 2)

# Printing the result
print(f"HAAR: {haar_res}% of faces detected ")
print(f"HOG: {hog_res}% of faces detected ")
print(f"DNN: {dnn_res}% of faces detected ")

Using `matplotlib` to plot the data in a horizontal bar graph: 

In [None]:
# Setting up labels and values to plot (x, y axis labels)
methods = ["HAAR", "HOG", "DNN"]
percentages = [haar_res, hog_res, dnn_res]

# Creating the horizontal bar chart
pyplot.figure(figsize=(10, 6))
bars = pyplot.barh(methods, percentages, color=["blue", "orange", "green"])

# Adding the text labels on the bars
for bar in bars:
    # Set labels to the left of the bar
    width = bar.get_width()
    label_x_pos = width + 1 

    # Plotting the bar
    pyplot.text(label_x_pos, bar.get_y() + bar.get_height()/2, f"{width}%", va="center")

# Setting the labels and title
pyplot.xlabel(f"Percentage of Faces Detected\n\nFaces labeled = {total_amount_of_faces}, Images = {total_amount_of_images}")
pyplot.title("Comparing Face Detection Methods (Accuracy)")

# Adjust the x-axis limits if you want more space for text
pyplot.xlim(0, max(percentages) + 10)  # Adding 10 for a bit of extra space on the right

# Saving the figure to a set path 
pyplot.savefig("../data/results/compare_face_detection_model_accuracy.png")

# Showing the figure
pyplot.show()

### Resources and Credits: 

Metrics for face detection model comparison: <br>
https://learnopencv.com/what-is-face-detection-the-ultimate-guide/#Metrics-used-for-Face-Detection <br>

The FDDB Dataset with for benchmarking: <br>
http://vis-www.cs.umass.edu/fddb/ 


Check out the GitHub repository here: [GitHub Repository](https://github.com/RIT-NTNU-Bachelor/OpenCV_Server)

**Created by:** Kjetil Indrehus, Sander Hauge and Martin Johannessen
