# Title: AIDI 1002 Final Term Project Report

#### Members' Names or Individual's Name:
<ol><li>Aditya Dube</li>
<li>Shimoni Mistry</li></ol>
                                 

####  Emails:
   <li> 200530940@student.georgianc.on.ca</li>
   <li> 200523189@student.georgianc.on.ca</li>
    

# Introduction: 
#### BlazeFace is a real-time face detection model developed by Google, which uses a lightweight architecture to achieve fast inference on mobile devices and low-power computers.

#### Problem Description:

The problem addressed by BlazeFace is real-time face detection with high accuracy while using minimal computational resources.


#### Context of the Problem:

Real-time face detection is crucial in many applications such as video conferencing, security systems, and augmented reality. However, existing face detection models may not be optimized for mobile devices or low-power computers, making real-time face detection a challenging task.
#### Limitation About other Approaches:

Prior approaches to real-time face detection have been limited by either their computational complexity or their accuracy. Traditional methods based on sliding windows require significant computational resources and may not work well under varying lighting and pose conditions.
#### Solution:

BlazeFace addresses these limitations by using a lightweight architecture based on the Single Shot Detector (SSD) architecture and a modified version of the MobileNetV1 architecture for feature extraction. This enables fast and accurate face detection on mobile devices and low-power computers with minimal computational resources.

# Background

Explain the related work using the following table

| Reference |Explanation |  Dataset/Input |Weakness
| --- | --- | --- | --- |
| Tom et al. [1] | They trained a BERT based transformer to predict answers from the passage of a question| SQUAD dataset for QA | Only 80% accuracy
| George et al. [2] | They trained a attention based sequence to sequence model using LSTM to predict answers from the passage of a question| SQUAD V2 dataset for QA | High accuracy but poor on unkown answers
| Weiss et al. [3] |They proposed a real-time object detection model using YOLOv2 and achieved high accuracy |PASCAL VOC dataset Image dataset for object detection | Not optimized for mobile devices
|BlazeFace (discussed in this paper) |A real-time face detection model using a lightweight architecture based on SSD and a modified version of MobileNetV1 for feature extraction. |WIDER FACE and AFW datasets |face detection Requires large training data for high accuracy. Future work can involve further optimization for real-time performance on low-power devices.



The last row in this table should be about the method discussed in this paper (If you can't find the weakenss of this method then write about the future improvement, see the future work section of the paper)

# Methodology

The BlazeFace model is a lightweight real-time face detection model that uses a modified version of the MobileNetV1 architecture for feature extraction and the Single Shot Detector (SSD) architecture for object detection.

The model consists of a feature extraction network followed by two parallel branches that predict the location and class of face bounding boxes. The feature extraction network is composed of a series of depthwise separable convolutions, which significantly reduces the computational cost of the model.

The first branch predicts the location of the face bounding box by regressing the coordinates of the top-left and bottom-right corners of the box. The second branch predicts the probability that the detected object is a face.

During training, the model is optimized using a combination of smooth L1 loss for bounding box regression and focal loss for classification. The model is trained on large-scale face detection datasets, such as the WIDER FACE and AFW datasets.

BlazeFace achieves state-of-the-art accuracy on the WIDER FACE and AFW benchmarks while using minimal computational resources. The model is designed to run efficiently on mobile devices, with a small model size and low computational requirements.

In this section, we will be implementing the BlazeFace model for real-time face detection in images and videos. We will also explore ways to optimize the model for even faster inference on low-power devices.

# Implementation

In this section, you will provide the code and its explanation. You may have to create more cells after this. (To keep the Notebook clean, do not display debugging output or thousands of print statements from hundreds of epochs. Make sure it is readable for others by reviewing it yourself carefully.)

In [1]:
import numpy as np

In [2]:
anchor_options = {
    "num_layers": 4,
    "min_scale": 0.1484375,
    "max_scale": 0.75,
    "input_size_height": 128,
    "input_size_width": 128,
    "anchor_offset_x": 0.5,
    "anchor_offset_y": 0.5,
    "strides": [8, 16, 16, 16],
    "aspect_ratios": [1.0],
    "reduce_boxes_in_lowest_layer": False,
    "interpolated_scale_aspect_ratio": 1.0,
    "fixed_anchor_size": True,
}

In [3]:
anchor_back_options = {
    "num_layers": 4,
    "min_scale": 0.15625,
    "max_scale": 0.75,
    "input_size_height": 256,
    "input_size_width": 256,
    "anchor_offset_x": 0.5,
    "anchor_offset_y": 0.5,
    "strides": [16, 32, 32, 32],
    "aspect_ratios": [1.0],
    "reduce_boxes_in_lowest_layer": False,
    "interpolated_scale_aspect_ratio": 1.0,
    "fixed_anchor_size": True,
}

In [5]:
anchor_back_options = {
    "num_layers": 4,
    "min_scale": 0.15625,
    "max_scale": 0.75,
    "input_size_height": 256,
    "input_size_width": 256,
    "anchor_offset_x": 0.5,
    "anchor_offset_y": 0.5,
    "strides": [16, 32, 32, 32],
    "aspect_ratios": [1.0],
    "reduce_boxes_in_lowest_layer": False,
    "interpolated_scale_aspect_ratio": 1.0,
    "fixed_anchor_size": True,
}


In [8]:
def calculate_scale(min_scale, max_scale, stride_index, num_strides):
    return min_scale + (max_scale - min_scale) * stride_index / (num_strides - 1.0)


def generate_anchors(options):
    strides_size = len(options["strides"])
    assert options["num_layers"] == strides_size

    anchors = []
    layer_id = 0
    while layer_id < strides_size:
        anchor_height = []
        anchor_width = []
        aspect_ratios = []
        scales = []

        # For same strides, we merge the anchors in the same order.
        last_same_stride_layer = layer_id
        while (last_same_stride_layer < strides_size) and \
              (options["strides"][last_same_stride_layer] == options["strides"][layer_id]):
            scale = calculate_scale(options["min_scale"],
                                    options["max_scale"],
                                    last_same_stride_layer,
                                    strides_size)

            if last_same_stride_layer == 0 and options["reduce_boxes_in_lowest_layer"]:
                # For first layer, it can be specified to use predefined anchors.
                aspect_ratios.append(1.0)
                aspect_ratios.append(2.0)
                aspect_ratios.append(0.5)
                scales.append(0.1)
                scales.append(scale)
                scales.append(scale)                
            else:
                for aspect_ratio in options["aspect_ratios"]:
                    aspect_ratios.append(aspect_ratio)
                    scales.append(scale)

                if options["interpolated_scale_aspect_ratio"] > 0.0:
                    scale_next = 1.0 if last_same_stride_layer == strides_size - 1 \
                                     else calculate_scale(options["min_scale"],
                                                          options["max_scale"],
                                                          last_same_stride_layer + 1,
                                                          strides_size)
                    scales.append(np.sqrt(scale * scale_next))
                    aspect_ratios.append(options["interpolated_scale_aspect_ratio"])

            last_same_stride_layer += 1

        for i in range(len(aspect_ratios)):
            ratio_sqrts = np.sqrt(aspect_ratios[i])
            anchor_height.append(scales[i] / ratio_sqrts)
            anchor_width.append(scales[i] * ratio_sqrts)            
            
        stride = options["strides"][layer_id]
        feature_map_height = int(np.ceil(options["input_size_height"] / stride))
        feature_map_width = int(np.ceil(options["input_size_width"] / stride))

        for y in range(feature_map_height):
            for x in range(feature_map_width):
                for anchor_id in range(len(anchor_height)):
                    x_center = (x + options["anchor_offset_x"]) / feature_map_width
                    y_center = (y + options["anchor_offset_y"]) / feature_map_height

                    new_anchor = [x_center, y_center, 0, 0]
                    if options["fixed_anchor_size"]:
                        new_anchor[2] = 1.0
                        new_anchor[3] = 1.0
                    else:
                        new_anchor[2] = anchor_width[anchor_id]
                        new_anchor[3] = anchor_height[anchor_id]
                    anchors.append(new_anchor)

        layer_id = last_same_stride_layer

    return anchors

In [10]:
anchors = generate_anchors(anchor_options)

assert len(anchors) == 896

anchors_back = generate_anchors(anchor_back_options)

assert len(anchors_back) == 896


In [11]:
anchors[:10]

[[0.03125, 0.03125, 1.0, 1.0],
 [0.03125, 0.03125, 1.0, 1.0],
 [0.09375, 0.03125, 1.0, 1.0],
 [0.09375, 0.03125, 1.0, 1.0],
 [0.15625, 0.03125, 1.0, 1.0],
 [0.15625, 0.03125, 1.0, 1.0],
 [0.21875, 0.03125, 1.0, 1.0],
 [0.21875, 0.03125, 1.0, 1.0],
 [0.28125, 0.03125, 1.0, 1.0],
 [0.28125, 0.03125, 1.0, 1.0]]

In [12]:
anchors_back[:10]

[[0.03125, 0.03125, 1.0, 1.0],
 [0.03125, 0.03125, 1.0, 1.0],
 [0.09375, 0.03125, 1.0, 1.0],
 [0.09375, 0.03125, 1.0, 1.0],
 [0.15625, 0.03125, 1.0, 1.0],
 [0.15625, 0.03125, 1.0, 1.0],
 [0.21875, 0.03125, 1.0, 1.0],
 [0.21875, 0.03125, 1.0, 1.0],
 [0.28125, 0.03125, 1.0, 1.0],
 [0.28125, 0.03125, 1.0, 1.0]]

In [14]:
anchor_options_test = {
    "num_layers": 5,
    "min_scale": 0.1171875,
    "max_scale": 0.75,
    "input_size_height": 256,
    "input_size_width": 256,
    "anchor_offset_x": 0.5,
    "anchor_offset_y": 0.5,
    "strides": [8, 16, 32, 32, 32],
    "aspect_ratios": [1.0],
    "reduce_boxes_in_lowest_layer": False,
    "interpolated_scale_aspect_ratio": 1.0,
    "fixed_anchor_size": True,
}

anchors_test = generate_anchors(anchor_options_test)
anchors_golden = np.loadtxt("anchor_golden_file_0.txt")

assert len(anchors_test) == len(anchors_golden)
print("Number of errors:", (np.abs(anchors_test - anchors_golden) > 1e-5).sum())

Number of errors: 0


In [15]:
anchor_options_test = {
    "num_layers": 6,
    "min_scale": 0.2,
    "max_scale": 0.95,
    "input_size_height": 300,
    "input_size_width": 300,
    "anchor_offset_x": 0.5,
    "anchor_offset_y": 0.5,
    "strides": [16, 32, 64, 128, 256, 512],
    "aspect_ratios": [1.0, 2.0, 0.5, 3.0, 0.3333],
    "reduce_boxes_in_lowest_layer": True,
    "interpolated_scale_aspect_ratio": 1.0,
    "fixed_anchor_size": False,
}

anchors_test = generate_anchors(anchor_options_test)
anchors_golden = np.loadtxt("anchor_golden_file_1.txt")

assert len(anchors_test) == len(anchors_golden)
print("Number of errors:", (np.abs(anchors_test - anchors_golden) > 1e-5).sum())

Number of errors: 0


In [16]:
np.save("anchors.npy", anchors)
np.save("anchorsback.npy", anchors_back)

In [17]:
pip install git+https://github.com/hollance/BlazeFace-PyTorch.git

Collecting git+https://github.com/hollance/BlazeFace-PyTorch.git
  Cloning https://github.com/hollance/BlazeFace-PyTorch.git to c:\users\aditya\appdata\local\temp\pip-req-build-4lyci31r
  Resolved https://github.com/hollance/BlazeFace-PyTorch.git to commit 852bfd8e3d44ed6775761105bdcead4ef389a538
Note: you may need to restart the kernel to use updated packages.


  Running command git clone --filter=blob:none --quiet https://github.com/hollance/BlazeFace-PyTorch.git 'C:\Users\Aditya\AppData\Local\Temp\pip-req-build-4lyci31r'
ERROR: git+https://github.com/hollance/BlazeFace-PyTorch.git does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.


In [None]:
import numpy as np
import torch
import cv2

In [None]:
print("PyTorch version:", torch.__version__)
print("CUDA version:", torch.version.cuda)
print("cuDNN version:", torch.backends.cudnn.version())

In [None]:
gpu = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
gpu

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.patches as patches

def plot_detections(img, detections, with_keypoints=True):
    fig, ax = plt.subplots(1, figsize=(10, 10))
    ax.grid(False)
    ax.imshow(img)
    
    if isinstance(detections, torch.Tensor):
        detections = detections.cpu().numpy()

    if detections.ndim == 1:
        detections = np.expand_dims(detections, axis=0)

    print("Found %d faces" % detections.shape[0])
        
    for i in range(detections.shape[0]):
        ymin = detections[i, 0] * img.shape[0]
        xmin = detections[i, 1] * img.shape[1]
        ymax = detections[i, 2] * img.shape[0]
        xmax = detections[i, 3] * img.shape[1]

        rect = patches.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin,
                                 linewidth=1, edgecolor="r", facecolor="none", 
                                 alpha=detections[i, 16])
        ax.add_patch(rect)

        if with_keypoints:
            for k in range(6):
                kp_x = detections[i, 4 + k*2    ] * img.shape[1]
                kp_y = detections[i, 4 + k*2 + 1] * img.shape[0]
                circle = patches.Circle((kp_x, kp_y), radius=0.5, linewidth=1, 
                                        edgecolor="lightskyblue", facecolor="none", 
                                        alpha=detections[i, 16])
                ax.add_patch(circle)
        
    plt.show()

In [None]:
import cv2
import numpy as np
from blazeface import BlazeFace


In [None]:
# Load the BlazeFace model
face_detector = BlazeFace()
face_detector.load_weights("blazeface.pth")
face_detector.load_anchors("anchors.npy")

# Set the video source
video_path = "path/to/your/video.mp4"
cap = cv2.VideoCapture(video_path)


In [None]:
while True:
    # Read the frame from the video
    ret, frame = cap.read()
    if not ret:
        break
    
    # Convert the frame to RGB and resize
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    frame = cv2.resize(frame, (1280, 720))
    
    # Detect faces
    try:
        faces = face_detector.predict_on_image(frame)
    except BlazeFaceError:
        continue
    
    # Draw bounding boxes around the detected faces
    for face in faces:
        x, y, w, h = face.astype(int)
        cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
    
    # Show the frame
    cv2.imshow("Frame", frame)
    
    # Exit the loop if 'q' is pressed
    if cv2.waitKey(1) == ord("q"):
        break


In [None]:
while True:
    # Read the frame from the video
    ret, frame = cap.read()
    if not ret:
        break
    
    # Convert the frame to RGB and resize
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    frame = cv2.resize(frame, (1280, 720))
    
    # Detect faces
    try:
        faces = face_detector.predict_on_image(frame)
    except BlazeFaceError:
        continue
    
    # Draw bounding boxes around the detected faces
    for face in faces:
        x, y, w, h = face.astype(int)
        cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
    
    # Show the frame
    cv2.imshow("Frame", frame)
    
    # Exit the loop if 'q' is pressed
    if cv2.waitKey(1) == ord("q"):
        break


In [None]:
from blazeface import BlazeFace

front_net = BlazeFace().to(gpu)
front_net.load_weights("blazeface.pth")
front_net.load_anchors("anchors.npy")
back_net = BlazeFace(back_model=True).to(gpu)
back_net.load_weights("blazefaceback.pth")
back_net.load_anchors("anchorsback.npy")

# Optionally change the thresholds:
front_net.min_score_thresh = 0.75
front_net.min_suppression_threshold = 0.3

In [None]:
img = cv2.imread("1face.jpg")
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

In [None]:
front_detections = front_net.predict_on_image(img)
front_detections.shape

In [None]:
front_detections

In [None]:
plot_detections(img, front_detections)

In [None]:
back_detections

In [None]:
plot_detections(img2, back_detections)

In [None]:
filenames = [ "1face.jpg", "3faces.png", "4faces.png" ]

xfront = np.zeros((len(filenames), 128, 128, 3), dtype=np.uint8)
xback = np.zeros((len(filenames), 256, 256, 3), dtype=np.uint8)

for i, filename in enumerate(filenames):
    img = cv2.imread(filename)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    xfront[i] = img
    xback[i] = cv2.resize(img, (256, 256))

In [None]:
front_detections = front_net.predict_on_batch(xfront)
[d.shape for d in front_detections]

In [None]:
front_detections

In [None]:
plot_detections(xfront[0], front_detections[0])

In [None]:
plot_detections(xfront[1], front_detections[1])

In [None]:
plot_detections(xfront[2], front_detections[2])

In [None]:
back_detections = back_net.predict_on_batch(xback)
[d.shape for d in back_detections]

In [None]:
back_detections

In [None]:
plot_detections(xback[0], back_detections[0])

In [None]:
plot_detections(xback[1], back_detections[1])

In [None]:
plot_detections(xback[2], back_detections[2])


## Added face detection in video
    Video is divided into frames and then converted to RGB and passed on to predict_on_image() method of the BlazeFace object. If a face is detected, a rectangle is drawn around it. The code exits when the 'q' key is pressed.

In [None]:
cap.release()
cv2.destroyAllWindows()


In [None]:
import cv2
from blazeface import BlazeFace

# Load the BlazeFace detector
detector = BlazeFace()

# Load the video stream
cap = cv2.VideoCapture(scr="video1.avi")

# Check if the video stream was opened successfully
if not cap.isOpened():
    print("Error opening video stream")
    exit()

# Loop over the frames from the video stream
while True:
    # Read the frame from the video stream
    ret, frame = cap.read()

    # If the frame was not read successfully, break from the loop
    if not ret:
        break

    # Convert the frame to RGB
    rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

    try:
        # Detect faces in the frame using BlazeFace
        faces = detector.predict_on_image(rgb)

        # Draw a rectangle around each detected face
        for face in faces:
            x, y, w, h = face
            cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)

    except BlazeFaceError:
        pass

    # Display the resulting frame
    cv2.imshow('Frame', frame)

    # Exit the loop if the 'q' key is pressed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release the video capture object and destroy all windows
cap.release()
cv2.destroyAllWindows()


# Conclusion and Future Direction

We tried to add Face Detection in video and got successful results for some video formats and video size (mp4 and avi seemed work the most).
We also tried to change the image size to 256x256px in the blazeface.py file, got better results but the processing time was increased and the model used more resources and was not able to detect faces in miliseconds as earlier.
<br><br>
We also tried implementing MediaPipe Holistic model to detect not only the face but also the pose, left and right hands, and landmarks of the detected objects but weren't successful as it's accuracy was not as expected and also crashed the notebbok a couple of time so we had to remove it.

During this project, We learned how to use BlazeFace for face detection in images and videos using Python and OpenCV. I also gained knowledge on how to fine-tune the detection results by adjusting the confidence threshold and the non-maximum suppression overlap threshold.
 <br>The results showed that BlazeFace is a robust and efficient face detection model, capable of detecting multiple faces in real-time videos with high accuracy. However, the model is limited to detecting only frontal faces and may not perform well under challenging conditions, such as occlusion, extreme lighting, and blurry images.

In the future, the limitations of BlazeFace can be addressed by incorporating more complex and powerful deep learning models or by combining multiple models for better performance. Additionally, further research can be done on adapting face detection models for non-frontal views and other facial features such as facial expressions and emotions.

# References:

[1]:  Chen Chen, Haibin Ling, Yanfeng Sun, and Wei Xia. "BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs." 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, pp. 619-620.
