# Case Study: compare tracking 

This case study is compare the different case studies abilities to track the user based on a standard video. 
By drawing the boundary boxes or landmarks on the video. Then the videos will be saved so that users can manually evaluate the tracking based on the videos.

The face detection models compared will be: 
- Haar Cascade
- HOG
- DNN
- MMOD

These detection models will also be compared against the following library: 
- CVZone

(**Note:** We also test MMOD here to evaluate if it has potential to be used, even though it is too cost expensive)

## 1. Import dependencies 

Run the next codeblock to import the `models` module with all the code, and import any other dependencies

In [1]:
# Import packages needed 
import cv2
import dlib
from cvzone.FaceMeshModule import FaceMeshDetector
import numpy as np
import glob
import os

# Import package for changing the path
import sys 
sys.path.append("../")

# Importing the functions for the face detection models 
from models.code.dnn import detect_face_dnn
from models.code.haar import detect_face_haar
from models.code.hog import detect_face_hog
from models.code.cvzone import detect_face_cvzone
from models.code.mmod import detect_face_mmod

## 2. The testing video 

For creating the tracking videos, we can use any trivial video of a user moving. In this case study, we will be using a video from the Intel IoT Development Kit (see [resources](#resources)). The video is about 2 minutes long, and involves a man moving his head around. 

Run the codeblock below to set path variables and the color of the boundary boxes or landmarks.

In [59]:
# Check that the video exist 
path_to_video = "./datasets/tracking/tracking_video.mp4"
output_dir = "./results/tracking_videos/"

# Also set the desired color to be used
COLOR = (0, 255, 0)

## 3. Tracking the face 

Each face detection method has detectors. They are also used in the [OpenCV Server](https://github.com/RIT-NTNU-Bachelor/OpenCV_Server/blob/main/tests/test_utils.py). Run this codeblock: 

In [60]:
# Load models 
HAAR_CLASSIFIER = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
HOG_DETECTOR = dlib.get_frontal_face_detector()
MMOD_DETECTOR = dlib.cnn_face_detection_model_v1("./models/trained_models/mmod_human_face_detector.dat")
CVZONE_DETECTOR = FaceMeshDetector()
DNN_CAFFE_MODEL_PATH = "./models/trained_models/res10_300x300_ssd_iter_140000_fp16.caffemodel"
DNN_CONFIG_PATH = "./models/trained_models/deploy.prototxt"
DNN_NET = cv2.dnn.readNetFromCaffe(DNN_CONFIG_PATH, DNN_CAFFE_MODEL_PATH)

I0000 00:00:1714069835.519054   28660 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1714069835.521084   34302 gl_context.cc:357] GL version: 3.2 (OpenGL ES 3.2 Mesa 23.2.1-1ubuntu3.1~22.04.2), renderer: RENOIR (renoir, LLVM 15.0.7, DRM 3.54, 6.5.0-28-generic)


The following functions will help draw the result to the video. They are also used in the [Unit Test for the OpenCV Server](https://github.com/RIT-NTNU-Bachelor/OpenCV_Server/blob/main/tests/test_utils.py):

In [61]:
def draw_rectangle(face_coords, image):
    """Function to draw a single rectangle

    Args:
    - face_coords (tuple): Tuple with x, y, width and height of the bounding box
    - image (Matlike): Image object obtained from the OpenCV imread function 
    """
    x, y, width, height = face_coords
    cv2.rectangle(image, (x, y), (x + width, y + height), COLOR, 2)


def draw_landmark(landmarks, image):
    """Function to draw a landmark.
    Draws a circle in the image for each landmark in the given list of landmarks

    Args:
        landmarks (list): List of landmarks where each landmark has a x and y position
        image (Matlike): Image object obtained from the OpenCV imread function 
    """
    for point in landmarks:
        cv2.circle(image, (point[0], point[1]), 1, COLOR, -1)

def draw_rectangle_from_dlib(face_rectangle, img_with_box):
    """Function that draws the given dlib rectangle to the given image

    Args:
        face_rectangle (dlib.rectangle): A dlib rectangle object
        img_with_box (Matlike): Image object obtained from the OpenCV imread function
    """
    cv2.rectangle(img_with_box, (face_rectangle.left(), face_rectangle.top()), (face_rectangle.right(), face_rectangle.bottom()), COLOR, 2)

The following function will help process a frame based on method. It iterates over each method and then uses some logic to decide what type of drawing function should be utilized. It takes a list of detectors and writers.  

In [62]:
def process_and_write_frame(frame, detectors, writers):
    """Process frame for each detection method and write to appropriate outputs using specific drawing logic for each."""
    for method, detector in detectors.items():
        face = detector(frame)
        frame_copy = frame.copy()
        
        if face is not None:
            # Check what model it is based on the datatype and then draw the landmark/rectangle accordantly 
            if isinstance(face, tuple) or (isinstance(face, np.ndarray) and face.ndim == 1): # Single face as a tuple or 1D numpy array 
                draw_rectangle(face, frame_copy)
            elif isinstance(face, np.ndarray) and face.ndim == 2: # Multiple faces in a 2D numpy array
                for single_face in face:
                    draw_rectangle(single_face, frame_copy)
            elif isinstance(face, dlib.rectangle):  # Single dlib rectangle => 1 face
                draw_rectangle_from_dlib(face, frame_copy)
            elif isinstance(face, dlib.rectangles):  # Multiple dlib rectangles => 2 faces
                for face_rectangle in face:
                    draw_rectangle_from_dlib(face_rectangle, frame_copy)
            elif isinstance(face, list): # DNN or CVZone
                if all(isinstance(f, tuple) and len(f) == 4 for f in face):  # List of tuples for DNN => 2 faces
                    for f in face:
                        draw_rectangle(f, frame_copy)
                elif len(face) == 468:  # CVZone => 1 face
                    draw_landmark(face,frame_copy)
                elif all(isinstance(f, list) for f in face) and all(len(f) == 468 for f in face):  #CVZone => 2 faces
                    for face_landmarks in face:
                        draw_landmark(face_landmarks, frame_copy)

        writers[method].write(frame_copy)

The next codeblock processes the video and creates new videos for each method: 

In [63]:
# Main processing setup
cap = cv2.VideoCapture(path_to_video)
width, height = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = cap.get(cv2.CAP_PROP_FPS)

# Prepare the output video writers for each type and detection functions
methods = ["haar", "dnn", "hog", "cvzone" , "mmod"]
writers = {method: cv2.VideoWriter(f"{output_dir}{method}.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (width, height)) for method in methods}

# All the detector 
detectors = {
    "haar": lambda img: detect_face_haar(img, HAAR_CLASSIFIER, detect_multiple_faces=False),
    "dnn": lambda img: detect_face_dnn(img, DNN_NET, detect_multiple_faces=False),
    "hog": lambda img: detect_face_hog(img, HOG_DETECTOR, detect_multiple_faces=False),
    "cvzone": lambda img: detect_face_cvzone(img, CVZONE_DETECTOR, detect_multiple_faces=False),
    "mmod": lambda img: detect_face_mmod(img, MMOD_DETECTOR, detect_multiple_faces=False)
}

# Processing loop
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    
    # Process frame and write to multiple outputs
    process_and_write_frame(frame, detectors, writers)

# Clean up
cap.release()
for writer in writers.values():
    writer.release()
cv2.destroyAllWindows()


Verify that the videos has been created. There should be a total of 5: 

In [66]:
output_count = 0

for filename in glob.iglob(f'{output_dir}/**', recursive=True):
    if os.path.isfile(filename) and filename.endswith(".mp4"):
        print(f"[INFO] Video file found: {filename}")
        output_count += 1


print(f"Total video files created: {output_count}")
assert(output_count == 5)

[INFO] Video file found: ./results/tracking_videos/hog.mp4
[INFO] Video file found: ./results/tracking_videos/haar.mp4
[INFO] Video file found: ./results/tracking_videos/dnn.mp4
[INFO] Video file found: ./results/tracking_videos/cvzone.mp4
[INFO] Video file found: ./results/tracking_videos/mmod.mp4
Total video files created: 5


After all videos has been created, one can manually review them to check the algorithms ability to track the movement of the users head.

## Resources

Intel IoT Developer Kit Sample Videos: <br>
https://github.com/intel-iot-devkit/sample-videos?tab=readme-ov-file#samples-videos 

Check out the GitHub repository here: [GitHub Repository](https://github.com/RIT-NTNU-Bachelor/case-study)

**Created by:** Kjetil Indrehus, Sander Hauge and Martin Johannessen