# Merging the two models built before
This notebook aims to merge the two notebooks we had seen before, the first model is trained on coco8 dataset.  The second model is pre-trained for face detection found on hugging face. Both are YOLO models.

source: https://huggingface.co/arnabdhar/YOLOv8-Face-Detection


**Note** : Merging here does **NOT** reffer to combining the two models, its simply to put both of them in the same notebook and see what we can do with them since we are just testing.(**Why?** Because we have not finalized our models yet.) We will obviously add benchmarks at the end.

Benchmarks, i.e the speed taken for both the models is given below including the performances

In [None]:
!pip install ultralytics

In [None]:
!pip install supervision

In [1]:
from google.colab import drive
drive.mount('/content/drive')

In [7]:
# imports
import cv2
from google.colab.patches import cv2_imshow
from ultralytics import YOLO
from matplotlib import pyplot as plt
import numpy as np
import os
from tabulate import tabulate
from huggingface_hub import hf_hub_download
from supervision import Detections
from PIL import Image

In [3]:
model = YOLO('yolov8n.pt') # load a pre-trained model
model.info()

Downloading https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8n.pt to 'yolov8n.pt'...


100%|██████████| 6.23M/6.23M [00:00<00:00, 57.3MB/s]


YOLOv8n summary: 225 layers, 3157200 parameters, 0 gradients, 8.9 GFLOPs


(225, 3157200, 0, 8.8575488)

In [None]:
results = model.train(data='coco8.yaml', epochs=100, imgsz=640)

In [8]:
# download model
model_path = hf_hub_download(repo_id="arnabdhar/YOLOv8-Face-Detection", filename="model.pt")

# load model
face_model = YOLO(model_path)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model.pt:   0%|          | 0.00/6.25M [00:00<?, ?B/s]

In [9]:
def display_side_by_side(image1, image2):
    """
    Display two images side by side.

    Parameters:
        image1 (numpy.ndarray): The first image.
        image2 (numpy.ndarray): The second image.
    """
    concatenated_image = cv2.hconcat([image1, image2])

    cv2_imshow(concatenated_image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()



In [10]:
%cd /content/drive/MyDrive/images
!ls

/content/drive/MyDrive/images
001.png  002.jpg  003.jpeg  004.png


In [20]:
paths = ["001.png",  "002.jpg",  "003.jpeg", "004.png"]
image_results = []
saved_paths = []

for path in paths:
    filename, extension = os.path.splitext(path)
    save_path = f"{filename}_result{extension}"

    curr_result = model(path)[0]
    curr_result[0].save(save_path)

    image_results.append(curr_result)
    saved_paths.append(save_path)



image 1/1 /content/drive/MyDrive/images/001.png: 384x640 2 persons, 1 car, 172.4ms
Speed: 2.7ms preprocess, 172.4ms inference, 1.4ms postprocess per image at shape (1, 3, 384, 640)

image 1/1 /content/drive/MyDrive/images/002.jpg: 416x640 24 persons, 1 tie, 162.4ms
Speed: 3.4ms preprocess, 162.4ms inference, 2.0ms postprocess per image at shape (1, 3, 416, 640)

image 1/1 /content/drive/MyDrive/images/003.jpeg: 640x640 1 bear, 252.0ms
Speed: 3.9ms preprocess, 252.0ms inference, 1.7ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/drive/MyDrive/images/004.png: 384x640 1 bottle, 2 wine glasss, 1 cup, 2 forks, 3 knifes, 2 spoons, 1 bowl, 2 dining tables, 162.8ms
Speed: 3.0ms preprocess, 162.8ms inference, 1.8ms postprocess per image at shape (1, 3, 384, 640)


In [25]:
# COMMENTED OUT TO REDUCE PYNB SIZE, output saved in images folder

# for input, output in zip(paths, saved_paths):
#   image1 = cv2.imread(input)
#   image2 = cv2.imread(output)
#   display_side_by_side(image1, image2)


In [22]:
saved_paths

['001_result.png', '002_result.jpg', '003_result.jpeg', '004_result.png']

In [29]:
image_results_2 = []
face_paths = []
for path in paths:
    filename, extension = os.path.splitext(path)
    save_path = f"{filename}_face_result{extension}"

    curr_result = face_model(path)[0]
    curr_result[0].save(save_path)

    image_results_2.append(curr_result)
    face_paths.append(save_path)



image 1/1 /content/drive/MyDrive/images/001.png: 384x640 3 FACEs, 608.7ms
Speed: 6.6ms preprocess, 608.7ms inference, 1.6ms postprocess per image at shape (1, 3, 384, 640)

image 1/1 /content/drive/MyDrive/images/002.jpg: 416x640 24 FACEs, 1172.9ms
Speed: 92.1ms preprocess, 1172.9ms inference, 2.3ms postprocess per image at shape (1, 3, 416, 640)

image 1/1 /content/drive/MyDrive/images/003.jpeg: 640x640 1 FACE, 341.5ms
Speed: 8.6ms preprocess, 341.5ms inference, 1.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/drive/MyDrive/images/004.png: 384x640 (no detections), 167.1ms
Speed: 3.4ms preprocess, 167.1ms inference, 0.9ms postprocess per image at shape (1, 3, 384, 640)


IndexError: index 0 is out of bounds for dimension 0 with size 0

In [24]:
face_paths

['001_result_face_result.png',
 '002_result_face_result.jpg',
 '003_result_face_result.jpeg']

In [26]:
image_data = []

for result in image_results:
    speed = result.speed
    orig_shape = result.orig_shape
    image_data.append((orig_shape, speed['preprocess'], speed['inference'], speed['postprocess']))

print(tabulate(image_data, headers=[ "Original Shape","Preprocess", "Inference", "Postprocess",]))


Original Shape      Preprocess    Inference    Postprocess
----------------  ------------  -----------  -------------
(168, 300)             2.71463      172.394        1.44792
(480, 768)             3.41344      162.386        1.98889
(225, 225)             3.90077      251.972        1.68943
(514, 875)             2.9695       162.759        1.84631


In [27]:
image_data_2 = []

for result in image_results_2:
    speed = result.speed
    orig_shape = result.orig_shape
    image_data_2.append((orig_shape, speed['preprocess'], speed['inference'], speed['postprocess']))

print(tabulate(image_data_2, headers=[ "Original Shape","Preprocess", "Inference", "Postprocess",]))


Original Shape      Preprocess    Inference    Postprocess
----------------  ------------  -----------  -------------
(168, 300)             3.63016      199.303        2.563
(480, 768)             3.582        160.225        1.28984
(225, 225)             4.67849      254.233        1.26791


# Results

- The first model was properly able to detect all 24 persons in the second image, funfact the image is actually 24 AI generated faces ! It categorized all of them as Persons.
- The second model was able to detect three faces, including the person who is barely visible behind one of the people present in the scene.
- The second model was able to detect 24 faces in the second image as well
- The second model threw an error in the lats image, which had no faces, indicating it detects the presence of no face, which is exactly what we need.

- The time taken for each image is roughly the same for both the models

# Future Work
We now have models which can
- Detect faces
- Detect objects in the scene

Keep in mind these models, especially the object detection, is trained on coco8 dataset only, future scope is to train it on a much larger dataset, discussed in the first YOLOv8_coco8.ipynb notebook. coco8 dataset only consists 4 images.

**Note**: another method which passes the input picture first through the object detection model followed by face detection model was tested, but those results were not saved as the bounding boxes get cluttered.

1) Now we aim to see how we can use the attributes of the detected objects to enhance this passage of photos to the second model. This can be done by , first detecting the presence of a `person` , if person is present we should be passing the photo to detecting the presence of a `face` on the second model.

2) Training the first model (object detection) on a larger dataset is still pending and is a task left due time.

3) Unfortunately the hugging face model we are using, the code generating it is not open source, I could not find it. Fortunately the datasets it was trained on is given, we could reverse engineer and build a model which combines both of them in a transfer learning fashion.