<a href="https://colab.research.google.com/github/DanielKyr/NeonateLabeler/blob/main/SYSC4415_Assignment2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Adapted from [this tutorial](https://machinelearningmastery.com/how-to-perform-face-detection-with-classical-and-deep-learning-methods-in-python-with-keras/).

In [None]:
#Run this cell to be able to access required data
#Follow the instructions displayed in the output

from google.colab import auth
from googleapiclient.discovery import build
auth.authenticate_user()
drive_service = build('drive', 'v3')

# Introduction
Face detection is a computer vision problem that involves finding faces in photos. It is a trivial problem for humans to solve and has been solved reasonably well by classical feature-based techniques, such as the cascade classifier (see below). More recently deep learning methods have achieved state-of-the-art results on standard benchmark face detection datasets. One example is the Multi-task Cascade Convolutional Neural Network, or MTCNN for short.

<img src="http://kpzhang93.github.io/SPL/stylesheets/image007.png" width="500">

In this assignment, you will discover how to perform face detection in Python using classical and deep learning models.

The two main approaches to face recognition: 


1.   Feature-based: use hand-crafted filters to search for and detect faces
2.   Deep learning: learn holistically how to extract faces from the entire image





# Part 1) Feature-based Cascaded Classifier
Feature-based face detection algorithms are fast and effective and have been used successfully for decades. Perhaps the most successful example is a technique called cascade classifiers first described by Paul Viola and Michael Jones and their 2001 paper titled “Rapid Object Detection using a Boosted Cascade of Simple Features.”

In the paper, the AdaBoost model is used to learn a range of very simple or weak features in each face, that together provide a robust classifier. The models are then organized into a hierarchy of increasing complexity, called a “cascade”. Simpler classifiers operate on candidate face regions directly, acting like a coarse filter, whereas complex classifiers operate only on those candidate regions that show the most promise as faces.

We are going to use a modern implementation of the Classifier Cascade face detection algorithm as provided in the OpenCV library. The benefit of this implementation is that it provides pre-trained face detection models, and provides an interface to train a model on your own dataset.

## Load test images and pre-trained models
Run the below cell to download some test images and the pre-trainined models. Add your own test image by pasting a link to `image_3_url`. Find an image with frontal faces of a group of people that you think will be difficult for the classifier. 

In [None]:
import urllib.request
model1_url = "https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarcascade_frontalface_default.xml"
urllib.request.urlretrieve(model1_url, "haarcascade_frontalface_default.xml")
model2_url = "https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarcascade_eye.xml"
urllib.request.urlretrieve(model2_url, "haarcascade_eye.xml")
image1_url = "https://thumbor.forbes.com/thumbor/fit-in/1200x0/filters%3Aformat%28jpg%29/https%3A%2F%2Fspecials-images.forbesimg.com%2Fimageserve%2F6052170a9df9bdf69d63f201%2F0x0.jpg"
urllib.request.urlretrieve(image1_url, "test_1.jpg")
image2_url = "https://pbs.twimg.com/media/BhxWutnCEAAtEQ6.jpg"
urllib.request.urlretrieve(image2_url, "test_2.jpg")
image3_url = ""
urllib.request.urlretrieve(image3_url, "test_3.jpg")

('test_2.jpg', <http.client.HTTPMessage at 0x7f32c0548990>)

## Q1) Face detection (3)
Load the face detection model ("haarcascade_frontalface_default.xml") and run it on the three test images. 
**Hint: see [documentation](https://docs.opencv.org/4.5.3/d9/d80/classcv_1_1cuda_1_1CascadeClassifier.html) or [tutorial](https://opencv24-python-tutorials.readthedocs.io/en/latest/py_tutorials/py_objdetect/py_face_detection/py_face_detection.html)**



1.   Plot the overlayed bounding boxes of the detections for each image
2.   For "test_2.jpg" adjust the parameters `minNeighbors` and `scaleFactor`. Describe what the parameters control and how they affect the face detection. Give a set of parameters that improve the detection over the default values.
3. Comment on the performance of the detector on the image you selected for "test_3.jpg". Describe the failure points of the detector. Apply the same parameters you selected in *Q1.2)*.  Does it improve the performance?




In [None]:
#Function to load classifier
from cv2 import CascadeClassifier
#Solution here:

*Discussion here:*

*Q1.2)*

*Q1.3)*

## Q2) Face Landmark Detection (3)
Load the eye detection model ("haarcascade_eye.xml") and run it on "test_1.jpg".


1.   Plot the overlayed bounding boxes of the detections
2.   Comment on the performance of this task on the test image and describe the failure points of the detector. Based on this performance, do you think this is a more difficult task than face detection, why or why not?
3. Give a set of parameters that improve the detection over the default values. What improvement is seen from changing the parameters?


In [None]:
#Solution here:

*Discussion here:*

*Q2.2)*

*Q2.3)*

# Part 2) Deep Learning - MTCNN
Multi-task Cascaded Convolutional Networks (MTCNN) is a framework developed as a solution for both face detection and face alignment. The process consists of three stages of convolutional networks that are able to recognize faces and landmark location such as eyes, nose, and mouth.
The [paper](https://arxiv.org/ftp/arxiv/papers/1604/1604.02878.pdf) proposes MTCNN as a way to integrate both tasks (recognition and alignment) using multi-task learning.


<img src="https://machinelearningmastery.com/wp-content/uploads/2019/03/Collage-Students-Photograph-with-Bounding-Boxes-and-Facial-Keypoints-Drawn-For-Each-Detected-Face-using-MTCNN.png" width="500">

[Source](https://machinelearningmastery.com/how-to-perform-face-detection-with-classical-and-deep-learning-methods-in-python-with-keras/)

## Q3) MTCNN Architecture (1)

Skim the MTCNN paper. Name and describe the three stages of MTCNN, provide the equation (latex format) for the loss function of each task and state the type of problem (i.e. regression or classification) for each loss function. Provide your answer in the cell below.

*Discussion here:*

Stages:
1.   
2.   
3.  

Loss functions:
1. 
2. 
3. 


## Q4) MTCNN on Test Images (3)

Apply MTCNN on the test images from Part 1. Overlay the bounding boxes and the landmarks on each image and plot the results. That is, each image should be displayed with each face bounding box and corresponding landmarks added to the image (similar to the one shown in the Part 2 introduction). Compare these results to those from *Q1)* and discuss.

In [None]:
#Load MTCNN network
!pip install mtcnn
from mtcnn.mtcnn import MTCNN
#Solution here:

*Discussion here:*

## CelebA Dataset
To quantitatively test the performance of the face detections, we need an annotated dataset. We will use the CelebA dataset. 

CelebFaces Attributes Dataset ([CelebA](https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)) is a large-scale face attributes dataset with more than 200K celebrity images. The images in this dataset cover large pose variations and background clutter. It contains annotations for face bounding boxes and 5 landmarks:
1. left_eye
2. right_eye
3. nose
4. mouth_left
5. mouth_right

A subset of these images and their annotations are provided in a zip file on google drive. Run the cell below to download and load the data into the workspace.

You can access the image data by: `imgs[#]`

You can access the image filename by: `imgs.files[#].split('/')[-1]`

You can access elements of annoation dataframe in 3 ways:
 1. `anno.loc[image_id,column_name]`
 2. `anno.iloc[row_#,column_#]`
 3. `anno[column_name][row_#/image_id]`



In [None]:
import io
from googleapiclient.http import MediaIoBaseDownload
from zipfile import ZipFile
import skimage.io 
import matplotlib.pyplot as plt
import pandas as pd

#Download  file
file_id = "1Nt9uUZgssHUlwHbe9LFVZ0hhgaIQmcj-"
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
  _, done = downloader.next_chunk()
#Extract zip file
with ZipFile(downloaded) as zipf:
    zipf.extractall()

#Load images
imgs = skimage.io.collection.ImageCollection('celeb_a_mini/imgs/*.jpg')
#Load bounding box
df = pd.read_csv('celeb_a_mini/list_bbox_celeba.txt',delim_whitespace=True,header=1).set_index('image_id')
bbox = df.loc[[f.split('/')[-1] for f in imgs.files]].copy()
#Load landmarks
df = pd.read_csv('celeb_a_mini/list_landmarks_celeba.txt',delim_whitespace=True,header=1)
landmarks = df.loc[[f.split('/')[-1] for f in imgs.files]].copy()
#Combine annotations
anno = pd.concat([bbox,landmarks],axis=1)

#Plot example image
plt.imshow(imgs[10]) #access image data by accessing the array (i.e. imgs[#])
plt.show()
#Print locations of landmarks
print(imgs.files[10].split('/')[-1])
print('left-eye coordinates: ({},{})'.format(anno['lefteye_x'][10],anno['lefteye_y'][10]))
print('upper left point of bbox: ({},{})'.format(anno['x_1'][10],anno['y_1'][10]))

In [None]:
#Display dataframe which contains ground truth annotations
anno.head()

## Intersection over Union
One metric to evaluate the accuracy of our face detection is Intersection over Union (IoU). IoU is an evaluation metric used to measure the accuracy of an object detector on a particular dataset. More formally, in order to apply IoU to evaluate an object detector we need:
* The ground-truth bounding boxes
* The predicted bounding boxes from our model.

To calculate IoU we use: 

<img src="https://www.pyimagesearch.com/wp-content/uploads/2016/09/iou_equation.png)" width="500">

[Source](https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/)


## Q5) Evaluate Performance using IoU (5)

Calculate the average IoU across all detections over the CelebA dataset using MTCNN and the Cascaded Classifier ("haarcascade_frontalface_default.xml"). Compare the performance of the methods. Show an example of an image with misclassification (IoU < 0.5) for both methods. 

In [None]:
#Solution here:

*Discussion here:*

## Bonus: Mean % Error Landmarks (1)
For the correct detections from MTCNN, calculate the average euclidian distance between the predicted landmark with the ground truth landmarks for all landmarks; normalize these values with respect to the inter-ocular distance (ground-truth distance between left and right eyes). Compare the accuracy across the five landmarks.

In [None]:
#Solution here: