# Face Detection

## Introduction

[Face detection](https://en.wikipedia.org/wiki/Face_detection) is a computer technology being used in various applications that identify human faces in digital images.  The purpose of this project is to research face detection methods and select and apply one. 

My professor has provided the class a set of images. I will analyze these images and apply the HOG technique to detect faces. I will then draw a rectangle on each face of every image. Following this step, I will conclude by stitching the individual images together to produce a video.

A special thanks to Adam Geitgey's [Machine Learning is Fun](https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471) series on Medium. 

## Literature Review

Face detection is the process of identifying any human face in a given image or video. There are multiple techniques that pertain to an agent being able to detect a face in a given image. The methods are: knowledge-based, feature invariant approaches, template matching, and appearance based methods.

The knowledge-based method is a rule-based method that has encoded in it knowledge about what constitutes a human face. The researcher is the one responsible for deriving these rules. A researcher can state that a face has two eyes, a nose and mouth. The difficulty arises in translating this knowledge into rules the system can comprehend. 
In feature invariant approaches, the aim is to find features that are present in an image even when the pose, viewpoint, or lighting conditions vary. The algorithm then utilizes these features to locate faces in an image. 
Template matching method is a technique that takes multiple standard patterns of a face and computes similarities between an input image and the stored patterns. The system is then able to both locate and detect a face in an image. Template matching is very simple to implement. Yet, one of the drawbacks of this approach is that it cannot deal with variations in pose, scale, and shape. 

Finally, appearance-based methods contrast heavily with template matching such that the templates in this method are learned from receiving multiple images as data. These techniques rely on principles of machine learning and statistics to generate the “templates”. Support vector machines, hidden Markov models, and deep convolutional neural networks are all examples of appearance-based methods. 

## Histogram of Oriented Gradients (HOG)

 For face detection I opted to go with the [Histogram of Oriented Gradients](http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf) (HOG) technique. 

The HOG technique is a feature descriptor utilized in computer vision
for object detection in digital images. The principle, or theory, of histogram of oriented gradients is that objects in images can be described as the distribution of intensity gradients. HOG takes a pixel and determines which surrounding pixels are getting darker and replaces the analyzed pixel with a vector pointing towards the darkest surrounding pixel. 

![HOG applied to a face](example.png)

In deciding to use this approach I compared HOG to another classic face detection technique known as Haar Cascades. I decided that while Haar Cascades may be faster and better suited for real-time applications, it provided too many false positives and HOG has better accuracy in recognizing faces. 

The goal is to be able to detect as many faces as possible. It is important to note beforehand that not all faces will be recognized immediately. This may be due to the orientation of the face as well as shading that prevents the HOG technique from detecting the face. 

![Before Face Detection](Pre_Face_Detection.gif)

## Method

The main step is the use of a face detection algorithm to detect faces in an image. For this step I am utilizing the previously mentioned technique, Histogram of Oriented Gradients. I am implementing HOG using the [dlib](http://dlib.net) library. Students at another university tested both HOG and Haar Cascades with both OpenCV and dlib and they found that dlib's HOG implementation was far more accurate than the others. 

To get dlib's Python API installed I followed PyImageSearch's instructions:
<http://www.pyimagesearch.com/2017/03/27/how-to-install-dlib/>



Lets begin by first importing the numpy library for fast operations. We will also import the Dlib library which we will use for the HOG technique. Skimage will be utilized to read in the frames. 

In [1]:
import dlib
import numpy as np
from skimage import io

Create the HOG face detector using dlib.

In [2]:
face_detector = dlib.get_frontal_face_detector()

Now we will begin the body of the code. 

In [3]:
# Create a window object to display the images
win = dlib.image_window()

with open("images/all_file.txt") as file:
    for line in file:
        image_name = line
        
        # Load the image into an array.
        image = io.imread("images/" + image_name[:-1])
        
        # Run HOG face detector on the image. 
        # This will result in a set of bounding boxes on the faces.
        detected_faces = face_detector(image, 1)
        
        # Clear previous overlays & add the image to the window
        win.clear_overlay()
        win.set_image(image)
        
        # For each face detected, add the overlay
        for i, face_rect in enumerate(detected_faces):
            win.add_overlay(detected_faces) 
        

KeyboardInterrupt: 

## Results

## Future Work

One key addition to this program can be the addition of detecting faces even when they are turned or facing a different direction. 

In order to accomlish this one can use **face landmark estimation** in order to create a "map" of the human face. We create a set of points (landmarks) that represent common features on every face such as lips and eyes. A technique that can be used is [Kazemi & Sullivan's](http://www.csc.kth.se/~vahidk/papers/KazemiCVPR14.pdf).

![feature landmark extraction](fle.png) 
The 68 landmarks we will locate on every face. This image was created by Brandon Amos of CMU who works on OpenFace.

After knowing where the facial features are located one can [shear](https://en.wikipedia.org/wiki/Shear_mapping), rotate, and scale the image in order to center the eyes and mouth as much as possible. 

A basic [affline transformation](https://en.wikipedia.org/wiki/Affine_transformation) can be utilized to perform these operations.