## Face Recognition
**Methods**: Haar's cascades, Local Binary Patterns.

**Instruments**: Python, OpenCV (3.4.3), PIL(4.3.0), NumPy (1.16.4)

** Interesting notes **:
1. OpenCV doesn't support gif format, so that's why we use Image module (from PIL)to read images in grayscale format and convert them into numpy arrays. 
If you have images in another format - do not use this module.

In [26]:
import os
import cv2
import numpy as np
import PIL
from PIL import Image

print("cv2 version: " + cv2.__version__)
print("numpy version: " + np.__version__)
print("PIL version: " + PIL.__version__)

# Haar's cascades
cascadePath = "/mnt/haarcascade.xml"
faceCascade = cv2.CascadeClassifier(cascadePath)

cv2 version: 3.4.3
numpy version: 1.16.4
PIL version: 4.3.0


### Theory
Object Detection using Haar feature-based cascade classifiers is an effective object detection method proposed by Paul Viola and Michael Jones in their paper, "Rapid Object Detection using a Boosted Cascade of Simple Features" in 2001. It is a machine learning based approach where a cascade function is trained from a lot of positive and negative images. It is then used to detect objects in other images.

Here we will work with face detection. Initially, the algorithm needs a lot of positive images (images of faces) and negative images (images without faces) to train the classifier. Then we need to extract features from it. For this, Haar features shown in the below image are used. They are just like our convolutional kernel. Each feature is a single value obtained by subtracting sum of pixels under the white rectangle from sum of pixels under the black rectangle.


![alt text](https://docs.opencv.org/3.4/haar.png)


Now, all possible sizes and locations of each kernel are used to calculate lots of features. (Just imagine how much computation it needs? Even a 24x24 window results over 160000 features). For each feature calculation, we need to find the sum of the pixels under white and black rectangles. To solve this, they introduced the integral image. However large your image, it reduces the calculations for a given pixel to an operation involving just four pixels. Nice, isn't it? It makes things super-fast.

But among all these features we calculated, most of them are irrelevant. For example, consider the image below. The top row shows two good features. The first feature selected seems to focus on the property that the region of the eyes is often darker than the region of the nose and cheeks. The second feature selected relies on the property that the eyes are darker than the bridge of the nose. But the same windows applied to cheeks or any other place is irrelevant. So how do we select the best features out of 160000+ features? It is achieved by Adaboost.

For this, we apply each and every feature on all the training images. For each feature, it finds the best threshold which will classify the faces to positive and negative. Obviously, there will be errors or misclassifications. We select the features with minimum error rate, which means they are the features that most accurately classify the face and non-face images. (The process is not as simple as this. Each image is given an equal weight in the beginning. After each classification, weights of misclassified images are increased. Then the same process is done. New error rates are calculated. Also new weights. The process is continued until the required accuracy or error rate is achieved or the required number of features are found).

The final classifier is a weighted sum of these weak classifiers. It is called weak because it alone can't classify the image, but together with others forms a strong classifier. The paper says even 200 features provide detection with 95% accuracy. Their final setup had around 6000 features. (Imagine a reduction from 160000+ features to 6000 features. That is a big gain).

So now you take an image. Take each 24x24 window. Apply 6000 features to it. Check if it is face or not. Wow.. Isn't it a little inefficient and time consuming? Yes, it is. The authors have a good solution for that.

In an image, most of the image is non-face region. So it is a better idea to have a simple method to check if a window is not a face region. If it is not, discard it in a single shot, and don't process it again. Instead, focus on regions where there can be a face. This way, we spend more time checking possible face regions.

For this they introduced the concept of Cascade of Classifiers. Instead of applying all 6000 features on a window, the features are grouped into different stages of classifiers and applied one-by-one. (Normally the first few stages will contain very many fewer features). If a window fails the first stage, discard it. We don't consider the remaining features on it. If it passes, apply the second stage of features and continue the process. The window which passes all stages is a face region. How is that plan!

The authors' detector had 6000+ features with 38 stages with 1, 10, 25, 25 and 50 features in the first five stages. (The two features in the above image are actually obtained as the best two features from Adaboost). According to the authors, on average 10 features out of 6000+ are evaluated per sub-window.

So this is a simple intuitive explanation of how Viola-Jones face detection works. Read the paper for more details or check out the references in the Additional Resources section.

In [0]:
recognizer = cv2.face.LBPHFaceRecognizer_create()

def get_images(path):
  image_paths = [os.path.join(path, f) for f in os.listdir(path) if not f.endswith('.happy')]
  images = []
  labels = []

  for image_path in image_paths:
    image_gray = Image.open(image_path).convert('L') #Adds or replaces the alpha layer in this image. If the image does not have an alpha layer, it’s converted to “LA” or “RGBA”. The new layer must be either “L” or “1”.
    image = np.array(image_gray, 'uint8') #convert the image format into np.array

    #get the label of the images
    label = int(os.path.split(image_path)[1].split(".")[0].replace("subject", ""))
    faces = faceCascade.detectMultiScale(image)

    #if face is detected, append the face to images and the label to labels
    #We are appending all the absolute path names of the database images in the list images_path. 
    #We, aren't appending images with the .sad extension, as we will use them to test the accuracy of the recognizer
    for (x, y, w, h) in faces:
      images.append(image[y: y + h, x: x + w])
      labels.append(label)
      cv2.imshow("Add faces to training set", image[y: y + h, x: x + w])
      cv2.waitKey(50)
  return images, labels

**cv2.waitKey** we loop around each images to detect the face in it and update our 2 lists


In [37]:
path = '/tmp'

#get faces and their labels
images, labels = get_images(path)
cv2.destroyAllWindows()

#train the model
recognizer.train(images, np.array(labels))

# Append the images with the extension .sad into image_paths
image_paths = [os.path.join(path, f) for f in os.listdir(path) if f.endswith('.sad')]

for image_path in image_paths:
predict_image_pil = Image.open(image_path).convert('L')
predict_image = np.array(predict_image_pil, 'uint8')
faces = faceCascade.detectMultiScale(predict_image)

for (x, y, w, h) in faces:
    label_predicted, conf = recognizer.predict(predict_image[y: y + h, x: x + w])
    label_actual = int(os.path.split(image_path)[1].split(".")[0].replace("subject", ""))
    
    if nbr_actual == label_predicted:
        print "{} is Correctly Recognized with confidence {}".format(label_actual, conf)
    else:
        print "{} is Incorrectly Recognized as {}".format(nbr_actual, label_predicted)
    cv2.imshow("Recognizing Face", predict_image[y: y + h, x: x + w])
    cv2.waitKey(1000)

DisabledFunctionError: ignored