# Deep Learning: Face Recognition
#### Linked-In Learning with: Adam Geitgey

#### Face Recognition now widely available
* Researchers openly shared their solutions for building face recognition systems
* But even if you know how to build a face recognition system, you still need a large dataset of images of people to train it.
* Companies like Facebook and Google already have access to large data sets like this since they have millions of users uploading photos of themselves everyday
* Researchers outside of these big companies have to work a little bit harder to build their own training data sets from images posted publicly online 
    * Anyone with access to a large training data set of images can build a face recognition system.
* The end results is that face recognition is now available to almost anyone in all kinds of products

#### What can you do with face recognition?
* The simplest use of face recognition is to check whether a specific person you already know is present in an image
    * This is called **identity verification**
* To do this, you capture photographs of the person you want to recognize and you use those photographs to train the face recognition system
    * The system can recognize that person when it sees them again
    * **This can be used as an alternative to normal user login systems or key card entrance systems.**
    
***
* Face recognition can also be used to quickly sort through large collections of images.
* Face recognition has many applications in security and surveillance
    * Imagine that you work for the police; you could feed a picture of a person of interest and the system search raw surveillance footage and pull out any video clips where that suspect appears; you could even do this across hundreds of cameras; that allows you to track a person as they move between cameras automatically
* Besides recognizing specific people that you know, face recognition can also be used to tell you when a new person appears
    * Count the number of unique people who appear in a video feed
    * **Some electronic billboards now contain cameras and use this technique to count the number of unique people that stop and look at the sign.** This info is used to measure the effectiveness of the advertisement
* Because face recognition systems are based on the model of how faces appear, they can be used to measure how similar (or not) two different people look
    * "Celebrity doppelganger"
  
#### 5 uses of face recognition
* Identity verification
* Automatically organizing raw photo libraries by person
* Tracking a specific person
* Counting unique people
* Finding people with similar appearances

### Tools for Face Recognition
   
#### Commerical Face Recognition Services
* Several large vendors provide face recognition APIs that you can use over the internet for a small fee
* They all work in similar ways and provide similar features but they are **trained with different data sets, meaning that their accuracy may be better or worse depending on your specific application.**
* **Some commonly used services are:**
    * **Amazon Rekognition API**
        * Face recognition, emotion detection, and motion tracking
    * **Microsoft Azure Face API**
        * Face recognition, age and gender detection, and face similarity matching 
* Both require an internet connection to use and you have to pay a small fee each time you use them
* Also, using those services requires uploading all your face data to a third-party, since the face recognition happens on their computers in the cloud
* For some applications, it's better to be able to do face reconition without a data connection and without having to share your data with anyone
    * In these cases, open-source face recognition systems might be a better choice
    
#### Open-Source Face Recognition Tools
* Can run locally on your computer without any external connections and without sharing any data
* **Two of the most popular are:**
    * **OpenFace:**
        * Created by Brandon Amos and Carnegie Mellon University
    * **dlib:**
        * Created by Davis King
        * A general purpose ML and computer vision library that has lots of features 
        * The instructor of this course, **Adam Geitgey** has written a Python library called **Face-Recognition** that makes it easier to use **dlib**
* Both are free and open-source  

## Face Recognition as a multi-step pipeline
* Recognizing a face is a complicated problem with several steps 
    * We have to chain together several different ML algorithms into a pipeline to complete the entire task of face recognition
    
### Step 1: Locate and Extract Faces
* Locate faces within the larger image
* Once we know where the face is located in the image, we'll extract that area as a new image
* This is the only part of the image that will pass to the next step of our face recognition pipeline (face detected within square).

### Step 2: Identify Facial Features
* Now we have a face image to work with, but we can't compare it directly to other face images 
* To be able to compare this face image with other faces, we need to be able to understand **how the person's head was turned or posed when the photo was taken;** a person's face looks different from different angles. If we don't take the person's head position into account, our face recognition system will think that the same person is two different people just because the person's head was turned a different way


<img src='data/identify_facial_features.png' width="200" height="100" align="center"/>

* To do this, we'll **use a machine learning algorithm that can look at a single face and identify the location of each facial feature within the face.**
    * We'll look for the position of the:
        * Eyes
        * Nose
        * Mouth
* We pass the face image and the location of each facial feature to the next step of our face recognition pipeline

### Step 3: Align Faces Using Pose
* The next step is to try to correct for the position or pose of the person's head
* We know the position of each facial feature; **each person's face is unique, but we can assume that all faces follow roughly the same structure.**


<img src='data/align_with_pose.png' width="600" height="300" align="center"/>

* On the right is a face position template. This template shows the average position of facial features across lost of people, assuming that the person is looking directly forward
* By comparing the position of each point on the left with the position of each point on the template, we can guess how far the head is turned and in what direction it was turned
* Then, we can warp our face image to roughly match the template
* This is called **aligning the face** because we are making sure the key facial features in the image line up with the face template before we move on to the next step in our pipeline

### Step 4: Represent Face as Measurements
* Now that we have aligned the face image, we're ready to turn the face into a set of numbers or measurements that represent this unique face.
* Other pictures of the same person should generate measurements that are very close to these numbers
* We'll use a neural network that was trained on millions of faces to come up with its own way to measure faces

### Step 5: Compare to Other Faces
* Now we can compare to other images by processing them the same way.
* To compare two faces, we'll calculate how different the measurements are using a formula
* **Euclidean distance between faces:**
    * $d(face_1, face_2) = 0.2304$
    * This formula basicaly measures how far apart the two different sets of measurements are 
        * If the measurements are close, we'll call it a match
        
        
#### Face Recognition Pipeline Steps
* Step 1: Locate and extract faces from each image
* Step 2: Identify facial features in each image
* Step 3: Align faces to match pose template
* Step 4: Encode faces using a trained neural network (aka **Face Encoding**)
* Step 5: Check Euclidean distance between face encodings

# Unit 3: Face Detection

#### What is face detection?
* **Face detection** is the ability to detect and locate human faces in a photograph
* We use face detection here to extract each face from a photograph and pass it to the next step in our face recognition pipeline

### Step 1: Sliding Window Classifier
* The easiset way to locate objects in an image is to build a **sliding window classifier.**
* Two steps:
    * 1) Build a simple face detection model using an ML model 
    * 2) Slide the simple face detector across a larger image 
* When a face is detected, we record the location of the face
#### Face detection algorithms we can use for sliding window classifier:
* Three of the most common are:
    * **Viola-Jones**
        * Uses decision trees to detect faces based on light and dark areas
        * Developed early 2000s
        * Pro: Very fast and great for low-powered devices
        * Con: Not very accurate; tends to have a lot of FPs (false positives)
        * No real reason to use this anymore unless you're working with very low-powered devices
    * **Histogram of oriented gradients (HOG)**
        * Invented 2005
        * Looks for shifts from light to dark areas in an image
        * Slower than Viola-Jones, but more accurate
        * Runs well on normal computers without special hardware
        * **This is the algorithm we'll use to build FRS in this course.**
    * **CNNs**
        * Uses a deep neural network to detect faces
        * Very accurate (**MOST accurate**), but **requires a lot of training data.**
        * Runs best on computers with dedicated GPUs
        * It will run very slowly otherwise: need to have right hardware
* Important to remember that face detection is its own separate step in our face recognition pipeline

### Analyzing an Image as a Histogram of Oriented Gradients (HOG)
#### Step 1: Convert to black and white
* HOG algorithm only looks at differences between light and dark areas in an image; it doesn't need color information

<img src='data/hog1.png' width="400" height="200" align="center"/>

* Our goal is to measure how dark a given pixel is in comparison to the pixels surrounding it and **find the direction where the biggest change happens**
* In this case, we can see that the pixel to the left of the center pixel is much lighter than the center pixel, and the pixel to the right is darker than this pixel
* In other words, at this exact point, the image is transitioning from a light area to a darker area
* Based on that, we draw an arrow on top of this pixel that points from left to right
* **This shows the movement of lighting at this exact point.**

<img src='data/hog2.png' width="350" height="125" align="center"/>

* If we repeat this process for every single pixel in the image, **the image turns into a map of transitions from light to dark areas.**
* These lines are called **gradients.**

<img src='data/hog3.png' width="350" height="125" align="center"/>

* Each gradient shows how the image flows from a light area to a dark area at that point.hog5.png
* Now let's zoom back out and see what the gradient map looks like:

<img src='data/hog4.png' width="400" height="200" align="center"/>
<img src='data/hog5.png' width="400" height="200" align="center"/>

* The **gradient map** is a simplified version of the original image, but it's still pretty complex
* Capturing the gradient for every single pixel is more detail than we need
* To detect faces, all we really need is to detect the overall structure of the image
* In other words, we can simplify this representation further
* Instead of keeping track of each separate gradient within the image, we'll just store a count of how many gradient points in each direction
* The original image is now a simple representation that captures the basic structure:

<img src='data/hog6.png' width="400" height="200" align="center"/>

* We can use this simplified representation to easily train a face detection model.

### Finding faces in images with HOG features 
#### Step 1: Collecting training data
* Convert data to HOG representations
#### Step 2: Train face classifier on HOG faces
* HOG face detectors can perform well with a fairly small amount of training data
#### Step 3: Sliding window classifier on HOG
* Any part of the image that returns true is a part of the image that contains a face
* HOG is a simplified representation of an image that still captures enough detail to detect faces 
* A HOG representation is not affected by small changes in lighting
* A HOG representation is not affected by small changes in an object's shape

### Coding face detection
* We'll be using a pre-trained HOG face detector to detect all the faces that appear in an image
* Because most human faces have roughly the same structure, the pretrained face detection model will work well for almost any image
* **There's no need to train a new one from scratch**
* **PIL** is the **Python Image Library**: it lets us easily display an image on the screen, and draw lines on top of the image
### `face_recognition` is the library that gives us access to the face detection model in `dlib`

* **Note:** that the `PIL` library works with images in its own internal format
    * So, we need to convert the image array into a `PIL` formatted image
    
# Chapter 4: Facial Feature Detection
### What is face landmark estimation?
* **Face Landmark Estimation:** Finding the location of key points on a face, such as the tip of the nose and center of each eye.
    * Works by starting with a known set of points that should appear on any face
    * It then moves those point around until they match the face image
    * The (orange) landmark model shown above is called a **68 point face landmark model**
    * To make face recognition systems run a little more quickly, we can also use a face landmark model with fewer points (like a **5 point model**, which only detects the edges of each eye and the bottom of the nose)
    
* Social media lenses: Adding snap filters or make-up to your face: uses face landmark models
* These applications work by first detecting the face landmarks and then using those points to overlay clothes or makeup in the right place
* The main use for face landmark estimation is: **Face Alignment**, where we correct for head rotation when doing face recognition

### Identifying Face Landmarks with an ML Model
* **Trick 1:** Assume all human faces are similar and roughly the same shape
    * Overlay the entire face template on the face, and then we'll only ask the computer to move and adjust the template so that each point is closer to the right point 
    
* **Trick 2:** Limit Movement of Each Point
    * Add the constraint of how much the computer can move each point
    * The rule is that no single point can be moved too far from its neighboring points
    * Notice that the below landmark points move a little from the original template, but no point moves too far from its neighboring points
    
<img src='data/landmarks1.png' width="300" height="150" align="center"/>

* **Trick 3:** Fine tune with multiple models:
    * Split the job of completely fitting the face template into the face into smaller problems
    * In other words, train several different ML models that each do part of the job
    * Subsequent models only need to learn to fix the mistakes of past models, making each of their jobs easier (because no one model needs to learn the entire process of fitting the template to the face, they just need to learn to make improvements)
    * This process continues with as many as 10 models
    
#### Automatic Face Landmark Estimation
* Once this cascade to face landmark model is trained, it should work for pretty much any face
* So, in our code, we won't have to train our own model from scratch
* We can just use a standard pre-trained model that should work for all of our images
* **Understanding how the model works allows you to understand its limitations**

### Posing faces based on face landmarks
* **Face alignment:** is where we adjust each face image so that key facial features (like the eyes, nose, and mouth) line up with a predefined template
* Correcting for head angle and rotation will make our face recognition system more accurate

#### Steps of face alignment
* Detect face landmarks
* Calculate affine transformation
    * **Affine transformatio:** A linear mapping between sets of points where parallel lines will remain parallel
    * Basically, we can move, rotate, and stretch our image, but we can't do more complex things like twisting or warping
    * We don't have to write any code to do this alignment, the face_recognition library will do this for us
* Each face in the list generated by `.face_landmarks` will be a python dictionary object where the keys are the names of the facial feature ("left eye", "right eye", "chin", etc) and the values are the list of `(x,y)` coordinates of the points that correspond to that facial feature

### Representing a face as a set of measurements
* The most important step in our face recognition pipeline: telling faces apart from each other
* The problem: We have a set of known faces in a database and an unknown face we'd like to identify
* The simplest approach is to take the unknown face and compare it to the known faces one by one

<img src='data/face_comp.png' width="400" height="200" align="center"/>

#### Comparing images doesn't work 
* Too slow (the bigger the database the slower)
* Doesn't capture the structure of each face
    * Different positions will throw off the comparison
    * Different backgrounds and clothing and hairstyles will throw off the model
    
* Solution: Representing faces as measurements
<img src='data/face_meas.png' width="400" height="200" align="center"/>

<img src='data/face_meas2.png' width="400" height="200" align="center"/>
<img src='data/face_meas3.png' width="400" height="200" align="center"/>

* **Face encoding:** The process of taking an image of a face and turning it into a set of measurements
* A real face-encoding system will capture a large number of face measurements (typically 128 or more)
* Instead of trying to decide on 128 ways to measure a face, we'll use ML to create those measurements
* **Deep Metric Learning:** Using deep learning to have a computer come up with a way to measure something that you don't know how to measure yourself
* **Training Triplets**: 2 different pictures of the same person, and one picture of a different person: goal is for computer to find a set of measurements that keeps the measurements of the two pictures of the same person closer than either of their measurements with the picture of someone else
* **Because a trained model should for any picture of any person, that means that we only have to train the face encoding model once.**
* When we write the code for our face recognition system, we'll be using a pre-trained model instead of training one from scratch. In most cases, this will work fine and you won't ever need to retrain your own face encoding model 
* **Model interpretability is a common problem in machine learning**
* **When models are hard to interpret, they can often have hidden biases. Watch out for this!**
* **In the face recognition model, there's often a hidden bias for the model to be more accurate for people from one region of the world than another.**
* **Face distance threshold:** set a face maximum distance that is still considered the same face. For our exercises we use 0.6

#### Advantages of Using Euclidean Distance
* Fast to calculate and easy to parallelize
* Works nicely with other common ML algorithms like KNN
* Makes it easy to store and query measurements using a standard database

* For faces of known people, we want to make sure there is only one person in the picture, that they are facing the camera, and that they are clear and visible and reasonable lighting

<img src='data/hog6.png' width="400" height="200" align="center"/>