# Faster Facial Landmark Detection

### How to speed up facial landmark detector

Dlib has a very good implementation of a very fast facial landmark detector. However, you sometimes hear people complain that Dlib’s facial landmark detector is slow. Out of the box, it appears to be slow, but that is not because of bad implementation of the Facial Landmark Detector.

Dlib’s facial landmark detector implements a [paper](http://www.csc.kth.se/~vahidk/papers/KazemiCVPR14.pdf) that can detect landmarks in just 1 millisecond! That is 1000 frames a second. You will never get 1000 fps because the landmark detector is not the bottleneck.

Optimizing code for speed involves first finding the bottlenecks. Sometimes I get an email from people asking what parameters to choose while training Dlib’s landmark detector to make it faster. Now, that is a wrong place to optimize because even though you can indeed make the landmark detector faster by optimizing training parameters, it will make zero difference to the final product. Even if you make it twice as fast, the code that now runs in 1 millisecond, will run in 0.5 milliseconds after all the optimizations you have done.

Let’s find out the bottlenecks and how to improve the speed.

### Compile Dlib in Release Mode with Optimizations turned on

As mentioned in Dlib’s [documentation](http://dlib.net/faq.html#Whyisdlibslow), it is critical to compile Dlib in release mode with appropriate compiler instructions turned on. The instructions in section 1.1 have been adapted from Dlib’s website and included for your convenience.

### Speed Up Face Detection

As you have seen in the previous section, landmark detection is a two step process. First, the faces are detected in an image, and then landmark detector is run inside each face bounding box.

The landmark detector runs in 1 millisecond. The face detector, depending on the size of the image, can take anywhere between 15 milliseconds to 60 milliseconds or even more. Face detection is the biggest bottleneck that needs to be addressed.

The following steps will help speed up face detection with small ( probably negligible ) loss in accuracy.

### Resize Frame

Facial Landmark Detector algorithms requires the user to provide a bounding box containing a face. The algorithm takes as input the image along with this box and returns the landmarks. The speed of face detection depends on the the resolution of the image because with smaller resolution images, you look for a smaller range of face sizes. The downside is that you will miss out smaller faces, but in many applications we have one person looking at the camera / webcam.

An easy way to speed up face detection is to resize the frame. My webcam records video at 720p ( i.e. 1280×720 ) resolution and I resize the image to a fixed height and vary the width accordingly. The bounding box obtained after face detection should be resized back by dividing the coordinates by the scale used for resizing the original frame. This allows us to do landmark detection at full resolution.

### Skip frame

Typically webcams record video at 30 fps. In a typical application you are sitting right in front of the webcam and not moving much. So there is no need to detect the face in every frame. We can simply do facial landmark detection based on facial bounding box obtained a few frames earlier. If you do face detection every 3 frames, you can have just sped up landmark detection by almost three times.

Is it possible to do better than using the previous location of the frame ? Yes, we can use object tracking methods to track the location of the face in frames where detection is not done, but in a webcam / selfie application it is an overkill.

### Optimizing Display

Third of the time was spent in drawing the landmarks and displaying the frame. In a real world application, you should never use HighGUI. The platform you work with usually has methods for capturing and rendering frames, and you can use different threads for processing frames and displaying them.

By default imshow with waitKey slows down execution speed because rendering of the output to screen happens in the same thread.

### Resize Frame

We resize the image to half resolution for display. This makes a huge difference because when the resolution is changed from 720p to 360p, the actual number of pixels that need to be displayed goes down by a factor of 4.

### Speed Up Code

In [1]:
import cv2,dlib
import sys
from renderFace import renderFace

In [2]:
PREDICTOR_PATH = "shape_predictor_68_face_landmarks.dat"

#### Set parameters for resizing and skipping frames

In [3]:
RESIZE_HEIGHT = 480
SKIP_FRAMES = 2

#### Initialize the video capture device

In [4]:
# Create an imshow window
winName = "Fast Facial Landmark Detector"

# Create a VideoCapture object
cap = cv2.VideoCapture(0)

# Check if OpenCV is able to read feed from camera
if (cap.isOpened() is False):
    print("Unable to connect to camera")
    sys.exit()

# Just a place holder. Actual value calculated after 100 frames.
fps = 30.0

# Get first frame
ret, im = cap.read()

#### Resize the input frame

In [5]:
# We will use a fixed height image as input to face detector
if ret == True:
    height = im.shape[0]
    # calculate resize scale
    RESIZE_SCALE = float(height)/RESIZE_HEIGHT
    size = im.shape[0:2]
else:
    print("Unable to read frame")
    sys.exit()

#### Set up face detector and landmark detector 

In [6]:
# Load face detection and pose estimation models
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor(PREDICTOR_PATH)
# initiate the tickCounter
t = cv2.getTickCount()
count = 0

#### Loop over the video and display the result 

The main thing to note in this loop is that we go over each frame and use SKIP_FRAMES to perform face detection and landmark detection after skipping certain number of frames.

And also rezize the output frame so that rendering the video takes lesser time.

In [None]:
# Grab and process frames until the main window is closed by the user.
while(True):
    if count==0:
      t = cv2.getTickCount()
    # Grab a frame
    ret, im = cap.read()
    imDlib = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
    
    # create imSmall by resizing image by resize scale
    imSmall= cv2.resize(im, None, fx = 1.0/RESIZE_SCALE, fy = 1.0/RESIZE_SCALE, interpolation = cv2.INTER_LINEAR)
    imSmallDlib = cv2.cvtColor(imSmall, cv2.COLOR_BGR2RGB)
    
    # Process frames at an interval of SKIP_FRAMES.
    # This value should be set depending on your system hardware
    # and camera fps.
    # To reduce computations, this value should be increased
    if (count % SKIP_FRAMES == 0):
      # Detect faces
      faces = detector(imSmallDlib,0)

    # Iterate over faces
    for face in faces:
      # Since we ran face detection on a resized image,
      # we will scale up coordinates of face rectangle
      newRect = dlib.rectangle(int(face.left() * RESIZE_SCALE),
                               int(face.top() * RESIZE_SCALE),
                               int(face.right() * RESIZE_SCALE),
                               int(face.bottom() * RESIZE_SCALE))

      # Find face landmarks by providing reactangle for each face
      shape = predictor(imDlib, newRect)
      # Draw facial landmarks
      renderFace(im, shape)

    # Put fps at which we are processinf camera feed on frame
    cv2.putText(im, "{0:.2f}-fps".format(fps), (50, size[0]-50), cv2.FONT_HERSHEY_COMPLEX, 1.5, (0, 0, 255), 3)
    # Display it all on the screen
    cv2.imshow(winName, im)
    # Wait for keypress
    key = cv2.waitKey(1) & 0xFF

    # Stop the program.
    if key==27:  # ESC
      # If ESC is pressed, exit.
      sys.exit()

    # increment frame counter
    count = count + 1
    # calculate fps at an interval of 100 frames
    if (count == 100):
      t = (cv2.getTickCount() - t)/cv2.getTickFrequency()
      fps = 100.0/t
      count = 0
cv2.destroyAllWindows()
cap.release()