## Intro ##

While the focus of this notebook is to detect a face in a picture, I'll also try to cover some basics, to get a better understanding of Python.

### Running Terminal Commands ###

This Jupyter Notebook is so cool that allows you to run terminal commands directly in its cells. To run a command, just open a new cell and put a `!` (exclamation mark) just in front of the row, like this:

In [None]:
!mkdir bogdan
!touch bogdan/dobrica.py

- The first command has creates a folder named `bogdan` in same folder as this notebook, as `mkdir` comes from `make directory` (directory = folder).
- The second command created an empty python script file, named `dobrica.py` inside the `bogdan` folder. Actually, the `touch` command means *change the date & time of the file passed as argument, but if the file doesn't exist create a new one*.

### Modules ###

Modules are pieces of software written by others that we can reuse and do things the lazy way. Actually, a module is usually a folder that contains some python scripts. Let me show you. Remember the previous folder and file we've just created? You can use them as (useless) modules in this notebook:

In [None]:
from bogdan import dobrica

You can check if the module is loaded correctly using the `dir` function applied directly on the name of the module. It will show whatever the module contains.

In [None]:
dir(dobrica)

Whoa! Even useless modules have things inside. Those are put there by default and you can access them with dots, like this:

In [None]:
dobrica.__file__

Last, let's make the `dobrica` module do something. Let's edit the file and add a variable inside. You can use the Jupyter Notebook editor for adding this line in `dobrica.py` file:
```python
is_doing = 'good'
```
Unfortunately, modules are cached. So to see if the change works, now it's the time to restart the `Kernel` in Jupyter Notebook.

In [None]:
from bogdan import dobrica
dir(dobrica)

Already `is_doing` appears in the list. You can call it by using the dot:

In [None]:
dobrica.is_doing

### Some Programming Basics ###

Any computer program can be written knowing just a few simple things.

**Firstly**, how to get things out of a program. And in Python this is done with `print`:

In [None]:
print(a) # will output 1 on a line
print(a, name) # will output 1 followed by a space (,) and then bogdan

**Secondly**, how to assign a variable. Variable are names for places in memory where data is stored. Knowing the name is more convenient than knowing the exact address.

In [None]:
a = 1           # here "a" will point to a memory zone
                # that contains the number 1
name = 'bogdan' # here "name" will point to a memory zone
                # that contains the string 'bogdan'

**Thirdly**, how to get things in from outside a program. Good programmers always load things from files. Here's how to do that:

In [None]:
fp = open('bogdan/dobrica.py', 'r') # open the specified file for reading
print(fp.read()) # read the whole content of the file
fp.close() # don't forget to close the file

**Fourthly**, how to check conditions. This will allow you to make decisions inside a computer program.

In [None]:
if name == 'bogdan':
    print('Oh! Bogdan! I know you!')
else:
    print('I though your name is Bogdan!')

And last, **fifthly**, how to loop through things. While usually you would use a `for` for this, `while` is way more flexible. Here's how to use it:

In [None]:
a = 0
while a < 10:
    print(a, 'x', a, '=', a*a)
    a = a + 1

### A few things about lists ###

Lists are cool cause you can store lots of things, neatly packed and point to that with a single name. In python, you can use `[]` to define a list.

In [None]:
a_list = [] # this is an empty list
b_list = [1,2,3] # this is a list of numbers
c_list = ['andrei', 'bogdan', 'cristi'] # this is a list of strings
d_list = [1, 2, 'bogdan'] # this is a mixed list
e_list = [a_list, b_list, c_list] # this is a list of lists

In [None]:
c_list

One of the cool things with lists is that you can `splice` them. This means you can access just part of the list very easy. Here are some nifty examples:

In [None]:
c_list[0:2] # get the elements between the first (0)
            # to but not including the third (2)

In [None]:
c_list[1:] # get the elements between the second (1)
           # to the last one (empty)

In [None]:
c_list[:2] # get the elements between the first one (empty)
           # to the third one (2)

In [None]:
c_list[::2] # get the elements between the first one (empty)
            # to the last one (empty), but going from 2 to 2

In [None]:
c_list[:-1] # get the elements between the first one (empty)
            # to the first from last (-1)

In [None]:
c_list[::-1] # get the elements between the first one (empty)
             # to the last one (empty), but with counting backwards

You can change an element of a list like this:

In [None]:
c_list[0] = 'dan' # change the last element to 'dan'
c_list

In [None]:
c_list[0:2] = ['a', 'b'] # or do it for first two elements at a time
c_list

You can append new elements to a list with:

In [None]:
c_list.append('dan')
c_list

Or remove an element from a list with:

In [None]:
del(c_list[0])
c_list

You can merge two lists, using `+`:

In [None]:
f_list = b_list + c_list
f_list

And multiply lists with numbers like this:

In [None]:
g_list = c_list * 3
g_list

Also, you can unpack lists like this:

In [None]:
a, b, c = b_list
print(a, b, c)

### Open CV Basics ###

Install the required libraries for the current kernel:
- **opencv-contrib-python==4.1.0.25**, a library that allows capturing and processing of images; this is the only version that installs correctly on Raspian, the OS running on Raspberry PI;
- **matplotlib**, a library that allows displaying charts or images;
After the first run of the below cell, you have to restart the kernel and run it again.

You need to restart the Kernel for this to work so press double-0.

In [None]:
import sys
!{sys.executable} -m pip install opencv-contrib-python==4.1.0.25
!{sys.executable} -m pip install matplotlib
!{sys.executable} -m pip install dlib

To check that everything worked ok, first let's try to get an image from the camera. For this, we need to load the `cv2` module (actually, the short-lazy name for `opencv-contrib-python`) - which allows us to capture and process images from the camera and the `matplotlib.pyplot` module which allows displaying images.

Remember from last-time, that we used `cv2.VideoCapture(0)` to initialize the camera and all the things that allows us to capture an image from the webcam and we used the `read()` function on the obtained object to read a status and a frame in BGR format (meaning, pixels are stored as blue-green-red order).

Because images are usually processed as red-green-blue, we use the OpenCV `cvtColor` function which converts an image from a color format to another. We'll use `cv2.COLOR_BGR2RGB` to convert from blue-green-red to red-green-blue.

Here's how:

In [None]:
import cv2 # loading the opencv module to allow video capturing and image processing
import matplotlib.pyplot as plt # loading the module that displays charts or images

video = cv2.VideoCapture(0) # initialize the Raspberry Pi Camera
success, frame = video.read() # read a frame from the camera
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) # convert the frame to RGB
plt.imshow(frame_rgb) # display the image
video.release() # release the Raspberry Pi Camera

You should run the above cell until you get a nice picture of your face for the next steps. To save that picture as I'd like to show how you can play with it, I'll make a copy:

In [None]:
frame_copy = frame_rgb.copy()

Here's how to get the image size:

In [None]:
frame_copy.shape

Now, you can think of this image as a 3D list. You have width, you have height but you also have depth - that is, color depth, red-green-blue. So all things from lists apply. For example, you can fill with white a square in the top-left corner like this:

In [None]:
frame_copy[0:250,0:250,:] = [ 5, 227, 235 ] # red = 255, green = 255, blue = 255
plt.imshow(frame_copy) # let's display the image

If you need to see a larger image, you can change the default size of the displayed images using the following command:

In [None]:
plt.rcParams['figure.figsize'] = (20, 10) # here, the width is 20 and the height is 10

After running the above command, the displayed image size will change every time `plt.imshow` is called.

In [None]:
plt.imshow(frame_copy)

To put a square in the bottom-left corner - I choosed this one as it's a little harder, is like this:

In [None]:
frame_copy[-250:,0:250,:] = [ 255, 0, 255 ] # red = 255, green = 255, blue = 255
plt.imshow(frame_copy) # let's display the image

### Viewing the live feed ###

This is a cool thing I though I'd show and it allows me to demonstrate some other modules from Python.

But first, let's talk a little about **exceptions**. So, when something bad happens in code, an exception is said to be thrown. The simples example is dividing something to zero. Let's see the exception:

In [None]:
a = 2
b = 0
print(a/b)

Sometimes is not cool to have this kind of errors. So we can **catch** them and tackle them in code. Like this:

In [None]:
try:
    c = a / b
except:
    print('Are you nuts?! You can\'t divide', a, 'to', b)

So why did I talked about this exceptions? Well, remember **CTRL+C** that stops a program when running? The same thing happens when in a running cell you press **stop**. So this will generate an exception that we can catch and exit gracefully from the program.

We need this, cause if we want to display a live feed from the camera, we need an infinite loop that we can interrupt by pressing **CTRL+C**.

In [None]:
def functie():
    return ('numar',True,3)

_, b, _ = functie()
print('b', b)

In [None]:
import time # load the time module; i'll use it here only for the sleep function
import IPython # load the IPython modules; this gives access to low level Jupyter Notebook functions

video = cv2.VideoCapture(0) # initialize the video capturing device
try:
    success = True # use this variable to check if the frame was captured successfully
    while success: # loop until the device cannot capture a frame
        success, frame = video.read() # read a frame
        _, jpeg_image = cv2.imencode('.jpeg', frame) # encodes an image into a memory buffer
        raw_image = IPython.display.Image(data = jpeg_image) # creates an IPython image given the raw data
        IPython.display.clear_output(True) # clears the output of this cell, waiting until other data is available
        IPython.display.display(raw_image) # display the IPython image into the notebook
        time.sleep(0.2) # wait 0.2 seconds
except KeyboardInterrupt: # if stop was pressed (or CTRL+C in the console)
    pass # do nothing
finally: # but anyway, if it was pressed or not
    video.release() # release the video capturing device

### Let's do some face detection ###

Doing face detection requires a machine learning model. This one that I'll use is called [Haar Cascade Classifier](https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_objdetect/py_face_detection/py_face_detection.html) and is almost embeded in the OpenCV. I say almost embedded as you need to download the model from their GitHub repository. Luckly we can run commands in the cells.

Download the machine learning model:

In [None]:
!wget https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarcascade_frontalface_default.xml

This used the **wget** command which comes from **world wide web get** and which downloads things from the internet.

Now, I need to load the downloaded model:

In [None]:
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml') # load the ML model

The classifier can be invoked on a gray-scale image using the `detectMultiScale` method, which for convenience I'll put in a wrapper.

In [None]:
def detect_faces(frame):
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # convert the frame in gray-scale
    faces = face_cascade.detectMultiScale( # the face detector
        gray, # gray-scale image
        scaleFactor = 1.1, # how much the image size is reduced at each scale
        minSize = (30, 30) # the minimum size of a detected object
    )
    return faces

And now I can use it simple, like below and will return a list of rectangles bounding all the faces in a picture.

In [None]:
faces = detect_faces(frame_rgb) # get the list of rectangles bounding faces
faces

In [None]:
for face in faces:
    print(face)

Let's put a rectangle over the detected faces:

In [None]:
x, y, w, h = faces[0] # unwrap the list
frame_rgb_copy = frame_rgb.copy() # make a copy of the image
cv2.rectangle(frame_rgb_copy, (x, y), (x+w, y+h), (0, 255, 0), 2) # draw the rectangle
plt.imshow(frame_rgb_copy) # display the image

Now, the trick is to make it detect faces continuously and for that I'll just merge the two pieces of code:

In [None]:
import time # load the time module; i'll use it here only for the sleep function
import IPython # load the IPython modules; this gives access to low level Jupyter Notebook functions

video = cv2.VideoCapture(0) # initialize the video capturing device
try:
    success = True # use this variable to check if the frame was captured successfully
    while success: # loop until the device cannot capture a frame
        success, frame = video.read() # read a frame
        faces = detect_faces(frame)
        if list(faces):
            for x, y, w, h in faces:
                cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
        _, jpeg_image = cv2.imencode('.jpeg', frame) # encodes an image into a memory buffer
        raw_image = IPython.display.Image(data = jpeg_image) # creates an IPython image given the raw data
        IPython.display.clear_output(True) # clears the output of this cell, waiting until other data is available
        IPython.display.display(raw_image) # display the IPython image into the notebook
        time.sleep(0.2) # wait 0.2 seconds
except KeyboardInterrupt: # if stop was pressed (or CTRL+C in the console)
    pass # do nothing
finally: # but anyway, if it was pressed or not
    video.release() # release the video capturing device

As you saw in the above example, the Haar Cascade algorithm used default by OpenCV is not the most accurate one - although, is by far the fastest and easy to implement in most mobile devices. An improvement to this algorithm is to use the [Histogram of Oriented Gradients (HOG)](http://dlib.net/fhog_object_detector_ex.cpp.html) algorithm implemented by the `dlib` module. The performance penalty is about 80% over the HOG, but the results are way better and it comes with a useful trick that'll be put to use a few cells down.

First, let's replace the `detect_faces` function to use HOG.

In [None]:
import dlib
detector = dlib.get_frontal_face_detector()

def detect_faces(frame):
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # convert the frame in gray-scale
    faces = detector(gray)
    return faces

To run the face detection, just apply the function to a frame. This time, the detection will return a list of `dlib` rectangles, which are objects that have `.top()`, `.bottom()`, `.left()` and `.right()` methods to extract the top-left and bottom-right coordinates:

In [None]:
result = detect_faces(frame)
if result: # remember, result is a list of rectangles and this is how to test if it is not empty
    print(
        'left:', result[0].left(),
        'top:', result[0].top(),
        'right:', result[0].right(),
        'bottom:', result[0].bottom()
    )

As I said, `dlib` is way cooler than a simple face detector. It has a model for detecting facial landmarks that we can download directly from this repository. It will detect [68 facial landmarks](https://towardsdatascience.com/facial-mapping-landmarks-with-dlib-python-160abcf7d672).

In [None]:
!wget https://github.com/italojs/facial-landmarks-recognition/raw/master/shape_predictor_68_face_landmarks.dat

Here's how to load the downloaded model:

In [None]:
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")

And as usual, let's wrap the thing into a function, for ease of use. The function will return a list of tuples, with a face rectangle and a list of 68 point objects - that have `.x` and `.y` coordinates.

In [None]:
def detect_landmarks(frame):
    detected = []
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    for face in detect_faces(frame):
        landmarks = predictor(image=gray, box=face)
        detected.append((face, landmarks))
    return detected

Here's how the result looks like:

In [None]:
detected_landmarks = detect_landmarks(frame)
detected_landmarks

And, to better see there results, here's how to draw them on the original image:

In [None]:
output_image = cv2.cvtColor(frame.copy(), cv2.COLOR_BGR2RGB)
for face, landmarks in detected_landmarks:
    cv2.rectangle(output_image, (face.left(), face.top()), (face.right(), face.bottom()), (255, 0, 0), 2)
    for landmark in landmarks.parts():
        cv2.circle(output_image, (landmark.x, landmark.y), 2, (0, 255, 0), 2)
plt.imshow(output_image)

I'll download next a glasses image:

In [None]:
!wget https://ublo.ro/wp-content/uploads/2021/03/glasses.png

Now, I'll load it as an OpenCV frame. The function for this is `cv2.imread` which takes as the first parameter the path to the file. Luckly, when I downloaded the file it is in the current directory. You'll notice a second parameter, `cv2.IMREAD_UNCHANGED`. This is present because glasses is a transparent image and it has 4 channels, instead of 3. Take a look:

In [None]:
glasses = cv2.imread('glasses.png', cv2.IMREAD_UNCHANGED)
glasses.shape

The fourth channel is actually a transparency mask: if the value for the 4th channel is 255, then that pixel is opaque. If the value for the 4th channel is 0, then the pixel is completely transparent. Any value between those represents the degree of transparency. Here's how the mask looks like:

In [None]:
plt.imshow(glasses[:,:,3])

Remember that the glasses frame is black? That's `(0, 0, 0)` in BGR components. But the other pixels are `(0, 0, 0)` as well, as they actually don't matter as they are transparent. So here's how the image looks like, without the transparency information:

In [None]:
plt.imshow(glasses[:,:,:3])

I'll read the width and height of the image, to make it easy for me next:

In [None]:
glasses_h, glasses_w, _ = glasses.shape

Also, I'll extract the alpha channel information and will create a 3-channels image from it, like this:

In [None]:
import numpy as np # I need the matrices modules
alpha = glasses[:, :, 3] # take only the 4th channel
alpha = np.expand_dims(alpha, axis = 2) # alpha is a 2-D matrix; but I want a 3-D one so I'll add a dummy dimension
alpha = alpha / 255.0 # I'd like for every pixel to be between 0 and 1 instead of 0 and 255
alpha = np.concatenate([alpha, alpha, alpha], axis = 2) # and will create 3 copies of alpha for each channel
inv_alpha = 1 - alpha # also, it's a good idea to have an inverted alpha: where 1 is 0 and 0 is 1

Now, I'll use this alpha to superimpose the two images. I'll do the trick that I've used above, to modify just part of the original image, to be more specific the top-left rectangle that has the same size as the glasses image, like this:
- I'll multipy each pixel from the original image with `inv_alpha`. So, where glasses are transparent 0, `inv_alpha` is 1 that the actual pixel color will not change;
- I'll multiply each glasses pixel with `alpha`. So, where glasses are opaque, `alpha` is actually 1 so multiplying that pixel with 1 will not change color, but multiplying with 0 the other pixels, will make them irrelevant;
- I'll simply add the two images.

In [None]:
frame_with_glasses = frame.copy() # make first a copy of the frame
# change only the top-left rectangle of (glasses_w, glasses_h) size
# multiply the original frame with inv_alpha and add
# the glasses image multiplied with alpha
frame_with_glasses[:glasses_h,:glasses_w,:] = \
    frame[:glasses_h,:glasses_w,:] * inv_alpha + \
    glasses[:,:,:3] * alpha
plt.imshow(frame_with_glasses) # display the superimposed images

I'll now mark the center of the glasses lenses as I'll have to match the center of the eyes there. These values are obtained by trial and error or you can use a graphics software, like [GIMP](https://www.gimp.org) to determine the coordinates using a mouse.

In [None]:
glasses_left_eye = dlib.point(130, 120) # this is the left eye center; I could've used tuples, but I like the points =)
glasses_right_eye = dlib.point(380, 120) # this is the right eye center
glasses_copy = glasses[:,:,3].copy() # make a copy of the 4th channel of the image
cv2.circle(glasses_copy, (glasses_left_eye.x, glasses_left_eye.y), 10, (255,), 2) # draw a circle for the left eye
cv2.circle(glasses_copy, (glasses_right_eye.x, glasses_left_eye.y), 10, (255,), 2) # draw a circle for the right eye
plt.imshow(glasses_copy) # show the image

For the eyes, I'll take all the landmarks that represents an eye and will take the average of the coordinates. Geometry tells me that this is how I'll obtain the center of gravity for the eye (the exact center). I'm using `numpy` ability to do mathematical operations on any kind of matrices, just because I'm lazy:

In [None]:
face_left_eye = np.mean(np.array(detected_landmarks[0][1].parts()[36:42]))
face_right_eye = np.mean(np.array(detected_landmarks[0][1].parts()[42:48]))

I'm defining a distance function. This is the Euclidean distance between two `dlib.point` objects. It's here to allow me to write less:

In [None]:
def distance(a, b):
    """
    Function that computes the euclidean distance between two dlib.dpoint
    @param a (dlib.dpoint):
    @param b (dlib.dpoint):
    @return (np.float32): the euclidean distance between a and b
    """
    return np.sqrt((a.x - b.x) * (a.x - b.x) + (a.y - b.y) * (a.y - b.y))

The first trick to make the glasses really blend with the image is to have them the correct size. To find that out, the distance between the center of the eyes on the glasses should be the same with the distance of the center of the eyes on the face.

In [None]:
ratio = distance(face_left_eye, face_right_eye) / distance(glasses_left_eye, glasses_right_eye) # compute the scaling ratio
glasses_shrinked_w = int(ratio * glasses_w) # so the new glasses width is the old width times the ratio
glasses_shrinked_h = int(ratio * glasses_h) # same for height
glasses_shrinked = cv2.resize(glasses, (glasses_shrinked_w, glasses_shrinked_h)) # and let's resize
# remember that this is done on the 4-channel image. see?
glasses_shrinked.shape

Let's see that the alpha channel is still there and that I can superimpose the two images, like I did before:

In [None]:
alpha = np.expand_dims(glasses_shrinked[:, :, 3], axis = 2) / 255.0
alpha = np.concatenate([alpha, alpha, alpha], axis = 2)
inv_alpha = 1.0 - alpha
frame_with_glasses = frame.copy()
frame_with_glasses[:glasses_shrinked_h,:glasses_shrinked_w,:] = frame[:glasses_shrinked_h,:glasses_shrinked_w,:] * inv_alpha + glasses_shrinked[:,:,:3] * alpha
plt.imshow(frame_with_glasses)

Uhuu! Next trick for making the glasses match, is to move them on the correct position. To do this, I'll actually move the glasses left eye center to match the face left eye center:
- because the distance between the eyes is the same for both images, I just need to see how far is the center of the eye for the face compared to the top-left corner;
- then, I'll substract the distance on each axis for the center of the left eye compared to the glasses;
- use the new offset values to alter a rectangle that has the new top-left coordinates.

In [None]:
# so, substract from the center of face-eye the center of glasses-eye
offset_x = int(face_left_eye.x - ratio * glasses_left_eye.x)
offset_y = int(face_left_eye.y - ratio * glasses_right_eye.y)
# add the offset to the width and height, just to write less
offset_w = glasses_shrinked_w + offset_x
offset_h = glasses_shrinked_h + offset_y
# make a copy of the frame
frame_with_glasses = frame.copy()
# here I'm doing the alpha merging
frame_with_glasses[offset_y:offset_h,offset_x:offset_w,:] = \
    frame[offset_y:offset_h,offset_x:offset_w,:] * inv_alpha + \
    glasses_shrinked[:,:,:3] * alpha
plt.imshow(frame_with_glasses)

The third trick is to take into consideration is to compensate for my head movement. That's no easy feat, but OpenCV to the rescue.
- OpenCV has a function that [affinely](https://en.wikipedia.org/wiki/Affine_geometry) transforms a [quadrilateral](https://en.wikipedia.org/wiki/Quadrilateral) - [`warpPerspective`](https://docs.opencv.org/3.4/da/d54/group__imgproc__transform.html#gaf73673a7e8e18ec6963e3774e6a94b87);
- to use this function, I need to construct a quadrilateral from the landmarks that are given to me by the model and match it to a quadrilateral to match a **normal** image - when the face is oriented perfectly;

Here's a schematic of the solution:
![warpAffine](https://ublo.ro/wp-content/uploads/2021/03/glasses-explained.png)
- I'll get the center of the top landmarks (L1, R1) and bottom landmarks (L2, R2);
- I'll compute the center between the centers of the eyes; this will be my normal reference center and will match the center on the moved head;
- from the computed center, I'll move left and right from it with half the distance between the center of the eyes and get the normal left and right eye centers;
- from the normal centers I'll move up and down with the average distance between (L1, L2) and (R1, R2) and will obtain the normal_L1, normal_L2, normal_R1, normal_R2;
- given the real and the normal points, I'll use the [getPerspectiveTransform](https://docs.opencv.org/3.4/da/d54/group__imgproc__transform.html#ga8c1ae0e3589a9d77fffc962c49b22043) to compute the transformation matrix;
- I'll put the glasses on an image the same size as the frame (warping works on the whole image);
- and will warp the glasses image and merge it with the original frame.

Let me define a function that computes the center of a segment given two points.

**!Warning!** Be very careful not mixing the `dlib.point` with `dlib.dpoint`. The last one has floating point numbers, while the first one has only integer coordinates. Sometimes integer coordinates are desired - especially when drawing on the images - but not when we do matrix computation.

In [None]:
def center(a, b):
    """
    This function takes two dlib.dpoint and computes the average (center)
    @param a (dlib.dpoint):
    @param b (dlib.dpoint):
    @return (dlib.dpoint): the middle of the segment formed by a and b.
    """
    return dlib.dpoint(np.array([
        0.5 * (a.x + b.x),
        0.5 * (a.y + b.y)
    ], dtype = np.float32))

I'll now compute the L1, L2, R1, R2 and center points:

In [None]:
L1 = np.mean(np.array(detected_landmarks[0][1].parts()[37:39]))
L2 = np.mean(np.array(detected_landmarks[0][1].parts()[40:42]))
R1 = np.mean(np.array(detected_landmarks[0][1].parts()[43:45]))
R2 = np.mean(np.array(detected_landmarks[0][1].parts()[46:48]))
C = center(face_left_eye, face_right_eye)

Computing the average distance, but make it relative to the distance between the center of the eyes.

In [None]:
distance_between_eyes = distance(face_left_eye, face_right_eye)
eye_height_to_distance = \
    ( \
        distance(L1, L2) + \
        distance(R1, R2)
    ) / (2.0 * distance_between_eyes)
eye_height_to_distance

I'll add a new helper function, to translate a point either on the x or y axis, or both. This will help me construct my normal points.

In [None]:
def translate(point, x = None, y = None):
    """
    This function translates a dlib.dpoint with either x or y and returns the result.
    @param point (dlib.dpoint):
    @param x (float): the amount to translate on the x-axis (optional), no x translation if missing;
    @param y (float): the amount to translate on the y-axis (optional), no y translation if missing;
    @return (dlib.dpoint): the translated point
    """
    x = x or 0
    y = y or 0
    return dlib.dpoint(np.array([point.x + x, point.y + y], dtype = np.float32))

As I said, the normal left eye center is the translation on the x axis (horizontally) for the center. Same for the normal right eye center.

In [None]:
normal_left_eye = translate(C, x = -distance_between_eyes / 2)
normal_right_eye = translate(C, x = distance_between_eyes / 2)

Now, to compute the normal L and normal R points, I start from the center of the eyes and move up and down on the y axis:

In [None]:
translate_y = 0.5 * eye_height_to_distance * distance_between_eyes # compute how much to move up and down
normal_L1 = translate(normal_left_eye, y = -translate_y) # move up, from the left eye
normal_L2 = translate(normal_left_eye, y = translate_y) # move down, from the left eye
normal_R1 = translate(normal_right_eye, y = -translate_y) # move up, from the right eye
normal_R2 = translate(normal_right_eye, y = translate_y) # move down, from the right eye

As drawing things always need tuples of integers and my points usually are floating point numbers, I define a function to easily convert to tuples:

In [None]:
def to_int_tuple(point):
    """
    This function converts a dlib.point into a tuple of integers.
    @param point (dlib.point):
    @return (tuple(int, int)): A tuple with x and y coordinates converted to integers.
    """
    return (int(point.x), int(point.y))

So, now that I have everything I need, I'll compute the transformation matrix that goes from normal points to the actual points. This will show me how to convert an ideal position to a real one.

In [None]:
M = cv2.getPerspectiveTransform(
    np.array([
        [normal_L1.x, normal_L1.y],
        [normal_L2.x, normal_L2.y],
        [normal_R2.x, normal_R2.y],
        [normal_R1.x, normal_R1.y]
    ], dtype = np.float32),
    np.array([
        [L1.x, L1.y],
        [L2.x, L2.y],
        [R2.x, R2.y],
        [R1.x, R1.y]
    ], dtype = np.float32)
)

I'll prepare the image for warping - this means creating a new 4-channel image the same size as the frame and add the shrinked glasses image to it. I'll consider placing it with the same trick as before, but not from the left eye center but from the center between the eyes.

In [None]:
glasses_warped = np.zeros((frame.shape[0], frame.shape[1], 4), dtype = np.uint8)
glasses_C = center(glasses_left_eye, glasses_right_eye)
offset_x = int(C.x - int(ratio * glasses_C.x))
offset_y = int(C.y - int(ratio * glasses_C.y))
offset_w = offset_x + glasses_shrinked_w
offset_h = offset_y + glasses_shrinked_h
glasses_warped[offset_y:offset_h, offset_x:offset_w, :] = glasses_shrinked

plt.imshow(glasses_warped)

And warp that image:

In [None]:
glasses_warped = cv2.warpPerspective(glasses_warped, M, (frame.shape[1], frame.shape[0]))
plt.imshow(glasses_warped)

And now, there's an easy blend between the two - as they are the same size:

In [None]:
frame_with_glasses = frame.copy()
alpha = np.expand_dims(glasses_warped[:,:,3], axis = 2) / 255.0
alpha = np.concatenate([alpha, alpha, alpha], axis = 2)
inv_alpha = 1.0 - alpha
frame_with_glasses = frame_with_glasses * inv_alpha + glasses_warped[:,:,:3] * alpha
frame_with_glasses = frame_with_glasses.astype(np.uint8) # this is the only thing that's needed, to make the image integer
plt.imshow(frame_with_glasses)

I'll create a few functions now from the code above, so we can run things live:

In [None]:
def get_real_points(landmarks):
    """
    This function takes a `landmark` object from dlib,
    which contains .parts() method and returns the L1, L2,
    R1, R2 points in the correct order, as a list.
    @param landmarks :
    @return list(list, list, list, list): A list of landmark points used for glasses.
    """
    L1 = np.mean(np.array(landmarks.parts()[37:39]))
    L2 = np.mean(np.array(landmarks.parts()[40:42]))
    R1 = np.mean(np.array(landmarks.parts()[43:45]))
    R2 = np.mean(np.array(landmarks.parts()[46:48]))
    
    return [
        [L1.x, L1.y],
        [L2.x, L2.y],
        [R2.x, R2.y],
        [R1.x, R1.y]
    ]

def get_center(landmarks):
    """
    This function takes a `landmark` object from dlib,
    which contains .parts() method and returns the center of the face.
    @param landmarks :
    @return (dlib.dpint) : the center of the faces.
    """
    face_left_eye = np.mean(np.array(landmarks.parts()[36:42]))
    face_right_eye = np.mean(np.array(landmarks.parts()[42:48]))
    face_center = center(face_left_eye, face_right_eye)
    return face_center

def get_distances(real_points):
    """
    This function takes the real points as returned by get_real_points,
    and computes the distance between the eyes and the eye height relative
    to the distance between the eyes.
    @param real_points (list(list, list, list, list)) : the list of real points
    @return (tuple(float, float)) : the distance between the eyes and the relative height
    """
    L1 = dlib.dpoint(np.array(real_points[0]))
    L2 = dlib.dpoint(np.array(real_points[1]))
    R2 = dlib.dpoint(np.array(real_points[2]))
    R1 = dlib.dpoint(np.array(real_points[3]))
    face_left_eye = center(L1, L2)
    face_right_eye = center(R1, R2)

    distance_between_the_eyes = distance(face_left_eye, face_right_eye)
    eye_height_to_distance = \
        ( \
            distance(L1, L2) + \
            distance(R1, R2)
        ) / (2 * distance_between_the_eyes)
    
    return distance_between_the_eyes, eye_height_to_distance

def get_normal_points(face_center, distance_between_the_eyes, eye_height_to_distance):
    """
    This function takes the center of the face, the distance between the eyes and
    the relative eye height and computes an ideal placement of the L and R points.
    @param face_center (dlib.dpoint) : the center of the face
    @param distance_between_the_eyes (float) :
    @param eye_height_to_distance (float) : the relative eye height to the distance between the eyes
    @return (list(list, list, list, list)) : a list of points in correct order, same as get_real_points
    """
    normal_left_eye = translate(face_center, x = -0.5 * distance_between_the_eyes)
    normal_right_eye = translate(face_center, x = 0.5 * distance_between_the_eyes)
    translate_y = 0.5 * eye_height_to_distance * distance_between_the_eyes
    normal_L1 = translate(normal_left_eye, y = -translate_y)
    normal_L2 = translate(normal_left_eye, y = translate_y)
    normal_R1 = translate(normal_right_eye, y = -translate_y)
    normal_R2 = translate(normal_right_eye, y = translate_y)
    return [
        [normal_L1.x, normal_L1.y],
        [normal_L2.x, normal_L2.y],
        [normal_R2.x, normal_R2.y],
        [normal_R1.x, normal_R1.y]
    ]

def prepare_glasses(distance_between_the_eyes):
    """
    This function takes the distance between the eyes and prepares (resizes) the
    glasses image. Also, computes the resize ratio and the coordinates of the glasses center.
    @param distance_between_the_eyes (float) : the distance between the eyes
    @return (tuple(image, float, dlib.dpoint)) : the image, the scale ratio and the center
    """
    glasses_left_eye = dlib.dpoint(130, 120)
    glasses_right_eye = dlib.dpoint(380, 120)
    glasses_h, glasses_w, _ = glasses.shape
    
    ratio = distance_between_the_eyes / distance(glasses_left_eye, glasses_right_eye)
    glasses_shrinked_w = int(ratio * glasses_w)
    glasses_shrinked_h = int(ratio * glasses_h)
    glasses_shrinked = cv2.resize(glasses, (glasses_shrinked_w, glasses_shrinked_h))
    
    glasses_center = center(glasses_left_eye, glasses_right_eye)
    
    return glasses_shrinked, ratio, glasses_center

def warp_glasses(glasses_shrinked, ratio, glasses_center, face_center, M, frame_size):
    """
    This function takes care of resizing the glasses image.
    @param glasses_shrinked (image) : the glasses shrinked image, as prepared by prepare_glasses;
    @param ratio (float) : the resize ratio
    @param glasses_center (dlib.dpoint) : the center of the glasses
    @param face_center (dlib.dpoint) : the center of the face
    @param M (matrix) : the transformation matrix
    @param frame_size (tuple(int, int)) : the width and height of the frame
    @return (image) : the prepared and warped image of the glasses
    """
    width, height = frame_size
    glasses_warped = np.zeros((height, width, 4), dtype = np.uint8)
    
    offset_x = int(face_center.x - int(ratio * glasses_center.x))
    offset_y = int(face_center.y - int(ratio * glasses_center.y))
    offset_w = offset_x + glasses_shrinked.shape[1]
    offset_h = offset_y + glasses_shrinked.shape[0]
    glasses_warped[offset_y:offset_h, offset_x:offset_w, :] = glasses_shrinked

    glasses_warped = cv2.warpPerspective(glasses_warped, M, (width, height))

    return glasses_warped

def merge_images(frame, glasses):
    """
    This function merges the current frame with the glasses.
    @param frame (image) : current frame
    @param glasses (image) : the warped glasses image
    @return (image) : the composed image
    """
    alpha = np.expand_dims(glasses[:,:,3], axis = 2) / 255.0
    alpha = np.concatenate([alpha, alpha, alpha], axis = 2)
    inv_alpha = 1.0 - alpha
    frame_with_glasses = frame * inv_alpha + glasses[:,:,:3] * alpha
    return frame_with_glasses.astype(np.uint8)
        
def add_glasses(frame, landmarks):
    """
    This function is the orchestration function for the whole process.
    It takes the frame and landmars as produced by detect_landmarks and
    outputs the frame with glasses.
    @param frame (image) : the captured frame
    @param landmarks (dlib.landmarks) : the list of landmarks produced by the model
    @return (image) : the composed frame with glasses
    """
    height, width, _ = frame.shape
    face_center = get_center(landmarks)
    real_points = get_real_points(landmarks)
    distance_between_the_eyes, eye_height_to_distance = get_distances(real_points)
    normal_points = get_normal_points(face_center, distance_between_the_eyes, eye_height_to_distance)
    M = cv2.getPerspectiveTransform(
        np.array(normal_points, dtype = np.float32),
        np.array(real_points, dtype = np.float32))
    
    # This part is for debug purposes. You can change False to True to see the L, R and center points.
    if False:
        for point in normal_points:
            cv2.circle(frame, (int(point[0]), int(point[1])), 2, (255, 0, 0), 2)
        for point in real_points:
            cv2.circle(frame, (int(point[0]), int(point[1])), 2, (0, 255, 0), 2)

        face_left_eye = np.mean(np.array(landmarks.parts()[36:42]))
        cv2.circle(frame, (int(face_left_eye.x), int(face_left_eye.y)), 2, (0, 0, 255), 2)
        face_right_eye = np.mean(np.array(landmarks.parts()[42:48]))
        cv2.circle(frame, (int(face_right_eye.x), int(face_right_eye.y)), 2, (0, 0, 255), 2)
        cv2.circle(frame, (int(face_center.x), int(face_center.y)), 2, (0, 0, 255), 2)

    
    glasses_shrinked, ratio, glasses_center = prepare_glasses(distance_between_the_eyes)
    warped_glasses = warp_glasses(glasses_shrinked, ratio, glasses_center, face_center, M, (width, height))
    frame_with_glasses = merge_images(frame, warped_glasses)
    return frame_with_glasses

And make it run in real time, by modifying the above cell with the new functions:

In [None]:
import time # load the time module; i'll use it here only for the sleep function
import IPython # load the IPython modules; this gives access to low level Jupyter Notebook functions

video = cv2.VideoCapture(0) # initialize the video capturing device
try:
    success = True # use this variable to check if the frame was captured successfully
    while success: # loop until the device cannot capture a frame
        success, frame = video.read() # read a frame
        faces = detect_landmarks(frame)
        if faces:
            for face, landmarks in faces:
                frame = add_glasses(frame, landmarks)
        _, jpeg_image = cv2.imencode('.jpeg', frame) # encodes an image into a memory buffer
        raw_image = IPython.display.Image(data = jpeg_image) # creates an IPython image given the raw data
        IPython.display.clear_output(True) # clears the output of this cell, waiting until other data is available
        IPython.display.display(raw_image) # display the IPython image into the notebook
        time.sleep(0.2) # wait 0.2 seconds
except KeyboardInterrupt: # if stop was pressed (or CTRL+C in the console)
    pass # do nothing
finally: # but anyway, if it was pressed or not
    video.release() # release the video capturing device