Chrome’s Dino game is one of the simplest games you might have ever played. The only two controls that you need to worry about is making the Dino Jump or Crouch in order to avoid obstacles. Normally, you would press the space and down button on the keyboard to make the Dino do that.

So all we need to do is programmatically press those buttons, we can easily do that by utilizing the pyautogui library which will allow us to control our keyboard with python.

now we just need to trigger those controls based on certain actions that I make.

So I want to make the Dino jump when I open my mouth, I can do this by detecting the facial landmarks around my lips to determine if my mouth is opened or closed. By using the dlib library I can easily achieve this. I’ll go into more detail on this when I’m implementing this in code.

so if the face is closer to the camera the detected size is bigger and the dino should crouch. This way we’re controlling the dino by measuring the proximity between the detected face and the camera. So now you can make the dino crouch by moving your face closer and farther from the camera. I’ll go into more details on this but for now, here are the steps we will be performing in this tutorial.

Outline:

    Step 1: Real-time Face Detection
    
    Step 2: Find the landmarks for the detected face
    
    Step 3: Build the Jump Control mechanism for the Dino
    
    Step 4: Build the Crouch Control mechanism
    
    Step 5: Perform Calibration
    
    Step 6: Keyboard Automation with PyautoGUI
    
    Step 7: Build the Final Application

In [3]:
# import the necessary libraries
import cv2
import numpy as np
import matplotlib.pyplot as plt
from math import hypot
import pyautogui
import dlib

###### Step 1: Face Detection
Now we can use OpenCV’s Haar cascades or Dlib’s HOG-based face detector but instead, I’m going to use a more robust deep learning-based SSD face detector with OpenCV’s DNN module.

Initialize The DNN Module:

The SSD face detector provided by OpenCV is a Caffe model and you will need two files to do inference with it. A .prototxt file that defines the model architecture and a .caffemodel file that contains the pre-trained weights for the layers in the architecture. Both the Caffe model and .prototxt file will be available in the download folder.

In [4]:
#path to the weights file
model_weights = './res10_300x300_ssd_iter_140000.caffemodel'

# path to architecture
model_arch = './deploy.prototxt.txt'

#load the caffe model
net = cv2.dnn.readNetFromCaffe(model_arch,model_weights)

###### Create A Face Detection Function:

create a function called face_detector() which will take the image as input and detect the faces in the image. Since the Dino has to be controlled using a single input at a time, only one of the detected faces can be used so the bounding rectangle is only returned for the detected face with the highest confidence.

We will also need to do some preprocessing steps before we can pass our image to the model, these steps are:

Resizing images to 300×300:

Applying mean subtraction of values (104, 177, 123):

And formating the array structure to a 4D tensor.

Fortunately all these things will be taken care of by DNN module’s dnn.blobFromImage() function.

After preprocessing we can feed the processed image to the network and then post-process the results. So the model returns a 4-dimensional array, the shape of which in our case is (1, 1, 200, 7). this array contains the confidence score for each detection in the image along with 4 coordinates of the detection scaled down to 0-1 range. For each image, the array also returns 200 detections, However since we are only interested in detecting a single face, we will extract the face with the highest confidence.

Finally, the true coordinates for the bounding box rectangle can be then retrieved by multiplying the scaled-down coordinates by the width and height of the original image before resizing.

In [8]:
def face_detector(image, threshold=0.7):
    # get the height, width of the image
    h,w = image.shape[:2]
    # Apply mean substraction and create 4D blob from the image
    blob = cv2.dnn.blobFromImage(image,1.0,(300,300),(104.0,177.0,123.0))
    # set the new input value for the network
    net.setInput(blob)
    # run foward path on the input to get the output
    faces = net.forward()
    # get all the confidence value for all detected faces
    prediction_scores = faces[:,:,:,2]
    # get the index of the prediction with highest confidence
    i = np.argmax(prediction_scores)
    # get the face with the highest confidence
    face = faces[0,0,i]
    # extract the confidence
    confidence = face[2]
    # if confidence value is greater than the threshold
    if confidence> threshold:
        # The 4 values at indexes 3-6 are the top-left bottom-right co-ordinates
        # scales to range 0-1. The original coordinates can be found by
        # multiplying x,y values with the width, height of the  image
        box = face[3:7]*np.array([w, h, w, h])
        
        # the coordinates are the pixel numbers relative to the top left
        # corner of the image therfore needs be quantized to int type
        (x,y,x1,y1) = box.astype("int")
        # draw the bounding box around the face
        ted_frame = cv2.rectangle(image.copy(),(x,y),(x1,y1),(0, 255, 255), 2)
        output =(ted_frame,(x,y,x1,y1),True,confidence)
    else:
        output =(image,(),False,0)
    return output

In [11]:
# test the face detector
# get the video feed from the webcam
cap = cv2.VideoCapture(0)

# set the window to a normal one so we can adjust it
cv2.namedWindow('face detection', cv2.WINDOW_NORMAL)

while(True):
    # read the frame
    ret, frame = cap.read()
    
    # break if frame is not returned
    if not ret:
        break
    
    # flip the frame horizontally
#     frame = cv2.flip(frame, 1)
    
    # detect face in the frame
    annotated_frame , coords, status, conf = face_detector(frame)
    
    # display the frame
    cv2.imshow('face detection', annotated_frame)
    # break the loop if 'q' key is pressed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
# when everything is done, release the capture and destroy the window
cap.release()
cv2.destroyAllWindows()

###### Step 2: Landmarks Detection
To implement the jump mechanism, we need information about whether the mouth is open or closed. For this, we need to detect facial landmarks so we can determine the position of upper and lower lips.

By using Dlib’s 68 landmark detector, we’ll be able to detect below 68 landmarks on the face.

The landmark detection model is an implementation of the paper One Millisecond Face Alignment with an Ensemble of Regression Trees by Vahid Kazemi and Josephine Sullivan(2014).

To initialize the landmark detector, you will use dlib.shape_predictor() function, which will load the pre-trained landmark detector from the disk.

In [12]:
predictor = dlib.shape_predictor('./shape_predictor_68_face_landmarks.dat')

###### Create the detect_landmarks() function

create a function called detect_landmarks() which takes in the coordinates of the detected face returned by the face_detector() method and then detects landmarks and returns them, while also annotating the image with circles on landmark positions.

In [13]:
def detect_landmarks(box,image):
    # for faster results convert the image to gray-scale
    gray_scale = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    
    # get the coordinates
    (x,y,x1,y1) = box
    
    # perform the detection
    shape = predictor(gray_scale, dlib.rectangle(x,y,x1,y1))
    
    # get the numpy array containing the coordinates of the landmarks
    landmarks = shape_to_np(shape)
    
    # draw the landmarks with circles
    for (x,y) in landmarks:
        annoted_image = cv2.circle(image,(x,y),2,(0, 127, 255), -1)
    
    return annoted_image, landmarks

The helper function below converts the shape object returned by the predictor function into a more convenient NumPy array. So the helper function below is being used by the landmark function we created above.

In [14]:
def shape_to_np(shape):
    # create an array of shape(68, 2) for storing the landmark coordinates
    landmarks = np.zeros((68, 2), dtype='int')
    
    #write the x,y coordinates of each landmark into the array 
    for i in range(0, 60):
        landmarks[i] = (shape.part(i).x, shape.part(i).y)
    
    return landmarks

In [26]:
# test the detect_landmark function with realtime feed
# get the video feed from webcam
cap = cv2.VideoCapture(0)

# set the window to a normal one so we can adjust it
cv2.namedWindow('landmark', cv2.WINDOW_NORMAL)

while(True):
    # read the frames
    ret,frame = cap.read()
    
    # break if frame is not returned
    if not ret:
        break
    
    #flip the frame horizontally
    frame = cv2.flip( frame, 1)
    
    # detect the face
    face_image, box_cord, status, conf = face_detector(frame)
    
    if status:
        # get the landmarks for the face region in the frame
        lm_image,landmarks = detect_landmarks(box_cord,frame)
    
    # display the frame
    cv2.imshow('landmark', lm_image)
    
    # break the loop if 'q' key pressed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

#when done, release the capture and destroy the window
cap.release()
cv2.destroyAllWindows()

###### Step 3: Jump Control mechanism
In this step, jump control mechanism is implemented

The jump control mechanism that we will use simply utilizes the euclidean distance between a pair of landmark points indicated below to calculate a ratio of mouth height to its width. Using a threshold value for comparison the mouth can then be evaluated as being close or open.

In [23]:
def is_mouth_open(landmarks, ar_threshold):
    # calculate the euclidean distance labelled as A,B,C
    A = hypot(landmarks[50][0] - landmarks[58][0],landmarks[50][1] - landmarks[58][1])
    B = hypot(landmarks[52][0] - landmarks[56][0],landmarks[52][1] - landmarks[56][1])
    C = hypot(landmarks[48][0] - landmarks[54][0],landmarks[48][1] - landmarks[54][1])
    
    # calculate the mouth aspect ratio ,The value of vertical distance A,B is averaged
    MAR = (A+B) / (2.0 * C)
    
    # return true if the value is greater than the threshold
    if MAR > ar_threshold:
        return True, MAR
    else:
        return False, MAR

In [28]:
# test the mouth funtion
cap = cv2.VideoCapture(0)
# cv2.namedWindow('mouth', cv2.WINDOW_NORMAL)
while(True):
    ret, frame = cap.read()
    if not ret:
        break
    frame = cv2.flip(frame, 1)
    face_image, box_coords, status, conf = face_detector(frame)
    if status:
        l_image,landmarks = detect_landmarks(box_coords,frame)
        mouth_status,_ = is_mouth_open(landmarks,0.55)
        cv2.putText(frame,'Is mouth open: {}'.format(mouth_status),(20,20),cv2.FONT_HERSHEY_COMPLEX,0.65,(0, 127, 255), 2)
    cv2.imshow('mouth',frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
cap.release()
cv2.destroyAllWindows()

###### Step 4: Crouch Control Mechanism
The Crouch control mechanism will utilize the euclidean distance between the top-left corner and bottom-right corner of the face bounding box. When the face is near the camera the distance will be greater, and when the face is close enough a key down event will be triggered causing the Dino to crouch.

Jump control mechanism using distance from camera
The face_proximity function below calculates the diagonal distance and compares it with the proximity_threshold to return either True or False. Also, the coordinates of a rectangle are calculated relative to the face. This rectangle guides the user on how close they need to get to the camera to trigger the crouch.

In [29]:
def face_proximity(box,image,prox_thresh=250):
    # get height and width from the bounding box
    face_width = box[2] - box[0]
    face_height = box[3]  - box[1]
    
    # draw rectangle to guide the user 
    #calculate the angle of diagonal using face width and height
    theta = np.arctan(face_height/face_width)
    
    #

cdbdasf
2


TypeError: 'str' object cannot be interpreted as an integer