# Social Distancing Detection with Python, OpenCV, and TensorFlow

The purpose of this project is to build an algorithm to detect whether people are obeying the social distancing rule to avoid the further spread of Coronavirus. This project is just for fun purpose only and how we can utilize the feature of OpenCV and TensorFlow framework to create such a social distancing detection algorithm.

In this project, the pre-trained TensorFlow model, which is Faster R-CNN ResNet trained on MS COCO 2017 dataset will be applied to predict the coordinate of the bounding boxes as well as to predict which object that each bounding box represents.

Before we start to cerate the algorithm, let's import the necessary library for this project.

In [30]:
import numpy as np
import tensorflow as tf
import cv2
import time
import yaml
import imutils
from os import listdir
from os.path import isfile, join
import itertools
import math
import glob
import os
import matplotlib as plt

## Load the Model, Video, and Additional Files

The first thing that we should do is to assign the color of out bounding boxes. Now, there are going to be two colors that will be applied in this project: Green and Red. If the distance between 2 or more people are greater than the specified minimum distance, then the bounding box will have green color, otherwise it's going to be red.

In [4]:
RED = (0, 0, 255)
GREEN = (0, 255, 0)

Next, we need to load the pre-trained model such that the model is ready to create a prediction. From the Faster R-CNN model trained on MS COCO dataset, we get the frozen inference graph of the model. This is a frozen graph which can't be trained anymore and it contains the graph definition of the model as well as trained parameters. 

To load the model, we define the class to instansiate the model and a function within the class to predict the image.

In [5]:
class Model:
   
    def __init__(self, model_path):

        
        self.detection_graph = tf.Graph()
        
        # Load the model into the tensorflow graph
        with self.detection_graph.as_default():
            
            od_graph_def = tf.compat.v1.GraphDef()
            
            with tf.io.gfile.GFile(model_path, 'rb') as file:
                
                serialized_graph = file.read()
                print(serialized_graph)
                od_graph_def.ParseFromString(serialized_graph)
                tf.import_graph_def(od_graph_def, name='')

        # Create a session from the detection graph
        self.sess = tf.compat.v1.Session(graph=self.detection_graph)

    def predict(self,img):
        
        # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
        img_exp = np.expand_dims(img, axis=0)
        
        # Pass the inputs and outputs to the session to get the results 
        (boxes, scores, classes) = self.sess.run([self.detection_graph.get_tensor_by_name('detection_boxes:0'), self.detection_graph.get_tensor_by_name('detection_scores:0'), self.detection_graph.get_tensor_by_name('detection_classes:0')],feed_dict={self.detection_graph.get_tensor_by_name('image_tensor:0'): img_exp})
        
        return (boxes, scores, classes)  

Next, we need to create a yaml file consits of the configuration file. What's going to be in this configuration file is the corner points of each rectangle that will define the area in the image where our predictions are gonna take place. If the people are outside of this rectangle in the video, then we're not going to draw prediction or bounding boxes on them.

Plus the configuration file is important to extract an example of the top view of an image such that at the end we can create a transformation matrix to change the perspective of the image from warped to top view.

In [6]:
print("[ Loading config file for the bird view transformation ] ")

with open("config.yml", "r") as ymlfile:
    cfg = yaml.load(ymlfile)
width_og, height_og = 0,0
corner_points = []

for section in cfg:
    corner_points.append(cfg["image_parameters"]["p1"])
    corner_points.append(cfg["image_parameters"]["p2"])
    corner_points.append(cfg["image_parameters"]["p3"])
    corner_points.append(cfg["image_parameters"]["p4"])
    
    width_og = int(cfg["image_parameters"]["width_og"])
    height_og = int(cfg["image_parameters"]["height_og"])
    
    img_path = cfg["image_parameters"]["img_path"]
   
    
print(" Done : [ Config file loaded ] ..." )

[ Loading config file for the bird view transformation ] 
 Done : [ Config file loaded ] ...


  after removing the cwd from sys.path.


Next, what we need to do is to instansiate our model by specifying the path to the frozen graph of trained Faster R-CNN model in our directory, then pass the path as the argument to call the class that we have defined before.

In [7]:
model_names_list = [name for name in os.listdir("C:/Users/ASUS/models/.") if name.find(".") == -1]
for index,model_name in enumerate(model_names_list):
    print(" - {} [{}]".format(model_name,index))

model_path="C:/Users/ASUS/models/faster_rcnn_resnet101_coco_11_06_2017/frozen_inference_graph.pb" 

print( " [ Loading TensorFlow Model ... ]")
model = Model(model_path)
print("Done : [ Model loaded and initialized ] ...")

 - faster_rcnn_resnet101_coco_11_06_2017 [0]
 - image [1]
 - video [2]
 [ Loading TensorFlow Model ... ]


IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



Done : [ Model loaded and initialized ] ...


Now that we have loaded our model, the next this is to specify which video that we want the model to predict. For the video to test the model, we are going to use [PETS 2009 dataset S2 for people tracking](http://www.cvg.reading.ac.uk/PETS2009/a.html#s0). If you have other similar video, feel free to change it to yours.

In [8]:
video_names_list = [name for name in os.listdir("C:/Users/ASUS/models/video/.") if name.endswith(".mp4") or name.endswith(".avi")]
for index,video_name in enumerate(video_names_list):
    print(" - {} [{}]".format(video_name,index))

video_path="C:/Users/ASUS/models/video/PETS2009.avi"  

 - bird_view.avi [0]
 - bird_view_video.avi [1]
 - output.avi [2]
 - output_video.avi [3]
 - PETS2009.avi [4]


Next, we need to specify how close each person to another person should be such that we are deemed that they are too close, hence we turn their bounding box from green to red. You can change the value to any value that you want.

In [9]:
distance_minimum = "110"

## Compute Transformation Matrix for Perspective Change

After we done with loading the model, specifying the video that we want to use, the next this is to define a transformation matrix using OpenCV library. But why do we need a transformation matrix?

The idea of this transformation matrix is to change the warped perspective of the camera to bird's eye view (or top view). We need to map the perspective into top view because then it will be easier for us to compute the distance between two or more people. It's going to be difficult to compute the distance between two coordinates in a warped perspective.

In [10]:
def compute_perspective_transform(corner_points,width,height,image):

    corner_points_array = np.float32(corner_points)
    # Create an array with the parameters (the dimensions) required to build the matrix
    img_params = np.float32([[0,0],[width,0],[0,height],[width,height]])
    
    matrix = cv2.getPerspectiveTransform(corner_points_array,img_params) 
    img_transformed = cv2.warpPerspective(image,matrix,(width,height))
    
    return matrix,img_transformed

We have defined a function to compute the transformation matrix above, next what we should do is to use the resulting transformation matrix to transform each centroid of bounding boxes from warped perspective into top view perspective. The result of the function below is the coordinate of each centroid of bounding boxes in a bird's eye view.

In [11]:
def compute_point_perspective_transformation(matrix,list_centroids):

 
    list_points_to_detect = np.float32(list_centroids).reshape(-1, 1, 2)
    transformed_points = cv2.perspectiveTransform(list_points_to_detect, matrix)
    
    # Loop over the points and add them to the list that will be returned
    transformed_points_list = list()
    
    for i in range(0,transformed_points.shape[0]):
        transformed_points_list.append([transformed_points[i][0][0],transformed_points[i][0][1]])
        
    return transformed_points_list

After defining the transformation matrix function, next we can finally use the function to transform the corner points in the warped perspective of the example image into bird's eye view.

In [12]:
matrix,imgOutput = compute_perspective_transform(corner_points,width_og,height_og,cv2.imread(img_path))
height,width,_ = imgOutput.shape
blank_image = np.zeros((height,width,3), np.uint8)
height = blank_image.shape[0]
width = blank_image.shape[1] 
dim = (width, height)

## Predict Bounding Boxes and Compute the Centroid of Each Bounding Box

After we done with defining the transformation matrix, the next thing that we need to do is to define a function to return only the bounding box whih represents humans. With the pre-trained model, we will get a lot of bounding boxes and not all of those bounding boxes represents human. Hence in the following function, only bounding boxes that represent human will be returned.

In [13]:
def get_human_box_detection(boxes,scores,classes,height,width):

    array_boxes = list() 
    
    for i in range(boxes.shape[1]):
        # If the class of the detected object is 1 and the confidence of the prediction is > 0.75
        if int(classes[i]) == 1 and scores[i] > 0.75:
            
            box = [boxes[0,i,0],boxes[0,i,1],boxes[0,i,2],boxes[0,i,3]] * np.array([height, width, height, width])
            
            array_boxes.append((int(box[0]),int(box[1]),int(box[2]),int(box[3])))
            
    return array_boxes

Now after we define a function to return only bounding boxes that only represent humans, then we need to compute the centroid of each bounding boxes. The coordinate of the centroid of bounding boxes will be very important to compute the distance between one person to another.

In [20]:
def get_centroids(array_boxes_detected):

    array_centroids = list() # Initialize empty centroid and ground point lists 
    for index,box in enumerate(array_boxes_detected):
    
        center_x = int(((box[1]+box[3])/2))
        center_y = int(((box[0]+box[2])/2))
        
        array_centroids.append((center_x, center_y))
       
    return array_centroids


Finally, let's create a function to draw the rectangle from corner points that we have defined in the configuration yaml file. 

In [28]:
def draw_rectangle(corner_points, frame):

    cv2.line(frame, (corner_points[0][0], corner_points[0][1]), (corner_points[1][0], corner_points[1][1]), GREEN, thickness=1)
    cv2.line(frame, (corner_points[1][0], corner_points[1][1]), (corner_points[3][0], corner_points[3][1]), GREEN, thickness=1)
    cv2.line(frame, (corner_points[0][0], corner_points[0][1]), (corner_points[2][0], corner_points[2][1]), GREEN, thickness=1)
    cv2.line(frame, (corner_points[3][0], corner_points[3][1]), (corner_points[2][0], corner_points[2][1]), GREEN, thickness=1)

## Predict the Bounding Boxes and Social Distancing Result of the Video

The workflow of predicting bounding boxes and social distancing result of the video is as follows:
    
 - Read the video frame by frame.
    
   For each frame:
    - Predict the boxes, classes, and scores using the pre-trained model that has been defined above.
    
       At the end of prediction, we'll get:
       - boxes: The amount of boxes predicted in a single frame.
       - classes: which class each box represents. If class = 1, it means it represents human.
       - score: the confidence level of the prediction in which the model predicts whether the box represents the corresponding class.
       
    - Check how many boxes are classified as a human based on each box classes and its corresponding confidence.
    - Foe every human boxes, we need to compute the centroids of the boxes.
    - Next, we need to transform the centroid into the mapped perspective with the transformation matrix.
    - Next, check how many boxes or transformed centroids that we have, if it is 2 or more people, then proceed to the next step, otherwise we don't need to detect anything (bc we dont want to detect only 1 person)
    - Next, using itertools, we can create the permutation of different combinations of boxes coordinates. If the L2 distance (? Needs to be checked) of any of those permutations below the defined minimum distance, then we should change the rectangle color between these pairs of boxes into red.
    - Write every frame back into a video.
    

In [31]:
vs = cv2.VideoCapture(video_path)
output_video_1 = None
# Loop until the end of the video stream
while True:
    # Load the image of the ground and resize it to the correct size
    img = cv2.imread("C:/Users/ASUS/models/image/chemin_1.png")
    bird_view_img = cv2.resize(img, dim, interpolation = cv2.INTER_AREA)
    
    # Load the frame
    (frame_exists, frame) = vs.read()
    
    print(frame_exists)
   
    if not frame_exists:
        break
    else:

        # Make the predictions for a frame
        (boxes, scores, classes) =  model.predict(frame)
       

        # Return only boundix boxes that represent humans 
        array_boxes_detected = get_human_box_detection(boxes,scores[0].tolist(),classes[0].tolist(),frame.shape[0],frame.shape[1])

        # Compute the centroids of each bounding boxes
        array_centroids= get_centroids(array_boxes_detected)

        # Use the transform matrix to get the transformed coordinates
        transformed_centroids = compute_point_perspective_transformation(matrix, array_centroids)

        # Check if 2 or more people have been detected (otherwise no need to detect)
        if len(transformed_centroids) >= 2:
            for index, centroid in enumerate(transformed_centroids):
                if not (centroid[0] > width or centroid[0] < 0 or centroid[1] > height+200 or centroid[1] < 0 ):
                    cv2.rectangle(frame,(array_boxes_detected[index][1],array_boxes_detected[index][0]),(array_boxes_detected[index][3],array_boxes_detected[index][2]),GREEN,2)

            # Iterate over every possible permutations of the transformed centroids
            list_indexes = list(itertools.combinations(range(len(transformed_centroids)), 2))
            
            for i,pair in enumerate(itertools.combinations(transformed_centroids, r=2)):
                
                # Check if the distance between each combination of points is less than the minimum distance
                if math.sqrt( (pair[0][0] - pair[1][0])**2 + (pair[0][1] - pair[1][1])**2 ) < int(distance_minimum):
                    
                    # Change the colors of the points that are too close from each other to red
                    if not (pair[0][0] > width or pair[0][0] < 0 or pair[0][1] > height+200  or pair[0][1] < 0 or pair[1][0] > width or pair[1][0] < 0 or pair[1][1] > height+200  or pair[1][1] < 0):
                       
                        index_pt1 = list_indexes[i][0]
                        index_pt2 = list_indexes[i][1]
                        
                        cv2.rectangle(frame,(array_boxes_detected[index_pt1][1],array_boxes_detected[index_pt1][0]),(array_boxes_detected[index_pt1][3],array_boxes_detected[index_pt1][2]),RED,2)
                        cv2.rectangle(frame,(array_boxes_detected[index_pt2][1],array_boxes_detected[index_pt2][0]),(array_boxes_detected[index_pt2][3],array_boxes_detected[index_pt2][2]),RED,2)


    # Draw the rectangle of the area in the images where the detection is considered
    draw_rectangle(corner_points, frame)
    
    key = cv2.waitKey(1) & 0xFF

    # Write the both outputs video to a local folders
    if output_video_1 is None:
        fourcc1 = cv2.VideoWriter_fourcc(*"MJPG")
        output_video_1 = cv2.VideoWriter("C:/Users/ASUS/models/video/output_video_i.avi", fourcc1, 25,(frame.shape[1], frame.shape[0]), True)
       
    elif output_video_1 is not None:
        output_video_1.write(frame)

    # Break the loop
    if key == ord("q"):
        break

True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
