# Detect Person in Region of Interest

We won't be using DreamAI to train a model. This is an example of what we can do with our general purpose object detection model that we trained here: https://github.com/HamzaFarhan/COCO-Object-Detection-using-DreamAI

## Imports

In [1]:
import sys

# Make sure to change this path to your folder with DreamAI

sys.path.insert(0, '/home/farhan/hamza/dreamai/') # Folder with DreamAI

# Things below are all included in the dreamai folder

import utils
import obj_utils
from dai_imports import*

%load_ext autoreload
%autoreload 2

In [2]:
# Load the object of the model that we trained

net = data_processing.load_obj('best_obj_net.pkl')

In [3]:
# Function used to calculate the intersection over unio (IoU) of two rectangles
# Source: ronny rest (can't seem to find the link to their website)

def get_iou(a, b, epsilon=1e-5):
    """ Given two boxes `a` and `b` defined as a list of four numbers:
            [x1,y1,x2,y2]
        where:
            x1,y1 represent the upper left corner
            x2,y2 represent the lower right corner
        It returns the Intersect of Union score for these two boxes.

    Args:
        a:          (list of 4 numbers) [x1,y1,x2,y2]
        b:          (list of 4 numbers) [x1,y1,x2,y2]
        epsilon:    (float) Small value to prevent division by zero

    Returns:
        (float) The Intersect of Union score.
    """
    # COORDINATES OF THE INTERSECTION BOX
    x1 = max(a[0], b[0])
    y1 = max(a[1], b[1])
    x2 = min(a[2], b[2])
    y2 = min(a[3], b[3])

    # AREA OF OVERLAP - Area where the boxes intersect
    width = (x2 - x1)
    height = (y2 - y1)
    # handle case where there is NO overlap
    if (width<0) or (height <0):
        return 0.0
    area_overlap = width * height

    # COMBINED AREA
    area_a = (a[2] - a[0]) * (a[3] - a[1])
    area_b = (b[2] - b[0]) * (b[3] - b[1])
    area_combined = area_a + area_b - area_overlap

    # RATIO OF AREA OF OVERLAP OVER COMBINED AREA
    iou = area_overlap / (area_combined+epsilon)
    return iou

In [7]:
def dai_roi(video_path, person_net, detection_conf = 0.3, nms_overlap = 0.4, iou_thresh = 0.3,
            frame_size = (256,256), output = 'roi_out.mp4'):

    writer = None
    vs = cv2.VideoCapture(video_path)
    time.sleep(2.0)
    frame_number = 0
    selected = False
    
    while True:
        
        # Grab a frame from the video stream
        (grabbed, frame) = vs.read()
        if grabbed:
            frame_number+=1
            old_H,old_W = frame.shape[:2]
        # If the frame was not grabbed, then we have reached the end of the video
        if not grabbed:
            break
            
# Our model was trained on imgages of size 'frame_size' and in object detection, we can't run our model on
# images of a different size, so we will resize the frame just for predictions
        new_frame = cv2.resize(frame,frame_size)
        (H, W) = new_frame.shape[:2]
        height_scale = old_H/H
        width_scale = old_W/W
        
        # If RoI has been selected, draw it on the frame and then predict objects in the frame
        
        if selected:           
            
            # Draw RoI
            
            (x1, y1, x2, y2) = roi
            cv2.rectangle(frame, (x1, y1), (x2, y2),(0, 0, 255), 2)
            cv2.rectangle(frame, (x1, y2 + 25), (x2, y2), (0, 0, 255), cv2.FILLED)
            font = cv2.FONT_HERSHEY_DUPLEX
            cv2.putText(frame, 'ROI', (x1 + 6, y2 + 19), font, 0.5, (255, 255, 255), 1)
            
            # Take the resized 'new_frame' and turn it into a PyTorch batch to be fed to the model
            
            img_batch = utils.get_test_input(imgs=[new_frame],size=frame_size)
            
            # Predict. This will output the bounding boxes and the labels of all objects detected
            
            object_net_pred = person_net.predict_objects(img_batch,score_thresh=detection_conf,
                                                 nms_overlap=nms_overlap)
            
            # If any object has been found, further filter them to objects with the label 'person'
            
            objects_found = len(object_net_pred[0]) > 0
            if objects_found:
                person_locations = np.array(object_net_pred[0])[np.array(object_net_pred[1]) == 'person']
                if len(person_locations) > 0:
                    
                    # Since our 'new_frame' was the resized frame,
                    # we must scale the predicted bounding boxes back to the original frame size
                    
                    person_locations = [[int(np.ceil(f[0]*width_scale)),int(np.ceil(f[1]*height_scale)),
                                       int(np.ceil(f[2]*width_scale)),int(np.ceil(f[3]*height_scale))] 
                                       for f in person_locations]
                    
                # Draw a bounding box around a person if their IoU with the RoI is greater than 'iou_thresh'
                    
                    for (x1,y1,x2,y2) in person_locations:

                        iou = get_iou(roi,(x1,y1,x2,y2))
                        if (iou > iou_thresh):
                            cv2.rectangle(frame, (x1, y1), (x2, y2),(255, 0, 0), 2)
                            cv2.rectangle(frame, (x1, y2 + 25), (x2, y2), (255, 0, 0), cv2.FILLED)
                            font = cv2.FONT_HERSHEY_DUPLEX
                            cv2.putText(frame, 'person', (x1 + 6, y2 + 19), font, 0.5, (255, 255, 255), 1)
                       
        # Show the output frame
        
        cv2.imshow("ROI Video", frame)
        
        # Write the frame to the output file
        
        if writer is None:
            fourcc = cv2.VideoWriter_fourcc(*"DIVX")
            writer = cv2.VideoWriter(output, fourcc, 24,
                (frame.shape[1], frame.shape[0]), True)
        
        writer.write(frame)
        
        # Slight delay if no RoI selected just to make it easier to select
        
        if not selected:
            time.sleep(0.09)
        key = cv2.waitKey(1) & 0xFF
        
        # If 's' is pressed, video pauses and RoI can be selected. Press 'enter' or 'space' after selecting
        
        if key == ord("s"):            
            selected = True
            roi = cv2.selectROI("ROI Video", frame, fromCenter=False,
                showCrosshair=True)
            roi = roi[0],roi[1],roi[0]+roi[2],roi[1]+roi[3]
            
        # If 'w' is pressed, remove the current RoI    
            
        elif key == ord("w"):
            selected = False
        
        # If 'q' is pressed, quit
            
        elif key == ord("q"):
            break    
            
    writer.release()
    vs.release()
    cv2.destroyAllWindows()

## Usage

The function 'dai_roi' takes a video path and an object detection model and some additional paramaters.
When run, it starts displaying the video in a window, there is a slight delay between frames so that it is easier to mark the region of interest (RoI).

When the user presses 's', the video pauses and then they can draw a box on the RoI, then when they press 'space' or 'enter', the video resumes with the RoI now being displayed.

Now whenever a person will enter the RoI, a bounding box will be displayed around them as long as they are in the RoI.

Pressing 'w' will reset the RoI and until the user presses 's' again, there will be no RoI.

Pressing 'q' will quit the video and the final video with the bounding boxes being displayed will be saved in the 'output' file.

You can see the output generated video called 'rf_roi_github.mp4' as well as the screen recording of the demo called 'roi_usage_demo.mp4'

If you want to try it yourself, just run this notebook.

In [8]:
video_path = 'videos/rf.mp4'

In [9]:
dai_roi(video_path,net,detection_conf=0.25,nms_overlap=0.2,iou_thresh=0.08,output='rf_roi.mp4')