# Track Person of Interest

We won't be using DreamAI to train a model. This is an example of what we can do by combining two general purpose models trained using DreamAI. We will use an object detection model and a face detection model to detect and blur out people who are not the person of interest (PoI).

## Imports

In [1]:
import sys

# Make sure to change this path to your folder with DreamAI

sys.path.insert(0, '/home/farhan/hamza/dreamai/') # Folder with DreamAI

# Things below are all included in the dreamai folder

import utils
import obj_utils
from dai_imports import*

%load_ext autoreload
%autoreload 2

In [2]:
# Load the face detection and the object detection models

face_net = data_processing.load_obj('best_face_net.pkl')
person_net = data_processing.load_obj('best_obj_net.pkl')

In [3]:
# Function used to calculate the intersection over unio (IoU) of two rectangles
# Source: ronny rest (can't seem to find the link to their website)

def get_iou(a, b, epsilon=1e-5):
    """ Given two boxes `a` and `b` defined as a list of four numbers:
            [x1,y1,x2,y2]
        where:
            x1,y1 represent the upper left corner
            x2,y2 represent the lower right corner
        It returns the Intersect of Union score for these two boxes.

    Args:
        a:          (list of 4 numbers) [x1,y1,x2,y2]
        b:          (list of 4 numbers) [x1,y1,x2,y2]
        epsilon:    (float) Small value to prevent division by zero

    Returns:
        (float) The Intersect of Union score.
    """
    # COORDINATES OF THE INTERSECTION BOX
    x1 = max(a[0], b[0])
    y1 = max(a[1], b[1])
    x2 = min(a[2], b[2])
    y2 = min(a[3], b[3])

    # AREA OF OVERLAP - Area where the boxes intersect
    width = (x2 - x1)
    height = (y2 - y1)
    # handle case where there is NO overlap
    if (width<0) or (height <0):
        return 0.0
    area_overlap = width * height

    # COMBINED AREA
    area_a = (a[2] - a[0]) * (a[3] - a[1])
    area_b = (b[2] - b[0]) * (b[3] - b[1])
    area_combined = area_a + area_b - area_overlap

    # RATIO OF AREA OF OVERLAP OVER COMBINED AREA
    iou = area_overlap / (area_combined+epsilon)
    return iou

In [4]:
# Function used to expand a bounding box by some margin

def expand_rect(left,top,right,bottom,H,W, margin = 15):
    if top >= margin:
        top -= margin
    if left >= margin:
        left -= margin
    if bottom <= H-margin:
        bottom += margin
    if right <= W-margin:
        right += margin
    return left,top,right,bottom

def dai_poi(video_path, face_net, person_net, detection_conf = 0.3, nms_overlap = 0.4,
                   rec_tolerance = 0.5, iou_thresh = 0.3, frame_size = (256,256), output = 'poi_out.mp4'):
    writer = None
    
    # initialize the video stream, then allow the camera sensor to warm up
    vs = cv2.VideoCapture(video_path)
    time.sleep(2.0)
    frame_number = 0
    poi = None
    selected = False
    gone = False
    all_blur = True
    faces_found = False
    person_found = False
    tracker = None
    
    while True:
        
        # Grab a frame from the video stream
        (grabbed, frame) = vs.read()
        if grabbed:
            frame_number+=1
            old_H,old_W = frame.shape[:2]
        # If the frame was not grabbed, then we have reached the end of the video
        if not grabbed:
            break
            
# Our model was trained on imgages of size 'frame_size' and in object detection, we can't run our model on
# images of a different size, so we will resize the frame just for predictions
        new_frame = cv2.resize(frame,frame_size)
        (H, W) = new_frame.shape[:2]
        height_scale = old_H/H
        width_scale = old_W/W
        
        # Display the current info on the frame
        
        info = [
                ("Selected", selected),
                ("All Blur", all_blur)
            ]
        for (i, (k, v)) in enumerate(info):
            text = "{}: {}".format(k, v)
            cv2.putText(frame, text, (10, old_H - ((i * 20) + 20)),
                cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 2)
        
        # If PoI has been selected, update the tracker and draw the bounding box on the frame
        
        if selected:           
            (success, box) = tracker.update(frame)
            box = [int(box[0]),int(box[1]),int(box[2]+box[0]),int(box[1]+box[3])]

            # Check to see if the tracking was a success
            
            if success:
                (x1, y1, x2, y2) = box
                cv2.rectangle(frame, (x1, y1), (x2, y2),(0, 0, 255), 2)
                cv2.rectangle(frame, (x1, y2 + 25), (x2, y2), (0, 0, 255), cv2.FILLED)
                font = cv2.FONT_HERSHEY_DUPLEX
                cv2.putText(frame, 'POI', (x1 + 6, y2 + 19), font, 0.5, (255, 255, 255), 1)
        
        # 'all_blur' means that we should blur out every face whether it's the PoI or not
        
        if selected or all_blur:

            # Take the resized 'new_frame' and turn it into a PyTorch batch to be fed to the model
            
            img_batch = utils.get_test_input(imgs=[new_frame],size=frame_size)
            
            # Predict. This will output the bounding boxes and the labels of all faces detected
            
            face_locations = face_net.predict_objects(img_batch,score_thresh=detection_conf,
                                                 nms_overlap=nms_overlap)[0]
            
            faces_found = len(face_locations) > 0
            if faces_found:
                
                # Since our 'new_frame' was the resized frame,
                # we must scale the predicted bounding boxes back to the original frame size
                
                face_locations = [[int(np.ceil(f[0]*width_scale)),int(np.ceil(f[1]*height_scale)),
                               int(np.ceil(f[2]*width_scale)),int(np.ceil(f[3]*height_scale))] 
                               for f in face_locations]
                for (left, top, right, bottom) in face_locations:
                    
                    # If all_blur, blur the face, elif there's a PoI, blur every face but the PoI
                    
                    if all_blur:
                        left,top,right,bottom = expand_rect(left,top,right,bottom,old_H,old_W,margin=40)
                        sub_face = frame[top:bottom, left:right]
                        sub_face = cv2.GaussianBlur(sub_face,(35,35), 100)
                        try:
                            frame[top:top+sub_face.shape[0], left:left+sub_face.shape[1]] = sub_face
                        except:
                            pass    
                    elif success:
                        iou = get_iou(box,(left, top, right, bottom))    
                        if (iou < iou_thresh):
                            left,top,right,bottom = expand_rect(left,top,right,bottom,old_H,old_W,margin=40)
                            try:
                                sub_face = frame[top:bottom, left:right]
                                sub_face = cv2.GaussianBlur(sub_face,(35,35), 100)
                                frame[top:top+sub_face.shape[0], left:left+sub_face.shape[1]] = sub_face
                            except:
                                pass
                            
            # Our face detector might miss some faces because of obstruction or a weird angle, so we will use
            # our object detector to detect a person, and the blur out the top half of that person as well
            # Both of these combined will guarantee that no face is missed

            object_net_pred = person_net.predict_objects(img_batch,score_thresh=detection_conf,
                                                 nms_overlap=nms_overlap)
            object_found = len(object_net_pred[0]) > 0
            if object_found:
                person_locations = np.array(object_net_pred[0])[np.array(object_net_pred[1]) == 'person']
                
                if len(person_locations) > 0:
                
                    person_locations = [[int(np.ceil(f[0]*width_scale)),int(np.ceil(f[1]*height_scale)),
                                       int(np.ceil(f[2]*width_scale)),int(np.ceil(f[3]*height_scale))] 
                                       for f in person_locations]
                    for (left, top, right, bottom) in person_locations:

                        half_coords = (left,top,right,((bottom-top)//2)+top)
                        if all_blur:
                            left,top,right,bottom = half_coords
                            try:
                                sub_face = frame[top:bottom, left:right]
                                sub_face = cv2.GaussianBlur(sub_face,(35,35), 100)
                                frame[top:top+sub_face.shape[0], left:left+sub_face.shape[1]] = sub_face
                            except:
                                pass
                        elif success:
                            iou = get_iou(box,(left, top, right, bottom))
                            left,top,right,bottom = half_coords
                            if (iou < iou_thresh):
                                sub_face = frame[top:bottom, left:right]
                                sub_face = cv2.GaussianBlur(sub_face,(35,35), 100)
                                try:
                                    frame[top:top+sub_face.shape[0], left:left+sub_face.shape[1]] = sub_face
                                except:
                                    pass        
                       
        # Show the output frame

        cv2.imshow("POI Video", frame)
        
        # Write the frame to the output file
        
        if writer is None:
            fourcc = cv2.VideoWriter_fourcc(*"DIVX")
            writer = cv2.VideoWriter(output, fourcc, 24,
                (frame.shape[1], frame.shape[0]), True)
        
        writer.write(frame)
        
        # Slight delay if no PoI selected just to make it easier to select
        
        if not selected:
            time.sleep(0.09)
        key = cv2.waitKey(1) & 0xFF
        
        # If 's' is pressed, video pauses and PoI can be selected. Press 'enter' or 'space' after selecting
        
        if key == ord("s"):
            
            selected = True
            all_blur = False
            poi = cv2.selectROI("POI Video", frame, fromCenter=False,
                showCrosshair=True)

            # Start OpenCV object tracker using the selected PoI
            tracker = cv2.TrackerCSRT_create()
            tracker.init(frame, poi)
            
        # If 'd' is pressed, toggle all_blur
        
        elif key == ord("d"):
            all_blur = not all_blur
            
        # If 'w' is pressed, remove the current PoI and set all_blur to True    
            
        elif key == ord("w"):
            selected = False
            all_blur = True
        
        # If 'q' is pressed, quit
            
        elif key == ord("q"):
            break    
            
    writer.release()
    vs.release()
    cv2.destroyAllWindows()

## Usage

The function 'dai_poi' takes a video path and two models, a face detector and an object detector, and some additional paramaters.
When run, it starts displaying the video in a window, there is a slight delay between frames so that it is easier to mark the region of interest (PoI).

By default, every face is blurred out. Presssing 'd' will toggle 'all_blur' and then noone will be blurred.

When the user presses 's', the video pauses and then they can draw a box on the PoI, then when they press 'space' or 'enter', the video resumes with the PoI now being displayed and tracked and all the other faces being blured.

Pressing 'w' will reset the PoI and turn on 'all_blur'. Until the user presses 's' again, there will be no PoI.

Pressing 'q' will quit the video and the final video with the bounding boxes being displayed will be saved in the 'output' file.

You can see the output generated video called 'mall_poi_github.mp4' as well as the screen recording of the demo called 'poi_usage_demo.mp4'

If you want to try it yourself, just run this notebook.

In [5]:
video_path = 'videos/mall.mp4'

In [7]:
dai_poi(video_path,face_net,person_net,detection_conf=0.1,nms_overlap=0.3,
                        rec_tolerance=0.56,iou_thresh=0.01,
                        output='mall_poi.mp4')