### Small summary of the algorithm

1. Obtaining the **difference** of two consequtive **frames** to get the part which chenges over time
2. Applying the **Gaussian Blur** on this difference to smoothen (and extend a little) the borders of potential cars
3. Applying simple **binary thresholding** to binarize the image
4. To make to borders a bit more distinguishable, applying **dilation** couple of times
5. On the dilated image finding the contours and **bounding boxes** for each of them
6. As some of the obtained boxes for moving objects might be decomposed as several boxes isntead of one, next step is **merging the boxes** which might represent the same object, into one using the difference in width and height between centers of boxes
7. Once we got the true bounding boxes at each frame, we calculate their trails by **pairwise compare** of bounding boxes using *Intersection over Union* metric. E.g. those, who have  a high IoU are considered as the same object, and we may continue the trail with the color of trail from the previous frame
8. **Drawing** the bounding boxes and trails on each of the frames
9. **Concatenating** every frame **into** single 60FPS **video**

1. Video into frames translation is adapted from:

    https://learnopencv.com/reading-and-writing-videos-using-opencv/
    
2. Idea of taking a differences in frames is taken from:

    https://www.analyticsvidhya.com/blog/2020/04/vehicle-detection-opencv-python/
    
3. Information about opening and closing is taken from:

    https://docs.opencv.org/master/d9/d61/tutorial_py_morphological_ops.html
    
4. Finding the car contours is adapted from:

    https://docs.opencv.org/3.4/d4/d73/tutorial_py_contours_begin.html
    
5. Merging the frames into video is adapted from:

    https://stackoverflow.com/a/44948030
    
6. Code for finding IoU (Intersection over Union) as adapted from:

    https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
    
7. Line to draw a point in opencv2 is taken from:

    https://stackoverflow.com/a/60546030

In [1]:
import matplotlib.pyplot as plt
from random import randint
import cv2

In [2]:
def show(images):
    '''
    Showing the images using pyplot  (a helper function)
    '''
    
    # going through each iamge and plotting it
    for image in images:
        
        # detecting the cmap
        cmap = 'gray' if len(image.shape) == 2 else None

        # setting the size of the figure
        plt.figure(figsize = (20, 15))

        # showing the image
        plt.imshow(image, cmap = cmap)
        plt.show()

In [3]:
class MovingObject:
    '''
    Class for representing a moving object with its last bounding box,
    a random color and limit for a trail (along with trail points)
    '''
    
    def __init__(self, coords):
        '''
        Setting the coorinates of object and assigning a random color with trail limit
        '''
        
        # setting the coordinates
        self.coords = coords
        
        # setting the random RGB color 
        self.color = tuple(randint(0, 255) for _ in range(3))
        
        # setting the limit for a tril (in frames)
        self.limit = 100
        
        # initializing a list to hold all the trail points
        self.trail = []
        
        # append a first point
        self.draw_point()
        
    def draw_point(self):
        '''
        Generate a new trail point and add it to the existing trail
        '''
        
        # generating a center of bounding box
        point = ((self.coords[0] + self.coords[2]) // 2, (self.coords[1] + self.coords[3]) // 2)
        
        # adding it to the trail
        self.trail.append(point)
        
    def iou(self, another_obj):
        '''
        Calculating Intersection over Union metric of areas between two given moving objects
        '''
        
        # determining coordinates of intersection 
        xA = max(self.coords[0], another_obj.coords[0])
        yA = max(self.coords[1], another_obj.coords[1])
        xB = min(self.coords[2], another_obj.coords[2])
        yB = min(self.coords[3], another_obj.coords[3])
        
        # calculating areas of interseciton rectangle 
        int_area   = max(0, xB - xA) * max(0, yB - yA)
        
        # calculating the area of two initial boxes and its union part
        box1_area  = (self.coords[2] - self.coords[0]) * (self.coords[3] - self.coords[1])
        box2_area  = (another_obj.coords[2] - another_obj.coords[0]) * (another_obj.coords[3] - another_obj.coords[1])
        union_area = box1_area + box2_area - int_area
        
        # retuning the ratio of intersection part and union part
        return int_area / union_area if union_area else 0

In [4]:
class MyObjectDetector:
    '''
    Engine for detecting objects (hopefully cars) from a video in a specified directory
    '''
    
    def __init__(self, video_path):
        '''
        Opening the video and sampling it into frames
        '''
        
        # open video
        video = cv2.VideoCapture(video_path)
        
        # obtaining colored and grayscale frames from video
        self.frames_gray = []
        self.frames = []
        while video.isOpened():
            
            # getting the next frame along with end of video flag
            flag, frame_bgr = video.read()
            
            # if we reached the end of stream, stopping
            if not flag: break
                
            # obtaining grayscale version of a particular frame
            frame_gray = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2GRAY)
            
            # adding both frames to the list
            self.frames.append(frame_bgr)
            self.frames_gray.append(frame_gray)
        
        # reseasing the stream video object
        video.release()
        
        # structure holding all the boxes
        self.boxes = [] 
        
    def detect_bounding_boxes(self, debug = False):
        '''
        Function which detects bounding objects for (hopefully) cars
        '''
        
        # going through each frame and polulating the boxes structure
        for i in range(1, len(self.frames_gray)):
            
            # defining the two consequtive frames
            previous_frame = self.frames_gray[i - 1].copy()
            current_frame  = self.frames_gray[i].copy()
            
            # taking their difference to detect movement
            subtracted = cv2.absdiff(current_frame, previous_frame)
            
            # applying gaussian blur to smoothen the borders
            k = 11
            gauss = cv2.GaussianBlur(subtracted, (k, k), 0)
            
            # cutting out the regions that we're not interested in (upper and lower parts of picture)
            gauss[:120, :] = 0
            gauss[250:, :] = 0
            
            # binarizing the differenced using a binary threshold
            _, thresh = cv2.threshold(gauss, 25, 255, cv2.THRESH_BINARY)
            
            # applying dilating on image to extend the car contours (so it's easier to detect them)
            k = 5
            dilated = cv2.dilate(thresh, (k, k), iterations = 7)
            
            # finding the contours of moving objects
            contours, _ = cv2.findContours(dilated, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
            
            # filtering out very small contours (probably noise from sun or objects that only start appearing)
            # contours = list(filter(lambda cont: 25 <= cv2.contourArea(cont), contours))
            
            # obtaining bouning boxes for objects contours
            OFFSET = 3  # offset obtained after dilation
            SHADOW = 8 # average thickness of a shadow to remove it within box
            boxes  = [] # boxes for particular frame
            for cont in contours:
                
                # obtaining box coordinates
                x, y, w, h = cv2.boundingRect(cont)
                
                # normalizing them
                x += OFFSET // 2
                w -= OFFSET
                h -= SHADOW
                y -= OFFSET
                
                # appending the to the boxes holding structure
                boxes.append((x, y, x + w, y + h))
                
            # adding boxes to the overall boxes quantity
            self.boxes.append(boxes)
            
            # plotting the process of obtaining the car contours for debugging
            if debug: show([gauss, thresh, dilated, current_frame]); break
                
        # increasing the precision of detection by correcting the bounding boxes
        self.merge_bounding_boxes()
                
    def merge_bounding_boxes(self):
        '''
        Correct the output of contour detection and merge bounding boxes of the same object
        into one bounding box
        '''
        
        def boxes_are_merged(boxes, h_thr, w_thr):
            '''
            Given a list of boxes checks if some of them belong the same object
            '''
            
            # going through each pair of boxes
            for i in range(len(boxes) - 1):
                for j in range(i + 1, len(boxes)):
                    
                    # defining the centers of wo bounding boxes
                    center1 = ((boxes[i][0] + boxes[i][2]) // 2, (boxes[i][1] + boxes[i][3]) // 2)
                    center2 = ((boxes[j][0] + boxes[j][2]) // 2, (boxes[j][1] + boxes[j][3]) // 2)
                    
                    # checking their similarity w.r.t. to their centers
                    if abs(center1[0] - center2[0]) <= w_thr and abs(center1[1] - center2[1]) <= h_thr:
                        return False

            # returning true is there are no pairs to merge
            return True
        
        def merged_boxes(boxes, h_thr, w_thr):
            '''
            Merging the first pair of boxes which it sees
            '''
            
            # set of indices which are merged
            victims = set()
            
            # going through each pair of boxes
            for i in range(len(boxes) - 1):
                for j in range(i + 1, len(boxes)):
                    
                    # defining the centers of wo bounding boxes
                    center1 = ((boxes[i][0] + boxes[i][2]) // 2, (boxes[i][1] + boxes[i][3]) // 2)
                    center2 = ((boxes[j][0] + boxes[j][2]) // 2, (boxes[j][1] + boxes[j][3]) // 2)
                    
                    # if similar then we merge
                    if abs(center1[0] - center2[0]) <= w_thr and abs(center1[1] - center2[1]) <= h_thr:
                        
                        # defining a box which will go away
                        victims.add(i)
                        
                        # populating a master box 
                        boxes[j] = (
                            min(boxes[i][0], boxes[j][0]),
                            min(boxes[i][1], boxes[j][1]),
                            max(boxes[i][2], boxes[j][2]),
                            max(boxes[i][3], boxes[j][3])
                        )
                        
            # filtering out the victim boxes because they got merged      
            return [box for i, box in enumerate(boxes) if i not in victims]
        
        # defining thresholds for searching the similarity between bounding boxes
        HEIGHT_THRESH = 10
        WIDTH_THRESH  = 50
        
        # going through each set of boxes
        for i in range(len(self.boxes)):
            
            # merging boxes until we merge every pair with each other
            while not boxes_are_merged(self.boxes[i], HEIGHT_THRESH, WIDTH_THRESH):
                self.boxes[i] = merged_boxes(self.boxes[i], HEIGHT_THRESH, WIDTH_THRESH) 
                
    def compute_trails(self):
        '''
        Transforming simple bounding boxes and moving objects
        with trail color and trail length
        '''
        
        # structure which will hold the moving objects
        self.objects = []
        
        # initializing the 2nd frame objects using the first pack of boxes
        self.objects.append([MovingObject(box) for box in self.boxes[0]])
        
        # going through each set of boxes
        for i in range(1, len(self.boxes)):
            
            # initializing objects of current and previous frame
            prev_obj = self.objects[-1]
            cur_obj  = [MovingObject(box) for box in self.boxes[i]]
            
            # finding the similarities between objects on current and previous frame
            similarities = [[None for j in range(len(prev_obj))] for k in range(len(cur_obj))]
            for j in range(len(cur_obj)):
                for k in range(len(prev_obj)):
                    
                    # calculating the Intersection over Union metric for each pair of objects
                    similarities[j][k] = cur_obj[j].iou(prev_obj[k])
                    
            # finding object which maximizes IoU similarity
            for j, row in enumerate(similarities):
                
                # getting the most fitting result
                max_sim = max(row)
                
                # if there exists the same object on the next frame
                # we continue the trail and reassign the coordinates of bounding box
                if max_sim > 0.1:
                    
                    # getting the position of object on previous frame
                    prev_obj_ind = row.index(max_sim)
                    
                    # updating the information about trails and color
                    cur_obj[j].color = prev_obj[prev_obj_ind].color
                    cur_obj[j].trail = prev_obj[prev_obj_ind].trail[:]
                    
                    # appending another point to the trail
                    cur_obj[j].draw_point()
            
            # appending the objects with updated info
            self.objects.append(cur_obj)

    def detect_and_record(self, path):
        '''
        Detect moving objects on a processed video and produce and output
        in the form of video with bounding boxes
        '''
        
        # populating bounding boxes 
        self.detect_bounding_boxes()
        
        # computing the trails for each moving object
        self.compute_trails()
        
        # drawing them on every available image frame
        for frame, objects in zip(self.frames[1:], self.objects):
            
            # going through each object in defined boxes and drawing info about it
            for obj in objects:
                
                # drawing the last 'limit' points of the trail
                for x, y in obj.trail[-min(obj.limit, len(obj.trail)):]:
                    cv2.circle(frame, (x, y), radius = 2, color = obj.color, thickness = -1)
                
                # drawing the bounding box
                cv2.rectangle(frame,
                              (obj.coords[0], obj.coords[1]),
                              (obj.coords[2], obj.coords[3]),
                              (0, 0, 255), 2)
        
        # getting the shape of the frame 
        h, w, _ = self.frames[0].shape
        
        # initializing the writing to video engine
        video = cv2.VideoWriter('detected.avi', cv2.VideoWriter_fourcc(*'DIVX'), 60, (w, h))
        
        # going through each obtained image and adding it to the video
        for frame in self.frames:
            video.write(frame)
        
        # releasing the video and finishing writing
        video.release()

In [5]:
detect0r = MyObjectDetector('cars.mp4')
detect0r.detect_and_record('detected.avi')