## Implementation Process of Each Functionality and Introduction of the Parameters

The following code and markdowns introduce the implementation process of the 3 main functionalities of this project, including 360 object detection, 360 object tracking and 360 overtaking behaviour detection. 

All the functions in this notebook can be found in ./panoramic_detection/improved_OD.py, ./Object_Detection.py, ./Object_Tracking.py and ./Overtaking_Detection.py. Although some other code has been written in ./panoramic_detection/draw_output.py and ./deep_sort/, since it is low in importance and easy to understand with the comments, it is not included here.

### 1. Define a Function for Loading the Models (Faster RCNN and YOLO v5)

<b>Parameters of load_model():</b>

- <b>model_type:</b> The name of the model to use which should be either 'YOLO' or 'Faster RCNN';

- <b>input_size:</b> The maximum input size of the model, 640 by default;

- <b>score_threshold:</b> The threshold of the confidence score, 0.4 by default;

- <b>nms_threshold:</b> The threshold of the Non Maximum Suppression, 0.45 by default.

In [2]:
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()
import numpy as np
import os, json, cv2, random
from matplotlib import pyplot as plt
import torch
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog

# function used to load a YOLO or Faster RCNN model according to the users' demands
def load_model(model_type,input_size=640,score_threshold=0.4,nms_threshold=0.45):

    # first get the default config
    cfg = get_cfg()

    # choose a model from detectron2's model zoo
    cfg.merge_from_file(
        model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")
    )
    cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(
        "COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"
    )

    cfg.INPUT.MAX_SIZE_TEST = input_size  # set the size of the input images
    cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = score_threshold  # set the threshold of the confidence score
    cfg.MODEL.ROI_HEADS.NMS_THRESH_TEST = nms_threshold  # set the NMS threshold

    # set the device to use (GPU or CPU)
    if torch.cuda.is_available():
        cfg.MODEL.DEVICE = "cuda"  
    else:
        cfg.MODEL.DEVICE = "cpu"

    # only work on apple m1 mac
    # cfg.MODEL.DEVICE = 'mps'

    # create a predictor instance with the config above
    predictor_faster_RCNN = DefaultPredictor(cfg)

    # choose a model from YOLO v5 family
    predictor_YOLO = torch.hub.load("ultralytics/yolov5", "yolov5m6")
    predictor_YOLO.conf = score_threshold  # set the threshold of the confidence score
    predictor_YOLO.iou = nms_threshold  # set the NMS threshold
    predictor_YOLO.agnostic = True  # NMS class-agnostic (i.e., only the bboxes with the same category can be eliminated after NMS)

    if model_type=='Faster RCNN':
        return predictor_faster_RCNN,cfg
    else:
        return predictor_YOLO,cfg

### 2. Prepare Some Functions Used for the Improved Object Detection


#### 2.1. Projection Transformation from Equirectangular to Perspective</b>

<b>Parameters of equir2pers():</b>

- <b>input_img:</b> The input image which is represented with multidimensional matrix;

- <b>FOV:</b> Field of view of the sub images;

- <b>THETAs:</b> A list which contains the theta of each sub image (The length should be the same as the number of sub images);

- <b>PHIs:</b> A list which contains the Phi of each sub image (The length should be the same as the number of sub images);

- <b>output_height, output_width:</b> Height and width of the output images (which should be the same).


In [3]:
# import the Perspective_and_Equirectangular library
import lib.Equirec2Perspec as E2P
import lib.Perspec2Equirec as P2E
import lib.multi_Perspec2Equirec as m_P2E

# function used to split the equirectangular image into several sub images which are in perspective projection
def equir2pers(input_img, FOV, THETAs, PHIs, output_height, output_width):
    equ = E2P.Equirectangular(input_img)  # Load the equirectangular image

    # set where to save the outputs
    output_dir = "./output_sub/"
    if not os.path.exists(output_dir):
        os.mkdir(output_dir)

    # maps which define the projection from equirectangular to perspective
    lon_maps = []
    lat_maps = []
    imgs = []  # output images

    # for each sub image
    for i in range(len(PHIs)):
        img1, lon_map1, lat_map1 = equ.GetPerspective(
            FOV, THETAs[i], PHIs[i], output_height, output_width
        )
        # save the outputs
        output1 = output_dir + str(i) + ".png"
        cv2.imwrite(output1, img1)
        lon_maps.append(lon_map1)
        lat_maps.append(lat_map1)
        imgs.append(img1)

    return lon_maps, lat_maps, imgs


#### 2.2. Project the Bounding Boxes on the Sub Images Back to the Original Image and Return the Bounding Boxes whose Left/Right Borders are Tangent to a Border of the Sub Image (which are Required to be Merged)

<b>Parameters of reproject_bboxes():</b>

- <b>bboxes:</b> A list of bounding boxes in [x y x y] format;

- <b>lon_map_original, lat_mat_original:</b> Map matrix got in the projection transformation from equirectangular to perspective (i.e., lon_maps, lat_maps returned by equir2pers());

- <b>classes, scores:</b> Lists of classes and scores predicted by the object detection model;

- <b>interval:</b> A value which determines how many pixels apart to calculate the corresponding coordinate point of the bounding boxes on the sub images. The smaller the interval is, the higher accuracy will be achieved;

- <b>num_of_subimage:</b> Serial number of the current sub image (0 or 1 or 2 or 3), as shown in the following image;

<div align=center><img src ="./images_in_markdown/markdown1.jpg"/></div>

- <b>input_video_width, input_video_height:</b> Height and width of the input video;

- <b>num_of_subimages:</b> Total number of the sub images (which should be 4 by default);

- <b>threshold_of_boundary:</b> A threshold used to determine whether a left/right border of a bounding box is tangent to a border of the sub image (i.e., distance < threshold_of_boundary);

- <b>is_split_image2:</b> Boolean value which used to determine whether to split the bboxes across the center line of sub image 2 into two, as shown below. 

<div align=center><img style="width:500px;"src ="./images_in_markdown/markdown2.png"/></div>


In [4]:
# function used to reproject the bboxes on the sub images (perspective) to the original image (equirectangular)
# and find the bboxes whose left/right border is tangent to a border of the sub image (i.e., distance < threshold_of_boundary)
def reproject_bboxes(
    bboxes,
    lon_map_original,
    lat_map_original,
    classes,
    scores,
    interval,
    num_of_subimage,
    input_video_width,
    input_video_height,
    num_of_subimages,
    threshold_of_boundary,
    is_split_image2=True,
):

    # lists for storing the new bboxes,classes and scores after reprojection
    new_bboxes = []
    new_classes = []
    new_scores = []

    # variables which store the index of the bboxes (in the list new_bboxes) which coincide with the left/right boundaries of the sub image
    left_boundary_box = None
    right_boundary_box = None

    # calculate the overlapped degree between each pair of the adjacent sub images (if the number of sub images is 4, then the result is 30)
    overlaped_degree = (num_of_subimages * 120 - 360) / num_of_subimages
    # calculate which subimage will be splited into two parts (if the number of sub images is 4, then the results will be image 2)
    num_of_splited_subimage = num_of_subimages / 2

    index = 0
    # number of pixels occupied by (overlaped_degree/2) degrees on a sub image
    margin = int(lon_map_original.shape[0] / 120 * (overlaped_degree / 2))

    # for each bbox, class and score
    for bbox, class1, score in zip(bboxes, classes, scores):

        # get the coordinates of the top left point and the right bottom point
        left_top_x = int(bbox[0])
        left_top_y = int(bbox[1])
        right_bottom_x = int(bbox[2])
        right_bottom_y = int(bbox[3])

        # only reproject the bboxes when they are not totally inside the overlapped area and their y-values are less than 70 degrees (or sometimes the backpack of the cyclist will be incorrectly detected as a car)
        if (
            margin
            <= ((left_top_x + right_bottom_x) / 2)
            <= (lon_map_original.shape[0] - margin)
            and left_top_y <= lon_map_original.shape[0] / 120 * 70
        ):

            # since for an a*b sub image, the size of lon_map and lat_map is (a-1)*(b-1), when right_bottom_x or right_bottom_y equals a or b,
            # to get the corresponding value in lon_map and lat_map (which represent the corresponding position on the original image), we have to subtract them by 1.
            if right_bottom_x == lon_map_original.shape[0]:
                right_bottom_x -= 1
            if right_bottom_y == lon_map_original.shape[1]:
                right_bottom_y -= 1

            # check if a bbox coincides with the left/right boundaries of the sub image, if yes, assign its index to left_boundary_box/right_boundary_box
            # if the bbox is large (>subimage size/5), just use the threshold to do the judgement
            if (right_bottom_x - left_top_x) * (
                right_bottom_y - left_top_y
            ) < lon_map_original.shape[0] * lon_map_original.shape[0] / 5:
                if left_top_x <= threshold_of_boundary:
                    left_boundary_box = index
                if right_bottom_x >= lon_map_original.shape[
                    0
                ] - threshold_of_boundary:
                    right_boundary_box = index

            # if the bbox is small (<=subimage size/5), set the threshold a little bit larger
            # (based on my experience, it has better performance ^_^)
            else:
                if left_top_x <= (
                    threshold_of_boundary + 15 * int(lon_map_original.shape[0] / 640)
                ):
                    left_boundary_box = index
                if right_bottom_x >= lon_map_original.shape[0] - (
                    threshold_of_boundary + 15 * int(lon_map_original.shape[0] / 640)
                ):
                    right_boundary_box = index

            # lists used to store the corresponding x and y coordinates on the original image of each point on the bbox
            xs = []
            ys = []

            # if the current sub image is the one which crosses the boundary (e.g., image 2 when the number of sub image is 4)
            # and the current bbox is across the center line
            if (
                num_of_subimage == num_of_splited_subimage
                and left_top_x <= int(lon_map_original.shape[0] / 2) - 1
                and right_bottom_x >= int(lon_map_original.shape[0] / 2)
            ):  
                # lists used to store the x coordinates on the original image of each point on the left/right part of the bbox
                xs_left = []
                xs_right = []

                # calculation for the left and right borders
                for i in range(left_top_y, right_bottom_y, interval):
                    # left border
                    x = int(round(lon_map_original[i, left_top_x]))
                    y = int(round(lat_map_original[i, left_top_x]))
                    xs.append(x)
                    ys.append(y)
                    xs_left.append(x)
                    # right border
                    x = int(round(lon_map_original[i, right_bottom_x]))
                    y = int(round(lat_map_original[i, right_bottom_x]))
                    xs.append(x)
                    ys.append(y)
                    xs_right.append(x)

                    
                # calculation for the left part of the top and bottom borders
                for i in range(
                    left_top_x, int(lon_map_original.shape[0] / 2) - 1, interval
                ):
                    x = int(round(lon_map_original[left_top_y, i]))
                    y = int(round(lat_map_original[left_top_y, i]))
                    xs.append(x)
                    ys.append(y)
                    xs_left.append(x)
                    x = int(round(lon_map_original[right_bottom_y, i]))
                    y = int(round(lat_map_original[right_bottom_y, i]))
                    xs.append(x)
                    ys.append(y)
                    xs_left.append(x)

                # calculation for the right part of the top and bottom borders
                for i in range(
                    int(lon_map_original.shape[0] / 2), right_bottom_x, interval
                ):
                    x = int(round(lon_map_original[left_top_y, i]))
                    y = int(round(lat_map_original[left_top_y, i]))
                    xs.append(x)
                    ys.append(y)
                    xs_right.append(x)
                    x = int(round(lon_map_original[right_bottom_y, i]))
                    y = int(round(lat_map_original[right_bottom_y, i]))
                    xs.append(x)
                    ys.append(y)
                    xs_right.append(x)


                ymax = max(ys)
                ymin = min(ys)
                xmin_left = min(xs_left)
                xmax_right = max(xs_right)

                # if it is needed to split the bbox into two parts, create two bboxes with the MBRs of the left and right part seperately
                if is_split_image2 == True:
                    new_bboxes.append([xmin_left, ymin, input_video_width, ymax])
                    new_bboxes.append([0, ymin, xmax_right, ymax])
                    new_classes.append(int(class1))
                    new_classes.append(int(class1))
                    new_scores.append(score)
                    new_scores.append(score)
                    index += 2
                
                # if not, create one bbox which extends outside the right boundary
                else:
                    new_bboxes.append(
                        [xmin_left, ymin, input_video_width + xmax_right, ymax]
                    )
                    new_classes.append(int(class1))
                    new_scores.append(score)
                    index += 1
            
            # if the current sub image is not the one which crosses the boundary
            else:
                # in case the interval is set larger than the length of the border, if so, set it as the length of the short side of the bbox
                if (
                    right_bottom_x - left_top_x < interval
                    or right_bottom_y - left_top_y < interval
                ):
                    interval = min(
                        right_bottom_x - left_top_x, right_bottom_y - left_top_y
                    )
                
                # get the corresponding coordinates on the original image of each point on the boundary
                for i in range(left_top_y, right_bottom_y, interval):
                    x = int(round(lon_map_original[i, left_top_x]))
                    y = int(round(lat_map_original[i, left_top_x]))
                    xs.append(x)
                    ys.append(y)
                    x = int(round(lon_map_original[i, right_bottom_x]))
                    y = int(round(lat_map_original[i, right_bottom_x]))
                    xs.append(x)
                    ys.append(y)
                for i in range(left_top_x, right_bottom_x, interval):
                    x = int(round(lon_map_original[left_top_y, i]))
                    y = int(round(lat_map_original[left_top_y, i]))
                    xs.append(x)
                    ys.append(y)
                    x = int(round(lon_map_original[right_bottom_y, i]))
                    y = int(round(lat_map_original[right_bottom_y, i]))
                    xs.append(x)
                    ys.append(y)
                
                # create one bbox with the MBR
                xmax = max(xs)
                xmin = min(xs)
                ymax = max(ys)
                ymin = min(ys)
                new_bboxes.append([xmin, ymin, xmax, ymax])
                new_classes.append(int(class1))
                new_scores.append(score)
                index += 1

    return new_bboxes, new_classes, new_scores, left_boundary_box, right_boundary_box

#### 2.3. Match the Serial Number of the Sub Images with the Serial Number of the Boundaries

As shown in the image below, according to the order of the positions in the original image, the boundaries of the sub images are labelled as boundary 1 to 8.

For example, the left and right borders of image 3 are boundary 1 and boundary 4.

<div align=center><img style="width:600px;" src ="./images_in_markdown/markdown3.png"/></div>

Thus, a function called number_of_left_and_right_boundary() is defined, which is used to match the serial numbers of the sub images with the serial numbers of boundaries.

</br>

<b>Parameters of number_of_left_and_right_boundary():</b>

- <b>number_of_subimage:</b> The serial number (0,1,2,3) of a sub image.

</br>

<b>⚠️Attention:</b> The function is designed specifically for the case of 4 sub images, if the number of sub images is set larger, please do some corresponding modifications.


In [5]:
# function used to match the serial number of the sub image with the serial number of boundary
def number_of_left_and_right_boundary(number_of_subimage):
    if number_of_subimage == 0:
        return [2, 5]
    elif number_of_subimage == 1:
        return [4, 7]
    elif number_of_subimage == 2:
        return [6, 1]
    else:
        return [0, 3]

#### 2.4. Merge the Bounding Boxes of the Objects which are Shown in Several Sub Images

After getting the bounding boxes which are tangent to a boundary, a function called merge_bbox_across_boundary() is defined to merge them into their MBR.

Bounding boxes that need to be merged can be classified into the following 2 categories:

1. <b>Objects crossing 2 sub images:</b> Two bounding boxes are tangent to consecutive two boundaries (boundaries 1&2 or 3&4 or 5&6 or 7&8), as shown below:

<div align=center><img style="width:600px;" src ="./images_in_markdown/markdown5.png"/></div>

2. <b>Objects crossing at least 3 sub images:</b> 4/6/8 bounding boxes are tangent to consecutive 4/6/8 boundaries. For example, for an object crossing image 0,1 and 2, as the following image shows, each of the boundary 5, 6, 7 and 8 should be tangent to a bounding box, and the bounding boxes tangent to boundary 5 and 8 should be the same.

<div align=center><img style="width:600px;" src ="./images_in_markdown/markdown6.png"/></div>

<b>⚠️Attention:</b> Sometimes, the number of consecutive boundaries with a tangent bounding box can be odd (3/5/7). In such cases, just delete the first/last box and merge the remaining ones, for the one to be deleted must be included in another bounding box. For example, in the following image, each of the boundary 5, 6 and 7 is tagent to a bounding box, however, since the blue one is in the overlapped area, the contents of it are also included in the green one. So, here, we can first delete it and merge the yellow and green ones.

<div align=center><img style="width:600px;" src ="./images_in_markdown/markdown4.png"/></div>

</br>
<b>Parameters of merge_bbox_across_boundary():</b>

- <b>bboxes_all:</b> List of bounding boxes after projection to the original image;

- <b>classes_all, scores_all:</b> List of categories and scores of the bounding boxes;

- <b>width, height:</b> Width and height of the original images;

- <b>bboxes_boundary:</b> A list whose length is 8. The Nth value represents the index of the bounding box which is tangent to the Nth boundary.

<b>⚠️Attention:</b> In the implementation of this function, all the possible situations are enumerated. Thus, the code can be a little bit lengthy and complex, sorry about that. In addition, the function is designed specifically for the case of 4 sub images, so if the number of sub images is set larger, please do some corresponding modifications.

</br>  
The following 3 functions are also defined in this part which are used in merge_bbox_across_boundary():

- <b>weighted_average_score():</b> A function used to calculate the weighted average score of several bounding boxes;

- <b>class_with_largest_score():</b> When the bboxes to merge are of different categories, use this function to choose the class with the largest weighted score as the class of the new bbox;

- <b>MBR_bboxes():</b> Calculate the MBR of several bboxes.



In [6]:
# function used to merge the bounding boxes of the objects which are shown in several sub images
def merge_bbox_across_boundary(bboxes_all,classes_all,scores_all,width,height,bboxes_boundary):
    
    # a list to store the index of the bbox to be deleted after we merge them
    bboxes_to_delete=[]

    # first delete the bboxes which are on the boundary and are totally in the overlapped areas
    names = locals()
    for i in range(0,8,1):
        if bboxes_boundary[i] !=None:
            #  although the overlapped area is 30 degree in width, here we set the threshold as 40, for after some tests, it seems 40 can get better performance.
            if (bboxes_all[bboxes_boundary[i]][2]-bboxes_all[bboxes_boundary[i]][0]) <= int(width/360*40):
                bboxes_to_delete.append(bboxes_boundary[i])
                bboxes_boundary[i] = None

    # Assign each value in the array to 8 variables, just for better understanding *_*
    bboxes_boundary1=bboxes_boundary[0]
    bboxes_boundary2=bboxes_boundary[1]
    bboxes_boundary3=bboxes_boundary[2]
    bboxes_boundary4=bboxes_boundary[3]
    bboxes_boundary5=bboxes_boundary[4]
    bboxes_boundary6=bboxes_boundary[5]
    bboxes_boundary7=bboxes_boundary[6]
    bboxes_boundary8=bboxes_boundary[7] 

    # if the object crosses all the 4 overlapped areas (12 34 56 78)
    if bboxes_boundary1!=None and bboxes_boundary2!=None and bboxes_boundary3!=None and bboxes_boundary4!=None and bboxes_boundary5!=None and bboxes_boundary6!=None and bboxes_boundary7!=None and bboxes_boundary8!=None and (bboxes_boundary1==bboxes_boundary4) and (bboxes_boundary3==bboxes_boundary6) and (bboxes_boundary5==bboxes_boundary8):
            bboxes_all.extend(MBR_bboxes([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1],bboxes_all[bboxes_boundary3],bboxes_all[bboxes_boundary5],bboxes_all[bboxes_boundary7]]))
            classes_all.append(class_with_largest_score([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1],bboxes_all[bboxes_boundary3],bboxes_all[bboxes_boundary5],bboxes_all[bboxes_boundary7]],[scores_all[bboxes_boundary2],scores_all[bboxes_boundary1],scores_all[bboxes_boundary3],scores_all[bboxes_boundary5],scores_all[bboxes_boundary7]],[classes_all[bboxes_boundary2],classes_all[bboxes_boundary1],classes_all[bboxes_boundary3],classes_all[bboxes_boundary5],classes_all[bboxes_boundary7]]))
            scores_all.extend([weighted_average_score([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1],bboxes_all[bboxes_boundary3],bboxes_all[bboxes_boundary5],bboxes_all[bboxes_boundary7]],[scores_all[bboxes_boundary2],scores_all[bboxes_boundary1],scores_all[bboxes_boundary3],scores_all[bboxes_boundary5],scores_all[bboxes_boundary7]])])
            bboxes_to_delete.extend([bboxes_boundary1,bboxes_boundary2,bboxes_boundary3,bboxes_boundary4,bboxes_boundary5,bboxes_boundary6,bboxes_boundary7,bboxes_boundary8])
    else:
        # if the object crosses 3 overlapped areas (12 34 56)
        if bboxes_boundary1!=None and bboxes_boundary2!=None and bboxes_boundary3!=None and bboxes_boundary4!=None and bboxes_boundary5!=None and bboxes_boundary6!=None and (bboxes_boundary1==bboxes_boundary4) and (bboxes_boundary3==bboxes_boundary6):
                bboxes_all.extend(MBR_bboxes([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1],bboxes_all[bboxes_boundary3],bboxes_all[bboxes_boundary5]]))
                classes_all.append(class_with_largest_score([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1],bboxes_all[bboxes_boundary3],bboxes_all[bboxes_boundary5]],[scores_all[bboxes_boundary2],scores_all[bboxes_boundary1],scores_all[bboxes_boundary3],scores_all[bboxes_boundary5]],[classes_all[bboxes_boundary2],classes_all[bboxes_boundary1],classes_all[bboxes_boundary3],classes_all[bboxes_boundary5]]))
                scores_all.extend([weighted_average_score([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1],bboxes_all[bboxes_boundary3],bboxes_all[bboxes_boundary5]],[scores_all[bboxes_boundary2],scores_all[bboxes_boundary1],scores_all[bboxes_boundary3],scores_all[bboxes_boundary5]])])
                bboxes_to_delete.extend([bboxes_boundary1,bboxes_boundary2,bboxes_boundary3,bboxes_boundary4,bboxes_boundary5,bboxes_boundary6])

                # if another object crosses the remaining overlapped area (78)
                if bboxes_boundary7!=None and bboxes_boundary8!=None:
                        bboxes_all.extend(MBR_bboxes([bboxes_all[bboxes_boundary7],bboxes_all[bboxes_boundary8]]))
                        classes_all.append(class_with_largest_score([bboxes_all[bboxes_boundary8],bboxes_all[bboxes_boundary7]],[scores_all[bboxes_boundary8],scores_all[bboxes_boundary7]],[classes_all[bboxes_boundary8],classes_all[bboxes_boundary7]]))
                        scores_all.extend([weighted_average_score([bboxes_all[bboxes_boundary8],bboxes_all[bboxes_boundary7]],[scores_all[bboxes_boundary8],scores_all[bboxes_boundary7]])])
                        bboxes_to_delete.extend([bboxes_boundary7,bboxes_boundary8])

        
        # if the object crosses 3 overlapped areas (34 56 78)
        if bboxes_boundary3!=None and bboxes_boundary4!=None and bboxes_boundary5!=None and bboxes_boundary6!=None and bboxes_boundary7!=None and bboxes_boundary8!=None and (bboxes_boundary3==bboxes_boundary6) and (bboxes_boundary5==bboxes_boundary8):
                bboxes_all.extend(MBR_bboxes([bboxes_all[bboxes_boundary4],bboxes_all[bboxes_boundary3],bboxes_all[bboxes_boundary5],bboxes_all[bboxes_boundary7]]))
                classes_all.append(class_with_largest_score([bboxes_all[bboxes_boundary4],bboxes_all[bboxes_boundary3],bboxes_all[bboxes_boundary5],bboxes_all[bboxes_boundary7]],[scores_all[bboxes_boundary4],scores_all[bboxes_boundary3],scores_all[bboxes_boundary5],scores_all[bboxes_boundary7]],[classes_all[bboxes_boundary4],classes_all[bboxes_boundary3],classes_all[bboxes_boundary5],classes_all[bboxes_boundary7]]))
                scores_all.extend([weighted_average_score([bboxes_all[bboxes_boundary4],bboxes_all[bboxes_boundary3],bboxes_all[bboxes_boundary5],bboxes_all[bboxes_boundary7]],[scores_all[bboxes_boundary4],scores_all[bboxes_boundary3],scores_all[bboxes_boundary5],scores_all[bboxes_boundary7]])])
                bboxes_to_delete.extend([bboxes_boundary3,bboxes_boundary4,bboxes_boundary5,bboxes_boundary6,bboxes_boundary7,bboxes_boundary8])

                # if another object crosses the remaining overlapped area (12)
                if bboxes_boundary1!=None and bboxes_boundary2!=None:
                        bboxes_all.extend(MBR_bboxes([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1]]))
                        classes_all.append(class_with_largest_score([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1]],[scores_all[bboxes_boundary2],scores_all[bboxes_boundary1]],[classes_all[bboxes_boundary2],classes_all[bboxes_boundary1]]))
                        scores_all.extend([weighted_average_score([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1]],[scores_all[bboxes_boundary2],scores_all[bboxes_boundary1]])])
                        bboxes_to_delete.extend([bboxes_boundary1,bboxes_boundary2])

        else:
            # if the object crosses 2 overlapped areas (12 34)
            if bboxes_boundary1!=None and bboxes_boundary2!=None and bboxes_boundary3!=None and bboxes_boundary4!=None and (bboxes_boundary1==bboxes_boundary4):
                    bboxes_all.extend(MBR_bboxes([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1],bboxes_all[bboxes_boundary3]]))
                    classes_all.append(class_with_largest_score([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1],bboxes_all[bboxes_boundary3]],[scores_all[bboxes_boundary2],scores_all[bboxes_boundary1],scores_all[bboxes_boundary3]],[classes_all[bboxes_boundary2],classes_all[bboxes_boundary1],classes_all[bboxes_boundary3]]))
                    scores_all.extend([weighted_average_score([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1],bboxes_all[bboxes_boundary3]],[scores_all[bboxes_boundary2],scores_all[bboxes_boundary1],scores_all[bboxes_boundary3]])])
                    bboxes_to_delete.extend([bboxes_boundary1,bboxes_boundary2,bboxes_boundary3,bboxes_boundary4])

                    # if another object crosses the remaining overlapped area (56)
                    if bboxes_boundary5!=None and bboxes_boundary6!=None:
                            bboxes_all.extend(MBR_bboxes([bboxes_all[bboxes_boundary5],bboxes_all[bboxes_boundary6]]))
                            classes_all.append(class_with_largest_score([bboxes_all[bboxes_boundary6],bboxes_all[bboxes_boundary5]],[scores_all[bboxes_boundary6],scores_all[bboxes_boundary5]],[classes_all[bboxes_boundary6],classes_all[bboxes_boundary5]]))
                            scores_all.extend([weighted_average_score([bboxes_all[bboxes_boundary6],bboxes_all[bboxes_boundary5]],[scores_all[bboxes_boundary6],scores_all[bboxes_boundary5]])])
                            bboxes_to_delete.extend([bboxes_boundary5,bboxes_boundary6])

                    # if another object crosses the remaining overlapped area (78)
                    if bboxes_boundary7!=None and bboxes_boundary8!=None:
                            bboxes_all.extend(MBR_bboxes([bboxes_all[bboxes_boundary7],bboxes_all[bboxes_boundary8]]))
                            classes_all.append(class_with_largest_score([bboxes_all[bboxes_boundary8],bboxes_all[bboxes_boundary7]],[scores_all[bboxes_boundary8],scores_all[bboxes_boundary7]],[classes_all[bboxes_boundary8],classes_all[bboxes_boundary7]]))
                            scores_all.extend([weighted_average_score([bboxes_all[bboxes_boundary8],bboxes_all[bboxes_boundary7]],[scores_all[bboxes_boundary8],scores_all[bboxes_boundary7]])])
                            bboxes_to_delete.extend([bboxes_boundary7,bboxes_boundary8])

            # if the object crosses 2 overlapped areas (34 56)
            if bboxes_boundary3!=None and bboxes_boundary4!=None and bboxes_boundary5!=None and bboxes_boundary6!=None and (bboxes_boundary3==bboxes_boundary6):
                    bboxes_all.extend(MBR_bboxes([bboxes_all[bboxes_boundary4],bboxes_all[bboxes_boundary3],bboxes_all[bboxes_boundary5]]))
                    classes_all.append(class_with_largest_score([bboxes_all[bboxes_boundary4],bboxes_all[bboxes_boundary3],bboxes_all[bboxes_boundary5]],[scores_all[bboxes_boundary4],scores_all[bboxes_boundary3],scores_all[bboxes_boundary5]],[classes_all[bboxes_boundary4],classes_all[bboxes_boundary3],classes_all[bboxes_boundary5]]))
                    scores_all.extend([weighted_average_score([bboxes_all[bboxes_boundary4],bboxes_all[bboxes_boundary3],bboxes_all[bboxes_boundary5]],[scores_all[bboxes_boundary4],scores_all[bboxes_boundary3],scores_all[bboxes_boundary5]])])
                    bboxes_to_delete.extend([bboxes_boundary3,bboxes_boundary4,bboxes_boundary5,bboxes_boundary6])

                    # if another object crosses the remaining overlapped area (12)
                    if bboxes_boundary1!=None and bboxes_boundary2!=None:
                            bboxes_all.extend(MBR_bboxes([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1]]))
                            classes_all.append(class_with_largest_score([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1]],[scores_all[bboxes_boundary2],scores_all[bboxes_boundary1]],[classes_all[bboxes_boundary2],classes_all[bboxes_boundary1]]))
                            scores_all.extend([weighted_average_score([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1]],[scores_all[bboxes_boundary2],scores_all[bboxes_boundary1]])])
                            bboxes_to_delete.extend([bboxes_boundary1,bboxes_boundary2])

                    # if another object crosses the remaining overlapped area (78)
                    if bboxes_boundary7!=None and bboxes_boundary8!=None:
                            bboxes_all.extend(MBR_bboxes([bboxes_all[bboxes_boundary7],bboxes_all[bboxes_boundary8]]))
                            classes_all.append(class_with_largest_score([bboxes_all[bboxes_boundary8],bboxes_all[bboxes_boundary7]],[scores_all[bboxes_boundary8],scores_all[bboxes_boundary7]],[classes_all[bboxes_boundary8],classes_all[bboxes_boundary7]]))
                            scores_all.extend([weighted_average_score([bboxes_all[bboxes_boundary8],bboxes_all[bboxes_boundary7]],[scores_all[bboxes_boundary8],scores_all[bboxes_boundary7]])])
                            bboxes_to_delete.extend([bboxes_boundary7,bboxes_boundary8])


            # if the object crosses 2 overlapped areas (56 78)
            if bboxes_boundary5!=None and bboxes_boundary6!=None and bboxes_boundary7!=None and bboxes_boundary8!=None and (bboxes_boundary5==bboxes_boundary8):
                    bboxes_all.extend(MBR_bboxes([bboxes_all[bboxes_boundary6],bboxes_all[bboxes_boundary5],bboxes_all[bboxes_boundary7]]))
                    classes_all.append(class_with_largest_score([bboxes_all[bboxes_boundary6],bboxes_all[bboxes_boundary5],bboxes_all[bboxes_boundary7]],[scores_all[bboxes_boundary6],scores_all[bboxes_boundary5],scores_all[bboxes_boundary7]],[classes_all[bboxes_boundary6],classes_all[bboxes_boundary5],classes_all[bboxes_boundary7]]))
                    scores_all.extend([weighted_average_score([bboxes_all[bboxes_boundary6],bboxes_all[bboxes_boundary5],bboxes_all[bboxes_boundary7]],[scores_all[bboxes_boundary6],scores_all[bboxes_boundary5],scores_all[bboxes_boundary7]])])
                    bboxes_to_delete.extend([bboxes_boundary5,bboxes_boundary6,bboxes_boundary7,bboxes_boundary8])

                    # if another object crosses the remaining overlapped area (12)
                    if bboxes_boundary1!=None and bboxes_boundary2!=None:
                            bboxes_all.extend(MBR_bboxes([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1]]))
                            classes_all.append(class_with_largest_score([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1]],[scores_all[bboxes_boundary2],scores_all[bboxes_boundary1]],[classes_all[bboxes_boundary2],classes_all[bboxes_boundary1]]))
                            scores_all.extend([weighted_average_score([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1]],[scores_all[bboxes_boundary2],scores_all[bboxes_boundary1]])])
                            bboxes_to_delete.extend([bboxes_boundary1,bboxes_boundary2])

                    # if another object crosses the remaining overlapped area (34)
                    if bboxes_boundary3!=None and bboxes_boundary4!=None:
                            bboxes_all.extend(MBR_bboxes([bboxes_all[bboxes_boundary3],bboxes_all[bboxes_boundary4]]))
                            classes_all.append(class_with_largest_score([bboxes_all[bboxes_boundary4],bboxes_all[bboxes_boundary3]],[scores_all[bboxes_boundary4],scores_all[bboxes_boundary3]],[classes_all[bboxes_boundary4],classes_all[bboxes_boundary3]]))
                            scores_all.extend([weighted_average_score([bboxes_all[bboxes_boundary4],bboxes_all[bboxes_boundary3]],[scores_all[bboxes_boundary4],scores_all[bboxes_boundary3]])])
                            bboxes_to_delete.extend([bboxes_boundary3,bboxes_boundary4])
                            
            else:
                # if the object crosses 1 overlapped area (12)
                if bboxes_boundary1!=None and bboxes_boundary2!=None:
                        bboxes_all.extend(MBR_bboxes([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1]]))
                        classes_all.append(class_with_largest_score([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1]],[scores_all[bboxes_boundary2],scores_all[bboxes_boundary1]],[classes_all[bboxes_boundary2],classes_all[bboxes_boundary1]]))
                        scores_all.extend([weighted_average_score([bboxes_all[bboxes_boundary2],bboxes_all[bboxes_boundary1]],[scores_all[bboxes_boundary2],scores_all[bboxes_boundary1]])])
                        bboxes_to_delete.extend([bboxes_boundary1,bboxes_boundary2])

                # if the object crosses 1 overlapped area (34)
                if bboxes_boundary3!=None and bboxes_boundary4!=None:
                        bboxes_all.extend(MBR_bboxes([bboxes_all[bboxes_boundary3],bboxes_all[bboxes_boundary4]]))
                        classes_all.append(class_with_largest_score([bboxes_all[bboxes_boundary4],bboxes_all[bboxes_boundary3]],[scores_all[bboxes_boundary4],scores_all[bboxes_boundary3]],[classes_all[bboxes_boundary4],classes_all[bboxes_boundary3]]))
                        scores_all.extend([weighted_average_score([bboxes_all[bboxes_boundary4],bboxes_all[bboxes_boundary3]],[scores_all[bboxes_boundary4],scores_all[bboxes_boundary3]])])
                        bboxes_to_delete.extend([bboxes_boundary3,bboxes_boundary4])

                # if the object crosses 1 overlapped area (56)
                if bboxes_boundary5!=None and bboxes_boundary6!=None:
                        bboxes_all.extend(MBR_bboxes([bboxes_all[bboxes_boundary5],bboxes_all[bboxes_boundary6]]))
                        classes_all.append(class_with_largest_score([bboxes_all[bboxes_boundary6],bboxes_all[bboxes_boundary5]],[scores_all[bboxes_boundary6],scores_all[bboxes_boundary5]],[classes_all[bboxes_boundary6],classes_all[bboxes_boundary5]]))
                        scores_all.extend([weighted_average_score([bboxes_all[bboxes_boundary6],bboxes_all[bboxes_boundary5]],[scores_all[bboxes_boundary6],scores_all[bboxes_boundary5]])])
                        bboxes_to_delete.extend([bboxes_boundary5,bboxes_boundary6])

                # if the object crosses 1 overlapped area (78)
                if bboxes_boundary7!=None and bboxes_boundary8!=None:
                        bboxes_all.extend(MBR_bboxes([bboxes_all[bboxes_boundary7],bboxes_all[bboxes_boundary8]]))
                        classes_all.append(class_with_largest_score([bboxes_all[bboxes_boundary8],bboxes_all[bboxes_boundary7]],[scores_all[bboxes_boundary8],scores_all[bboxes_boundary7]],[classes_all[bboxes_boundary8],classes_all[bboxes_boundary7]]))
                        scores_all.extend([weighted_average_score([bboxes_all[bboxes_boundary8],bboxes_all[bboxes_boundary7]],[scores_all[bboxes_boundary8],scores_all[bboxes_boundary7]])])
                        bboxes_to_delete.extend([bboxes_boundary7,bboxes_boundary8])

    # delete the boxes that have been merged from the lists
    bboxes_to_delete=list(set(bboxes_to_delete))
    bboxes_to_delete.sort(reverse=True)
    for i in bboxes_to_delete:
        bboxes_all.pop(i)
        classes_all.pop(i)
        scores_all.pop(i)

    return bboxes_all, classes_all, scores_all
    

# function used to calculate the weighted average score of several bboxes
def weighted_average_score(bboxes,scores):
    sum=0
    sum_area=0
    for bbox,score in zip(bboxes,scores):
        area=(bbox[3]-bbox[1])*(bbox[2]-bbox[0])
        sum+=score*area
        sum_area+=area
    return np.float32(sum/sum_area)

# function used to choose the class with the largest weighted score as the class of the new merged bbox
def class_with_largest_score(bboxes,scores,classes):
    sum_area=0
    score_multi_area=[]
    for bbox,score in zip(bboxes,scores):
        area=(bbox[3]-bbox[1])*(bbox[2]-bbox[0])
        score_multi_area.append(area*score)
        sum_area+=area
    weighted_score = [i / sum_area for i in score_multi_area]
    return classes[weighted_score.index(max(weighted_score))]

# function used to calculate the MBR of several connected bboxes
def MBR_bboxes(bboxes):
    xs=[]
    ys=[]
    for bbox in bboxes:
        xs.append(bbox[0])
        xs.append(bbox[2])
        ys.append(bbox[1])
        ys.append(bbox[3])
    return [[min(xs),min(ys),max(xs),max(ys)]]

#### 2.5. Other Functions Used for Improving Object Detection </b>

- <b>filter_classes():</b> The pre-trained YOLO and Faster RCNN output the detection results of all the categories in COCO, but we only need part of them. This function is used to filter out the bounding boxes of the categories we need;

- <b>project_class():</b> The output classes are in the form like [0,1,2,3,5,7,9], but in the annotated dataset, the objects' classes are labeled as [0,1,2,3,4,5,6], so this function is used to project the class_id when doing evaluations;

  <b>⚠️Attention:</b> If the categories to detect are changed, values in this function should also be changed.

- <b>xyxy2xcycwh():</b> A function used to transform the output from [x1,y1,x2,y2] format to [x_centre, y_centre, width, height].

In [7]:
# function used to filter the bboxes according to the classes we need
def filter_classes(bboxes_all, classes_all, scores_all, class_needed):
    bboxes_all = bboxes_all.tolist()
    classes_all = classes_all.tolist()
    scores_all = scores_all.tolist()
    # remove the bboxes which are not belong to the needed classes from the lists
    for i in range(len(classes_all), 0, -1):
        if classes_all[i - 1] not in class_needed:
            bboxes_all.pop(i - 1)
            classes_all.pop(i - 1)
            scores_all.pop(i - 1)
    return bboxes_all, classes_all, scores_all

# function used to project the class id from [0,1,2,3,5,7,9] to [0,6] to match our annotations
def project_class(classes):
    for index,class1 in enumerate(classes):
        if class1==5:
            classes[index]=4
        elif class1==7:
            classes[index]=5
        elif class1==9:
            classes[index]=6
    return classes

# A function used to transform the output from [x1,y1,x2,y2] format to [x_centre, y_centre, width, height].
def xyxy2xcycwh(bboxes):
    bboxes_new = []
    for bbox in bboxes:
        bboxes_new.append(
            [
                (bbox[0] + bbox[2]) / 2,
                (bbox[1] + bbox[3]) / 2,
                (bbox[2] - bbox[0]),
                (bbox[3] - bbox[1]),
            ]
        )
    return bboxes_new


### 3. Define the Process of the Improved Object Detection on One Frame



Using the functions above, the process of the improved object detection on one image frame is defined as a function called predict_one_frame().

<b>Parameters of predict_one_frame():</b>

- <b>FOV:</b> Field of view of the sub images;

- <b>THETAs:</b> A list which contains the theta of each sub image (The length should be the same as the number of sub images);

- <b>PHIs:</b> A list which contains the Phi of each sub image (The length should be the same as the number of sub images);

- <b>im:</b> The image on which to do the object detection;

- <b>predictor:</b> A YOLO v5 or Faster RCNN object detection model;

- <b>video_width, video_height:</b> Height and width of the input image frame;

- <b>sub_image_width:</b> Width (or height) of the sub images;

- <b>classes_to_detect:</b> Index numbers in COCO of the classes we need to detect, [0, 1, 2, 3, 5, 7, 9] by default;

- <b>is_project_class:</b> A boolean value which determines whether to project the original class_ids of the outputs according to our annotations using project_class();

- <b>use_mymodel:</b> A boolean value which determines whether to use the improved object detection model, if False, instead of being split into 4 parts, the image will be detected as a whole;

- <b>model:</b> Name of the model to use, which should be either "Faster RCNN" or "YOLO";

- <b>is_split_image2:</b> A boolean value which determines whether to split the bboxes across the center line of sub image 2 into two when reprojecting the bboxes back to the original image using reproject_bboxes().


In [8]:
import time
from detectron2.structures.instances import Instances
from detectron2.structures.boxes import Boxes
from detectron2.layers import batched_nms
import torch
import torchvision


# function used to do object detection on one image frame
def predict_one_frame(
    FOV,
    THETAs,
    PHIs,
    im,
    predictor,
    video_width,
    video_height,
    sub_image_width,
    classes_to_detect=[0, 1, 2, 3, 5, 7, 9],
    is_project_class=False,
    use_mymodel=True,
    model="Faster RCNN",
    split_image2=True,
):

    # for checking the processing speed, record the current time first
    time1 = time.time()
    
    # if the user chooses to use the improved object detection model
    if use_mymodel:
        # split the frame into 4 sub images (of perspective projection) and get the maps and the output images
        lon_maps, lat_maps, subimgs = equir2pers(
            im, FOV, THETAs, PHIs, sub_image_width, sub_image_width
        )

        # lists for storing the detection results from all the sub images
        bboxes_all = []
        classes_all = []
        scores_all = []

        # list for storing the index of the bounding boxes which intersect with the boundaries of the sub images
        bboxes_boundary = [None] * 8

        # if a Faster RCNN model is being used
        if model == "Faster RCNN":
            # for each sub image
            for i in range(len(subimgs)):
                # get the detection results with the predictor
                outputs1 = predictor(subimgs[i])

                # --------  if you want to save and check the detail of the results on each sub image, run the code below  ----------
                # v1 = Visualizer(
                #     subimgs[i][:, :, ::-1],
                #     MetadataCatalog.get(cfg.DATASETS.TRAIN[0]),
                #     scale=1.0,
                # )
                # im1 = v1.draw_instance_predictions(outputs1["instances"].to("cpu"))
                # cv2.imwrite(
                #     "./outtest/subdetect" + str(i) + ".png", im1.get_image()[:, :, ::-1]
                # )
                # --------  end of this part  ----------

                # get the bboxes, classes and scores of the instances detected
                bboxes = outputs1["instances"].pred_boxes.tensor.cpu().numpy()
                classes = outputs1["instances"].pred_classes.cpu().numpy()
                scores = outputs1["instances"].scores.cpu().numpy()

                # do NMS on the bboxes despite the category
                # keep_boxes is a list which stores the index of the bboxes to keep after NMS
                keep_boxes = torchvision.ops.nms(
                    torch.tensor(bboxes), torch.tensor(scores), 0.45
                )

                # for each bbox in the current sub image, reproject it to the original image
                (
                    reprojected_bboxes,
                    classes,
                    scores,
                    left_boundary_box,
                    right_boundary_box,
                ) = reproject_bboxes(
                    torch.tensor(bboxes)[keep_boxes],
                    lon_maps[i],
                    lat_maps[i],
                    torch.tensor(classes)[keep_boxes],
                    torch.tensor(scores)[keep_boxes],
                    10,
                    i,
                    video_width,
                    video_height,
                    len(subimgs),
                    sub_image_width / 640 * 20,
                    split_image2,
                )

                # get the index of the bboxes which intersect the boundaries of the sub images
                if left_boundary_box != None:
                    bboxes_boundary[
                        number_of_left_and_right_boundary(i)[0]
                    ] = left_boundary_box + len(bboxes_all)
                if right_boundary_box != None:
                    bboxes_boundary[
                        number_of_left_and_right_boundary(i)[1]
                    ] = right_boundary_box + len(bboxes_all)

                # add the bboxes after reprojection to the lists which contain bboxes from all the sub images
                bboxes_all = bboxes_all + reprojected_bboxes
                classes_all = classes_all + classes
                scores_all = scores_all + scores

        # if a YOLO model is being used
        elif model == "YOLO":

            # for each sub image, first change the color from BGR to RGB
            for i in range(len(subimgs)):
                subimgs[i] = cv2.cvtColor(subimgs[i], cv2.COLOR_BGR2RGB)

            # YOLO supports detecting several images at the same time, so input all the sub images at once to the predictor
            results = predictor(subimgs, size=sub_image_width)  # includes NMS
            
            # --------  if you want to save and check the detail of the results on each sub image, run the code below  ----------
            # results.save()
            # --------  end of this part  ----------
            
            # for each sub image
            for i in range(len(subimgs)):
                # Originally, YOLO outputs the positions using the relative coordinates [0-1], so transform the output format by multiplying by the width/height of the sub image
                bboxes = (
                    results.xyxyn[i].cpu().numpy()[:, 0:4]
                    * [
                        sub_image_width,
                        sub_image_width,
                        sub_image_width,
                        sub_image_width,
                    ]
                ).tolist()
                classes = list(map(int, results.xyxyn[i].cpu().numpy()[:, 5].tolist()))
                scores = results.xyxyn[i].cpu().numpy()[:, 4].tolist()

                # for each bbox in the current sub image, reproject it to the original image
                (
                    reprojected_bboxes,
                    classes,
                    scores,
                    left_boundary_box,
                    right_boundary_box,
                ) = reproject_bboxes(
                    bboxes,
                    lon_maps[i],
                    lat_maps[i],
                    classes,
                    scores,
                    10,
                    i,
                    video_width,
                    video_height,
                    len(subimgs),
                    sub_image_width / 640 * 20,
                    split_image2,
                )

                # get the index of the bboxes which intersect the boundaries of the sub images
                if left_boundary_box != None:
                    bboxes_boundary[
                        number_of_left_and_right_boundary(i)[0]
                    ] = left_boundary_box + len(bboxes_all)
                if right_boundary_box != None:
                    bboxes_boundary[
                        number_of_left_and_right_boundary(i)[1]
                    ] = right_boundary_box + len(bboxes_all)

                # add the bboxes after reprojection to the lists which contain bboxes from all the sub images
                bboxes_all = bboxes_all + reprojected_bboxes
                classes_all = classes_all + classes
                scores_all = scores_all + scores

        # merge the boxes which goes across the boundaries with merge_bbox_across_boundary()
        bboxes_all, classes_all, scores_all = merge_bbox_across_boundary(
            bboxes_all,
            classes_all,
            scores_all,
            video_width,
            video_height,
            bboxes_boundary,
        )

        # do NMS on the output bboxes again to get the index of the boxes which should be kept
        keep = batched_nms(
            torch.tensor(bboxes_all),
            torch.tensor(scores_all),
            torch.tensor(classes_all),
            0.3,
        )

        # only keep the instances of the classes we need (person, bike, car, motorbike, bus, truck, traffic light by default)
        bboxes_all, classes_all, scores_all = filter_classes(
            torch.tensor(bboxes_all)[keep],
            torch.tensor(classes_all)[keep],
            torch.tensor(scores_all)[keep],
            classes_to_detect,
        )

        # if needed, project the class into [0,6] (to match with the annotations in our dataset)
        if is_project_class == True:
            classes_all = project_class(classes_all)

    # if the user chooses to use the original object detection model
    else:
        # if a Faster RCNN model is being used
        if model == "Faster RCNN":
            # get the outputs and do NMS on them
            outputs1 = predictor(im)
            bboxes_all = outputs1["instances"].pred_boxes.tensor.cpu().numpy()
            classes_all = outputs1["instances"].pred_classes.cpu().numpy()
            scores_all = outputs1["instances"].scores.cpu().numpy()
            keep_boxes = torchvision.ops.nms(
                torch.tensor(bboxes_all), torch.tensor(scores_all), 0.45
            )
            bboxes_all = (
                outputs1["instances"].pred_boxes.tensor.cpu().numpy()[keep_boxes]
            )
            classes_all = outputs1["instances"].pred_classes.cpu().numpy()[keep_boxes]
            scores_all = outputs1["instances"].scores.cpu().numpy()[keep_boxes]
        
        # if a YOLO model is being used
        elif model == "YOLO":
            # change the color from BGR to RGB
            im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
            # get the outputs
            results = predictor(im, size=sub_image_width)  # NMS included
            bboxes_all = (
                results.xyxyn[0].cpu().numpy()[:, 0:4]
                * [video_width, video_height, video_width, video_height]
            ).tolist()
            classes_all = list(map(int, results.xyxyn[0].cpu().numpy()[:, 5].tolist()))
            scores_all = results.xyxyn[0].cpu().numpy()[:, 4].tolist()

        # only keep the instances of the classes we need (person, bike, car, motorbike, bus, truck, traffic light)
        bboxes_all, classes_all, scores_all = filter_classes(
            torch.tensor(bboxes_all),
            torch.tensor(classes_all),
            torch.tensor(scores_all),
            classes_to_detect,
        )

        # if needed, project the class into [0,6] (to match with the annotations in our dataset)
        if is_project_class == True:
            classes_all = project_class(classes_all)

    # record the current time again and calculate the running time
    time2 = time.time()
    # print(time2 - time1)
    
    return bboxes_all, classes_all, scores_all


### 4. Improved Object Detection on a Panoramic Video



To realize the improved object detection on each frame of a video with the functions above, a function called Object_Detection() is defined as below.

<b>Parameters of Object_Detection():</b>

- <b>input_video_path:</b> Path of the input video;

- <b>output_video_path:</b> Path of the output video;

- <b>classes_to_detect:</b> Index numbers of the categories to detect in the COCO dataset, [0, 1, 2, 3, 5, 7, 9] by default;

- <b>FOV:</b> Field of view of the sub images, 120 by default;

- <b>THETAs:</b> A list which contains the theta of each sub image (The length should be the same as the number of sub images),[0, 90, 180, 270] by default;

- <b>PHIs:</b> A list which contains the Phi of each sub image (The length should be the same as the number of sub images), [-10, -10, -10, -10] by default;

- <b>sub_image_width:</b> Width (or height) of the sub images, 640 by default;

- <b>model_type:</b> A string that determines which detector to use ("YOLO" or "Faster RCNN"), "YOLO" by default;

- <b>score_threshold:</b> The threshold of the confidence score, 0.4 by default;

- <b>nms_threshold:</b> The threshold of the Non Maximum Suppression, 0.45 by default;

- <b>use_mymodel:</b> A boolean value which determines whether to use the improved object detection model, if False, instead of being split into 4 parts, the image will be detected as a whole, True by default.

In [10]:
import sys
import time
import torch

# function used to realize object detection on a panoramic video
def Object_Detection(input_video_path,output_video_path,classes_to_detect=[0, 1, 2, 3, 5, 7, 9],FOV=120,THETAs=[0, 90, 180, 270],PHIs=[-10, -10, -10, -10],sub_image_width=640,model_type="YOLO",score_threshold=0.4,nms_threshold=0.45,use_mymodel=True):
        
    # load the pretrained detection model
    model,cfg=load_model(model_type,sub_image_width,score_threshold,nms_threshold)

    # read the input panoramic video (of equirectangular projection)
    video_capture = cv2.VideoCapture(input_video_path)

    # if the input path is not right, warn the user
    if (video_capture.isOpened()==False):
        print('Can not open the video file.')
    # if right, get some info about the video (width, height, frame count and fps)
    else:
        video_width = int(video_capture.get(cv2.CAP_PROP_FRAME_WIDTH))
        video_height = int(video_capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
        video_frame_count = int(video_capture.get(cv2.CAP_PROP_FRAME_COUNT))
        video_fps = int(round(video_capture.get(cv2.CAP_PROP_FPS)))
        # fourcc = cv2.VideoWriter_fourcc(*'MJPG')
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
        outputfile = cv2.VideoWriter(output_video_path, fourcc, video_fps, (video_width, video_height))
    
    # output the video info
    print("The input video is "+str(video_width)+' in width and '+str(video_height)+" in height.")

    # the number of current frame
    num_of_frame=1

    # for each image frame in the video
    while video_capture.grab():

        time1=time.time()

        # get the next image frame
        _,im= video_capture.retrieve ()
        
        
        # get the predictions on the current frame
        bboxes_all, classes_all, scores_all = predict_one_frame(
            FOV,
            THETAs,
            PHIs,
            im,
            model,
            video_width,
            video_height,
            sub_image_width,
            classes_to_detect,
            False,
            use_mymodel,
            model_type,
            True
        )
        
        # create an instance of detectron2 so that the output can be visualized
        output_new = Instances(
            image_size=[video_width, video_height],
            pred_boxes=Boxes(torch.tensor(bboxes_all)),
            scores=torch.tensor(scores_all),
            pred_classes=torch.tensor(classes_all),
        )
        
        # show the current FPS every 5 frames
        time2=time.time()
        if num_of_frame%5==0:
            print(num_of_frame,'/',video_frame_count)
            print(str(1/(time2-time1))+' fps')
            
        num_of_frame+=1

        # use `Visualizer` to draw the predictions on the image
        v = Visualizer(
            im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.0
        )
        im = v.draw_instance_predictions(output_new.to("cpu"))
        outputfile.write(im.get_image()[:, :, ::-1])

    # release the input and output videos
    video_capture.release()
    outputfile.release()    


An example of how to use the function:

In [12]:
Object_Detection('test.mp4','test_object_detection.mp4')

[Checkpointer] Loading from https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl ...
Reading a file from 'Detectron2 Model Zoo'
Using cache found in /Users/guojingwei/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2022-9-20 Python-3.9.13 torch-1.12.1 CPU

Fusing layers... 
YOLOv5m6 summary: 378 layers, 35704908 parameters, 0 gradients
Adding AutoShape... 


The input video is 5376 in width and 2688 in height.
5 / 55
1.0971085151055975 fps
10 / 55
1.1391894035360657 fps
15 / 55
1.1618261675035788 fps
20 / 55
1.1436455381152102 fps
25 / 55
1.125331786494368 fps


### 5. Object Tracking on a Panoramic Video using the Improved Object Detection Models



To realize object tracking on a panoramic video using the improved object detection model as its detector, a function called Object_Tracking() is defined as below.

<b>Parameters of Object_Tracking():</b>

- <b>input_video_path:</b> Path of the input video;

- <b>output_video_path:</b> Path of the output video;

- <b>MOT_text_path:</b> Path of the output txt file which stores all the MOT tracking results;

- <b>prevent_different_classes_match:</b> A boolean value which determines whether to use the support for multiple categories in DeepSORT, True by default;

- <b>match_across_boundary:</b> A boolean value which determines whether to use the support for boundary continuity in DeepSORT, True by default;

- <b>classes_to_detect:</b> Index numbers of the categories to detect in the COCO dataset, [0, 1, 2, 3, 5, 7, 9] by default;

- <b>FOV:</b> Field of view of the sub images, 120 by default;

- <b>THETAs:</b> A list which contains the theta of each sub image (The length should be the same as the number of sub images),[0, 90, 180, 270] by default;

- <b>PHIs:</b> A list which contains the Phi of each sub image (The length should be the same as the number of sub images), [-10, -10, -10, -10] by default;

- <b>sub_image_width:</b> Width (or height) of the sub images, 640 by default;

- <b>model_type:</b> A string that determines which detector to use ("YOLO" or "Faster RCNN"), "YOLO" by default;

- <b>score_threshold:</b> The threshold of the confidence score, 0.4 by default;

- <b>nms_threshold:</b> The threshold of the Non Maximum Suppression, 0.45 by default;

- <b>use_mymodel:</b> A boolean value which determines whether to use the improved object detection model, if False, instead of being split into 4 parts, the image will be detected as a whole, True by default.

In [13]:
from deep_sort.deep_sort import DeepSort
from panoramic_detection.draw_output import draw_boxes

# function used to realize object tracking on a panoramic video
def Object_Tracking(input_video_path,output_video_path,MOT_text_path, prevent_different_classes_match=True,
        match_across_boundary=True,classes_to_detect=[0, 1, 2, 3, 5, 7, 9],FOV=120,THETAs=[0, 90, 180, 270],PHIs=[-10, -10, -10, -10],sub_image_width=640,model_type="YOLO",score_threshold=0.4,nms_threshold=0.45,use_mymodel=True):
        
    # load the pretrained detection model
    model,cfg=load_model(model_type,sub_image_width,score_threshold,nms_threshold)

    # read the input panoramic video (of equirectangular projection)
    video_capture = cv2.VideoCapture(input_video_path)

    # if the input path is not right, warn the user
    if (video_capture.isOpened()==False):
        print('Can not open the video file.')
    # if right, get some info about the video (width, height, frame count and fps)
    else:
        video_width = int(video_capture.get(cv2.CAP_PROP_FRAME_WIDTH))
        video_height = int(video_capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
        video_frame_count = int(video_capture.get(cv2.CAP_PROP_FRAME_COUNT))
        video_fps = int(round(video_capture.get(cv2.CAP_PROP_FPS)))
        # fourcc = cv2.VideoWriter_fourcc(*'MJPG')
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
        outputfile = cv2.VideoWriter(output_video_path, fourcc, video_fps, (video_width, video_height))
    
    # output the video info
    print("The input video is "+str(video_width)+' in width and '+str(video_height)+" in height.")

    # create a deepsort instance with the pre-trained feature extraction model
    deepsort = DeepSort('./deep_sort/deep/checkpoint/ckpt.t7', use_cuda=torch.cuda.is_available())

    # the number of current frame
    num_of_frame=1

    with open(MOT_text_path,"w") as f: 
        # for each image frame in the video
        while video_capture.grab():

            time1=time.time()

            # get the next image frame
            _,im= video_capture.retrieve ()
            
            # get the predictions on the current frame
            bboxes_all, classes_all, scores_all = predict_one_frame(
                FOV,
                THETAs,
                PHIs,
                im,
                model,
                video_width,
                video_height,
                sub_image_width,
                classes_to_detect,
                False,
                use_mymodel,
                model_type,
                not match_across_boundary,
            )
            
            # convert the bboxes from [x,y,x,y] to [xc,yc,w,h]
            bboxes_all_xcycwh=xyxy2xcycwh(bboxes_all)
            
            # update deepsort and get the tracking results
            track_outputs = deepsort.update(np.array(bboxes_all_xcycwh),np.array(classes_all),  np.array(scores_all),im, prevent_different_classes_match, match_across_boundary)
            
            # plot the results on the video and save them as MOT texts
            if len(track_outputs) > 0:
                bbox_xyxy = track_outputs[:, :4]
                track_classes= track_outputs[:, 4]
                track_scores= track_outputs[:, 5]
                identities = track_outputs[:, -1]
                im = draw_boxes(im, bbox_xyxy, track_classes, track_scores, video_width,identities)
                for bb_xyxy,track_class,identity in zip(bbox_xyxy,track_classes,identities):
                        f.write(str(num_of_frame)+','+str(int(identity))+','+str(deepsort._xyxy_to_tlwh(bb_xyxy)).strip('(').strip(')').replace(' ','')+','+'-1,-1,-1,-1\n')
            outputfile.write(im)

            # show the current FPS
            time2=time.time()
            if num_of_frame%5==0:
                print(num_of_frame,'/',video_frame_count)
                print(str(1/(time2-time1))+' fps')
                
            num_of_frame+=1

    # release the input and output videos
    video_capture.release()
    outputfile.release()    

An example of how to use the function:

In [14]:
Object_Tracking('test.mp4','test_tracking.mp4','test_tracking.txt')

[Checkpointer] Loading from https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl ...
Reading a file from 'Detectron2 Model Zoo'
Using cache found in /Users/guojingwei/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2022-9-20 Python-3.9.13 torch-1.12.1 CPU

Fusing layers... 
YOLOv5m6 summary: 378 layers, 35704908 parameters, 0 gradients
Adding AutoShape... 
Loading weights from ./deep_sort/deep/checkpoint/ckpt.t7... Done!


The input video is 5376 in width and 2688 in height.
10 / 55
0.7008337044163192 fps
20 / 55
0.8350278551393394 fps


### 6. Overtaking Behaviour Detection on a Panoramic Video

To realize overtaking behaviour detection on a panoramic video using DeepSORT, a function called Overtaking_Detection() is defined as below.

<b>Parameters of Overtaking_Detection():</b>

- <b>input_video_path:</b> Path of the input video;

- <b>output_video_path:</b> Path of the output video;

- <b>mode:</b> A string that determines which kind of overtaking behaviour to detect, "Confirmed" or "Unconfirmed", "Confirmed" by default;

- <b>prevent_different_classes_match:</b> A boolean value which determines whether to use the support for multiple categories in DeepSORT, True by default;

- <b>match_across_boundary:</b> A boolean value which determines whether to use the support for boundary continuity in DeepSORT, True by default;

- <b>classes_to_detect:</b> Index numbers of the categories to detect in the COCO dataset, [0, 1, 2, 3, 5, 7, 9] by default;

- <b>classes_to_detect_movement:</b> Index numbers of the categories for movement detection in the COCO dataset, which should be a subset of classes_to_detect, [2,5,7] (i.e., car,bus and truck) by default;

- <b>size_thresholds:</b> A set of size thresholds which should share the same length with classes_to_detect_movement, if the size of a track of a certain class is larger than the corresponding threshold, then it is considered as close to the user, [500 * 500, 900 * 900, 600 * 600] by default;

- <b>FOV:</b> Field of view of the sub images, 120 by default;

- <b>THETAs:</b> A list which contains the theta of each sub image (The length should be the same as the number of sub images),[0, 90, 180, 270] by default;

- <b>PHIs:</b> A list which contains the Phi of each sub image (The length should be the same as the number of sub images), [-10, -10, -10, -10] by default;

- <b>sub_image_width:</b> Width (or height) of the sub images, 640 by default;

- <b>model_type:</b> A string that determines which detector to use ("YOLO" or "Faster RCNN"), "YOLO" by default;

- <b>score_threshold:</b> The threshold of the confidence score, 0.4 by default;

- <b>nms_threshold:</b> The threshold of the Non Maximum Suppression, 0.45 by default;

- <b>use_mymodel:</b> A boolean value which determines whether to use the improved object detection model, if False, instead of being split into 4 parts, the image will be detected as a whole, True by default.

In [28]:
from deep_sort.deep_sort import DeepSort
from panoramic_detection.draw_output import draw_boxes

# a function used to realize overtaking behaviour detection on a panoramic video
def Overtaking_Detection(input_video_path,output_video_path, mode='Confirmed',prevent_different_classes_match=True,
        match_across_boundary=True,classes_to_detect=[0, 1, 2, 3, 5, 7, 9],classes_to_detect_movement=[2,5,7],size_thresholds=[500 * 500, 900 * 900, 600 * 600],FOV=120,THETAs=[0, 90, 180, 270],PHIs=[-10, -10, -10, -10],sub_image_width=640,model_type="YOLO",score_threshold=0.4,nms_threshold=0.45,use_mymodel=True):
        
    # load the pretrained detection model
    model,cfg=load_model(model_type,sub_image_width,score_threshold,nms_threshold)

    # read the input panoramic video (of equirectangular projection)
    video_capture = cv2.VideoCapture(input_video_path)

    # if the input path is not right, warn the user
    if (video_capture.isOpened()==False):
        print('Can not open the video file.')
    # if right, get some info about the video (width, height, frame count and fps)
    else:
        video_width = int(video_capture.get(cv2.CAP_PROP_FRAME_WIDTH))
        video_height = int(video_capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
        video_frame_count = int(video_capture.get(cv2.CAP_PROP_FRAME_COUNT))
        video_fps = int(round(video_capture.get(cv2.CAP_PROP_FPS)))
        # fourcc = cv2.VideoWriter_fourcc(*'MJPG')
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
        outputfile = cv2.VideoWriter(output_video_path, fourcc, video_fps, (video_width, video_height))
    
    # output the video info
    print("The input video is "+str(video_width)+' in width and '+str(video_height)+" in height.")

    # create a deepsort instance with the pre-trained feature extraction model
    deepsort = DeepSort('./deep_sort/deep/checkpoint/ckpt.t7', use_cuda=torch.cuda.is_available())

    # the number of current frame
    num_of_frame=1

    # a dictionary which stores the history positions of each track
    history_track_positions={}

    # dics/lists to store the unconfirmed left/right overtaking, confirmed overtaking and their periods
    unconfirmed_left_overtaking={}
    unconfirmed_right_overtaking={}
    confirmed_overtaking=[]
    confirmed_overtaking_period=[]

    with open('results.txt',"w") as f: 
        
        # for each image frame in the video
        while video_capture.grab():

            time1=time.time()

            # get the next image frame
            _,im= video_capture.retrieve ()
            
            # get the predictions on the current frame
            bboxes_all, classes_all, scores_all = predict_one_frame(
                FOV,
                THETAs,
                PHIs,
                im,
                model,
                video_width,
                video_height,
                sub_image_width,
                classes_to_detect,
                False,
                use_mymodel,
                model_type,
                not match_across_boundary,
            )
            
            # convert the bboxes from [x,y,x,y] to [xc,yc,w,h]
            bboxes_all_xcycwh=xyxy2xcycwh(bboxes_all)
            
            # update deepsort and get the tracking results
            track_outputs = deepsort.update(np.array(bboxes_all_xcycwh),np.array(classes_all),  np.array(scores_all),im, prevent_different_classes_match, match_across_boundary)
            
            # two lists to store the objects that are moving forwards and backwards
            objects_moving_forwards=[]
            objects_moving_backwards=[]

            # if there are tracked objects in the current frame
            if len(track_outputs) > 0:
                bbox_xyxy = track_outputs[:, :4]
                track_classes= track_outputs[:, 4]
                track_scores= track_outputs[:, 5]
                identities = track_outputs[:, -1]
                
                # for each track
                for bb_xyxy,track_class,identity in zip(bbox_xyxy,track_classes,identities):
                    
                    # save the tracking results to the txt file
                    f.write(str(num_of_frame)+','+str(int(identity))+','+str(deepsort._xyxy_to_tlwh(bb_xyxy)).strip('(').strip(')').replace(' ','')+','+'-1,-1,-1,-1\n')
                    
                    # check whether the track is moving forwards or backwards
                    bb_xyxy_list=bb_xyxy.tolist()
                    
                    # if the track is doing an unconfirmed overtake from the left of the image
                    if int(identity) in unconfirmed_left_overtaking:
                        # if the rear of the track has passed the 90 degree, update the overtake to confirmed
                        if bb_xyxy_list[0]>=video_width/360*90:
                            confirmed_overtaking.append(int(identity))
                            confirmed_overtaking_period.append([unconfirmed_left_overtaking[int(identity)],num_of_frame])
                            unconfirmed_left_overtaking.pop(int(identity))
                        # if the front of the track has come back, delete the unconfirmed overtake
                        elif bb_xyxy_list[2]<video_width/360*90:
                            unconfirmed_left_overtaking.pop(int(identity))
                    
                    # if the track is doing an unconfirmed overtake from the right of the image
                    elif int(identity) in unconfirmed_right_overtaking:
                        # if the rear of the track has passed the 270 degree, update the overtake to confirmed
                        if bb_xyxy_list[2]<=video_width/360*270:
                            confirmed_overtaking.append(int(identity))
                            confirmed_overtaking_period.append([unconfirmed_right_overtaking[int(identity)],num_of_frame])
                            unconfirmed_right_overtaking.pop(int(identity))
                        # if the front of the track has come back, delete the unconfirmed overtake
                        elif bb_xyxy_list[0]>video_width/360*270:
                            unconfirmed_right_overtaking.pop(int(identity))
                    
                    # if the track is not doing an overtake and its class is on which we need to detect overtakes
                    if track_class in classes_to_detect_movement:
                        # add the current position of the track to a dictionary called history_track_positions
                        if int(identity) not in history_track_positions.keys():
                            history_track_positions[int(identity)]= [bb_xyxy_list]
                        else:
                            history_track_positions[int(identity)]+= [bb_xyxy_list]
                            # count how many times a track moves forwards and backwards in the last five frames
                            if len(history_track_positions[int(identity)])>=6:
                                forwards_num=0
                                backwards_num=0
                                for ii in range (-6,-1):
                                    if abs(video_width/2-xyxy2xcycwh(history_track_positions[int(identity)])[ii][0])>abs(video_width/2-xyxy2xcycwh(history_track_positions[int(identity)])[ii+1][0]):
                                        forwards_num+=1
                                    else: backwards_num+=1
                                # if in the last 5 frames, at least 3 frames moves towards the middle line of the image
                                if forwards_num>=3:
                                    # treat the object as it is moving forwards
                                    objects_moving_forwards.append(int(identity))
                                    # if in the last frame, the front of the track had not passed the 90/270 degree line, but now it has
                                    # give the track an unconfirmed overtaking behaviour
                                    if bb_xyxy_list[2]>=video_width/360*90 and history_track_positions[int(identity)][-2][2]<video_width/360*90:
                                        unconfirmed_left_overtaking[int(identity)]=num_of_frame
                                    elif bb_xyxy_list[0]<=video_width/360*270 and history_track_positions[int(identity)][-2][0]>video_width/360*270:
                                        unconfirmed_right_overtaking[int(identity)]=num_of_frame
                                # if in the last 5 frames, at least 3 frames moves away from the middle line of the image
                                elif backwards_num>=3:
                                    # treat the object as it is moving backwards
                                    objects_moving_backwards.append(int(identity))
            
                # if the function is used for unconfirmed overtaking behaviour detection, draw the tracks with the overtaking boxes
                if mode=='Unconfirmed':
                    im = draw_boxes(im, bbox_xyxy, track_classes, track_scores,video_width, identities,objects_moving_backwards,objects_moving_forwards,unconfirmed_left_overtaking,unconfirmed_right_overtaking,size_thresholds,True,classes_to_detect_movement)
                
                # if the function is used for confirmed overtaking behaviour detection, only draw the tracks
                elif mode=='Confirmed':
                    im = draw_boxes(im, bbox_xyxy, track_classes, track_scores,video_width, identities,objects_moving_backwards,objects_moving_forwards)

            # save the frame to the output file
            outputfile.write(im)

            # show the current FPS
            time2=time.time()
            if num_of_frame%5==0:
                print(num_of_frame,'/',video_frame_count)
                print(str(1/(time2-time1))+' fps')
                
            num_of_frame+=1

    # release the input and output videos
    video_capture.release()
    outputfile.release()
    
    # since the confirmed overtakes can only be detected after the whole behaviour has been finished
    # in the 'Confirmed' mode, draw the boxes for comfirmed overtakes after the process of detection
    if  mode=='Confirmed':

        print('Confirmed overtaking tracks:',confirmed_overtaking)
        print('Confirmed overtaking periods:', confirmed_overtaking_period)

        # copy and paste the output video with tracking results
        v_src = open(output_video_path,'rb')
        content = v_src.read()
        v_copy = open('copy.mp4','wb')
        v_copy.write(content)
        v_src.close()
        v_copy.close()

        video_capture = cv2.VideoCapture('copy.mp4')
        video_frame_count = int(video_capture.get(cv2.CAP_PROP_FRAME_COUNT))
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
        outputfile = cv2.VideoWriter(output_video_path, fourcc, video_fps, (video_width, video_height))
        num_of_frame = 1 
        
        tracking_results=[]

        # read the tracking results
        with open('results.txt',"r") as f:   
            data=f.readlines()
            for line in data:
                tracking_results.append(line)
        
        # for each image frame in the video
        while video_capture.grab():
            # get the next image frame
            _,im= video_capture.retrieve ()
            
            
            # when an track is between the start and end frames of an confirmed overtaking behaviour
            for i,q in zip(confirmed_overtaking_period,confirmed_overtaking):
                # color the bbox of the track with red
                if num_of_frame in range(i[0],i[1]):
                    for line in tracking_results:
                        contents=line.split(',')
                        if contents[0]==str(num_of_frame) and contents[1]==str(q):
                            red_area = np.zeros(im.shape, np.uint8)
                            cv2.rectangle(red_area, (int(float(contents[2])),int(float(contents[3]))), (int(float(contents[2]))+int(float(contents[4])),int(float(contents[3]))+int(float(contents[5]))), (0, 0, 255), -1)
                            im = cv2.addWeighted(im, 1.0, red_area, 0.5, 1)
            outputfile.write(im)
            if num_of_frame%5==0:
                print(num_of_frame,'/',video_frame_count)
            num_of_frame+=1
        
        # release the videos again
        video_capture.release()
        outputfile.release()   

An example of how to use the function:

In [32]:
Overtaking_Detection('test.mp4','test_overtake.mp4',mode='Unconfirmed',classes_to_detect_movement=[2,5,7],size_thresholds=[500 * 500, 900 * 900, 600 * 600])

[Checkpointer] Loading from https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl ...
Reading a file from 'Detectron2 Model Zoo'
Using cache found in /Users/guojingwei/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2022-9-20 Python-3.9.13 torch-1.12.1 CPU

Fusing layers... 
YOLOv5m6 summary: 378 layers, 35704908 parameters, 0 gradients
Adding AutoShape... 
Loading weights from ./deep_sort/deep/checkpoint/ckpt.t7... Done!


The input video is 5376 in width and 2688 in height.
5 / 55
0.5649516674025502 fps
10 / 55
0.5448937709541225 fps
15 / 55
0.49607409584511875 fps
20 / 55
0.5224900464389478 fps
25 / 55
0.4886474591882539 fps
