# SAHI: Slicing Aided Hyper Inference for Yolov5 and YoloX

A lightweight vision library for performing large scale object detection & instance segmentation on Kaggle. Full source code and tutorial you can find on Fatih Cagatay Akyon (author: Akyon, Fatih Cagatay and Cengiz, Cemil and Altinuc, Sinan Onur and Cavusoglu, Devrim and Sahin, Kadir and Eryuksel, Ogulcan) github: [SAHI: A vision library for large-scale object detection & instance segmentation](https://github.com/obss/sahi)

* In this notebook (tutorial) you can find:
* Installation of SAHI on Kaggle
* Sliced inference with SAHI for Yolov5
* Sliced inference with SAHI for YolovX (soon)


<div class="alert alert-success" role="alert">
Other my work in this competition:
    <ul>
        <li> <a href="https://www.kaggle.com/remekkinas/yolox-full-training-pipeline-for-cots-dataset">YoloX full training pipeline for COTS dataset</a></li>
        <li> <a href="https://www.kaggle.com/remekkinas/yolox-inference-on-kaggle-for-cots-lb-0-507">YoloX detections submission made on COTS dataset</a></li>
        <li> <a href="https://www.kaggle.com/remekkinas/yolor-p6-w6-one-more-yolo-on-kaggle-infer">YoloR [P6/W6] ... one more yolo on Kaggle [INFER]</a></li>
        <li> <a href="https://www.kaggle.com/remekkinas/yolor-p6-w6-one-more-yolo-on-kaggle-train">YoloR [P6/W6]... one more yolo on Kaggle [TRAIN]</a></li>
    </ul>
    
</div>


<div class="alert alert-warning">Note: My goal was to implement and share tool for experimentations  - I was not looking for best parameters to submit over 0.6 or ... even 0.7. This is your part of this journey. Enjoy experimenting and progressing!</div>

The concept of sliced inference is basically; performing inference over smaller slices of the original image and then merging the sliced predictions on the original image. It can be illustrated as below:

<div align="center"><img src="https://raw.githubusercontent.com/obss/sahi/main/resources/sliced_inference.gif"/></div>

# Install libraries

In [None]:
# norfair dependencies
%cd /kaggle/input/norfair031py3/
!pip install commonmark-0.9.1-py2.py3-none-any.whl -f ./ --no-index
!pip install rich-9.13.0-py3-none-any.whl

!mkdir /kaggle/working/tmp
!cp -r /kaggle/input/norfair031py3/filterpy-1.4.5/filterpy-1.4.5/ /kaggle/working/tmp/
%cd /kaggle/working/tmp/filterpy-1.4.5/
!pip install .
!rm -rf /kaggle/working/tmp

# norfair
%cd /kaggle/input/norfair031py3/
!pip install norfair-0.3.1-py3-none-any.whl -f ./ --no-index

# yolov5 + sahi
%cd /kaggle/input/sahihub/s-lib
!pip install ./fire-0.4.0/fire-0.4.0.tar -f ./ --no-index
!pip install terminaltables-3.1.10-py2.py3-none-any.whl -f ./ --no-index
!pip install sahi-0.8.22-py3-none-any.whl -f ./ --no-index
!pip install thop-0.0.31.post2005241907-py3-none-any.whl -f ./ --no-index
!pip install yolov5-6.0.6-py36.py37.py38-none-any.whl -f ./ --no-index
!pip install yolo5-0.0.1-py36.py37.py38-none-any.whl -f ./ --no-index

!mkdir -p /root/.config/Ultralytics
!cp /kaggle/input/sahihub/Arial.ttf /root/.config/Ultralytics/

%cd /kaggle/working

## 1. IMPORT SAHI MODULES

In [None]:
FOLD = 1
ROOT_DIR  = '/kaggle/input/tensorflow-great-barrier-reef'
IMAGE_DIR = f'/kaggle/working/yolo_data/fold{FOLD}/images' # directory to save images
LABEL_DIR = f'/kaggle/working/yolo_data/fold{FOLD}/labels' # directory to save labels.
DATASET_PATH = '/kaggle/input/tensorflow-great-barrier-reef/train_images/'
CKPT_PATH = '/kaggle/input/reef-baseline-fold12/l6_3600_uflip_vm5_f12_up/f1/best.pt'
# CKPT_PATH = '/kaggle/input/yolov5-models/yolov5l_1920.pt'

import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd
from tqdm.notebook import tqdm
tqdm.pandas()
import sys
import cv2
import ast
import torch
from PIL import Image as Img
from IPython.display import display
from norfair import Detection, Tracker

from sahi.model import Yolov5DetectionModel
from sahi.utils.cv import read_image
from sahi.predict import get_prediction, get_sliced_prediction, predict
from IPython.display import Image
from sahi.utils.yolov5 import (
    download_yolov5s6_model,
)
sys.path.append('../input/weightedboxesfusion/')
from ensemble_boxes.ensemble_boxes_wbf import weighted_boxes_fusion

### A. YOLOv5 - get_sliced_prediction
​
* **image**: str or np.ndarray - Location of image or numpy image matrix to slice
* **detection_model**: model.DetectionModel
* **image_size**: int: Input image size for each inference (image is scaled by preserving asp. rat.).
* **slice_height**: int: Height of each slice.  Defaults to ``512``.
* **slice_width**: int: Width of each slice.  Defaults to ``512``.
* **overlap_height_ratio**: float: Fractional overlap in height of each window (e.g. an overlap of 0.2 for a window of size 512 yields an overlap of 102 pixels). Default to ``0.2``.
* **overlap_width_ratio**: float: Fractional overlap in width of each window (e.g. an overlap of 0.2 for a window of size 512 yields an overlap of 102 pixels). Default to ``0.2``.
* **perform_standard_pred**: bool: Perform a standard prediction on top of sliced predictions to increase large object detection accuracy. Default: True.
* **postprocess_type**: str: Type of the postprocess to be used after sliced inference while merging/eliminating predictions. Options are 'NMM', 'GRREDYNMM' or 'NMS'. Default is 'GRREDYNMM'.
* **postprocess_match_metric**: str: Metric to be used during object prediction matching after sliced prediction. 'IOU' for intersection over union, 'IOS' for intersection over smaller area.
* **postprocess_match_threshold**: float: Sliced predictions having higher iou than postprocess_match_threshold will be postprocessed after sliced prediction.
* **postprocess_class_agnostic**: bool: If True, postprocess will ignore category ids.
* **verbose**: int: 0: no print, 1: print number of slices (default), 2: print number of slices and slice/prediction durations

### A1. CUSTOM Yolo5 PREDICTION CLASS
This is not obligatory but I decided to write this to have more control over prediction.
Idea provided by Dewei Chen @dwchen in this discussion: https://www.kaggle.com/c/tensorflow-great-barrier-reef/discussion/302761

In [None]:
from sahi.prediction import ObjectPrediction
from sahi.model import DetectionModel
from typing import Dict, List, Optional, Union
from sahi.utils.compatibility import fix_full_shape_list, fix_shift_amount_list

class COTSYolov5DetectionModel(DetectionModel):

    
    def load_model(self):
        model = torch.hub.load('/kaggle/input/yolov5-lib-ds', 
                               'custom', 
                               path=self.model_path,
                               source='local',
                               force_reload=True)
        
        model.conf = self.confidence_threshold
        self.model = model
        
        if not self.category_mapping:
            category_mapping = {str(ind): category_name for ind, category_name in enumerate(self.category_names)}
            self.category_mapping = category_mapping

    def perform_inference(self, image: np.ndarray, image_size: int = None):
        if image_size is not None:
            warnings.warn("Set 'image_size' at DetectionModel init.", DeprecationWarning)
            prediction_result = self.model(image, size=image_size, augment=True)
#             if debug_mode:
#                 display(Img.fromarray(image).resize((320, 200)))
        elif self.image_size is not None:
            prediction_result = self.model(image, size=self.image_size, augment=True)
        else:
            prediction_result = self.model(image)

        self._original_predictions = prediction_result

    @property
    def num_categories(self):
        """
        Returns number of categories
        """
        return len(self.model.names)

    @property
    def has_mask(self):
        """
        Returns if model output contains segmentation mask
        """
        has_mask = self.model.with_mask
        return has_mask

    @property
    def category_names(self):
        return self.model.names

    def _create_object_prediction_list_from_original_predictions(
        self,
        shift_amount_list: Optional[List[List[int]]] = [[0, 0]],
        full_shape_list: Optional[List[List[int]]] = None,):

        original_predictions = self._original_predictions
        shift_amount_list = fix_shift_amount_list(shift_amount_list)
        full_shape_list = fix_full_shape_list(full_shape_list)

        # handle all predictions
        object_prediction_list_per_image = []
        for image_ind, image_predictions_in_xyxy_format in enumerate(original_predictions.xyxy):
            shift_amount = shift_amount_list[image_ind]
            full_shape = None if full_shape_list is None else full_shape_list[image_ind]
            object_prediction_list = []

            # process predictions
            for prediction in image_predictions_in_xyxy_format.cpu().detach().numpy():
                x1 = int(prediction[0])
                y1 = int(prediction[1])
                x2 = int(prediction[2])
                y2 = int(prediction[3])
                bbox = [x1, y1, x2, y2]
                score = prediction[4]
                category_id = int(prediction[5])
                category_name = self.category_mapping[str(category_id)]

                # ignore invalid predictions
                if bbox[0] > bbox[2] or bbox[1] > bbox[3] or bbox[0] < 0 or bbox[1] < 0 or bbox[2] < 0 or bbox[3] < 0:
                    logger.warning(f"ignoring invalid prediction with bbox: {bbox}")
                    continue
                if full_shape is not None and (
                    bbox[1] > full_shape[0]
                    or bbox[3] > full_shape[0]
                    or bbox[0] > full_shape[1]
                    or bbox[2] > full_shape[1]
                ):
                    logger.warning(f"ignoring invalid prediction with bbox: {bbox}")
                    continue

                object_prediction = ObjectPrediction(
                    bbox=bbox,
                    category_id=category_id,
                    score=score,
                    bool_mask=None,
                    category_name=category_name,
                    shift_amount=shift_amount,
                    full_shape=full_shape,
                )
                object_prediction_list.append(object_prediction)
            object_prediction_list_per_image.append(object_prediction_list)

        self._object_prediction_list_per_image = object_prediction_list_per_image 

## 2. HELPER FUNCTIONS

### Visualizaion

In [None]:
def show_prediction(img, bboxes, scores, gts, show=True):
    colors = [(0, 0, 255)]

    obj_names = ["s"]

    for box, score in zip(bboxes, scores):
#         cv2.rectangle(img, (int(box[0]), int(box[1])), (int(box[2]), int(box[3])), (255,0,0), 2)
        cv2.rectangle(img, (int(box[0]), int(box[1])), (int(box[0] + box[2]), int(box[1] + box[3])), (255,0,0), 2)
        cv2.putText(img, f'{score}', (int(box[0]), int(box[1])-3), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255,0,0), 1, cv2.LINE_AA)
        
    for gt in gts:
        cv2.rectangle(img, (int(gt[0]), int(gt[1])), (int(gt[0]+gt[2]), int(gt[1]+gt[3])), (0,255,0), 2)
    
    if show:
        img = Img.fromarray(img).resize((1280, 720))
    return img

## Tracking

In [None]:
def predict(img, model, sw, sh, ohr, owr, pmt, img_size, verb):
    result = get_sliced_prediction(img,
                                   model,
                                   slice_width = sw,
                                   slice_height = sh,
                                   overlap_height_ratio = ohr,
                                   overlap_width_ratio = owr,
                                   postprocess_match_threshold = pmt,
                                   image_size = img_size,
                                   verbose = verb,
                                   perform_standard_pred = True)
    
    
    bboxes = []
    scores = []
    result_len = result.to_coco_annotations()
    for pred in result_len:
        bboxes.append(pred['bbox'])
        scores.append(pred['score'])
    
    return bboxes, scores 

In [None]:
ORIG_IMG_SIZE = 1280

# Prediction
def predict(img, model, sh, sw, ohr, owr, pmt, img_size, verb):
    result = get_sliced_prediction(img,
                                   model,
                                   slice_height = sh,
                                   slice_width = sw,
                                   overlap_height_ratio = ohr,
                                   overlap_width_ratio = owr,
                                   postprocess_match_threshold = pmt,
                                   image_size = img_size,
                                   verbose = verb,
                                   perform_standard_pred = True)
    
    bboxes = []
    scores = []
    result_len = result.to_coco_annotations()
    for pred in result_len:
        bboxes.append(pred['bbox'])
        scores.append(pred['score'])
    
    return bboxes, scores 


def add_bboxes(img, model, sw=768, sh=432, ohr=0.2, owr=0.2, pmt=0.45, img_size=3200):
    global bboxes, scores
    
    # get predictions from SAHI and transform them to suitable format
    bbox, score = predict(img, detection_model, sw=768, sh=432, ohr=0.2, owr=0.2, pmt=0.45, img_size=3200, verb=0)
    
    for i in range(len(bbox)):
        bbox[i][2] += bbox[i][0]
        bbox[i][3] += bbox[i][1]  
    
    if bbox:
        bboxes.append(bbox)
        scores.append(score)


# WBF
def run_wbf(bboxes, scores, orig_image_size=ORIG_IMG_SIZE, iou_thr=0.4, skip_box_thr=0.0, weights=None):
    # normalize bboxes
    for i in range(len(bboxes)):
        for j in range(len(bboxes[i])):
            bboxes[i][j] = [bb/(ORIG_IMG_SIZE-1) for bb in bboxes[i][j]]
    labels = [np.ones(len(score)) for score in scores]
    # fuse bboxes with WBF
    bboxes, scores, labels = weighted_boxes_fusion(bboxes, scores, labels, weights=weights, iou_thr=iou_thr, skip_box_thr=skip_box_thr)
    # return bboxes to the original size
    bboxes = [bbox*(ORIG_IMG_SIZE-1) for bbox in bboxes]
    return bboxes, scores

In [None]:
# Helper to convert bbox in format [x_min, y_min, x_max, y_max, score] to norfair.Detection class
def to_norfair(detects, frame_id):
    result = []
    for x_min, y_min, x_max, y_max, score in detects:
        xc, yc = (x_min + x_max) / 2, (y_min + y_max) / 2
        w, h = x_max - x_min, y_max - y_min
        result.append(Detection(points=np.array([xc, yc]), scores=np.array([score]), data=np.array([w, h, frame_id])))
        
    return result

# Euclidean distance function to match detections on this frame with tracked_objects from previous frames
def euclidean_distance(detection, tracked_object):
    return np.linalg.norm(detection.points - tracked_object.estimate)

# 3. Evaluation

## Evaluation utils

In [None]:
def calc_iou(bboxes1, bboxes2, bbox_mode='xywh'):
    assert len(bboxes1.shape) == 2 and bboxes1.shape[1] == 4
    assert len(bboxes2.shape) == 2 and bboxes2.shape[1] == 4
    
    bboxes1 = bboxes1.copy()
    bboxes2 = bboxes2.copy()
    
    if bbox_mode == 'xywh':
        bboxes1[:, 2:] += bboxes1[:, :2]
        bboxes2[:, 2:] += bboxes2[:, :2]

    x11, y11, x12, y12 = np.split(bboxes1, 4, axis=1)
    x21, y21, x22, y22 = np.split(bboxes2, 4, axis=1)
    xA = np.maximum(x11, np.transpose(x21))
    yA = np.maximum(y11, np.transpose(y21))
    xB = np.minimum(x12, np.transpose(x22))
    yB = np.minimum(y12, np.transpose(y22))
    interArea = np.maximum((xB - xA + 1), 0) * np.maximum((yB - yA + 1), 0)
    boxAArea = (x12 - x11 + 1) * (y12 - y11 + 1)
    boxBArea = (x22 - x21 + 1) * (y22 - y21 + 1)
    iou = interArea / (boxAArea + np.transpose(boxBArea) - interArea)
    return iou

def f_beta(tp, fp, fn, beta=2):
    return (1+beta**2)*tp / ((1+beta**2)*tp+beta**2*fn+fp)

def calc_is_correct_at_iou_th(gt_bboxes, pred_bboxes, iou_th, verbose=False):
    gt_bboxes = gt_bboxes.copy()
    pred_bboxes = pred_bboxes.copy()
    
    tp = 0
    fp = 0
    for k, pred_bbox in enumerate(pred_bboxes): # fixed in ver.7
        ious = calc_iou(gt_bboxes, pred_bbox[None, 1:])
        max_iou = ious.max()
        if max_iou > iou_th:
            tp += 1
            gt_bboxes = np.delete(gt_bboxes, ious.argmax(), axis=0)
        else:
            fp += 1
        if len(gt_bboxes) == 0:
            fp += len(pred_bboxes) - (k + 1) # fix in ver.7
            break

    fn = len(gt_bboxes)
    return tp, fp, fn

def calc_is_correct(gt_bboxes, pred_bboxes):
    """
    gt_bboxes: (N, 4) np.array in xywh format
    pred_bboxes: (N, 5) np.array in conf+xywh format
    """
    if len(gt_bboxes) == 0 and len(pred_bboxes) == 0:
        tps, fps, fns = 0, 0, 0
        return tps, fps, fns
    
    elif len(gt_bboxes) == 0:
        tps, fps, fns = 0, len(pred_bboxes)*11, 0
        return tps, fps, fns
    
    elif len(pred_bboxes) == 0:
        tps, fps, fns = 0, 0, len(gt_bboxes)*11
        return tps, fps, fns
    
    pred_bboxes = pred_bboxes[pred_bboxes[:,0].argsort()[::-1]] # sort by conf
    
    tps, fps, fns = 0, 0, 0
    for iou_th in np.arange(0.3, 0.85, 0.05):
        tp, fp, fn = calc_is_correct_at_iou_th(gt_bboxes, pred_bboxes, iou_th)
        tps += tp
        fps += fp
        fns += fn
    return tps, fps, fns

def calc_f2_score(gt_bboxes_list, pred_bboxes_list, verbose=False):
    """
    gt_bboxes_list: list of (N, 4) np.array in xywh format
    pred_bboxes_list: list of (N, 5) np.array in conf+xywh format
    """
    tps, fps, fns = 0, 0, 0
    for gt_bboxes, pred_bboxes in zip(gt_bboxes_list, pred_bboxes_list):
        tp, fp, fn = calc_is_correct(gt_bboxes, pred_bboxes)
        tps += tp
        fps += fp
        fns += fn
        if verbose:
            num_gt = len(gt_bboxes)
            num_pred = len(pred_bboxes)
            print(f'num_gt:{num_gt:<3} num_pred:{num_pred:<3} tp:{tp:<3} fp:{fp:<3} fn:{fn:<3}')
    return f_beta(tps, fps, fns, beta=2)

In [None]:
detection_model = COTSYolov5DetectionModel(
   model_path = CKPT_PATH,
   confidence_threshold = 0.28,
   device="cuda",
   image_size=10250
)

# detection_model = Yolov5DetectionModel(
#    model_path = CKPT_PATH,
#    confidence_threshold = 0.25,
#    device="cuda",
# )

detection_model.model.iou = 0.4
area_thr = 300

## Get GT bboxes from dataset

In [None]:
dir = f'{DATASET_PATH}'
imgs = [dir + f for f in ('video_2/5748.jpg',
                          'video_2/5772.jpg',
                          'video_2/5820.jpg',
                          'video_1/4159.jpg', 
                          'video_1/4183.jpg', 
                          'video_1/4501.jpg', 
                          'video_1/5375.jpg', 
                          'video_1/5414.jpg',
                          'video_1/5495.jpg',
                          'video_1/4775.jpg', 
                          'video_0/9794.jpg', 
                          'video_0/4502.jpg', 
                          'video_0/9651.jpg', 
                          'video_0/9700.jpg',  
                          'video_0/9674.jpg',
                          'video_0/20.jpg', 
                          'video_0/17.jpg', 
                          'video_1/5474.jpg', 
                          'video_0/0.jpg')]

def get_path(row):
    row['image_path'] = f'{ROOT_DIR}/train_images/video_{row.video_id}/{row.video_frame}.jpg'
    return row

def get_bbox(annots):
    bboxes = [list(annot.values()) for annot in annots]
    return bboxes

# Train Data
df = pd.read_csv(f'{ROOT_DIR}/train.csv')
# df = df[df.video_id == FOLD]
df = df.progress_apply(get_path, axis=1)
df['annotations'] = df['annotations'].progress_apply(lambda x: ast.literal_eval(x))
df.head(2)

df['num_bbox'] = df['annotations'].progress_apply(lambda x: len(x))
data = (df.num_bbox>0).value_counts(normalize=True)*100
print(f"No BBox: {data[0]:0.2f}% | With BBox: {data[1]:0.2f}%")

df['bboxes'] = df.annotations.progress_apply(get_bbox)
df.head()

## Evaluate model

In [None]:
# #######################################################
# #                      Tracking                       #
# #######################################################

# # Tracker will update tracks based on detections from current frame
# # Matching based on euclidean distance between bbox centers of detections 
# # from current frame and tracked_objects based on previous frames
# # You can check it's parameters in norfair docs
# # https://github.com/tryolabs/norfair/blob/master/docs/README.md
# tracker = Tracker(
#     distance_function=euclidean_distance, 
#     distance_threshold=30,
#     hit_inertia_min=3,
#     hit_inertia_max=6,
#     initialization_delay=1,
# )
    
# # Save frame_id into detection to know which tracks have no detections on current frame
# frame_id = 0
# #######################################################

# gt_bboxes_list, prd_bboxes_list = [], []

# test_df = df[df.sequence == 8503]

# # for idx, row in tqdm(df.iloc[7000:7200].iterrows()):
# for idx, row in tqdm(test_df.iterrows()):
# #     global bboxes, scores
# #     bboxes, scores = [], []
#     gt_bboxes, pred_bboxes = [], []
    
#     img_path = row.image_path
#     img = cv2.imread(img_path)
#     img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

#     # get GT bboxes for evaluation
#     for gt in row.bboxes:
#         gt_bbox = np.array(list(map(float, gt)))
#         gt_bboxes.append(gt_bbox)
    
#     gt_bboxes_list.append(np.array(gt_bboxes))

# #     add_bboxes(img, detection_model, sw=768, sh=432, ohr=0.2, owr=0.2, pmt=0.45, img_size=3200)
#     bboxes, scores = predict(img, detection_model, sw=768, sh=432, ohr=0.2, owr=0.2, pmt=0.45, img_size=3200, verb=0)
    
#     predictions = []
#     detects = []
    
#     if len(bboxes) > 0:
#         for bbox, score in zip(bboxes, scores):
#             width, height = int(bbox[2]), int(bbox[3])
#             area = width * height
#             if area >= area_thr:
#                 detects.append([int(bbox[0]), int(bbox[1]), int(bbox[0])+width, int(bbox[1])+height, score])
#                 predictions.append('{:.2f} {} {} {} {}'.format(score, int(bbox[0]), int(bbox[1]), width, height))
#                 pred_bboxes.append(np.array([score, int(bbox[0]), int(bbox[1]), width, height]))
                
# #     display(show_prediction(img, bboxes, scores, gt_bboxes))
#     #######################################################
#     #                      Tracking                       #
#     #######################################################
    
#     # Update tracks using detects from current frame
#     tracked_objects = tracker.update(detections=to_norfair(detects, frame_id))
#     for tobj in tracked_objects:
#         bbox_width, bbox_height, last_detected_frame_id = tobj.last_detection.data
#         if last_detected_frame_id == frame_id:  # Skip objects that were detected on current frame
#             continue

#         # Add objects that have no detections on current frame to predictions
#         xc, yc = tobj.estimate[0]
#         x_min, y_min = int(round(xc - bbox_width / 2)), int(round(yc - bbox_height / 2))
#         score = tobj.last_detection.scores[0]
#         area = bbox_width * bbox_height
#         if area >= area_thr:
#             predictions.append('{:.2f} {} {} {} {}'.format(score, x_min, y_min, bbox_width, bbox_height))
#             pred_bboxes.append(np.array([score, x_min, y_min, bbox_width, bbox_height]))
#     #######################################################
    
#     # get pred bboxes for evaluation
#     prd_bboxes_list.append(np.array(pred_bboxes))
    
    
#     prediction_str = ' '.join(predictions)
    
#     frame_id += 1

In [None]:
# calc_f2_score(gt_bboxes_list, prd_bboxes_list, verbose=False)

s6, IOU 0.25 - 0.5718199755816712


s6, IOU 0.45 - 0.5703915447399105


l6, IOU 0.25 - 

## 4. MAKE VIDEO FROM PREDS

In [None]:
import ast
import os
import pandas as pd
import subprocess

from ast import literal_eval
from tqdm.auto import tqdm

from IPython.display import HTML
from base64 import b64encode

In [None]:
def load_image(video_id, video_frame, image_dir):
    assert os.path.exists(image_dir), f'{image_dir} does not exist.'
    img = cv2.imread(image_dir)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    return img


def decode_annotations(annotaitons_str):
    return literal_eval(annotaitons_str)

def load_image_with_annotations(img, annotations):
#     annotations = decode_annotations(annotaitons_str)
    if len(annotations) > 0:
        for ann in annotations:
            cv2.rectangle(img, (ann['x'], ann['y']),
                (ann['x'] + ann['width'], ann['y'] + ann['height']),
                (0, 255, 255), thickness=2,)
    return img

In [None]:
# ## This code I found in: https://www.kaggle.com/bamps53/create-annotated-video Thank you for sharing.

# def make_sahi_video(df, video_id, sequence_id, out_dir):
#     fps = 15 
#     width = 1280
#     height = 720

#     save_path = f'{out_dir}/video-{video_id}.mp4'
#     tmp_path =  f'{out_dir}/tmp-video-{video_id}.mp4'
#     output_video = cv2.VideoWriter(tmp_path, cv2.VideoWriter_fourcc(*"MP4V"), fps, (width, height))
    
#     # I just generate any part of video
#     video_df = df.query('video_id == @video_id and sequence == @sequence_id and video_frame > 5700 and video_frame < 6000')
#     for _, row in tqdm(video_df.iterrows(), total=len(video_df)):
#         video_id = row.video_id
#         video_frame = row.video_frame
#         annotations = row.annotations
#         img_file = row.image_path
#         img = load_image(video_id, video_frame, img_file)
#         bboxes, scores = predict(img, detection_model, 768, 432, 0.2, 0.2, 0.45, 3200, 0)
        
#         gt_bboxes, pred_bboxes = [], []
#         # get pred bboxes for evaluation
#         for bbox in bboxes:
#             width, height = int(bbox[2]), int(bbox[3])
#             area = width * height
#             if area >= area_thr:
#                 pred_bboxes.append(bbox)
#         # get GT bboxes for evaluation
#         for gt in row.bboxes:
#             gt_bbox = np.array(list(map(float, gt)))
#             gt_bboxes.append(gt_bbox)

#         img = show_prediction(img, pred_bboxes, scores, gt_bboxes, False)
#         img = load_image_with_annotations(img, annotations)
#         cv2.putText(img, f'{video_id}-{video_frame}', (10,70), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,0,0), 1, cv2.LINE_AA)
#         img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
#         output_video.write(img)

    
#     output_video.release()

#     if os.path.exists(save_path):
#         os.remove(save_path)
#     subprocess.run(
#         ["ffmpeg", "-i", tmp_path, "-crf", "18", "-preset", "veryfast", "-vcodec", "libx264", save_path]
#     )
#     os.remove(tmp_path)

In [None]:
# # To speed up I just generate any part of video
# # This prediction is for sure overfitted but it is for demo only (I can see it on prediction)

# debug_mode = False

# make_sahi_video(df, 2, 22643, '/kaggle/working/')

In [None]:
# def play(filename):
#     html = ''
#     video = open(filename,'rb').read()
#     src = 'data:video/mp4;base64,' + b64encode(video).decode()
#     html += '<video width=800 controls autoplay loop><source src="%s" type="video/mp4"></video>' % src 
#     return HTML(html)

# play('/kaggle/working/video-2.mp4')

# 5. Inference

In [None]:
import greatbarrierreef
env = greatbarrierreef.make_env() # initialize the environment
iter_test = env.iter_test()       # an iterator which loops over the test set and sample submission

#######################################################
#                      Tracking                       #
#######################################################

# Tracker will update tracks based on detections from current frame
# Matching based on euclidean distance between bbox centers of detections 
# from current frame and tracked_objects based on previous frames
# You can check it's parameters in norfair docs
# https://github.com/tryolabs/norfair/blob/master/docs/README.md
tracker = Tracker(
    distance_function=euclidean_distance, 
    distance_threshold=30,
    hit_inertia_min=3,
    hit_inertia_max=6,
    initialization_delay=1,
)
    
# Save frame_id into detection to know which tracks have no detections on current frame
frame_id = 0
#######################################################

for (img, sample_prediction_df) in tqdm(iter_test):
    bboxes, scores = predict(img, detection_model, sw=768, sh=432, ohr=0.2, owr=0.2, pmt=0.45, img_size=3200, verb=0)
    
    predictions = []
    detects = []
    
    if len(bboxes) > 0:
         for bbox, score in zip(bboxes, scores):
            width, height = int(bbox[2]), int(bbox[3])
            area = width * height
            if area >= area_thr:
                detects.append([int(bbox[0]), int(bbox[1]), int(bbox[0])+width, int(bbox[1])+height, score])
                predictions.append('{:.2f} {} {} {} {}'.format(score, int(bbox[0]), int(bbox[1]), width, height))
    
    #######################################################
    #                      Tracking                       #
    #######################################################
    
    # Update tracks using detects from current frame
    tracked_objects = tracker.update(detections=to_norfair(detects, frame_id))
    for tobj in tracked_objects:
        bbox_width, bbox_height, last_detected_frame_id = tobj.last_detection.data
        if last_detected_frame_id == frame_id:  # Skip objects that were detected on current frame
            continue
            
        # Add objects that have no detections on current frame to predictions
        xc, yc = tobj.estimate[0]
        x_min, y_min = int(round(xc - bbox_width / 2)), int(round(yc - bbox_height / 2))
        score = tobj.last_detection.scores[0]
        # Calculate area and add only those boxes which have area bigger than area threshold
        area = bbox_width * bbox_height
        if area >= area_thr:
            predictions.append('{:.2f} {} {} {} {}'.format(score, x_min, y_min, bbox_width, bbox_height))
    #######################################################
    
    
    prediction_str = ' '.join(predictions)
    sample_prediction_df['annotations'] = prediction_str
    env.predict(sample_prediction_df)
    
    frame_id += 1

In [None]:
sub_df = pd.read_csv('submission.csv')
sub_df.head()