# Competition introduction
### TensorFlow - Help Protect the Great Barrier Reef 

The Great Barrier Reef  is the world's largest coral reef system. It is larger than the Great Wall of China and the only living thing on earth visible from space. It is home to 1,500 species of fish, 400 species of corals, 130 species of sharks, rays, and a massive variety of other sea life. The marine park protects a large part of the reef from any negative impact of human use such as fishing and tourism. Unfortunately, the reef is facing one the primary threats to its system's health, that is the overpopulation of the crown-of-thorns starfish (COTS). This is a large starfish that preys upon hard, or stony coral polyps and it is one of the largest starfish in the world. An outbreak of these starfish can devastate reefs as they contribute to a loss of live coral cover. The outbreak is believed to occur in natural cycles, worsened by poor water quality and overfishing of the starfish's predators.

<iframe width="682" height="315" src="https://www.youtube.com/embed/UT2noVDFoaA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

In order to control COTS outbreaks to ecologically sustainable levels, The Great Barrier Reef Foundation established an innovation program  in which AI technology could improve image detection underwater which is the main goal of this competition, that is, to accurately identify starfish in real-time by building an object detection model trained on underwater videos of coral reefs.


# **Training YOLO Object Detection on a Kaggle Dataset**

### **Overview**

This notebook walks through how to train a YOLO object detection model using the TensorFlow Object Detection API.
In this specific example, we'll be training an object detection model to accurately identify starfish in real-time, trained on underwater videos of coral reefs. 
Everything in this notebook is also hosted on this [GitHub repo](https://github.com/arnold402/in2deep). Our final submission is located on the main branch, while additional work is avaliable in the [development branch](https://github.com/arnold402/in2deep/tree/development). Notebook is also avaliable on Kaggle [here](https://www.kaggle.com/amyrouillard/ak-final-tests-v1/edit/run/87382593).

![Great Barrier reef](https://blogger.googleusercontent.com/img/a/AVvXsEj6-rQw5r22Bt47BUTtW5bn_dcWT7zMeADwtvsAHS3kBt6w8eWTmCM649ZcJcvosIMup6flKFIaI8p4M9ZzH1yXpEaMRjvwwfVZ_hMqgXCxtwNzEK25vTa-J2ly20by3M1zx7rTymo-tBI6Fq-mj1SJfCOXsOz0Ou1Esi4h2omvQSW98AjsONsVS-EA)

 
### **Our Data**

We will be using data provided by Kaggle as part of the competion. Our dataset consists of the following:

#### Files

**train/** - Folder containing training set photos of the form **video_{video_id}/{video_frame_number}.jpg**.

**[train/test].csv** - Metadata for the images. As with other test files, most of the test metadata data is only available to your notebook upon submission. Just the first few rows available for download.

* `video_id` - ID number of the video the image was part of. The video ids are not meaningfully ordered.

* `video_frame` - The frame number of the image within the video. Expect to see occasional gaps in the frame number from when the diver surfaced.

* `sequence` - ID of a gap-free subset of a given video. The sequence ids are not meaningfully ordered.

* `sequence_frame` - The frame number within a given sequence.

* `image_id` - ID code for the image, in the format '__{video_id}-{video_frame}__'

* `annotations` - The bounding boxes of any starfish detections in a string format that can be evaluated directly with Python. Does not use the same format as the predictions you will submit. Not available in `test.csv`. A bounding box is described by the pixel coordinate (x_min, y_min) of its upper left corner within the image together with its width and height in pixels.

**example_sample_submission.csv** - A sample submission file in the correct format. The actual sample submission will be provided by the API; this is only provided to illustrate how to properly format predictions. The submission format is further described on the Evaluation page.

**example_test.npy** - Sample data that will be served by the example API.

**greatbarrierreef** - The image delivery API that will serve the test set pixel arrays. You may need Python 3.7 and a Linux environment to run the example offline without errors.

#### Time-series API Details

The API serves the images one by one, in order by video and frame number, as pixel arrays.
Expect to see roughly 13,000 images in the test set.
The API will require roughly two GB of memory after initialization. The initialization step (env.iter_test()) will require meaningfully more memory than that; we recommend you do not load your model until after making that call. The API will also consume less than ten minutes of runtime for loading and serving the data. 

The dataset can be found at Kaggle [here](https://www.kaggle.com/c/tensorflow-great-barrier-reef/data).


### **Our Model**

We'll be training a YOLO neural network. For YOLO, detection is a simple regression problem which takes an input image and learns the class probabilities and bounding
box coordinates. Sounds simple? YOLO divides each image into a grid of S x S and each grid predicts N bounding boxes and confidence. The confidence reflects the
accuracy of the bounding box and whether the bounding box actually contains an object(regardless of class). YOLO also predicts the classification score for each box for
every class in training. You can combine both the classes to calculate the probability of each class being present in a predicted box. (Consider [this](https://neptune.ai/blog/object-detection-with-yolo-hands-on-tutorial) deep dive for more!)

#### YOLO compared to other detectors

Although a convolutional neural net (CNN) is used under the hood of YOLO, it’s still able to detect objects with real-time performance. It’s possible thanks to YOLO’s ability to do the predictions simultaneously in a single-stage approach. Other, slower algorithms for object detection (like Faster R-CNN) typically use a two-stage approach:

* in the first stage, interesting image regions are selected. These are the parts of an image that might contain any objects

* in the second stage, each of these regions is classified using a convolutional neural net. 

Usually, there are many regions on an image with the objects. All of these regions are sent to classification. Classification is a time-consuming operation, which is why the two-stage object detection approach performs slower compared to one-stage detection.YOLO doesn’t select the interesting parts of an image, there’s no need for that. Instead, it predicts bounding boxes and classes for the whole image in a single forward net pass.

Yolo does not face many of the limitations faced by other object detection algorithms such high computational times faced by CNN or Fast-RCNN models.

### **Training**

Google Colab provides free GPU resources. Click "Runtime" → "Change runtime type" → Hardware Accelerator dropdown to "GPU."
Colab does have memory limitations, and notebooks must be open in your browser to run. Sessions automatically clear themselves after 12 hours.

### **Inference**

We'll run inference directly in this notebook, and on test images contained in the "test" folder from kaggle. 









### Install required packages

In [None]:
# norfair dependencies
%cd /kaggle/input/norfair031py3/
!pip install commonmark-0.9.1-py2.py3-none-any.whl -f ./ --no-index
!pip install rich-9.13.0-py3-none-any.whl
!mkdir /kaggle/working/tmp
!cp -r /kaggle/input/norfair031py3/filterpy-1.4.5/filterpy-1.4.5/ /kaggle/working/tmp/
%cd /kaggle/working/tmp/filterpy-1.4.5/
!pip install .
!rm -rf /kaggle/working/tmp

# norfair
%cd /kaggle/input/norfair031py3/
!pip install norfair-0.3.1-py3-none-any.whl -f ./ --no-index
%cd ..

### Import Libraries

In [None]:
import numpy as np
from tqdm.notebook import tqdm
tqdm.pandas()
import pandas as pd
import os
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
import glob
import shutil
import sys
import time
sys.path.append('../input/tensorflow-great-barrier-reef')
import torch
from PIL import Image, ImageDraw
import ast
import albumentations as albu
import random

In [None]:
# Path to root directory
ROOT_DIR  = '/kaggle/input/tensorflow-great-barrier-reef/'

# Increase Annotations

**Hypothesis:** we can enlarge our training data set by adding earlier bounding boxes prior to the first detected box for each COTS.

An increase in annotations means increase in training data which hypothetically can increase accuracy.

**Approach:**

1. Add detections to earlier frames that have a detection in a subsequent frame as follows:
* identify frames that have less detections than the next one
* exclude any candidate boxes if the box has an overlap with another box in the previous frame
* exclude the image "margins" to account for that some boxes may not have been visible in the frame
* shift the bounding box by the average translation of any other matched boxes
2. Compare some examples frames right before and right after a detection
3. Save the results into a new training set

In [None]:
# Parameters
IMAGE_DIM = (1280,720)
# We will not be adding boxes to the perimeter of the image, i.e. to 5% of the image width or height
EXCLUDE_MARGIN = 0.05

In [None]:
def get_bbox(annots):
    """Get a list of bounding boxes coordinates and dimentions by
    excluding key part of their dict and extracting only actual values.

    Parameters
    ----------    
    List of dicts eg.([{'x': 559, 'y': 213, 'width': 50, 'height': 32}])
            Annotations (coordinates and dimentions)

    Returns
    ----------
    bboxes: array, shape = [n,4]
        Bounding boxes in format [xmid, ymid, w, h] 
     """

    #extract bounding box coordinates and dimentions
    bboxes = [list(annot.values()) for annot in annots]
    return bboxes

def read_data():
    """Safely constracts the training dataset(dataframe) from the train.csv.

    Parameters
    ----------    
    None

    Returns
    ----------
    dataset: Pandas Dataframe
        Dataset in the form of a dataframe    
    
    """
    
    #read the train csv
    df_train = pd.read_csv('../input/tensorflow-great-barrier-reef/train.csv')
    df_train['img_path'] = os.path.join('../input/tensorflow-great-barrier-reef/train_images')+"/video_"+df_train.video_id.astype(str)+"/"+df_train.video_frame.astype(str)+".jpg"
    #Safely evaluate annotation encoded string containing a Python expression.
    df_train['annotations'] = df_train['annotations'].apply(lambda x: ast.literal_eval(x))
    df_train['bboxes'] = df_train['annotations'].apply(lambda x: get_bbox(x))
    #get number of bounding boxes
    df_train['Number_bbox'] = df_train['annotations'].apply(lambda x:len(x))
    return df_train

# Call read_data() to load competition data
df_train = read_data()

## Add New Bounding Boxes to Previous Frames
We shift current annotation to previous frame and use frames that have less
annonations than the next as they are the ones most likely to have starfish
show before its actually annotated.

In [None]:
#shift next annotations to previous frame and rename columns to reflect that.
df_shift = df_train.shift(-1).rename(columns={'annotations':'annotations_n1',
                                             'Number_bbox':'Number_bbox_n1',
                                             'img_path':'img_path_n1'})
df_lagged = pd.concat([df_train, df_shift], axis=1)

#identify frames that have less annotations than the next one
#these are the candidates for adding earlier annotations
df_first_frames = df_lagged[df_lagged.Number_bbox < df_lagged.Number_bbox_n1]

In [None]:
df_first_frames.head()

In [None]:
def intersects(rectangle_a, rectangle_b):
    '''Checks for intersection of two rectangles specified as [(x1,y1),(x2,y2)]

    Parameters
    ----------    
    rectangle_a: List eg [(x1,y1),(x2,y2)]
            List of tuples containing coordinates. 

    rectangle_b: List eg [(x1,y1),(x2,y2)]
            List of tuples containing coordinates. 

    Returns
    ----------
    Boolean: True or False
        Returns True if there is intersection and False if there isn't
    '''
    if(rectangle_a[1][0]<rectangle_b[0][0] or rectangle_a[1][1]<rectangle_b[0][1]):
        return False
    elif(rectangle_a[0][0]>rectangle_b[1][0] or rectangle_a[0][1]>rectangle_b[1][1]):
        return False
    else:
        return True
        
def new_bboxes(prev_bboxes, next_bboxes):
    '''Returns the bounding boxes that are deemed new in the next frame by checking 
    the centers of the bounding box in the next frame are not contained in
    one of the previous frame bounding boxes.
    
    Parameters
    ----------

    prev_bboxes: List of dicts eg.([{'x': 559, 'y': 213, 'width': 50, 'height': 32}])
            Annotations from the previous frame
    next_bboxes: List of dicts eg.([{'x': 559, 'y': 213, 'width': 50, 'height': 32}])
            Annotations from the next frame
    
    Returns
    ----------
    new_annots: List of dicts eg.([{'x': 559, 'y': 213, 'width': 50, 'height': 32}])
            New annotations generated  
    
    '''
    new_bbs =[]
    delta_xs = [0]
    delta_ys = [0]
    delta_ws = [0]
    delta_hs = [0]
    for bb in next_bboxes:
        found = False
        for prev_bb in prev_bboxes:
            if intersects([(bb['x'],bb['y']),(bb['x'] + bb['width'],bb['y'] + bb['height'])],
                         [(prev_bb['x'], prev_bb['y']), (prev_bb['x'] + prev_bb['width'], 
                                                         prev_bb['y'] + prev_bb['height'])]
                         ):
                delta_xs.append(bb['x']-prev_bb['x'])
                delta_ys.append(bb['y']-prev_bb['y'])
                delta_ws.append(bb['width']-prev_bb['width'])
                delta_hs.append(bb['height']-prev_bb['height'])
                found = True
                break
        if found == False:
            #exclude margins
            if (bb['x'] > IMAGE_DIM[0]*EXCLUDE_MARGIN) & \
            (bb['x'] < (IMAGE_DIM[0]-IMAGE_DIM[0]*EXCLUDE_MARGIN)) & \
            (bb['y'] > IMAGE_DIM[1]*EXCLUDE_MARGIN) & \
            (bb['y'] < (IMAGE_DIM[1]-IMAGE_DIM[1]*EXCLUDE_MARGIN)):
                new_bb = {'x': bb['x'], 'y': bb['y'], 'width':bb['width'], 'height':bb['height']}
                new_bbs.append(new_bb)
                
    #adjust bounding boxes for avergage drift
    for b in new_bbs:        
        delta_x_avg = sum(delta_xs)/len(delta_xs)
        delta_y_avg = sum(delta_ys)/len(delta_ys)
        delta_w_avg = sum(delta_ws)/len(delta_ws)
        delta_h_avg = sum(delta_hs)/len(delta_hs)
        b['x'] = b['x'] + delta_x_avg
        b['y'] = b['y'] + delta_y_avg
               
    return new_bbs

In [None]:
# create new column to store only bounding boxes that are new on the next frame.
df_first_frames['new_annotations'] = df_first_frames.apply(lambda x: 
                                                            new_bboxes(x['annotations'],
                                                                      x['annotations_n1']),
                                                          axis=1)

In [None]:
#Visualize new bounding boxes
def viz_new_boxes(prev_path, next_path, prev_annots, next_annots, new_annots):  
    """Draws actual/original bboxes(red) and potential new annotations(yellow)

    Parameters
    ----------
    prev_path: String [image]
            Path of the previous frame
    next_path: String [image]
            Path of the next frame
    prev_annots: List of dicts eg.([{'x': 559, 'y': 213, 'width': 50, 'height': 32}])
            Annotations from the previous frame
    next_annots: List of dicts eg.([{'x': 559, 'y': 213, 'width': 50, 'height': 32}])
            Annotations from the next frame
    new_annots: List of dicts eg.([{'x': 559, 'y': 213, 'width': 50, 'height': 32}])
            New annotations generated
    
    """
    #previuos frame
    print(prev_path)
    img = Image.open(prev_path)
    
    #draw red box for the previous annotations
    for box in prev_annots:
        shape = [box['x'], box['y'], box['x']+box['width'], box['y']+box['height']]
        ImageDraw.Draw(img).rectangle(shape, outline ="red", width=3)

    #draw yellow box for the new annotations
    for box in new_annots:
        shape = [box['x'], box['y'], box['x']+box['width'], box['y']+box['height']]
        ImageDraw.Draw(img).rectangle(shape, outline ="yellow", width=3)
    # Display the image
    display(img)    
    
    #next frame
    print(next_path)
    img = Image.open(next_path)
    #On the next frame draw next annotations as red boxes
    for box in next_annots:
        shape = [box['x'], box['y'], box['x']+box['width'], box['y']+box['height']]
        ImageDraw.Draw(img).rectangle(shape, outline ="red", width=3)
    # Display the image
    display(img)

In [None]:
#Get a sample of 10 consecutive frames and apply visualization
for index, row in df_first_frames.sample(10, random_state=12).iterrows():
    viz_new_boxes(row.img_path,
                  row.img_path_n1,
                  row.annotations,
                  row.annotations_n1,
                  row.new_annotations)

All the new box previously maked as yellow(new) show on the next frame as red(original) which means they are not new on that specific frame. This means we have succefully shifted/generated new bounding boxes.

In [None]:
#Get new annotation column to merge with original dataset
df_first_frames_strip = df_first_frames[['new_annotations']]
df_first_frames_strip.head(2)

In [None]:
#Join new annotations with original dataset
df_train_new = df_train.join(df_first_frames_strip)
df_train_new.head(2)

In [None]:
#Replace NaN with empty list [] as in the original dataset
df_train_new['new_annotations'].loc[df_train_new['new_annotations'].isnull()] = df_train_new['new_annotations'].loc[df_train_new['new_annotations'].isnull()].apply(lambda x: []) 

In [None]:
#Merge original annotations with new annotations.
df_train_new['merge_annotations'] = df_train_new.apply(lambda x: (x['annotations'] + x['new_annotations']),axis=1)

In [None]:
df_train_new.head(2)

In [None]:
#Get total number of original annotations
(df_train_new['annotations'].apply(lambda x:len(x))).sum()

In [None]:
#Get total number of new annotations
(df_train_new['new_annotations'].apply(lambda x:len(x))).sum()

In [None]:
#Get total number of annotations combined
(df_train_new['merge_annotations'].apply(lambda x:len(x))).sum()

In [None]:
#Number of new annotations per row
df_train_new['New_number_bbox'] = df_train_new['merge_annotations'].apply(lambda x:len(x))

In [None]:
# Short report of the entire process
prev_box_count = df_train_new['Number_bbox'].sum()
curr_box_count = df_train_new['New_number_bbox'].sum()
prev_frames_with_box_count = df_train_new[df_train_new.Number_bbox >0]['video_id'].count()
curr_frames_with_box_count = df_train_new[df_train_new.New_number_bbox >0]['video_id'].count()
data = (df_train.Number_bbox>0).value_counts()/len(df_train)*100
print("Previous number of bounding boxes: ", prev_box_count)
print("New number of boxes: ", curr_box_count)
print("Number of boxes increase: ", curr_box_count-prev_box_count)
print("Previous number of frames with boxes: ", prev_frames_with_box_count)
print("New number of frames with boxes: ", curr_frames_with_box_count)
print("Number of frames with boxes increase: ", curr_frames_with_box_count-prev_frames_with_box_count)
print("Number of images with and without starfish:\n", f"With BBox: {data[1]:0.2f}% | No BBox: {data[0]:0.2f}%")

There is an increase in the number of annotations, however there is an opportunity to get more
annotations if we shift more frames backwards. [MACITATA](https://www.kaggle.com/bartmaciszewski) (on kaggle) also suggests "Using 'optical flow' techniques to determine where the boxes may have come from better, fit tighter boxes, and project boxes into margins if possible". For now we will conclude.

In [None]:
#Replace original annotations column with merge_annotations
df_train['annotations'] = df_train_new['merge_annotations']

# 🔨 Helper

## Bounding box formatting helpers

The following image illustrates the the positions of the different points on the bounding box which relate to the different bounding box formats.
<div style="text-align: center;">
    <img src="https://i.ibb.co/LQL8C7j/bounding-boxes.png" style="display: block; 
           margin-left: auto;
           margin-right: auto;
           width: 40%;"/>
</div>

In [None]:
def voc2yolo(bboxes, image_height=720, image_width=1280):
    """Converts bounding boxes from 'voc' (Visual Object Classes) formatting to 'yolo' (You Only Look Once) formatting.
    The voc formatting for bounding boxes is [xmin, ymin, xmax, ymax], where (xmin,ymin) and (xmax,ymax) are 
    the pixel coordinates of the top-left and bottom-right corners of a bounding box, respectively. 
    The yolo formating is [xmid, ymid, w, h] where (xmid, ymid) is the position of the center of the bounding box,
    in coordiates normalised over the image size. w and h represent the normalised width and height of the image. 
    The default image size of the images in this training data is 1280x720 pixels.
    
    Parameters
    ----------
    bboxes: array_like, shape = [n,4]
        Bounding boxes in voc format [xmin, ymin, xmax, ymax]
    image_height: int, default 720
        Height of images (Pixel height)
    image_width: int, default 1280
        Width of images (Pixel width)
        
    Returns
    ----------
    bboxes: array, shape = [n,4]
        Bounding boxes in yolo format [xmid, ymid, w, h]
    """
    bboxes = bboxes.copy().astype(float) 
    
    bboxes[..., [0, 2]] = bboxes[..., [0, 2]]/ image_width
    bboxes[..., [1, 3]] = bboxes[..., [1, 3]]/ image_height
    
    w = bboxes[..., 2] - bboxes[..., 0] 
    h = bboxes[..., 3] - bboxes[..., 1]
    
    bboxes[..., 0] = bboxes[..., 0] + w/2
    bboxes[..., 1] = bboxes[..., 1] + h/2
    bboxes[..., 2] = w
    bboxes[..., 3] = h
    
    return bboxes

def yolo2coco(bboxes, image_height=720, image_width=1280):
    """Converts bounding boxes from 'yolo' (You Only Look Once) formatting to 'coco' (Common Pbjects in Context) formatting.
    The yolo formating is [xmid, ymid, w, h] where (xmid, ymid) is the position of the center of the bounding box,
    in coordiates normalised over the image size. w and h represent the normalised width and height of the image. 
    The coco formatting for bounding boxes is [xmin, ymin, w, h], where (xmin,ymin) is the pixel coordinate of 
    the top-left corner of a bounding box.  w and h represent the width and height of the image in number of pixels.
    The default image size of the images in this training data is 1280x720 pixels.
    
    Parameters
    ----------
    bboxes: array, shape = [n,4]
        Bounding boxes in yolo format [xmid, ymid, w, h]
    image_height: int, default=720
        Height of images (Pixel height)
    image_width: int, default=1280
        Width of images (Pixel width)
        
    Returns
    ----------
    bboxes: array, shape = [n,4]
        Bounding boxes in coco format [xmin, ymin, w, h] 
    """
    bboxes = bboxes.copy().astype(float)
    
    bboxes[..., [0, 2]]= bboxes[..., [0, 2]]* image_width
    bboxes[..., [1, 3]]= bboxes[..., [1, 3]]* image_height
    
    # converstion (xmid, ymid) => (xmin, ymin) 
    bboxes[..., [0, 1]] = bboxes[..., [0, 1]] - bboxes[..., [2, 3]]/2
    
    return bboxes


def voc2coco(bboxes, image_height=720, image_width=1280):
    """Converts bounding boxes from 'voc' (Visual Object Classes) formatting to 'coco' (Common objects in Context) formatting.
    The voc formatting for bounding boxes is [xmin, ymin, xmax, ymax], where (xmin,ymin) and (xmax,ymax) are 
    the pixel coordinates of the top-left and bottom-right corners of a bounding box, respectively. 
    he coco formatting for bounding boxes is [xmin, ymin, w, h], where (xmin,ymin) is the pixel coordinate of 
    the top-left corner of a bounding box.  w and h represent the width and height of the image in number of pixels. 
    The default image size of the images in this training data is 1280x720 pixels.
    
    Parameters
    ----------
    bboxes: array, shape = [n,4]
        Bounding boxes in voc format [xmin, ymin, xmax, ymax]
    image_height: int, default 720
        Height of images (Pixel height)
    image_width: int, default 1280
        Width of images (Pixel width)
        
    Returns
    ----------
    bboxes: array, shape = [n,4]
        Bounding boxes in coco format [xmin, ymin, w, h]
    """
    bboxes  = voc2yolo(bboxes, image_height, image_width)
    bboxes  = yolo2coco(bboxes, image_height, image_width)
    return bboxes



## Plotting helpers
The following functions can be used to plot the bounding boxes overlaying them on the original image.

In [None]:
#Randomly select a color for bounding box plots
np.random.seed(32)
colors = [(np.random.randint(255), np.random.randint(255), np.random.randint(255)) for idx in range(1)]


def plot_one_box(x, img, color=None, label=None, line_thickness=None):
    """ Plots one bounding box on image img using cv2.rectangle.
    The color, label and line thickness of the rectangular box can also be customised.
    
    Parameters
    ----------
    x: list, shape = [4]
     Bounding box in voc format [xmin, ymin, xmax, ymax]
    img: PIL
        Image, for example, cv2.imread(image_path)
    color: list, shape = [3], default=None
        Color of rectangular bounding box
    label: string, default=None
        Class label of bounding box
    line_thickness: float, default=None
        Line thinkness of bounding box
    """
    
    tl = line_thickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1  # line/font thickness
    color = color or [random.randint(0, 255) for _ in range(3)]
    c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
    cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
    if label:
        tf = max(tl - 1, 1)  
        t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
        c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
        cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA) 
        cv2.putText(img, label, (c1[0], c1[1] - 2), 0, tl / 3, [225, 255, 255], thickness=tf, lineType=cv2.LINE_AA)
        

def draw_bboxes(img, bboxes, classes, class_ids, colors = None, show_classes = None, bbox_format = 'yolo', class_name = False, line_thickness = 2):  
    """ Plots bounding boxes on image img.
    Bounding box input format should be correctly specified. The color, label and line thickness of boxes can also be customised.
    
    Parameters
    ----------
    img: PIL
        Image, for example, cv2.imread(image_path)
    bboxes: array, shape=[n,4]
        Bounding boxes
    classes: list of strings, shape=[n]
        Name of classe associated with each bounding box
    class_ids: list of ints, shape=[n]
        Id of class associated with each bounding box
    color: list, shape = [3], default=None
        Color of rectangular bounding box
    show_classes: list of strings, shape=[n], default=None
        List of alternative class names
    bbox_format: string, default='yolo'
        Format of bounding boxes, options are 'yolo', 'coco' and 'voc_pascal'
    class_name: boolian, default=False
        Chose to show class name or id.
    line_thickness: float, default=None
        Line thinkness of bounding box
        
    Return
    ----------
    image: PIL
        Image overlayed with labeled bounding boxes.
    """
    
    image = img.copy()
    show_classes = classes if show_classes is None else show_classes
    colors = (0, 255 ,0) if colors is None else colors
    
    if bbox_format == 'yolo':
        
        for idx in range(len(bboxes)):  
            
            bbox  = bboxes[idx]
            cls   = classes[idx]
            cls_id = class_ids[idx]
            color = colors[cls_id] if type(colors) is list else colors
            
            if cls in show_classes:
            
                x1 = round(float(bbox[0])*image.shape[1])
                y1 = round(float(bbox[1])*image.shape[0])
                w  = round(float(bbox[2])*image.shape[1]/2) #w/2 
                h  = round(float(bbox[3])*image.shape[0]/2)

                voc_bbox = (x1-w, y1-h, x1+w, y1+h)
                plot_one_box(voc_bbox, 
                             image,
                             color = color,
                             label = cls if class_name else str(get_label(cls)),
                             line_thickness = line_thickness)
            
    elif bbox_format == 'coco':
        
        for idx in range(len(bboxes)):  
            
            bbox  = bboxes[idx]
            cls   = classes[idx]
            cls_id = class_ids[idx]
            color = colors[cls_id] if type(colors) is list else colors
            
            if cls in show_classes:            
                x1 = int(round(bbox[0]))
                y1 = int(round(bbox[1]))
                w  = int(round(bbox[2]))
                h  = int(round(bbox[3]))

                voc_bbox = (x1, y1, x1+w, y1+h)
                plot_one_box(voc_bbox, 
                             image,
                             color = color,
                             label = cls if class_name else str(cls_id),
                             line_thickness = line_thickness)

    elif bbox_format == 'voc_pascal':
        
        for idx in range(len(bboxes)):  
            
            bbox  = bboxes[idx]
            cls   = classes[idx]
            cls_id = class_ids[idx]
            color = colors[cls_id] if type(colors) is list else colors
            
            if cls in show_classes: 
                x1 = int(round(bbox[0]))
                y1 = int(round(bbox[1]))
                x2 = int(round(bbox[2]))
                y2 = int(round(bbox[3]))
                voc_bbox = (x1, y1, x2, y2)
                plot_one_box(voc_bbox, 
                             image,
                             color = color,
                             label = cls if class_name else str(cls_id),
                             line_thickness = line_thickness)
    else:
        raise ValueError('wrong bbox format')

    return image


def show_img(img, bboxes, bbox_format='yolo'):
    """Show image with overlayed plots of bounding boxes of all starfish detected.
    
    Parameters
    ----------
    img: PIL
        Original image of reef.
    bboxes: array, shape=[n,4]
        Bounding boxes of detected starfish.
    bbox_format: string, default='yolo'
        Format of bounding boxes, options are 'yolo', 'coco' and 'voc_pascal'
        
    Returns
    ----------
    Image: PIL
        Image overlayed with bounding boxes
    """
    names  = ['starfish']*len(bboxes)
    labels = [0]*len(bboxes)
    img    = draw_bboxes(img = img,
                           bboxes = bboxes, 
                           classes = names,
                           class_ids = labels,
                           class_name = True, 
                           colors = colors, 
                           bbox_format = bbox_format,
                           line_thickness = 2)
    return Image.fromarray(img).resize((800, 400))

## Tracker helpers
The following function is a tracking helper which convert detected objects as predicted by the yolov5 model into detections compatible with the norfair library. The norfair library assits with identifying starfish based on detections from a previous image frame. 

In [None]:
# Import norfair library
from norfair import Detection, Tracker

def detection2norfair(detects, frame_id):
    """Convert prediction in the form of [xmin, ymin, xmax, ymax, score] to norfair.Detection class.
    
    Parameters
    ----------
    detects: list, shape=[n,5]
        Detections made in the format [xmin, ymin, xmax, ymax, score]
    frame_id: int
        Image frame id
        
    Returns
    ----------
    result: list, shape=[n,3]
        List of decections in the norfair.Detection class format.
    """
    result = []
    for xmin, ymin, xmax, ymax, score in detects:
        xc, yc = (xmin + xmax) / 2, (ymin + ymax) / 2
        w, h = xmax - xmin, ymax - ymin
        result.append(Detection(points=np.array([xc, yc]), scores=np.array([score]), data=np.array([w, h, frame_id])))
        
    return result



The helper function 'euclidean_distance' is passed to the Tracker and computes the distance between detected objects in order to match detections in subsequent frames.

In [None]:
def euclidean_distance(detection, tracked_object):
    """Compute the euclidean distance between two onjects.
    Function used to match detections on this frame with tracked_objects from previous frames.
    
    Parameters
    ----------
    detections: object
        Object detected and save in norfair.Detection class.
    tracked_object: norfair.Tracked_object
        Object being tracked.
        
    Returns
    ----------
    n: float
        Euclidean datance between the detected object and the estimated position of the tracked object.
    """
    n = np.linalg.norm(detection.points - tracked_object.estimate)
    return n

##  Yolov5 pipline
Load yolov5 weights and configurations and create a prediction pipeline, including tracking of starfish over consecutive image frames. 

In [None]:
# Make configuration directory
!mkdir -p /root/.config/Ultralytics
# Copy yolo font to configuration directory
!cp /kaggle/input/yolov5-font/Arial.ttf /root/.config/Ultralytics/

def load_model(checkpoint_path, conf=0.28, iou=0.40):
    """Load yolov5 model
    
    Parameters
    ----------
    checkpoint_path: string
        Path to weights stored in pt file
    conf: float, default=0.28
        Confidence threshold
    iou: float, default=0.40
        IoU (intersection over union) threshold
        
    Returns
    ----------
    model: 
        The output of the callable model.
    """
    
    model = torch.hub.load('/kaggle/input/yolov5-lib-ds',
                           'custom',
                           path=checkpoint_path,
                           source='local',
                           force_reload=True)  
    model.conf = conf  
    model.iou  = iou 
    model.classes = None   
    model.multi_label = False 
    model.max_det = 20
    
    return model

def predict(model, img, size=9000, augment=False):
    """Perform object detection of image img using model.
    Predict the postion of objects in an image and return the bounding boxes
    and corresponding confidences.
    
    Parameters
    ----------
    model: object
        Callable yolov5 model. 
    img: PIL
        Image on which object detection will be performed
    size: int, default=9000
        Image size 
    augement: bool, default=False
    
    Returns
    ----------
    bboxes: list, shape=[n,4]
        Bounding poxes of predicted objects
    confs: list, shape=[n]
        Confidence in prediction
    """
    
    height, width = img.shape[:2]
    results = model(img, size=size, augment=augment)
    preds   = results.pandas().xyxy[0]
    bboxes  = preds[['xmin','ymin','xmax','ymax']].values
    if len(bboxes):
        bboxes  = voc2coco(bboxes,height,width).astype(int)
        confs   = preds.confidence.values
        return bboxes, confs
    else:
        return [],[]
    
def format_prediction(bboxes, confs):
    """Format predictions of bounding boxes, in 'coco' format, and confidences/scores according to
    the kaggle competion submission format.
    
    Parameters
    ----------
    bboxes: list, shape=[n,4]
        Bounding boxes of predicted objects in 'coco' format.
    conf: list, shape=[n]
        Confidence in prediction
    
    Returns
    ----------
    annot: string
        Annotations in kaggle competion submission format, 'confidence'+' '+'xmin'+' '+'ymin'+' '+'width'+' '+'height'.
    """
    annot = ''
    if len(bboxes)>0:
        for idx in range(len(bboxes)):
            xmin, ymin, w, h = bboxes[idx]
            conf             = confs[idx]
            annot += f'{conf} {xmin} {ymin} {w} {h}'
            annot +=' '
        annot = annot.strip(' ')
    return annot


In [None]:
def tracking_function(tracker, frame_id, bboxes, scores):
    """Uses objects loacted in previous frame to identify objects in current frame.
    Appends list of detections with detections predicted by norfair tracker based on detections
    made in a previous frame. Score and bounding box information is provides by tracker. 
    
    Parameters
    ----------
    tracker: object
        norfair tracker
    frame_id: int
        Image frame id
    bboxes: list, shape=[n,4]
        Bounding boxes of predicted objects in 'coco' format.
    scores: list, shape=[n]
        Confidence in prediction.
    
    Returns
    ----------
    predictions: list of string, shape=[n]
        Preditions made by model appended with those predictions made by norfair tracker
        with format 'score'+' '+'xmin'+' '+'ymin'+' '+'width'+' '+'height' (coco format).
    """
    detects = []
    predictions = []
    
    if len(scores)>0:
        for i in range(len(bboxes)):
            box = bboxes[i]
            score = scores[i]
            x_min = int(box[0])
            y_min = int(box[1])
            bbox_width = int(box[2])
            bbox_height = int(box[3])
            detects.append([x_min, y_min, x_min+bbox_width, y_min+bbox_height, score])
            predictions.append('{:.2f} {} {} {} {}'.format(score, x_min, y_min, bbox_width, bbox_height))

    tracked_objects = tracker.update(detections=detection2norfair(detects, frame_id))
    for tobj in tracked_objects:
        bbox_width, bbox_height, last_detected_frame_id = tobj.last_detection.data
        if last_detected_frame_id == frame_id: 
            continue
        
        xc, yc = tobj.estimate[0]
        x_min, y_min = int(round(xc - bbox_width / 2)), int(round(yc - bbox_height / 2))
        score = tobj.last_detection.scores[0]

        predictions.append('{:.2f} {} {} {} {}'.format(score, x_min, y_min, bbox_width, bbox_height))
        
    return predictions

# 🔭 Inference on **Train**

In the following section the model is tested on a sample of five randomly selected images from the competition data. The predictions are plotted.

## Model parameters

In [None]:
# Path to weights
CKPT_PATH = '../input/yolov5s6/f2_sub2.pt'
# Image size
IMG_SIZE  = 6400
# Confidence threshold
CONF      = 0.30
# IoU threshold
IOU       = 0.50
# Are the images augmented
AUGMENT   = False

In [None]:
%matplotlib inline

# Initialize norfair tracker fucntion
tracker = Tracker(
    distance_function=euclidean_distance, 
    distance_threshold=30,
    hit_inertia_min=3,
    hit_inertia_max=6,
    initialization_delay=1,
)

# Load yolov5 model
model = load_model(CKPT_PATH, conf=CONF, iou=IOU)

# Fetch copy of dataframe
df = df_train.copy()
df['image_path'] = '/kaggle/input/tensorflow-great-barrier-reef/train_images/video_'+df.video_id.astype(str)+'/'+df.video_frame.astype(str)+'.jpg'

# Select a random sample of 5 consecutive images containing starfish
df_sample = df.iloc[1:6, :]
while df_sample.Number_bbox.sum()<len(df_sample):
    nrows = range(df.shape[0])
    ix = random.randint(nrows.start, nrows.stop-5)
    df_sample = df.iloc[ix:ix+5, :]

# Create a list of the paths to the selected images
image_paths = df_sample.image_path.tolist()

# Predict starfish locations and plot side-by-side with actual starfish locations
frame_id = df_sample.iloc[0].video_frame
for idx, path in enumerate(image_paths):
    img = cv2.imread(path)[...,::-1]

    # Predict bounding boxes and confidences
    bboxes, confis = predict(model, img, size=IMG_SIZE, augment=AUGMENT)
    # Use norfair tracker to add predictions based on previous image frames
    predict_box = tracking_function(tracker, frame_id, bboxes, confis)
    
    # Split predict_box strings and convert string to int
    if len(predict_box)>0:
        box = [list(map(int,box.split(' ')[1:])) for box in predict_box]
    else:
        box = []
    
    # Plot results
    fig, axs = plt.subplots(1, 2, figsize=(15, 10))
    fig.tight_layout()
    bboxes_true = df_sample[(df_sample['image_path']==path)]['bboxes'].to_numpy()
    axs[0].imshow(show_img(img, bboxes_true[0], bbox_format='coco'))
    axs[0].set_title('True location of COTS')
    axs[1].imshow(show_img(img, box, bbox_format='coco'))
    axs[1].set_title('Predicted loaction of COTS')
    
    # Break if more than 5 images are sampled
    if idx>5:
        break
    frame_id += 1

# 🏆 Submission

## Initalize `Env`

In [None]:
# Import competition library
import greatbarrierreef
# Initialize the environment
env = greatbarrierreef.make_env()
# Iterator which loops over the test set and sample submission
iter_test = env.iter_test()      

# Change to working directory on Kaggle
%cd ../working

## 🔭 Run Inference on **Test**

In [None]:
# Assume all frames are consecutive and add one to frame_id after each iteration
frame_id =0
# Loop over iter_test iterator
for idx, (img, pred_df) in enumerate(tqdm(iter_test)):
    # Predict bounding boxes and confidences
    bboxes, confs  = predict(model, img, size=IMG_SIZE, augment=AUGMENT)
    # Use norfair tracker to add predictions based on previous image frames
    predictions = tracking_function(tracker, frame_id, bboxes, confs)
    
    # Join predictions strings in a single string
    prediction_str = ' '.join(predictions)
    # Add predictions to pred_df dataframe
    pred_df['annotations'] = prediction_str
    env.predict(pred_df)
    # Show the first images
    if frame_id < 3:
        if len(predict_box)>0:
            box = [list(map(int,box.split(' ')[1:])) for box in predictions]
        else:
            box = []
        display(show_img(img, box, bbox_format='coco'))
        
    frame_id += 1

# 🐠 Check Submission

In [None]:
sub_df = pd.read_csv('submission.csv')
sub_df.head()

# 👀References

* [More Annotations!](https://www.kaggle.com/bartmaciszewski/more-annotations)
* [Only yolov5 + tracking](https://www.kaggle.com/kocha1/only-yolov5-tracking-lb-642)
* [Yolov5 infer](https://www.kaggle.com/freshair1996/leon-v5-infer-2-0)
* Zhang, Minghua, et al. "Lightweight Underwater Object Detection Based on YOLO v4 and Multi-Scale Attentional Feature Fusion." Remote Sensing 13.22 (2021): 4706.

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=e93dcea3-c9f3-481b-b801-91715891ffa5' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>