
# <center>[Tensorflow - Help Protect the Great Barrier Reef](https://www.kaggle.com/c/tensorflow-great-barrier-reef)</center>
> <center>Detecting crown-of-thorns starfish in underwater image data</center>

<center><img src="https://storage.googleapis.com/kaggle-competitions/kaggle/31703/logos/header.png?t=2021-10-29-00-30-04" ></center>

<center><h1>Report: Team ReefSave</h1></center>

<h1>Introduction</h1>
    
🐠 This notebook is submitted to the Kaggle competition: TensorFlow – Help Protect the Great Barrier Reef that run between November 22, 2021 and February 14, 2022. 

🐠 The goal of the competition is to accurately identify starfish in real-time by building an object detection model trained on underwater videos of coral reefs.

🐠 This will help researchers identify species that are threatening Australia's Great Barrier Reef and take well-informed action to protect the reef for future generations.

🐠 Additional information covering Description, Evaluation, Timeline, Prizes and Code Requirements can be found at https://www.kaggle.com/c/tensorflow-great-barrier-reef/overview

🐠 This notebook is also submitted as report for the first module project of the [Africa DSI program 2022](http://dsi-program.com/).

<h1>Artificial Learning (AI), Machine Learning (ML) and Deep Learning (DL)</h1>

The approach used for this project is YOLO, an algorithm that employs convolutional neural networks (CNN) to detect objects in real-time.

CNN is a type of Deep Learning (DL) algorithm most commonly used to analyze visual imagery.

*Source: [Introduction to Convolutional Neural Networks (CNN)](https://www.analyticsvidhya.com/blog/2021/05/convolutional-neural-networks-cnn/#:~:text=In%20deep%20learning%2C%20a%20convolutional,applied%20to%20analyze%20visual%20imagery.&text=It%20uses%20a%20special%20technique%20called%20Convolution/)*.

<div align="center"><img src="https://flatironschool.com/legacy-assets/images.ctfassets.net/hkpf2qd2vxgx/235ViW0mhGaFw3bjXUrUyG/35d7a4312bb78fc47a644877ac01c6ea/BlogGraphics-machinnelearning-dark-09__1_.png" width=700>

<center><h5>Relationship between Artificial Intelligence, Machine Learning and Deep Learning</h5></center>

## What is CNN?
​
🐠 A Convolutional Neural Network (ConvNet/CNN) works by assigning importance (learnable weights and biases) to various aspects/objects of an image with the ability to differentiate one from the other.
The pre-processing required in a ConvNet is much lower as compared to other classification algorithms. 

🐠 CNN is analogous to the connectivity pattern of neurons in the human brain and was inspired by the organization of the Visual Cortex. Individual neurons respond to stimuli only in a restricted region of the visual field known as the Receptive Field. A collection of such fields overlap to cover the entire visual area.

🐠 CNN uses multiple layers of artificial neurons or mathematical functions that calculate the weighted sum of multiple inputs and outputs an activation value. Each layer generates several activation functions that are passed on to the next layer.

🐠 The first layer usually extracts basic features such as horizontal or diagonal edges.
This output is passed on to the next layer which detects more complex features such as corners or combinational edges. As we move deeper into the network it can identify even more complex features such as objects, faces, etc. Based on the activation map of the final convolution layer, the classification layer outputs a set of confidence scores (values between 0 and 1) that specify how likely the image is to belong to a “class”.

<div align="center"><img src="https://miro.medium.com/max/1400/1*Xn14QMJ7pzusY68MW9m8pQ.png
" width=400>
​
#  
<div align="center"><img src="https://miro.medium.com/max/700/1*qtinjiZct2w7Dr4XoFixnA.gif" width=400>
    
<center><h5>How CNN works</h5></center>

References: 

*[A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way](https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53#:~:text=A%20Convolutional%20Neural%20Network%20(ConvNet,differentiate%20one%20from%20the%20other.)*

*[Introduction to Convolutional Neural Networks (CNN)](https://www.analyticsvidhya.com/blog/2021/05/convolutional-neural-networks-cnn/#:~:text=Convolutional%20neural%20networks%20are%20composed,and%20outputs%20an%20activation%20value.)*

*[Convolutional Neural Networks](https://www.sciencedirect.com/topics/engineering/convolutional-neural-networks)*

## YOLO

🐠 **YOLO**: The acronym stands for ‘You Only Look Once’, a reference to the fact that the algorithm requires only a single forward pass through a neural network to identify objects

🐠 **What is it?** YOLO it's a very simple and fast algorithm that recognizes objects within an image in real time. It is made up by a single CNN and requires only one forward pass through the neural network in order to identify the objects.

🐠 **How does it work?**

1. The image is split in a grid that has the same dimension for each "tile".
2. Bounding boxes that identify each object are added. The bbox has the following format: `[width, height, class, bx, by]`, where `[bx, by]` represents the center of the object.
3. Intersection Over Union: this technique is used so the bounding box "catches" the object fully (and doesn't leave any part of it uncovered, neither it is too large for the object). The `IOU=1` if the predicted and actual box are identical.

<center><img src="https://www.section.io/engineering-education/introduction-to-yolo-algorithm-for-object-detection/bounding-box.png" width=600 height=300></center>


<center><img src="https://i.imgur.com/Ce1sfqj.png" width=600></center>

    
🐠 **Why YOLO?** 
    
YOLO has gained popularity in computer vision for the following reasons:
1. Speed: Currently, YOLO is typically faster than alternative algorithms

2. High accuracy: YOLO as a predictive technique, provides accurate results with minimal background errors.

3. Learning capabilities: The algorithm has excellent learning capabilities that enable it to learn the representations of objects and apply them in object detection.

A comparison between YOLOv5, ResNet and Faster RCNN using the competition project data and base parameters showed that YOLOv5 performs better in both speed and mean average precision (MAP). The results can be found in [here](https://github.com/denniesbor/KAGGLE-PROTECT-THE-GREAT-BARRIER-REEF)

🐠 **Implementing YOLO?** 
    
The version used for this project is YOLOv5. This requires cloning the official repository and setting up the dependencies required to run YOLO v5. 

YOLOv5 repository: https://github.com/ultralytics/yolov5

The steps for implementing YOLOv5 are:

1. Set up the Code
2. Download the Data
3. Convert the Annotations into the YOLO v5 Format
 >YOLO v5 Annotation Format
 
 >Testing the annotations
 
 >Partition the Dataset
4. Training Options
 >Data Config File
 
 >Hyper-parameter Config File
 
 >Custom Network Architecture
 
 >Train the Model
5. Inference
 >Computing the mAP on test dataset
6. Conclusion... and a bit about the naming saga


🐠 **Data Format**
> 3 main inputs are necessary for YOLOv5
1. Set of training images

2. Annotation files in .txt format. Each .txt file contains the annotations for the corresponding image file, that is object class, object coordinates, height and width.

3. YAML file containing model configuration and class values.


*References*

*[How to Train YOLO v5 on a Custom Dataset](https://blog.paperspace.com/train-yolov5-custom-data/)*

*[Deep Learning vs. Machine Learning — What’s the Difference?](https://flatironschool.com/blog/deep-learning-vs-machine-learning/)*

*[Introduction to YOLO Algorithm for Object Detection](https://www.section.io/engineering-education/introduction-to-yolo-algorithm-for-object-detection/)*

# Literature

The choice of a neural network is dependent on the available software and hardware resources, speed ,and the expected accuracy. Object detection networks are classified as multi-stage or single stage. 

Examples of single staged neural nets are the SSD, YOLO, etc. The multi staged approaches uses the region proposal networks in their architectures to extract feature maps from the backbone. Examples of multi stage networks are the RCNN and RFCN.

<h3> <strong>Architecture of a neural network</strong></h3>

Object detection nets consists of the input, backbone, neck and the head. The input takes in an image, and it outputs to a feature extractor consisting of dense convolution and max pooling layers. Residual Network(ResNet), ResNext,DenseNet, VGG16 etc. are the commonly used backbones. They are trained on standardized datasets such as [COCO](https://cocodataset.org/#home) or [ImageNet](https://image-net.org).
<br/>
The role of the neck is to extract feature maps e.g the Feature Pyramid Network. The head of a single stage network is dense prediction layer and sparse prediction for a two stage detector(i.e RCNN & RFCN)
<br />
![](https://github.com/denniesbor/KAGGLE-PROTECT-THE-GREAT-BARRIER-REEF/blob/e0c3e4253d8fd6a867252cab1feb3bab3d80f377/object_detection_arch.png?raw=true)
<br />
[**Figure 1.** Schematic representation of a single and multi stage neural network. Source: [Ultralytics](https://arxiv.org/pdf/1611.10012.pdf) ]

<h3> <strong>The choice of a neural network.</strong></h3>

Computational resources determine the amount of time spent on training and inference. GPU and TPU runtime accelerate the training as well as the inference time. The computational resource demand differ from one model to another.

Speed is key in a real-time object detection system or video search engines.  A balance of speed and resource requirements  is considered to achieve optimal performance.

The implementation of the minimum viable product for the module one was based on the performance of the Faster R-CNN ResNet Inception, Yolov4 and Yolov5 on pre-processed TensorFlow-Protect the great barrier datasets.

<h3><strong>Yolo(Single stage)</strong></h3>

Yolo is a single stage state of the art object detection algorithm. There are 4 documented versions of YOLO and the fifth version designed by Ultralytics team. [YOLO](https://github.com/ultralytics/yolov5) is described as a YOLOv4 implementation in Pytorch.
Compared with other algorithms, YOLO5 perfoms exceptionally well with a less GPU time.

According to [Huang,et al](https://arxiv.org/pdf/1611.10012.pdf) YOLO v4 attains a mean average precision of 43.5 running on a Tesla V100 GPUs while training on Common Objects in Context datasets. The neck of YOLO4 uses SPP and PAN.
<br />
![yolo](https://github.com/denniesbor/KAGGLE-PROTECT-THE-GREAT-BARRIER-REEF/blob/assets/Yolov5_performance.png?raw=true)
<br/>
[**Figure 2.** Average precision vs GPU speed of *YOLO5* weights against *EfficientDet* on . on [COCO](https://cocodataset.org/#home) datasets. Source: [Ultralytics](https://github.com/ultralytics/yolov5) ]

### What are Bag of Freebies and Bag of Specials?

They define the inference - training trade-off of a model. The bag of freebies are the methods applied to the model and which does not interfere with inference. Some of these methods include the data augmentation, regularization techniques e.g., dropout, drop-connect and drop-block.

The bag of freebies are the methods which improve the accuracy of the model by at the expense of inference costs. These methods introduce attention mechanisms. SPP is an example of this feature and is applied in YOLOv4.

<h3> <strong>Faster R-CNN(Multi stage)</strong></h3>

R-CNN models is a multi layered conv neural network and consists of the feature extractor, a region proposal algorithm to generate bounding boxes, a regression and classification layer. R-CNNs tradeoff their speed for accuracy. 

In Faster R-CNN, Region Proposal Network generation is not CPU restricted compared to the previous flavours of region convolution neural network.
<br />
![](https://github.com/denniesbor/KAGGLE-PROTECT-THE-GREAT-BARRIER-REEF/blob/assets/feature_extractor_acc.png?raw=True)
<br />
[**Figure 3.** Mean average precision against backbone accuracy of Faster R-CNN, R-FCN and SSD]
<h2> <strong>MVP - Performance comparison of YOLOv4, YOLOv5 and R-CNN</strong></h2>
The TensorFlow- Save the Great Barrier mvp is implemented using Faster R-CNN, YOLO4 and YOLO5 default tuning parameters. Performance analysis of the three models is done using their mean average precision. Faster RCNN runs on Resnet Inception backbone, whereas YOLO4 is built on darknet.





# Exploratory Data Analysis (EDA)

Import Libraries

In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import cv2
import os
import seaborn as sns
import ast
import time

Check Directory

In [6]:
!ls ../input/tensorflow-great-barrier-reef

Load Data to dataframe from csv

In [7]:

df = pd.read_csv('../input/tensorflow-great-barrier-reef/train.csv')
df.head()

**Explore data**
EDA from https://www.kaggle.com/kartik2khandelwal/data-analysis-and-prediction/notebook

Check shapes for column and rows, and data types of columns

In [8]:
df.describe()

In [9]:
df.info()

In [10]:
print(df.shape)

In [11]:
print(df.dtypes)

Check for null values
 - No null value found

In [12]:
null_count = df.isnull().sum()
print(null_count)

Count number of unique images
 - 23501 unique images

In [13]:
df.image_id.count()

Number of frames in one video
- 10688 frames per video

In [14]:
df.video_frame.unique()
print(len(pd.unique(df['video_frame'])))

Maximum frame value in any of the videos

In [15]:
df.video_frame.max()

In [16]:
df['image.path'] = os.path.join('../input/tensorflow-great-barrier-reef/train.csv')+"/video_"+df.video_id.astype(str)+"/"+df.video_frame.astype(str)+".jpg"

In [17]:
df.head()

Check number of images in each video

In [18]:
list(df['video_id'].value_counts())

Plot of number of images in each video

In [19]:
plt.figure(figsize=(8,5))
sns.countplot(df['video_id'], color='#2196F3')

Check how many images have labels (bounding boxes) and whether all contain "[]"

In [20]:
with_annotation = len(df[df['annotations'] != '[]'])
without_annotation = len(df[df['annotations'] == '[]'])

Check images with annotations
Verify that all annotations are bounded by []

In [21]:
#
annotated=df[df['annotations'] != '[]']
# see format of annotations
print(annotated['annotations'])
S=pd.Series(annotated['annotations'])

print(S.str.count(r'(!^\[.*])').sum())
print(S.str.count(r'(^\[.*])').sum())


Check if data is balanced
- Data is inbalanced

In [22]:

labels = ('Without Bounding Box', 'With Bounding Box')
y_pos = np.arange(len(labels))
count_annotations = [without_annotation, with_annotation]

plt.bar(y_pos, count_annotations, align='center', color='#2196F3', alpha=1.0)
plt.xticks(y_pos, labels)
plt.ylabel('Count')
plt.title('labels')

plt.show()

There can be more than one annotation in one frame

First test the format of annotations

In [23]:
print(len(df['annotations']))
print(len(df[df['annotations'].apply(lambda x:len(str(x))) > 2]))
print(len(df[df['annotations'].apply(lambda x:len(str(x))) > 50]))
print(df[df['annotations'].apply(lambda x:len(str(x))) > 500]['annotations'].iloc[0])


Count number of annotations per image

In [24]:
df['sum_annotations'] = df['annotations'].apply(lambda x: x.count('{'))
#print(df['annotations'][9292])
print(df[df['sum_annotations']>11])


Plot of number of annotations per image
- Range between 1 and 17
- Gives indication of outliers
- Over 50% of images with annotations have only 1 annotation

In [25]:
fig = px.bar(df['sum_annotations'].value_counts().drop(0), title='Count of Bounding Boxes', width=700)

fig.show()

# Model Training


🐠 Model training started on kaggle platform 

🐠 Optimized on AWS

🐠 The notebook for the model training is located at

🐠 Note that directories to data are different

🐠 The code for the model training is below

# 🛠 Install Libraries

In [26]:
!pip install -qU wandb
!pip install -qU bbox-utility # check https://github.com/awsaf49/bbox for source code

# 📚 Import Libraries

In [27]:
import numpy as np
from tqdm.notebook import tqdm
tqdm.pandas()
import pandas as pd
import os
import cv2
import matplotlib.pyplot as plt
import glob

import shutil
import sys
sys.path.append('../input/tensorflow-great-barrier-reef')

from joblib import Parallel, delayed

from IPython.display import display

# 📌 Key-Points
* One have to submit prediction using the provided **python time-series API**, which makes this competition different from previous Object Detection Competitions.
* Each prediction row needs to include all bounding boxes for the image. Submission is format seems also **COCO** which means `[x_min, y_min, width, height]`
* Copmetition metric `F2` tolerates some false positives(FP) in order to ensure very few starfish are missed. Which means tackling **false negatives(FN)** is more important than false positives(FP). 
$$F2 = 5 \cdot \frac{precision \cdot recall}{4\cdot precision + recall}$$

# ⭐ WandB
<img src="https://camo.githubusercontent.com/dd842f7b0be57140e68b2ab9cb007992acd131c48284eaf6b1aca758bfea358b/68747470733a2f2f692e696d6775722e636f6d2f52557469567a482e706e67" width=600>

Weights & Biases (W&B) is MLOps platform for tracking our experiemnts. We can use it to Build better models faster with experiment tracking, dataset versioning, and model management. Some of the cool features of W&B:

* Track, compare, and visualize ML experiments
* Get live metrics, terminal logs, and system stats streamed to the centralized dashboard.
* Explain how your model works, show graphs of how model versions improved, discuss bugs, and demonstrate progress towards milestones.


In [28]:
import wandb
#%env WANDB_NOTEBOOK_NAME 'DSI_reef/Nmeso/Nmeso_great-barrier-reef-yolov5-train.ipynb'
#%env WANDB_API_KEY="2bee544001fe452e298497e9e54c59a76ebe1563"
try:
    from kaggle_secrets import UserSecretsClient
    user_secrets = UserSecretsClient()
    api_key = user_secrets.get_secret("WANDB")
    wandb.login(key=api_key)
    anonymous = None
except:
    wandb.login(anonymous='must')
    print('To use your W&B account,\nGo to Add-ons -> Secrets and provide your W&B access token. Use the Label name as WANDB. \nGet your W&B access token from here: https://wandb.ai/authorize')

# 📖 Meta Data
* `train_images/` - Folder containing training set photos of the form `video_{video_id}/{video_frame}.jpg`.

* `[train/test].csv` - Metadata for the images. As with other test files, most of the test metadata data is only available to your notebook upon submission. Just the first few rows available for download.

* `video_id` - ID number of the video the image was part of. The video ids are not meaningfully ordered.
* `video_frame` - The frame number of the image within the video. Expect to see occasional gaps in the frame number from when the diver surfaced.
* `sequence` - ID of a gap-free subset of a given video. The sequence ids are not meaningfully ordered.
* `sequence_frame` - The frame number within a given sequence.
* `image_id` - ID code for the image, in the format `{video_id}-{video_frame}`
* `annotations` - The bounding boxes of any starfish detections in a string format that can be evaluated directly with Python. Does not use the same format as the predictions you will submit. Not available in test.csv. A bounding box is described by the pixel coordinate `(x_min, y_min)` of its lower left corner within the image together with its `width` and `height` in pixels --> (COCO format).

In [29]:
FOLD      = 1 # which fold to train
DIM       = 700 
MODEL     = 'yolov5s6'
BATCH     = 4
EPOCHS    = 30
OPTMIZER  = 'Adam'

PROJECT   = 'great-barrier-reef-public' # w&b in yolov5
NAME      = f'{MODEL}-dim{DIM}-fold{FOLD}' # w&b for yolov5

REMOVE_NOBBOX = True # remove images with no bbox
WORKING = '/kaggle/working'
ROOT_DIR  = '../input/tensorflow-great-barrier-reef'
IMAGE_DIR = '/kaggle/working/images' # directory to save images
LABEL_DIR = '/kaggle/working/labels' # directory to save labels

## Create Directories

In [30]:
!mkdir -p {IMAGE_DIR}
!mkdir -p {LABEL_DIR}
!mkdir -p {WORKING}

In [31]:
!ls ../input/tensorflow-great-barrier-reef
#!ls {ROOT_DIR}

In [32]:
#!cp -pr ../input/tensorflow-great-barrier-reef /kaggle/working/
#!ls /kaggle/working/tensorflow-great-barrier-reef

## Get Paths

In [33]:
# Train Data
df = pd.read_csv(f'{ROOT_DIR}/train.csv')
df['old_image_path'] = f'{ROOT_DIR}/train_images/video_'+df.video_id.astype(str)+'/'+df.video_frame.astype(str)+'.jpg'
df['image_path']  = f'{IMAGE_DIR}/'+df.image_id+'.jpg'
df['label_path']  = f'{LABEL_DIR}/'+df.image_id+'.txt'
df['annotations'] = df['annotations'].progress_apply(eval)
display(df.head(2))

## Number of BBoxes
> Nearly 80% images are without any bbox.

In [34]:
df['num_bbox'] = df['annotations'].progress_apply(lambda x: len(x))
data = (df.num_bbox>0).value_counts(normalize=True)*100
print(f"No BBox: {data[0]:0.2f}% | With BBox: {data[1]:0.2f}%")

# 🧹 Clean Data
* In this notebook, we use only **bboxed-images** (`~5k`). We can use all `~23K` images for train but most of them don't have any labels. So it would be easier to carry out experiments using only **bboxed images**.

In [35]:
if REMOVE_NOBBOX:
    df = df.query("num_bbox>0")

# ✏️ Write Images
* We need to copy the Images to Current Directory(`/kaggle/working`) as `/kaggle/input` doesn't have **write access** which is needed for **YOLOv5**.
* We can make this process faster using **Joblib** which uses **Parallel** computing.

In [36]:
def make_copy(row):
    shutil.copyfile(row.old_image_path, row.image_path)
    return

In [37]:
image_paths = df.old_image_path.tolist()
_ = Parallel(n_jobs=-1, backend='threading')(delayed(make_copy)(row) for _, row in tqdm(df.iterrows(), total=len(df)))

In [38]:
# check https://github.com/awsaf49/bbox for source code of following utility functions
from bbox.utils import coco2yolo, coco2voc, voc2yolo
from bbox.utils import draw_bboxes, load_image
from bbox.utils import clip_bbox, str2annot, annot2str

def get_bbox(annots):
    bboxes = [list(annot.values()) for annot in annots]
    return bboxes

def get_imgsize(row):
    row['width'], row['height'] = imagesize.get(row['image_path'])
    return row

np.random.seed(32)
colors = [(np.random.randint(255), np.random.randint(255), np.random.randint(255))\
          for idx in range(1)]

## Create BBox

In [39]:
df['bboxes'] = df.annotations.progress_apply(get_bbox)
df.head(2)

## Get Image-Size
> All Images have same dimension, [Width, Height] =  `[1280, 720]`

In [40]:
df['width']  = 1280
df['height'] = 720
display(df.head(2))

# 🏷️ Create Labels
We need to export our labels to **YOLO** format, with one `*.txt` file per image (if no objects in image, no `*.txt` file is required). The *.txt file specifications are:

* One row per object
* Each row is class `[x_center, y_center, width, height]` format.
* Box coordinates must be in **normalized** `xywh` format (from `0 - 1`). If your boxes are in pixels, divide `x_center` and `width` by `image width`, and `y_center` and `height` by `image height`.
* Class numbers are **zero-indexed** (start from `0`).

> Competition bbox format is **COCO** hence `[x_min, y_min, width, height]`. So, we need to convert form **COCO** to **YOLO** format.


In [41]:
cnt = 0
all_bboxes = []
bboxes_info = []
for row_idx in tqdm(range(df.shape[0])):
    row = df.iloc[row_idx]
    image_height = row.height
    image_width  = row.width
    bboxes_coco  = np.array(row.bboxes).astype(np.float32).copy()
    num_bbox     = len(bboxes_coco)
    names        = ['cots']*num_bbox
    labels       = np.array([0]*num_bbox)[..., None].astype(str)
    ## Create Annotation(YOLO)
    with open(row.label_path, 'w') as f:
        if num_bbox<1:
            annot = ''
            f.write(annot)
            cnt+=1
            continue
        bboxes_voc  = coco2voc(bboxes_coco, image_height, image_width)
        bboxes_voc  = clip_bbox(bboxes_voc, image_height, image_width)
        bboxes_yolo = voc2yolo(bboxes_voc, image_height, image_width).astype(str)
        all_bboxes.extend(bboxes_yolo.astype(float))
        bboxes_info.extend([[row.image_id, row.video_id, row.sequence]]*len(bboxes_yolo))
        annots = np.concatenate([labels, bboxes_yolo], axis=1)
        string = annot2str(annots)
        f.write(string)
print('Missing:',cnt)

# 📁 Create Folds
> Number of samples aren't same in each fold which can create large variance in **Cross-Validation**.

In [42]:
from sklearn.model_selection import GroupKFold
kf = GroupKFold(n_splits = 3)
df = df.reset_index(drop=True)
df['fold'] = -1
for fold, (train_idx, val_idx) in enumerate(kf.split(df, groups=df.video_id.tolist())):
    df.loc[val_idx, 'fold'] = fold
display(df.fold.value_counts())

# ⭕ BBox Distribution

In [43]:
bbox_df = pd.DataFrame(np.concatenate([bboxes_info, all_bboxes], axis=1),
             columns=['image_id','video_id','sequence',
                     'xmid','ymid','w','h'])
bbox_df[['xmid','ymid','w','h']] = bbox_df[['xmid','ymid','w','h']].astype(float)
bbox_df['area'] = bbox_df.w * bbox_df.h * 1280 * 720
bbox_df = bbox_df.merge(df[['image_id','fold']], on='image_id', how='left')
bbox_df.head(2)

## `x_center` Vs `y_center`

In [44]:
from scipy.stats import gaussian_kde

all_bboxes = np.array(all_bboxes)

x_val = all_bboxes[...,0]
y_val = all_bboxes[...,1]

# Calculate the point density
xy = np.vstack([x_val,y_val])
z = gaussian_kde(xy)(xy)

fig, ax = plt.subplots(figsize = (10, 10))
# ax.axis('off')
ax.scatter(x_val, y_val, c=z, s=100, cmap='viridis')
# ax.set_xlabel('x_mid')
# ax.set_ylabel('y_mid')
plt.show()

## `width` Vs `height`

In [45]:
x_val = all_bboxes[...,2]
y_val = all_bboxes[...,3]

# Calculate the point density
xy = np.vstack([x_val,y_val])
z = gaussian_kde(xy)(xy)

fig, ax = plt.subplots(figsize = (10, 10))
# ax.axis('off')
ax.scatter(x_val, y_val, c=z, s=100, cmap='viridis')
# ax.set_xlabel('bbox_width')
# ax.set_ylabel('bbox_height')
plt.show()

## Area

In [46]:
import matplotlib as mpl
import seaborn as sns

f, ax = plt.subplots(figsize=(12, 6))
sns.despine(f)

sns.histplot(
    bbox_df,
    x="area", hue="fold",
    multiple="stack",
    palette="viridis",
    edgecolor=".3",
    linewidth=.5,
    log_scale=True,
)
ax.xaxis.set_major_formatter(mpl.ticker.ScalarFormatter())
ax.set_xticks([500, 1000, 2000, 5000, 10000]);

# 🌈 Visualization

In [47]:
df2 = df[(df.num_bbox>0)].sample(100) # takes samples with bbox
y = 3; x = 2
plt.figure(figsize=(12.8*x, 7.2*y))
for idx in range(x*y):
    row = df2.iloc[idx]
    img           = load_image(row.image_path)
    image_height  = row.height
    image_width   = row.width
    with open(row.label_path) as f:
        annot = str2annot(f.read())
    bboxes_yolo = annot[...,1:]
    labels      = annot[..., 0].astype(int).tolist()
    names         = ['cots']*len(bboxes_yolo)
    plt.subplot(y, x, idx+1)
    plt.imshow(draw_bboxes(img = img,
                           bboxes = bboxes_yolo, 
                           classes = names,
                           class_ids = labels,
                           class_name = True, 
                           colors = colors, 
                           bbox_format = 'yolo',
                           line_thickness = 2))
    plt.axis('OFF')
plt.tight_layout()
plt.show()

# 🍚 Dataset

In [48]:
train_files = []
val_files   = []
train_df = df.query("fold!=@FOLD")
valid_df = df.query("fold==@FOLD")
train_files += list(train_df.image_path.unique())
val_files += list(valid_df.image_path.unique())
len(train_files), len(val_files)

# ⚙️ Configuration
The dataset config file requires
1. The dataset root directory path and relative paths to `train / val / test` image directories (or *.txt files with image paths)
2. The number of classes `nc` and 
3. A list of class `names`:`['cots']`

In [49]:
import yaml

cwd = '/kaggle/working/'

with open(os.path.join( cwd , 'train.txt'), 'w') as f:
    for path in train_df.image_path.tolist():
        f.write(path+'\n')
            
with open(os.path.join(cwd , 'val.txt'), 'w') as f:
    for path in valid_df.image_path.tolist():
        f.write(path+'\n')

data = dict(
    path  = '/kaggle/working/',
    train =  os.path.join( cwd , 'train.txt') ,
    val   =  os.path.join( cwd , 'val.txt' ),
    nc    = 1,
    names = ['cots'],
    )

with open(os.path.join( cwd , 'gbr.yaml'), 'w') as outfile:
    yaml.dump(data, outfile, default_flow_style=False)

f = open(os.path.join( cwd , 'gbr.yaml'), 'r')
print('\nyaml:')
print(f.read())

In [50]:
%%writefile /kaggle/working/hyp.yaml
lr0: 0.01  # initial learning rate (SGD=1E-2, Adam=1E-3)
lrf: 0.1  # final OneCycleLR learning rate (lr0 * lrf)
momentum: 0.937  # SGD momentum/Adam beta1
weight_decay: 0.0005  # optimizer weight decay 5e-4
warmup_epochs: 2.0  # warmup epochs (fractions ok)
warmup_momentum: 0.9  # warmup initial momentum
warmup_bias_lr: 0.05  # warmup initial bias lr
box: 0.05  # box loss gain
cls: 0.5  # cls loss gain
cls_pw: 1.0  # cls BCELoss positive_weight
obj: 1.0  # obj loss gain (scale with pixels)
obj_pw: 1.0  # obj BCELoss positive_weight
iou_t: 0.20  # IoU training threshold
anchor_t: 4.0  # anchor-multiple threshold
anchors: 3  # anchors per output layer (0 to ignore)
fl_gamma: 0.0  # focal loss gamma (efficientDet default gamma=1.5)
hsv_h: 0.02  # image HSV-Hue augmentation (fraction)
hsv_s: 0.8  # image HSV-Saturation augmentation (fraction)
hsv_v: 0.3  # image HSV-Value augmentation (fraction)
degrees: 0.0  # image rotation (+/- deg)
translate: 0.10  # image translation (+/- fraction)
scale: 0.5  # image scale (+/- gain)
shear: 0.0  # image shear (+/- deg)
perspective: 0.0  # image perspective (+/- fraction), range 0-0.001
flipud: 0.5  # image flip up-down (probability)
fliplr: 0.5  # image flip left-right (probability)
mosaic: 0.5  # image mosaic (probability)
mixup: 0.5 # image mixup (probability)
copy_paste: 0.0  # segment copy-paste (probability)

In [51]:
!pwd

In [52]:
sys.path.append('kaggle/working/yolov5')

In [53]:
%cd /kaggle/working/
# !git clone https://github.com/ultralytics/yolov5 # clone
!cp -r /kaggle/input/yolov5-lib-ds /kaggle/working/yolov5
%cd yolov5
%pip install -qr requirements.txt  # install




In [54]:
# import yolov5
# display = yolov5.utils.notebook_init()  # check

In [55]:
# !pwd
# !ls
# !cat requirements.txt

# 🚅 Training

In [None]:
!python train.py --img {DIM}\
--batch {BATCH}\
--epochs {EPOCHS}\
--optimizer {OPTMIZER}\
--data /kaggle/working/gbr.yaml\
--hyp /kaggle/working/hyp.yaml\
--weights {MODEL}.pt\
--project {PROJECT} --name {NAME}\
--exist-ok

# ✨ Overview
<span style="color: #000508; font-family: Segoe UI; font-size: 1.5em; font-weight: 300;"><a href="https://wandb.ai/obengdouglas/great-barrier-reef-public">View the Complete Dashboard Here ⮕</a></span>
![image.png](https://github.com/denniesbor/KAGGLE-PROTECT-THE-GREAT-BARRIER-REEF/blob/assets/Screenshot%202022-02-09%20124249.png?raw=true)

## Output Files

In [None]:
OUTPUT_DIR = '{}/{}'.format(PROJECT, NAME)
!ls {OUTPUT_DIR}

# 📈 Class Distribution

In [None]:
plt.figure(figsize = (10,10))
plt.axis('off')
plt.imshow(plt.imread(f'{OUTPUT_DIR}/labels_correlogram.jpg'));

In [None]:
plt.figure(figsize = (10,10))
plt.axis('off')
plt.imshow(plt.imread(f'{OUTPUT_DIR}/labels.jpg'));

# 🔭 Batch Image

In [None]:
import matplotlib.pyplot as plt
plt.figure(figsize = (10, 10))
plt.imshow(plt.imread(f'{OUTPUT_DIR}/train_batch0.jpg'))

plt.figure(figsize = (10, 10))
plt.imshow(plt.imread(f'{OUTPUT_DIR}/train_batch1.jpg'))

plt.figure(figsize = (10, 10))
plt.imshow(plt.imread(f'{OUTPUT_DIR}/train_batch2.jpg'))

## GT Vs Pred

In [None]:
fig, ax = plt.subplots(3, 2, figsize = (2*9,3*5), constrained_layout = True)
for row in range(3):
    ax[row][0].imshow(plt.imread(f'{OUTPUT_DIR}/val_batch{row}_labels.jpg'))
    ax[row][0].set_xticks([])
    ax[row][0].set_yticks([])
    ax[row][0].set_title(f'{OUTPUT_DIR}/val_batch{row}_labels.jpg', fontsize = 12)
    
    ax[row][1].imshow(plt.imread(f'{OUTPUT_DIR}/val_batch{row}_pred.jpg'))
    ax[row][1].set_xticks([])
    ax[row][1].set_yticks([])
    ax[row][1].set_title(f'{OUTPUT_DIR}/val_batch{row}_pred.jpg', fontsize = 12)
plt.show()

# 🔍 Result

## Score Vs Epoch

In [None]:
plt.figure(figsize=(30,15))
plt.axis('off')
plt.imshow(plt.imread(f'{OUTPUT_DIR}/results.png'));

## Confusion Matrix

In [None]:
plt.figure(figsize=(12,10))
plt.axis('off')
plt.imshow(plt.imread(f'{OUTPUT_DIR}/confusion_matrix.png'));

## Metrics

In [None]:
for metric in ['F1', 'PR', 'P', 'R']:
    print(f'Metric: {metric}')
    plt.figure(figsize=(12,10))
    plt.axis('off')
    plt.imshow(plt.imread(f'{OUTPUT_DIR}/{metric}_curve.png'));
    plt.show()

## Please Upvote if you find this Helpful

# ✂️ Remove Files

In [None]:
!rm -r {IMAGE_DIR}
!rm -r {LABEL_DIR}

# Model Inference

# 🛠 Install Libraries

In [None]:
# bbox-utility, check https://github.com/awsaf49/bbox for source code
!pip install -q /kaggle/input/loguru-lib-ds/loguru-0.5.3-py3-none-any.whl
!pip install -q /kaggle/input/bbox-lib-ds

# 📚 Import Libraries

In [None]:
import numpy as np
from tqdm.notebook import tqdm
tqdm.pandas()
import pandas as pd
import os
import cv2
import matplotlib.pyplot as plt
import glob
import shutil
import sys
sys.path.append('../input/tensorflow-great-barrier-reef')
import torch
from PIL import Image

# 📌 Key-Points
* One have to submit prediction using the provided **python time-series API**, which makes this competition different from previous Object Detection Competitions.
* Each prediction row needs to include all bounding boxes for the image. Submission is format seems also **COCO** which means `[x_min, y_min, width, height]`
* Copmetition metric `F2` tolerates some false positives(FP) in order to ensure very few starfish are missed. Which means tackling **false negatives(FN)** is more important than false positives(FP). 
$$F2 = 5 \cdot \frac{precision \cdot recall}{4\cdot precision + recall}$$

In [None]:
ROOT_DIR  = '/kaggle/input/tensorflow-great-barrier-reef/'
# CKPT_DIR  = '/kaggle/input/greatbarrierreef-yolov5-train-ds'
CKPT_PATH = '/kaggle/input/reef-baseline-fold12/l6_3600_uflip_vm5_f12_up/f1/best.pt' # by @steamedsheep
IMG_SIZE  = 9000
CONF      = 0.25
IOU       = 0.40
AUGMENT   = True

In [None]:
def predict(model, img, size=768, augment=False):
    height, width = img.shape[:2]
    results = model(img, size=size, augment=augment)  # custom inference size
    preds   = results.pandas().xyxy[0]
    bboxes  = preds[['xmin','ymin','xmax','ymax']].values
    if len(bboxes):
        bboxes  = voc2coco(bboxes,height,width).astype(int)
        confs   = preds.confidence.values
        return bboxes, confs
    else:
        return [],[]
    
def format_prediction(bboxes, confs):
    annot = ''
    if len(bboxes)>0:
        for idx in range(len(bboxes)):
            xmin, ymin, w, h = bboxes[idx]
            conf             = confs[idx]
            annot += f'{conf} {xmin} {ymin} {w} {h}'
            annot +=' '
        annot = annot.strip(' ')
    return annot

def show_img(img, bboxes, bbox_format='yolo'):
    names  = ['starfish']*len(bboxes)
    labels = [0]*len(bboxes)
    img    = draw_bboxes(img = img,
                           bboxes = bboxes, 
                           classes = names,
                           class_ids = labels,
                           class_name = True, 
                           colors = colors, 
                           bbox_format = bbox_format,
                           line_thickness = 2)
    return Image.fromarray(img).resize((800, 400))

## Run Inference on **Train**

In [None]:
model = load_model(CKPT_PATH, conf=CONF, iou=IOU)
image_paths = df[df.num_bbox>1].sample(100).image_path.tolist()
for idx, path in enumerate(image_paths):
    img = cv2.imread(path)[...,::-1]
    bboxes, confis = predict(model, img, size=IMG_SIZE, augment=AUGMENT)
    display(show_img(img, bboxes, bbox_format='coco'))
    if idx>5:
        break

## Init `Env`

In [None]:
import greatbarrierreef
env = greatbarrierreef.make_env()# initialize the environment
iter_test = env.iter_test()      # an iterator which loops over the test set and sample submission

## Run Inference on **Test**

In [None]:
model = load_model(CKPT_PATH, conf=CONF, iou=IOU)
for idx, (img, pred_df) in enumerate(tqdm(iter_test)):
    bboxes, confs  = predict(model, img, size=IMG_SIZE, augment=AUGMENT)
    annot          = format_prediction(bboxes, confs)
    pred_df['annotations'] = annot
    env.predict(pred_df)
    if idx<3:
        display(show_img(img, bboxes, bbox_format='coco'))

# 👀 Check Submission

In [None]:
sub_df = pd.read_csv('submission.csv')
sub_df.head()