# 🐠 Reef - DETR - Detection Transformer - Infer

## DETR Baseline model for the [Great Barrier Reef Competition](https://www.kaggle.com/c/tensorflow-great-barrier-reef) with `LB=0.189`

![](https://storage.googleapis.com/kaggle-competitions/kaggle/31703/logos/header.png)

## An adaption of [End to End Object Detection with Transformers:DETR](https://www.kaggle.com/tanulsingh077/end-to-end-object-detection-with-transformers-detr) to the [Great Barrier Reef Competition](https://www.kaggle.com/c/tensorflow-great-barrier-reef)

I made various adaptations to it in order to work, based on the following code and documentation:
* This awesome fork [End to End Object Detection with Transformers:DETR](https://www.kaggle.com/prokaj/end-to-end-object-detection-with-transformers-detr) by [prvi](https://www.kaggle.com/prokaj), correctly formatting the input, which is not coco and not pascal_voc, but something else.
* Albumentation code for bbox normalize and denormalize functions: [here](https://github.com/albumentations-team/albumentations/blob/master/albumentations/augmentations/bbox_utils.py#L88)
* [DETR's hands on Colab Notebook](https://colab.research.google.com/github/facebookresearch/detr/blob/colab/notebooks/detr_attention.ipynb): Shows how to load a model from hub, generate predictions, then visualize the attention of the model (similar to the figures of the paper)
* [Standalone Colab Notebook](https://colab.research.google.com/github/facebookresearch/detr/blob/colab/notebooks/detr_demo.ipynb): In this notebook, we demonstrate how to implement a simplified version of DETR from the grounds up in 50 lines of Python, then visualize the predictions. It is a good starting point if you want to gain better understanding the architecture and poke around before diving in the codebase.
* [Panoptic Colab Notebook](https://colab.research.google.com/github/facebookresearch/detr/blob/colab/notebooks/DETR_panoptic.ipynb): Demonstrates how to use DETR for panoptic segmentation and plot the predictions.
* [Hugging Face DETR Documentation](https://huggingface.co/docs/transformers/model_doc/detr)

The main changes to the original notebook I forked are:
* Data format changed from `[x_min, y_min, w, h]` to `[x_center, y_center, w, h]`
* Resnet-like normalization instead of `[0...1]`


## This is the inference notebook. You can find the training one here: [🐠 Reef - DETR - Detection Transformer - Train](https://www.kaggle.com/julian3833/reef-detr-detection-transformer-train).



# Please, _DO_ upvote if you find this useful!!


&nbsp;
&nbsp;
&nbsp;

---


# About DETR (Detection Transformer)

Attention is all you need,paper for Transformers,changed the state of NLP and has achieved great hieghts. Though mainly developed for NLP , the latest research around it focuses on how to leverage it across different verticals of deep learning. Transformer acrhitecture is very very powerful, and is something which is very close to my part,this is the reason I am motivated to explore anything that uses transformers , be it google's recently released Tabnet or OpenAI's ImageGPT .

Detection Transformer leverages the transformer network(both encoder and the decoder) for Detecting Objects in Images . Facebook's researchers argue that for object detection one part of the image should be in contact with the other part of the image for greater result especially with ocluded objects and partially visible objects, and what's better than to use transformer for it.

**The main motive behind DETR is effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode prior knowledge about the task and makes the process complex and computationally expensive**

The main ingredients of the new framework, called DEtection TRansformer or DETR, <font color='green'>are a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture.</font>

![](https://cdn.analyticsvidhya.com/wp-content/uploads/2020/05/Screenshot-from-2020-05-27-17-48-38.png)

---


In [None]:
from IPython.display import IFrame, YouTubeVideo
YouTubeVideo('T35ba_VXkMY',width=600, height=400)

# References:
* The video [above](https://www.youtube.com/watch?v=T35ba_VXkMY) in youtube
* [Other Video](https://www.youtube.com/watch?v=LfUsGv-ESbc)
* The original notebook: [End to End Object Detection with Transformers:DETR](https://www.kaggle.com/tanulsingh077/end-to-end-object-detection-with-transformers-detr)
* [Paper](https://scontent.flko3-1.fna.fbcdn.net/v/t39.8562-6/101177000_245125840263462_1160672288488554496_n.pdf?_nc_cat=104&_nc_sid=ae5e01&_nc_ohc=KwU3i7_izOgAX9bxMVv&_nc_ht=scontent.flko3-1.fna&oh=64dad6ce7a7b4807bb3941690beaee69&oe=5F1E8347) is the link to the paper
* [Github repo](https://github.com/facebookresearch/detr)
* [Blogpost](https://ai.facebook.com/blog/end-to-end-object-detection-with-transformers/)


Ok, enough chit chat, show me the code!!

# Imports

In [None]:
import os
import numpy as np 
import pandas as pd 
from datetime import datetime
import time
import random
from tqdm import tqdm


#Torch
import torch
import torch.nn as nn

#CV
import cv2

#Albumenatations
import albumentations as A
import matplotlib.pyplot as plt
from albumentations.pytorch.transforms import ToTensorV2

# Constants

In [None]:
num_classes = 2
num_queries = 18
null_class_coef = 0.1
BATCH_SIZE = 8

WIDTH = 1280
HEIGHT = 720

DETECTION_THRESHOLD = 0.5

DEVICE = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

# Pretrained weights

These come from the training notebook here:  [🐠 Reef - DETR - Detection Transformer - Train](https://www.kaggle.com/julian3833/reef-detr-detection-transformer-train)

Turned into a dataset here: [DETR - Weights and Supplies](https://www.kaggle.com/julian3833/detr-weights-and-supplies)
 

In [None]:
WEIGHTS_FILE = "../input/detr-weights-and-supplies/pytorch_model.bin"

# Setup pre-trained backbone and torchhub dependecies 

We are overriding the resources torch hub would download from the Internet in `/root/.cache/torch/hub/` with what we know it downloaded (from the training notebook).
This way we can submit with Internet disabled.


In [None]:
!mkdir -p /root/.cache/torch/hub/
!cp -R ../input/detr-weights-and-supplies/torch_hub/* /root/.cache/torch/hub
!ls -l /root/.cache/torch/hub/

# Get the model
We create the model and set the fine-tuned weights with `torch.load`

In [None]:
class DETRModel(nn.Module):
    def __init__(self,num_classes, num_queries):
        super(DETRModel,self).__init__()
        self.num_classes = num_classes
        self.num_queries = num_queries
        
        self.model = torch.hub.load('facebookresearch/detr', 'detr_resnet50', pretrained=True)
        self.in_features = self.model.class_embed.in_features
        
        self.model.class_embed = nn.Linear(in_features=self.in_features,out_features=self.num_classes)
        self.model.num_queries = self.num_queries
        
    def forward(self,images):
        return self.model(images)


def get_model():
    model = DETRModel(num_classes=num_classes,num_queries=num_queries)
    model.load_state_dict(torch.load(WEIGHTS_FILE, map_location=DEVICE))
    model.eval()
    model = model.to(DEVICE)
    return model

model = get_model()

# Predict functions

In [None]:
import torchvision.transforms as T

In [None]:
def transform():
    return A.Compose([
        A.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
        ToTensorV2(p=1.0),
        
    ])


def format_prediction_string(boxes, scores):
    # Format as specified in the evaluation page
    pred_strings = []
    for j in zip(scores, boxes):
        pred_strings.append("{0:.10f} {1} {2} {3} {4}".format(j[0], j[1][0], j[1][1], j[1][2], j[1][3]))
    return " ".join(pred_strings)


def predict(model, pixel_array):
    # Predictions for a single image
    
    # Apply all the transformations that are required
    pixel_array = pixel_array.astype(np.float32)# / 255.
    tensor_img = transform()(image=pixel_array)['image'].unsqueeze(0)
    
    # TODO!!: un-scale boxes
    
    # Get predictions
    with torch.no_grad():
        outputs = model(tensor_img.to(DEVICE))
    
    #import pdb; pdb.set_trace()
    # Move predictions to cpu and numpy
    boxes = outputs['pred_boxes'][0].data.cpu().numpy()
    boxes = np.array([np.array(box).astype(np.int32) for box in A.augmentations.bbox_utils.denormalize_bboxes(boxes, HEIGHT, WIDTH)])
    
    scores  = outputs['pred_logits'][0].softmax(1).detach().cpu().numpy()[:,0]
    
    #import pdb; pdb.set_trace()
    # Filter predictions with low score
    boxes = boxes[scores >= DETECTION_THRESHOLD].astype(np.int32)
    
    #[x_min, y_min, width, height]
    boxes[:, 0] = boxes[:, 0] - (boxes[:, 2] / 2) # x_center --> x_min
    boxes[:, 1] = boxes[:, 1] - (boxes[:, 3] / 2) # y_center --> y_min
    
    scores = scores[scores >= DETECTION_THRESHOLD]
    
    scored_boxes = list(zip(boxes, scores))
    sorted_boxes = list(sorted(scored_boxes, key=lambda y: -y[1]))
    top_n_boxes = sorted_boxes[:18]
    boxes = [box for box, score in top_n_boxes]
    scores = [score for box, score in top_n_boxes]
  
    # Format results as requested in the Evaluation tab
    return format_prediction_string(boxes, scores)

# Submit

(See: [Great Barrier Reef API Tutorial](https://www.kaggle.com/sohier/great-barrier-reef-api-tutorial))

In [None]:
import greatbarrierreef
env = greatbarrierreef.make_env()
iter_test = env.iter_test() 
#pixel_arrays = []
for (pixel_array, df_pred) in iter_test:
    # Predictions
    #pixel_arrays.append(pixel_array)
    df_pred['annotations'] = predict(model, pixel_array)
    #display(df_pred)
    
    env.predict(df_pred)

# Please, _DO_ upvote if you find it useful or interesting!!