# R-CNN for AWS Inferentia

This notebook demonstrates how to compile and run a [Detectron2](https://github.com/facebookresearch/detectron2) R-CNN  model for accelerated inference on Inferentia. This notebook has the following main sections:

1. Install dependencies
1. Define preprocessing and compilation helper functions
1. Define wrappers that extract the R-CNN ResNet backbone, RPN Head, and RoI Head for compilation on Inf1. Also define a `NeuronRCNN` wrapper that creates an optimized end-to-end Detectron2 R-CNN model for inference on Inf1
1. Create the `NeuronRCNN` model by compiling the wrappers
1. Run inference using the `NeuronRCNN` model

#### Notebook background:

The compilation wrappers and optimizations performed in this notebook are described in the [R-CNN application note](https://awsdocs-neuron-staging.readthedocs-hosted.com/en/latest/general/appnotes/torch-neuron/rcnn-app-note.html).

## Installation

This application note requires the following pip packages:

1. `torch==1.11.0`
1. `torch-neuron`
1. `neuron-cc`
1. `opencv-python`
1. `pycocotools`
1. `torchvision==0.12.0`
1. `detectron2==0.6`

The following section builds `torchvision` from source and installs the `Detectron2` package. It also reinstalls the Neuron packages to ensure version compability.

The Torchvision `roi_align_kernel.cpp` kernel is modified to use OMP threading for multithreaded inference on CPU. This significantly improves the performance of RoI Align kernels on Inf1: OMP threading leads to a 2 - 3x RoI Align latency reduction compared to the default `roi_align_kernel.cpp` kernel configuration.

In [None]:
# Install python3.7-dev for pycocotools (a Detectron2 dependency)
!sudo apt install python3.7-dev -y

# Install Neuron packages
!pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com
!pip uninstall -y torchvision
!pip install --force-reinstall torch-neuron==1.11.0.* neuron-cc[tensorflow] "protobuf==3.20.1" ninja opencv-python

# Change cuda to 10.2 for Detectron2
!sudo rm /usr/local/cuda
!sudo ln -s /usr/local/cuda-10.2 /usr/local/cuda

# Install Torchvision 0.12.0 from source
!git clone -b release/0.12 https://github.com/pytorch/vision.git

# Update the RoI Align kernel to use OMP multithreading
with open('vision/torchvision/csrc/ops/cpu/roi_align_kernel.cpp', 'r') as file:
    content = file.read()

# Enable OMP Multithreading and set the number of threads to 4
old = "// #pragma omp parallel for num_threads(32)"
new = "#pragma omp parallel for num_threads(4)"
content = content.replace(old, new)

# Re-write the file
with open('vision/torchvision/csrc/ops/cpu/roi_align_kernel.cpp', 'w') as file:
    file.write(content)

# Build Torchvision with OMP threading
!cd vision && CFLAGS="-fopenmp" python setup.py bdist_wheel
%pip install vision/dist/*.whl

# Install Detectron2 release v0.6
!python -m pip install 'git+https://github.com/facebookresearch/detectron2.git@v0.6'

## Preprocessing and compilation functions

In [None]:
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg

def get_model():

    # Configure the R-CNN model
    CONFIG_FILE = "COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml"
    WEIGHTS_FILE = "COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml"
    cfg = get_cfg()
    cfg.merge_from_file(model_zoo.get_config_file(CONFIG_FILE))
    cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(WEIGHTS_FILE)
    cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5
    cfg.MODEL.DEVICE = 'cpu' # Send to CPU for Neuron Tracing

    # Create the R-CNN predictor wrapper
    predictor = DefaultPredictor(cfg)
    return predictor

In [None]:
import os
import urllib.request
import cv2
import torch

def get_image():

    # Get a sample image
    filename = 'input.jpg'
    if not os.path.exists(filename):
        url = "http://images.cocodataset.org/val2017/000000439715.jpg"
        urllib.request.urlretrieve(url, filename)
    return filename


def preprocess(original_image, predictor):
    """
    A basic preprocessing function that sets the input height=800 and
    input width=800. The function is derived from the preprocessing
    steps in the Detectron2 `DefaultPredictor` module.
    """

    height, width = original_image.shape[:2]
    resize_func = predictor.aug.get_transform(original_image)
    resize_func.new_h = 800 # Override height
    resize_func.new_w = 800 # Override width
    image = resize_func.apply_image(original_image)
    image = torch.as_tensor(image.astype("float32").transpose(2, 0, 1))
    inputs = {"image": image, "height": height, "width": width}
    return inputs

In [None]:
import torch_neuron
from typing import Any, Union, Callable

def compile_or_load(
    model: Union[Callable, torch.nn.Module],
    example_inputs: Any,
    filename: str,
    **kwargs
) -> torch.nn.Module:
    """
    Load a Neuron module if it exists. Otherwise, compile the model for Inf1
    and save it as provided filename.

    model: A module or function which defines a torch model or computation.
    example_inputs: An example set of inputs which will be passed to the
        `model` during compilation.
    filename: Name of the compiled model
    kwargs: Extra `torch_neuron.trace` kwargs
    """

    if not os.path.exists(filename):
        with torch.no_grad():
            compiled_model = torch_neuron.trace(model, example_inputs, **kwargs)
        torch.jit.save(compiled_model, filename)

    compiled_model = torch.jit.load(filename)
    return compiled_model

## Neuron compilation wrappers

In [None]:
from detectron2.modeling.meta_arch.rcnn import GeneralizedRCNN

class NeuronFusedBackboneRPNHead(torch.nn.Module):
    """
    Wrapper to compile the fused ResNet backbone and RPN Head.
    """

    def __init__(self, model: GeneralizedRCNN) -> None:
        super().__init__()
        self.backbone = model.backbone
        self.rpn_head = model.proposal_generator.rpn_head
        self.in_features = model.proposal_generator.in_features

    def forward(self, x):
        features = self.backbone(x)
        features_ = [features[f] for f in self.in_features]
        return self.rpn_head(features_), features


class BackboneRPN(torch.nn.Module):
    """
    Wrapper that uses the compiled `neuron_backbone_rpn` instead
    of the original backbone and RPN Head. We copy the remainder
    of the RPN `forward` code (`predictor.model.proposal_generator.forward`)
    to create a "fused" backbone + RPN module.
    """

    def __init__(self, model: GeneralizedRCNN) -> None:
        super().__init__()
        self.backbone_rpn_head = NeuronFusedBackboneRPNHead(model)
        self._rpn = model.proposal_generator
        self.in_features = model.proposal_generator.in_features

    def forward(self, images):
        preds, features = self.backbone_rpn_head(images.tensor)
        features_ = [features[f] for f in self.in_features]
        pred_objectness_logits, pred_anchor_deltas = preds
        anchors = self._rpn.anchor_generator(features_)

        # Transpose the Hi*Wi*A dimension to the middle:
        pred_objectness_logits = [
            # (N, A, Hi, Wi) -> (N, Hi, Wi, A) -> (N, Hi*Wi*A)
            score.permute(0, 2, 3, 1).flatten(1)
            for score in pred_objectness_logits
        ]
        pred_anchor_deltas = [
            # (N, A*B, Hi, Wi) -> (N, A, B, Hi, Wi) -> (N, Hi, Wi, A, B) -> (N, Hi*Wi*A, B)
            x.view(x.shape[0], -1, self._rpn.anchor_generator.box_dim,
                   x.shape[-2], x.shape[-1])
            .permute(0, 3, 4, 1, 2)
            .flatten(1, -2)
            for x in pred_anchor_deltas
        ]

        proposals = self._rpn.predict_proposals(
            anchors, pred_objectness_logits, pred_anchor_deltas, images.image_sizes
        )
        return proposals, features


class NeuronBoxHeadBoxPredictor(torch.nn.Module):
    """
    Wrapper that extracts the RoI Box Head and Box Predictor
    for compilation.
    """

    def __init__(self, model: GeneralizedRCNN) -> None:
        super().__init__()
        self.roi_heads = model.roi_heads

    def forward(self, box_features):
        box_features = self.roi_heads.box_head(box_features)
        predictions = self.roi_heads.box_predictor(box_features)
        return predictions


class ROIHead(torch.nn.Module):
    """
    Wrapper that combines the compiled `roi_heads` into the
    rest of the RoI module. The `_forward_box` and `forward`
    functions are from the `predictor.model.roi_heads` module.
    """

    def __init__(self, model: GeneralizedRCNN) -> None:
        super().__init__()
        self.roi_heads = model.roi_heads
        self.neuron_box_head_predictor = NeuronBoxHeadBoxPredictor(model)

    def _forward_box(self, features, proposals):
        features = [features[f] for f in self.roi_heads.box_in_features]
        box_features = self.roi_heads.box_pooler(
            features, [x.proposal_boxes for x in proposals])
        predictions = self.neuron_box_head_predictor(box_features)
        pred_instances, _ = self.roi_heads.box_predictor.inference(
            predictions, proposals)
        return pred_instances

    def forward(self, images, features, proposals, targets=None):
        pred_instances = self._forward_box(features, proposals)
        pred_instances = self.roi_heads.forward_with_given_boxes(
            features, pred_instances)
        return pred_instances, {}


class NeuronRCNN(torch.nn.Module):
    """
    Wrapper that uses the fused backbone + RPN module and the optimized RoI
    Heads wrapper
    """

    def __init__(self, model: GeneralizedRCNN) -> None:
        super().__init__()

        # Create fused Backbone + RPN
        self.backbone_rpn = BackboneRPN(model)

        # Create Neuron RoI Head
        self.roi_heads = ROIHead(model)

        # Define pre and post-processing functions
        self.preprocess_image = model.preprocess_image
        self._postprocess = model._postprocess

    def forward(self, batched_inputs):
        images = self.preprocess_image(batched_inputs)
        proposals, features = self.backbone_rpn(images)
        results, _ = self.roi_heads(images, features, proposals, None)
        return self._postprocess(results, batched_inputs, images.image_sizes)

## Compile the fused backbone + RPN Head and RoI Head for inference on Inf1


In [None]:
# Create and compile the combined backbone and RPN Head wrapper
backbone_rpn_filename = 'backbone_rpn.pt'

predictor = get_model()
backbone_rpn_wrapper = NeuronFusedBackboneRPNHead(predictor.model)
backbone_rpn_wrapper.eval()

example = torch.rand([1, 3, 800, 800])
neuron_backbone_rpn_head = compile_or_load(backbone_rpn_wrapper, example, backbone_rpn_filename, strict=False)

In [None]:
# Create and compile the RoI Head wrapper
roi_head_filename = 'box_head_predictor.pt'

predictor = get_model()
box_head_predictor = NeuronBoxHeadBoxPredictor(predictor.model)
box_head_predictor.eval()

example = torch.rand([1000, 256, 7, 7])
neuron_box_head_predictor = compile_or_load(box_head_predictor, example, roi_head_filename)

## Neuron R-CNN Inference

In [None]:
# Initialize an R-CNN on CPU
predictor = get_model()

# Create the Neuron R-CNN on CPU
neuron_rcnn = NeuronRCNN(predictor.model)
neuron_rcnn.eval()

# Inject the Neuron compiled models
neuron_rcnn.backbone_rpn.backbone_rpn_head = neuron_backbone_rpn_head
neuron_rcnn.roi_heads.neuron_box_head_predictor = neuron_box_head_predictor

# Download a sample image from the COCO dataset and read it
image_filename = get_image()
image = cv2.imread(image_filename)
inputs = preprocess(image, get_model())

# Run inference using the sample image and print the output
print(neuron_rcnn([inputs]))