# Object Detection Examples

&nbsp;

Ampere AI software stack is the software acceleration layer of Ampere Cloud Native Processors specifically dedicated to accelerating AI workloads running on Ampere Processors. Ampere Optimized AI Frameworks include PyTorch, TensorFlow, and ONNXRuntime. This drop-in library seamlessly supports all AI applications developed in the most popular AI frameworks. It works  right out-of-the-box without API changes or any additional coding. Additionally, the Ampere AI software engineering team provides the publicly accessbile Ampere Model Library (AML) for testing and benchmarking the performance of Ampere Cloud Native Processors for some of the most common AI inference workloads.

&nbsp;&nbsp;Please visit us at https://amperecomputing.com

&nbsp;

## COCO Dataset Overview
<img align="left" src="https://cocodataset.org/images/coco-logo.png" alt="nn" style="width: 200px;"/>

&nbsp;

&nbsp;

These examples are using subset of COCO object detection validation set from year 2014.
COCO is a large-scale object detection, segmentation, and captioning dataset.

More info can be found here: https://cocodataset.org

&nbsp;

In [None]:
import os
import cv2
import time
import subprocess
import numpy as np
import onnxruntime as ort
from matplotlib import pyplot as plt

from utils.coco import COCODataset
import utils.post_processing as pp
import utils.benchmark as bench_utils

LAT_BATCH_SIZE = 1

## SSD MobileNet v1 

This example shows the performance of SSD MobileNet v1 
You can read more on SSD MobileNet architecture here: https://arxiv.org/pdf/1801.04381.pdf

In [None]:
input_shape = (640, 640)
threshold = 0.3
model = "ssd_mobilenet_v1/torch2onnx_ssd_mobilenet_v1.onnx"

In [None]:
# Initialize onnx session options 
session_options = ort.SessionOptions()
session_options.intra_op_num_threads = bench_utils.get_intra_op_parallelism_threads()
session_options.inter_op_num_threads = 1
session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
session_options.log_severity_level = 3

In [None]:
# initialization of COCO dataset
coco = COCODataset(
    batch_size=LAT_BATCH_SIZE,
    color_model="BGR",
    images_filename_base="COCO_val2014_000000000000",
    pre_processing=False,
    sort_ascending=True,
    transpose_input=False
)

input_array = coco.get_input_array(target_shape=input_shape)

input_dict = dict()
input_dict["image_tensor:0"] = input_array

output_names = ["detection_classes:0", "detection_boxes:0", "detection_scores:0", "num_detections:0"]

# for the purpose of visualizing results let's load the image without pre-processing
img = cv2.imread(str(coco.path_to_latest_image))

In [None]:
# running the model with AIO enabled
ort.AIO.force_enable()

sess = ort.InferenceSession(model, sess_options=session_options, providers=ort.get_available_providers())

# warm-up run
_ = sess.run(output_names, input_dict)

In [None]:
# actual run
start = time.time()
output_aio = sess.run(output_names, input_dict)
finish = time.time()

latency_ms = (finish - start) * 1000
print("\nSSD MobileNet v1 latency with AIO: {:.0f} ms\n".format(latency_ms))

In [None]:
# visualizing output

# post-processing
def post_process(image, det_boxes, det_classes, det_scores, num_det):


    for i in range(LAT_BATCH_SIZE):
        for d in range(int(num_det)):

            # the detected object does not exceed a set threshold we skip it
            if det_scores[i][d] < threshold:
                continue

            # first let's switch order of bbox boundaries from [top left bottom right] to [left top right bottom]
            converted_bbox = coco.convert_bbox_to_coco_order(
                det_boxes[i][d] * input_shape[0],
                1, 0, 3, 2,
                absolute=False
            )

            # then rescale back to original image ratio
            converted_bbox = coco.rescale_bbox(i, converted_bbox)

            # we can now draw bbox on the original input image
            image = pp.draw_bbox(image, converted_bbox, int(det_classes[i][d]))

    return image

detection_boxes = output_aio[1]
detection_classes = output_aio[0]
detection_classes += 1  # model uses indexing from 0 while COCO dateset start with idx of 1
detection_scores = output_aio[2]
num_detections = output_aio[3]

# show the post-processed images
plt.imshow(cv2.cvtColor(
    post_process(img, detection_boxes, detection_classes, detection_scores, num_detections),
    cv2.COLOR_BGR2RGB
))
plt.show()
print("SSD MobileNet v1 output with AIO enabled\n")