#  DVS Object Detection Benchmark Tutorial

This tutorial aims to provide an insight on how the NeuroBench framework is organized and how you can use it to benchmark your own models!

## About DVS Object Detection:
Real-time object detection is a widely used computer vision task with applications in several domains, including robotics, autonomous driving, and surveillance. Its applications include event cameras for smart home and surveillance systems, drones that monitor and track objects of interest, and self-driving cars that detect obstacles to ensure safe operation. Efficient energy consumption and real-time performance are crucial in such scenarios, particularly when deployed on low-power or always-on edge devices.

### Dataset:
The object detection benchmark utilizes the Prophesee 1 Megapixel Automotive Detection Dataset. This dataset was recorded with a high-resolution event camera with a 110 degree field of view mounted on a car windshield. The car was driven in various areas under different daytime weather conditions over several months. The dataset was labeled using the video stream of an additional RGB camera in a semi-automated way, resulting in over 25 million bounding boxes for seven different object classes: pedestrian, two-wheeler, car, truck, bus, traffic sign, and traffic light. The labels are provided at a rate of 60Hz, and the recording of 14.65 hours is split into 11.19, 2.21, and 2.25 hours for training, validation, and testing, respectively. 

### Benchmark Task:
The task of object detection in event-based spatio-temporal data involves identifying bounding boxes of objects belonging to multiple predetermined classes in an event stream. Training for this task is performed offline based on the data splits provided by the original dataset.

Note: This benchmark relies on the [Prophesee Metavision software](https://docs.prophesee.ai/stable/index.html), which must be downloaded from Prophesee itself and is not included with the NeuroBench package.


First we will import the relevant libraries. These include the dataset, model wrapper, and benchmark object.

In [None]:
import torch

from neurobench.datasets import Gen4DetectionDataLoader
from neurobench.models import NeuroBenchModel
from neurobench.benchmarks import Benchmark

from metavision_ml.detection.anchors import Anchors
from metavision_ml.detection.rpn import BoxHead

For this tutorial, we use the baseline RED architecture ([https://arxiv.org/pdf/2009.13436.pdf](https://arxiv.org/pdf/2009.13436.pdf)), labelled as Vanilla, and a hybrid ANN-SNN conversion which replaces the recurrent convolutional layers with spiking neurons, labelled Vanilla_lif. The latter model is implemented using the SpikingJelly framework.

In [None]:
from examples.obj_detection.obj_det_model import Vanilla, Vanilla_lif

To get started, we will load our desired dataset in a dataloader:

In [None]:
# dataloader itself takes about 7 minutes for loading, with model evaluation and score calculation is about 20 minutes on i9-12900KF, RTX3080
test_set_dataloader = Gen4DetectionDataLoader(dataset_path="../../data/Gen 4 Multi channel", # data in repo root dir
        split="testing",
        batch_size = 12,
        num_tbins = 12,
        preprocess_function_name="multi_channel_timesurface",
        delta_t=50000,
        channels=6,  # multichannel six channels
        height=360,
        width=640,
        max_incr_per_pixel=5,
        class_selection=["pedestrian", "two wheeler", "car"],
        num_workers=2)

For the models we want to benchmark, we need a wrapper. The wrapper inherits from the NeuroBenchModel base class and defines the `__init__`, `__call__`, and `__net__` functions. Note that the `__call__` function evaluates the whole inference pipeline and returns final predictions.

In [None]:
# Evaluation pipeline written and models trained by Shenqi Wang (wang69@imec.be) and Guangzhi Tang (guangzhi.tang@imec.nl) at imec.

class ObjDetectionModel(NeuroBenchModel):
    def __init__(self, net, box_coder, head):
        self.net = net
        self.box_coder = box_coder
        self.head = head

    def __call__(self, batch):
        self.net.eval()
        inputs = batch.permute(1, 0, 2, 3, 4).to(device='cuda') # dataloader supplies batch,timestep,*; model expects timestep,batch,*
        with torch.no_grad():
            feature = self.net(inputs)
            loc_preds_val, cls_preds_val = self.head(feature)
            scores = self.head.get_scores(cls_preds_val)
            scores = scores.to('cpu')
            for i, feat in enumerate(feature):
                feature[i] = feature[i].to('cpu')
            inputs = inputs.to('cpu')
            loc_preds_val = loc_preds_val.to('cpu')
            preds = box_coder.decode(feature, inputs, loc_preds_val, scores, batch_size=inputs.shape[1], score_thresh=0.05,
                        nms_thresh=0.5, max_boxes_per_input=500)
        return preds

    def __net__(self):
        # returns only network, not box_coder and head
        return self.net

Next, we load our model. This example includes two possibilities, a hybrid model which uses artificial neurons and spiking neurons or a fully artificial neural network without spiking neurons.

In [None]:
# Benchmark the ANN or Hybrid model
mode = "hybrid" # "ann" or "hybrid"

In [None]:
# Loading the model
if mode == "ann":
    # baseline ANN RED architecture
    model = Vanilla(cin = 6, cout = 256, base = 16)
    box_coder = Anchors(num_levels=model.levels, anchor_list="PSEE_ANCHORS", variances=[0.1, 0.2])
    head = BoxHead(model.cout, box_coder.num_anchors, 3+1, 0)
    model = model.to('cuda')
    head = head.to('cuda')
    model.load_state_dict(torch.load('model_data/save_models/25_ann_model.pth', map_location=torch.device('cuda')))
    head.load_state_dict(torch.load('model_data/save_models/25_ann_pd.pth', map_location=torch.device('cuda')))
elif mode == "hybrid":
    # hybrid SNN of above architecture
    model = Vanilla_lif(cin = 6, cout = 256, base = 16)
    box_coder = Anchors(num_levels=model.levels, anchor_list="PSEE_ANCHORS", variances=[0.1, 0.2])
    head = BoxHead(model.cout, box_coder.num_anchors, 3+1, 0)
    model = model.to('cuda')
    head = head.to('cuda')
    model.load_state_dict(torch.load('model_data/save_models/14_hybrid_model.pth', map_location=torch.device('cuda')))
    head.load_state_dict(torch.load('model_data/save_models/14_hybrid_pd.pth', map_location=torch.device('cuda')))
else:
    raise ValueError("mode must be ann or hybrid")

model = ObjDetectionModel(model, box_coder, head)

No pre- or post-processors are needed for this benchmark task setup.

In [None]:
preprocessors = []
postprocessors = []

Next specify the metrics which you want to calculate.

Note that the Model Excecution Rate metric is not returned by the famework, but reported by the user. Execution rate, in Hz, of the model computation based on forward inference passes per second, measured in the time-stepped simulation timescale. For both the ANN and Hybrid models, since raw event data is processed in non-overlapping 50 ms windows, the execution rate is 20 Hz.

In [None]:
static_metrics = ["footprint", "connection_sparsity"]
workload_metrics = ["activation_sparsity", "COCO_mAP", "synaptic_operations"]

Now you are ready to run the benchmark! The run may take a while (~1hr), as the event processing, model inference, and metric calculation are intensive.

In [None]:
benchmark = Benchmark(model, test_set_dataloader, preprocessors, postprocessors, [static_metrics, workload_metrics])
results = benchmark.run()
print(results)

Expected results:

Results - ANN
{'footprint': 91314912, 'connection_sparsity': 0.0, 'activation_sparsity': 0.6339577418819095, 'COCO_mAP': 0.4286601323956029, 'synaptic_operations': {'Effective_MACs': 248423062860.16266, 'Effective_ACs': 0.0, 'Dense': 284070730752.0}}

Results - Hybrid
{'footprint': 12133872, 'connection_sparsity': 0.0, 'activation_sparsity': 0.6130047485397788, 'COCO_mAP': 0.27111120859281557, 'synaptic_operations': {'Effective_MACs': 37520084211.538666, 'Effective_ACs': 559864693.7093333, 'Dense': 98513107968.0}}