# DVS Gesture Benchmark Tutorial

This tutorial aims to provide an insight on how the NeuroBench framework is organized and how you can use it to benchmark your own models!

## About DVS Gesture:
The IBM Dynamic Vision Sensor (DVS) Gesture dataset is composed of recordings of 29 distinct individuals executing 10 different types of gestures, including but not limited to clapping, waving, etc. Additionally, an 11th gesture class is included that comprises gestures that cannot be categorized within the first 10 classes. The gestures are recorded under four distinct lighting conditions, and each gesture is associated with a label that indicates the corresponding lighting condition under which it was performed.

### Benchmark Task:
The task is to classify gestures and achieve high accuracy. This tutorial demonstrates with a trained convolutional spiking neural network.

First we will import the relevant libraries. We will use the [Tonic library](https://tonic.readthedocs.io/en/latest/) for loading and pre-processing the data, and the model wrapper, post-processor, and benchmark object from NeuroBench.

In [None]:
# Tonic library is used for DVS Gesture dataset loading and processing
import tonic
import tonic.transforms as transforms
from torch.utils.data import DataLoader

from neurobench.models import SNNTorchModel
from neurobench.postprocessing import choose_max_count
from neurobench.benchmarks import Benchmark

For this tutorial, we will make use of a four-layer convolutional SNN, written using snnTorch.

In [None]:
import torch
import torch.nn as nn
import snntorch as snn
from snntorch import surrogate

class Net(torch.nn.Module):
    def __init__(self):
        super().__init__()

        # Hyperparameters
        beta_1 = 0.9999903192467171
        beta_2 = 0.7291118090686332
        beta_3 = 0.9364650136740154
        beta_4 = 0.8348241794080301
        threshold_1 = 3.511291184386264
        threshold_2 = 3.494437965584431
        threshold_3 = 1.5986853560315544
        threshold_4 = 0.3641469130041378
        spike_grad = surrogate.atan()
        dropout = 0.5956071342984011
        
         # Initialize layers
        self.conv1 = nn.Conv2d(2, 16, 5, padding="same")
        self.pool1 = nn.MaxPool2d(2)
        self.lif1 = snn.Leaky(beta=beta_1, threshold=threshold_1, spike_grad=spike_grad, init_hidden=True)
        
        self.conv2 = nn.Conv2d(16, 32, 5, padding="same")
        self.pool2 = nn.MaxPool2d(2)
        self.lif2 = snn.Leaky(beta=beta_2, threshold=threshold_2, spike_grad=spike_grad, init_hidden=True)
        
        self.conv3 = nn.Conv2d(32, 64, 5, padding="same")
        self.pool3 = nn.MaxPool2d(2)
        self.lif3 = snn.Leaky(beta=beta_3, threshold=threshold_3, spike_grad=spike_grad, init_hidden=True)
        
        self.linear1 = nn.Linear(64*4*4, 11)
        self.dropout_4 = nn.Dropout(dropout)
        self.lif4 = snn.Leaky(beta=beta_4, threshold=threshold_4, spike_grad=spike_grad, init_hidden=True, output=True)

    def forward(self, x):
        # x is expected to be in shape (batch, channels, height, width) = (B, 2, 32, 32)
        
        # Layer 1
        y = self.conv1(x)
        y = self.pool1(y)
        spk1 = self.lif1(y)

        # Layer 2
        y = self.conv2(spk1)
        y = self.pool2(y)
        spk2 = self.lif2(y)

        # Layer 3
        y = self.conv3(spk2)
        y = self.pool3(y)
        spk3 = self.lif3(y)

        # Layer 4
        y = self.linear1(spk3.flatten(1))
        y = self.dropout_4(y)
        spk4, mem4 = self.lif4(y)

        return spk4, mem4

We load a pre-trained model. The model is wrapped in the SNNTorchModel wrapper, which includes boilerplate inference code and interfaces with the top-level Benchmark class.

In [None]:
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

net = Net()
net.load_state_dict(torch.load("model_data/dvs_gesture_snn", map_location=device))

model = SNNTorchModel(net)

Next, we will load the dataset. Here, we are using the DVSGesture dataset from the Tonic library, as well as transforms to turn the events into frames that can be processed.

In [None]:
# Load the dataset, here we are using the Tonic library
data_dir = "../../../data/dvs_gesture" # data in repo root dir
test_transform = transforms.Compose([transforms.Denoise(filter_time=10000),
                                     transforms.Downsample(spatial_factor=0.25),
                                     transforms.ToFrame(sensor_size=(32, 32, 2),
                                                        n_time_bins=150),
                                    ])
test_set = tonic.datasets.DVSGesture(save_to=data_dir, transform=test_transform, train=False)
test_set_loader = DataLoader(test_set, batch_size=16,
                         collate_fn=tonic.collation.PadTensors(batch_first=True))

In [None]:
preprocessors = []
postprocessors = [choose_max_count]

Next specify the metrics which you want to calculate. The metrics include static metrics, which are computed before any model inference, and workload metrics, which show inference results.

- Footprint: Bytes used to store the model parameters and buffers.
- Connection sparsity: Proportion of zero weights in the model.
- Classification accuracy: Accuracy of keyword predictions.
- Activation sparsity: Proportion of zero activations, averaged over all neurons, timesteps, and samples.
- Synaptic operations: Number of weight-activation operations, averaged over keyword samples.
  - Effective MACs: Number of non-zero multiply-accumulate synops, where the activations are not spikes with values -1 or 1.
  - Effective ACs: Number of non-zero accumulate synops, where the activations are -1 or 1 only.
  - Dense: Total zero and non-zero synops.

In [None]:
static_metrics = ["footprint", "connection_sparsity"]
workload_metrics = ["classification_accuracy", "activation_sparsity", "synaptic_operations"]

Next, we instantiate the benchmark. We pass the model, the dataloader, the preprocessors, the postprocessor and the list of the static and data metrics which we want to measure:

In [None]:
benchmark = Benchmark(model, test_set_loader, preprocessors, postprocessors, [static_metrics, workload_metrics])

Now, let's run the benchmark and print our results!

In [None]:
results = benchmark.run()
print(results)

Expected output:
{'footprint': 304828, 'connection_sparsity': 0.0, 
'classification_accuracy': 0.8636363636363633, 'activation_sparsity': 0.9507192967815323, 
'synaptic_operations': {'Effective_MACs': 9227011.575757576, 'Effective_ACs': 30564577.174242426, 'Dense': 891206400.0}}