# PilotNet SDNN Example

This tutorial demonstrates how to use __lava__ to perform inference on a PilotNet SDNN on both CPU and Loihi 2 neurocore.

![PilotNet Inference](images/pilotnet_sdnn.PNG)

The network receives video input, recorded from a dashboard camera of a driving car (__Dataloader__). The data is encoded efficiently as the difference between individual frames (__Encoder__). The data passes through the PilotNet SDNN, which was trained with __lava-dl__ and is built using its __Network Exchange__ module (netx.hdf5.Network), which automatically generates a Lava process from the training artifact. The network estimates the angle of the steering wheel of the car, which is decoded from the network's raw output (__Decoder__) and sent to a visualization (__Monitor__) and logging system (__Logger__).

The core of the tutorial is lava-dl's Network Exchange module, which is available as `lava.lib.dl.netx.{hdf5, blocks, utils}`.
* `hdf5` implements automatic network generation.
* `blocks` implements individual layer blocks.
* `utils` implements hdf5 reading utilities. 

In addition, it also demonstrates how different lava processes can be connected with each other for real time interaction between them even though the underlying processes can be run on various backends, including Loihi 2.

Switching between Loihi 2 hardware and CPU simulation is as simple as changing the run configuration settings.

In [1]:
import logging
import numpy as np
import matplotlib.pyplot as plt

from lava.magma.core.run_configs import Loihi2HwCfg
from lava.magma.core.run_conditions import RunSteps
from lava.proc import io

from lava.lib.dl import netx
from dataset import PilotNetDataset
from utils import PilotNetEncoder, PilotNetNxEncoderModel, get_input_transform

# Import modules for Loihi2 execution

Check if Loihi2 compiker is available and import related modules.

In [2]:
from lava.utils.system import Loihi2
Loihi2.preferred_partition = 'kp_stack'
loihi2_is_available = Loihi2.is_loihi2_available

# To remove
import os
# os.environ['BOARD'] = 'ncl-og-01'

if loihi2_is_available:
    # TODO: remove lava.utils.profiler in lava-nc
    from lava.utils import n3_profiler
    print(f'Running on {Loihi2.partition}')
    compression = io.encoder.Compression.DELTA_SPARSE_8
else:
    print("Loihi2 compiler is not available in this system. "
          "This tutorial will execute on CPU backend.")
    raise RuntimeError('Energy benchmarking is not supported in CPU execution.')


ImportError: cannot import name 'profiling' from 'lava.utils' (unknown location)

## Create network block

PilotNet SDNN is described by the hdf5 file interface `network.net` exported after training. You can refer to the training tutorial that trains the networks and exports hdf5 file interface at [`tutorials/lava/lib/dl/slayer/pilotnet/train.ipynb`](https://github.com/lava-nc/lava-dl/blob/main/tutorials/lava/lib/dl/slayer/pilotnet/train.ipynb)

A network block can be created by simply instantiating `netx.hdf5.Network` with the path of the desired hdf5 network description file.
* The input layer is accessible as `net.in_layer`.
* The output layer is accessible as `net.out_layer`.
* All the constituent layers are accessible as a list: `net.layers`.

![PilotNet Inference](images/pilotnet_sdnn_network.PNG)

In [None]:
num_steps = 2000
net = netx.hdf5.Network(net_config='network.net', skip_layers=1)
print(net)

## Create Dataset instance
Typically the user would write it or provide it.

In [None]:
transform = get_input_transform(net.net_config)
full_set = PilotNetDataset(
    path='../data',
    size=net.inp.shape[:2],
    transform=transform,  # input transform
    visualize=True,  # visualize ensures the images are returned in sequence
    sample_offset=10550,
)

## Create Dataloader
The dataloader process reads data from the dataset objects and sends out the input frame and ground truth as spikes.

![PilotNet Inference](images/pilotnet_sdnn_dataloader.PNG)

In [None]:
dataloader = io.dataloader.SpikeDataloader(dataset=full_set)
input_encoder = PilotNetEncoder(shape=net.inp.shape,
                                net_config=net.net_config,
                                compression=compression)

In [None]:
dataloader.s_out.connect(input_encoder.inp)
input_encoder.out.connect(net.inp)

## Run the network


In [None]:
power_logger = n3_profiler.Loihi2Power(num_steps=num_steps)
runtime_logger = n3_profiler.Loihi2Runtime()
memory_logger = n3_profiler.Loihi2Memory()
activity_logger = n3_profiler.Loihi2Activity()

pre_run_fxs = [
    lambda b: power_logger.attach(b),
    lambda b: runtime_logger.attach(b),
    # lambda b: memory_logger.attach(b),
    # lambda b: activity_logger.attach(b),
]
post_run_fxs = [
    lambda b: power_logger.get_results(),
    lambda b: runtime_logger.get_results(),
    # lambda b: memory_logger.get_results(),
    # lambda b: activity_logger.get_results(),
]

In [None]:
exception_proc_model_map = {
    io.encoder.DeltaEncoder: io.encoder.PyDeltaEncoderModelSparse,
    PilotNetEncoder: PilotNetNxEncoderModel,
}
run_config = Loihi2HwCfg(exception_proc_model_map=exception_proc_model_map,
                        pre_run_fxs=pre_run_fxs,
                        post_run_fxs=post_run_fxs)
net._log_config.level = logging.INFO
net.run(condition=RunSteps(num_steps=num_steps), run_cfg=run_config)
net.stop()

In [None]:
# runtime measurements
inference_rate = 1e6 / runtime_logger.time_per_step
total_inference_time = num_steps * runtime_logger.time_per_step * 1e-6
print(f'Throughput : {inference_rate:.2f} fps.')


In [None]:
runtime_logger.time_per_step

In [None]:
# power measurements
time_stamp = power_logger.time_stamp
vdd_p = power_logger.vdd  # neurocore power
vddm_p = power_logger.vddm  # memory power
vddio_p = power_logger.vddio  # IO power
total_power = vdd_p + vddm_p + vddio_p
num_measurements = len(vdd_p)
time = time_stamp * 2 * total_inference_time / time_stamp.max()

num_chips = 1
if Loihi2.partition in ['kp', 'kp_stack']:
    num_chips = 8

# per chip static power
static_total_power = np.mean(total_power[num_measurements // 2:]) / num_chips
static_vdd_p = np.mean(vdd_p[num_measurements // 2:]) / num_chips
static_vddm_p = np.mean(vddm_p[num_measurements // 2:]) / num_chips
static_vddio_p = np.mean(vddio_p[num_measurements // 2:]) / num_chips

# compensate for static power of multiple chip
total_power -= (num_chips - 1) * static_total_power
vdd_p -= (num_chips - 1) * static_vdd_p
vddm_p -= (num_chips - 1) * static_vddm_p
vddio_p -= (num_chips - 1) * static_vddio_p

from scipy import signal
fig, ax = plt.subplots()
ax.plot(signal.medfilt(total_power, 51), label='Total Power')
ax.plot(signal.medfilt(vdd_p, 51), label='VDD Power')
ax.plot(signal.medfilt(vddm_p, 51), label='VDD-M Power')
ax.plot(signal.medfilt(vddio_p, 51), label='VDD-IO Power')
ax.axvspan(0, num_measurements // 2, color='green', alpha=0.1)
ax.set_ylabel('Power (W)')
ax.set_xticks([])
ax.legend()

In [None]:
# First half power measurements are when the network is running
# and the second half power measurements are when the board is done executing 
total_power_mean = np.mean(total_power[:num_measurements // 2])
vdd_p_mean = np.mean(vdd_p[:num_measurements // 2])
vddm_p_mean = np.mean(vddm_p[:num_measurements // 2])
vddio_p_mean = np.mean(vddio_p[:num_measurements // 2])
print(f'Total Power   : {total_power_mean:.6f} W')
print(f'Dynamic Power : {total_power_mean - static_total_power:.6f} W')
print(f'Static Power  : {static_total_power:.6f} W')
print(f'VDD Power     : {vdd_p_mean:.6f} W')
print(f'VDD-M Power   : {vddm_p_mean:.6f} W')
print(f'VDD-IO Power  : {vddio_p_mean:.6f} W')

In [None]:
total_energy = total_power_mean / inference_rate
dynamic_energy = (total_power_mean - static_total_power) / inference_rate
print(f'Total Energy per inference   : {total_energy * 1e3:.6f} mJ')
print(f'Dynamic Energy per inference : {dynamic_energy * 1e3:.6f} mJ')

In [None]:
'''@2000 steps
Throughput : 34.72 fps.

Total Power   : 0.247882 W
Dynamic Power : 0.003026 W
Static Power  : 0.244857 W
VDD Power     : 0.100399 W
VDD-M Power   : 0.145534 W
VDD-IO Power  : 0.001950 W

Total Energy per inference   : 7.138505 mJ
Dynamic Energy per inference : 0.087130 mJ
'''