In [None]:
%matplotlib inline


# MicroTVM Host-Driven Example

**Author**: `Andrew Reusch <areusch@octoml.ai>`_

**Changes for Nucleo STM32L496ZG**: `Max Sponner`



TVMConf 2020

This tutorial walks you through the process of deploying a model on-device using microTVM.
We'll use a model adapted from ARM's pre-quantized [CIFAR10-CNN tutorial](https://github.com/ARM-software/ML-examples/tree/master/cmsisnn-cifar10/models) and run it on an ARM M-class microcontroller.

Importing the Model
-------------------

We'll use the utilties in the microtvm blogpost repo to import and load the model:



In [None]:
import os

import tvm
import tvm.relay

import micro_eval
from micro_eval import model

microtvm_blogpost_path = os.path.realpath(os.path.join(micro_eval.__file__, '..', '..', '..'))
config_path = os.path.join(microtvm_blogpost_path, 'data', 'cifar10-config-validate.json')
model_inst, _ = model.instantiate_from_spec(f'cifar10_cnn:micro_dev:{config_path}')

compiled_model = model_inst.build_model()

relay_model, params = compiled_model.ir_mod, compiled_model.params

Great! We now have a Relay model and accompanying parameters. Let's take a look:



In [None]:
print(relay_model)

And we now have some parameters:



In [None]:
print('Parameters')
for k, v in params.items():
  print(f' * {k}: {v.shape}')

Compiling the Model
-------------------

Let's now run the TVM compiler. This step will lower the model all the way through the TIR to C.
First, we need to define the target we will use with TVM:



In [None]:
TARGET = tvm.target.target.create('c -keys=arm_cpu -mcpu=cortex-m4 -link-params -model=stm32l496xx -runtime=c -system-lib=1')

This target has a few parts:

 - ```-keys=arm_cpu```: Enables operator schedules used on ARM cpus
 - ```-mcpu=cortex-m4```: Specifies the CPU we will use with this model.
 - ```-link-params```: Link supplied model parameters as constants in the generated code.
 - ```-model=stm32l496xx```: Hint to the compiler of the CPU model. Mostly unused at this time.
 - ```-runtime=c```: Build code for the TVM C runtime (i.e. the bare-metal compatible one).
 - ```-system-lib```: Build a "system library." In deployments, the system library is pre-loaded into
   the runtime, rather than a library that needs to be loaded e.g. from a file. This is the simplest
   configuration for a bare-metal microcontroller, so we use it here.

Now we can run the compiler:



In [None]:
with tvm.transform.PassContext(opt_level=3, config={'tir.disable_vectorize': True}):
  graph_json, compiled_model, simplified_params = tvm.relay.build(
    relay_model, target=TARGET, params=params)

Now we've lowered our model into C. Let's look at a fragment:



In [None]:
print('\n'.join(compiled_model.imported_modules[0].get_source().split('\n')[:800]))

We can also look at the generated FuncRegistry:

In [None]:
print(compiled_model.get_source())

Let's also look at the simplified parameters:



In [None]:
print('Simplified Parameters')
for k, v in simplified_params.items():
  print(f' * {k}: {v.shape}')

print(graph_json)

Building a host-driven binary
-----------------------------

First we'll build a firmware binary that can be controlled using an attached host computer over
UART. This is a great way to try out the network while writing minimal firmware, and it's also
how autotuning is accomplished. We'll re-use the compilation flow we use with autotuning:



In [None]:
from tvm.micro.contrib import zephyr
opts = model_inst.get_micro_compiler_opts()
opts['lib_opts']['cmake_args'] = ['-DCMAKE_VERBOSE_MAKEFILE=1']

# Instantiate the compiler.
compiler = zephyr.ZephyrCompiler(os.path.join(microtvm_blogpost_path, 'runtimes', 'zephyr'),
                                 board='nucleo_l496zg',
                                 zephyr_toolchain_variant='zephyr')

# A Workspace is a directory that holds compiled libraries.
workspace = tvm.micro.Workspace(debug=True)

# Build the micro-binary, which represents the final firmware image.
micro_bin = tvm.micro.build_static_runtime(workspace, compiler, compiled_model, **opts)

In [None]:
print(os.path.join(micro_bin.base_dir, micro_bin.binary_file))
!~/zephyr-sdk/arm-zephyr-eabi/bin/arm-zephyr-eabi-size {os.path.join(micro_bin.base_dir, micro_bin.binary_file)}

Generating test data
--------------------

Now we'll generate some test data to be used to demonstrate inference. We use the dataset
generator from the microtvm blog post:



In [None]:
from micro_eval import dataset
dataset_gen = dataset.DatasetGenerator.instantiate('cifar10', {'shuffle': False})

samples = dataset_gen.generate(1)

# Adapt samples as needed to accomodate the modified input shape.
inputs = model_inst.adapt_sample_inputs(samples[0].inputs)

In [None]:
print(inputs['data'].data)

Flashing and Running
--------------------

Now we'll flash the binary onto an attached development board and establish communication.



In [None]:
with tvm.micro.Session(binary=micro_bin, flasher=compiler.flasher()) as sess:
  mod = tvm.micro.create_local_graph_runtime(graph_json, sess.get_system_lib(), sess.context)
  mod.set_input('data', inputs['data'].data)  # NOTE: the simplified params are set from flash.
  mod.run()

  micro_output = mod.get_output(0).asnumpy()

print('micro:', micro_output)

Checking our work
-----------------

We can check the output from executing on-device against output from the host.



In [None]:
model_inst, _ = model.instantiate_from_spec(f'cifar10_cnn:cpu:{config_path}')

compiled_model = model_inst.build_model()

cpu_relay_model, cpu_params = compiled_model.ir_mod, compiled_model.params
cpu_inputs = model_inst.adapt_sample_inputs(samples[0].inputs)

with tvm.transform.PassContext(opt_level=3, disabled_pass={"AlterOpLayout"}):
  cpu_graph_json, cpu_mod, cpu_simplified_params = tvm.relay.build(
    cpu_relay_model, target="llvm", params=cpu_params)

graph_mod = tvm.contrib.graph_runtime.create(cpu_graph_json, cpu_mod, tvm.cpu(0))
graph_mod.set_input('data', cpu_inputs['data'].data, **cpu_simplified_params)
graph_mod.run()
cpu_output = graph_mod.get_output(0)

print('cpu:', cpu_output)