In [1]:
%matplotlib inline



Quick Start Tutorial for Compiling Deep Learning Models
=======================================================
**Author**: `Yao Wang <https://github.com/kevinthesun>`_

This example shows how to build a neural network with NNVM python frontend and
generate runtime library for Nvidia GPU with TVM.
Notice that you need to build TVM with cuda and llvm enabled.



Overview for Supported Hardware Backend of TVM
----------------------------------------------
The image below shows hardware backend currently supported by TVM:

![](https://github.com/dmlc/web-data/raw/master/tvm/tutorial/tvm_support_list.png)

     :align: center
     :scale: 100%

In this tutorial, we'll choose cuda and llvm as target backends.
To begin with, let's import NNVM and TVM.



In [2]:
import numpy as np

import nnvm.compiler
import nnvm.testing
import tvm
from tvm.contrib import graph_runtime

Define Neural Network in NNVM
-----------------------------
First, let's define a neural network with nnvm python frontend.
For simplicity, we'll use pre-defined resnet-18 network in NNVM.
Parameters are initialized with Xavier initializer.
NNVM also supports other model formats such as MXNet, CoreML, ONNX and 
Tensorflow.

In this tutorial, we assume we will do inference on our device
and the batch size is set to be 1. Input images are RGB color
images of size 224 * 224. We can call the :any:`nnvm.symbol.debug_str`
to show the network structure.



In [8]:
batch_size = 1
num_class = 1000
image_shape = (3, 224, 224)
data_shape = (batch_size,) + image_shape
out_shape = (batch_size, num_class)

net, params = nnvm.testing.resnet.get_workload(layers=18,
        batch_size=batch_size, image_shape=image_shape)
print(net.debug_str())

Symbol Outputs:
	output[0]=softmax(0)
Variable:data
Variable:bn_data_gamma
Variable:bn_data_beta
Variable:bn_data_moving_mean
Variable:bn_data_moving_var
--------------------
Op:batch_norm, Name=bn_data
Inputs:
	arg[0]=data(0) version=0
	arg[1]=bn_data_gamma(0) version=0
	arg[2]=bn_data_beta(0) version=0
	arg[3]=bn_data_moving_mean(0) version=1
	arg[4]=bn_data_moving_var(0) version=1
Attrs:
	epsilon=2e-05
	scale=False
Variable:conv0_weight
--------------------
Op:conv2d, Name=conv0
Inputs:
	arg[0]=bn_data(0)
	arg[1]=conv0_weight(0) version=0
Attrs:
	channels=64
	kernel_size=(7, 7)
	padding=(3, 3)
	strides=(2, 2)
	use_bias=False
Variable:bn0_gamma
Variable:bn0_beta
Variable:bn0_moving_mean
Variable:bn0_moving_var
--------------------
Op:batch_norm, Name=bn0
Inputs:
	arg[0]=conv0(0)
	arg[1]=bn0_gamma(0) version=0
	arg[2]=bn0_beta(0) version=0
	arg[3]=bn0_moving_mean(0) version=1
	arg[4]=bn0_moving_var(0) version=1
Attrs:
	epsilon=2e-05
--------------------
Op:relu, Name=relu0
Inputs:
	ar

Compilation
-----------
Next step is to compile the model using the NNVM/TVM pipeline.
Users can specify the optimization level of the compilation.
Currently this value can be 0 to 3. The optimization passes include
operator fusion, pre-computation, layout transformation and so on.

:any:`nnvm.compiler.build` returns three components: the execution graph in
json format, the TVM module library of compiled functions specifically
for this graph on the target hardware, and the parameter blobs of
the model. During the compilation, NNVM does the graph-level
optimization while TVM does the tensor-level optimization, resulting
in an optimized runtime module for model serving.

We'll first compile for Nvidia GPU. Behind the scene, `nnvm.compiler.build`
first does a number of graph-level optimizations, e.g. pruning, fusing, etc.,
then registers the operators (i.e. the nodes of the optimized graphs) to
TVM implementations to generate a `tvm.module`.
To generate the module library, TVM will first transfer the High level IR
into the lower intrinsic IR of the specified target backend, which is CUDA
in this example. Then the machine code will be generated as the module library.



In [13]:
opt_level = 3
target = tvm.target.cuda()
with nnvm.compiler.build_config(opt_level=opt_level):
    graph, lib, params = nnvm.compiler.build(
        net, target, shape={"data": data_shape}, params=params)

In [30]:
print (lib.get_source())

; ModuleID = 'fuse_broadcast_add'
source_filename = "fuse_broadcast_add"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

%0 = type { double }
%1 = type { i8*, %2, i32, %3, i64*, i64*, i64 }
%2 = type { i32, i32 }
%3 = type { i8, i8, i16 }

@__tvm_module_ctx = linkonce dllexport local_unnamed_addr global i8* null, align 8
@__TVMFuncCall = linkonce dllexport local_unnamed_addr global i32 (i8*, %0*, i32*, i32, %0*, i32*)* null, align 8
@__TVMBackendGetFuncFromEnv = linkonce dllexport local_unnamed_addr global i32 (i8*, i8*, i8**)* null, align 8
@__TVMAPISetLastError = linkonce dllexport local_unnamed_addr global void (i8*)* null, align 8
@.str = private constant [71 x i8] c"Assert fail: (num_args == 3), fuse_broadcast_add: num_args should be 3\00", align 1
@.str.1 = private constant [226 x i8] c"Assert fail: ((((1 == int32(arg0.strides[3])) && ((1*224) == int32(arg0.strides[2]))) && (((1*224)*224) == int32(arg0.strides[1]))) && ((((1*224)

Run the generate library
------------------------
Now we can create graph runtime and run the module on Nvidia GPU.



In [31]:
# create random input
ctx = tvm.gpu()
data = np.random.uniform(-1, 1, size=data_shape).astype("float32")
# create module
module = graph_runtime.create(graph, lib, ctx)
# set input and parameters
module.set_input("data", data)
module.set_input(**params)
# run
module.run()
# get output
out = module.get_output(0, tvm.nd.empty(out_shape))
# convert to numpy
out.asnumpy()

# Print first 10 elements of output
print(out.asnumpy().flatten()[0:10])

[0.001011   0.0010983  0.00106555 0.00112959 0.00122434 0.00093741
 0.00107729 0.00101848 0.00093557 0.00091746]


In [34]:
print (type(module))

<class 'tvm.contrib.graph_runtime.GraphModule'>


In [35]:
print (type(out))

<class 'tvm.ndarray.NDArray'>


Save and Load Compiled Module
-----------------------------
We can also save the graph, lib and parameters into files and load them
back in deploy environment.



In [36]:
# save the graph, lib and params into separate files
from tvm.contrib import util

temp = util.tempdir()
path_lib = temp.relpath("deploy_lib.so")
lib.export_library(path_lib)
with open(temp.relpath("deploy_graph.json"), "w") as fo:
    fo.write(graph.json())
with open(temp.relpath("deploy_param.params"), "wb") as fo:
    fo.write(nnvm.compiler.save_param_dict(params))
print(temp.listdir())

['deploy_param.params', 'deploy_graph.json', 'deploy_lib.so']


In [42]:
print (temp.temp_dir)

/tmp/tmp5jpv0p2f


In [43]:
# load the module back.
loaded_json = open(temp.relpath("deploy_graph.json")).read()
loaded_lib = tvm.module.load(path_lib)
loaded_params = bytearray(open(temp.relpath("deploy_param.params"), "rb").read())
input_data = tvm.nd.array(np.random.uniform(size=data_shape).astype("float32"))

module = graph_runtime.create(loaded_json, loaded_lib, tvm.gpu(0))
module.load_params(loaded_params)
module.run(data=input_data)

out = module.get_output(0, out=tvm.nd.empty(out_shape))

In [45]:
print (out.shape)

(1, 1000)
