[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/uwsampl/tutorial/blob/master/notebook/02_TVM_Tutorial_Relay.ipynb)

Please run the following block to ensure TVM is setup for *this notebook*, each notebook may have its own runtime.

In [2]:
try:
  import google.colab
  IN_COLAB = True
except:
  IN_COLAB = False

if IN_COLAB:
    ! gsutil cp "gs://tvm-fcrc-binariesd5fce43e-8373-11e9-bfb6-0242ac1c0002/tvm.tar.gz" /tmp/tvm.tar.gz
    ! mkdir -p /tvm
    ! tar -xf /tmp/tvm.tar.gz --strip-components=4 --directory /tvm
    ! ls -la /tvm
    ! bash /tvm/package.sh
    ! export TVM_HOME=/tvm
    # Add TVM to the Python path.
    import sys
    sys.path.append('/tvm/python')
    sys.path.append('/tvm/topi/python')
    sys.path.append('/tvm/vta/python')
else:
    print("Notebook executing locally, skipping Colab setup ...")

Notebook executing locally, skipping Colab setup ...


In [3]:
if IN_COLAB:
    ! cd /; git clone https://github.com/uwsampl/relay-aot
    sys.path.append('/relay-aot')
else:
    import aot

# Relay: an Extensible Deep Learning IR

Last year TVM introduced Relay IR – a second generation high-level IR for deep learning. 

Relay's design comes from a simple insight that the critical difference between regular IRs
and deep learning IRs are the primitive values they manipulate. Relay is designed using well
known insights from the programming languages community coupled with TVM's existing 
infrastructure to provide state of the art performance. 

If you are familiar with ideas from programming languages or existing computation graph
representations, we will connect Relay to your existing knowledge during this tutorial.

We will first cover the design of Relay then elaborate on how one can use it to 
accomplish a wide variety of tasks. This portion of the tutorial is focused  
on Relay, but Relay will be discussed throughout the day day. Relay serves
as the interface layer to TVM.

In [4]:
import tvm
from tvm import relay
import tvm.relay.testing
from tvm.relay.expr_functor import ExprMutator
import torch
import torchvision
import onnx
import numpy

## Language 

We will first briefly introduce the concepts of Relay below.
You can find a full language specification [here](https://docs.tvm.ai/langref/index.html).

### Variables 

In [5]:
# A single Relay variable, the string is just a hint
x = relay.var('x')

# A Relay variable with a different dtype, defaults to float32.
x = relay.var('x', dtype='int32')

# A Relay variable with a different shape.
x = relay.var('x', shape=(10, 1))

### Operators

Relay provides high performance operators defined in TVM that implement the primitive operations needed by deep learning applications. Operators can be applied to arguments just like regular Python or C++ functions. Common arithemetic operations are provided both via names and operator overloading.

Variables can be used to construct Relay *expressions* which replace the concept of graphs present in previous frameworks. A Relay expression can be viewed much like a graph with extra functionality as we will see as we go
forward.

In [6]:
w = relay.op.add(x, x)
print(w)

v0.0.1
free_var %x: Tensor[(10, 1), float32]
add(%x, %x)


In [7]:
z = x + x
print(z)

v0.0.1
free_var %x: Tensor[(10, 1), float32]
add(%x, %x)


### Functions

The fundamental packaging of computation in Relay is the function. A function is a combination of a set of inputs
and a Relay expression. Relay functions are no different than ones in programming languages today. They replace named subgraphs.

In [8]:
f = relay.Function([x], z)
print(f)

v0.0.1
fn (%x: Tensor[(10, 1), float32]) {
  add(%x, %x)
}


### Module

Finally, we can give functions a global name and package many of them together into a module. When we add a function to the module, it will be type checked before hand.

When we print the module, you can see the program annotated with all type information. 

In [9]:
mod = relay.Module({})
fname = relay.GlobalVar('f')
mod[fname] = f

print(mod)

v0.0.1
def @f(%x: Tensor[(10, 1), float32]) -> Tensor[(10, 1), float32] {
  add(%x, %x) /* ty=Tensor[(10, 1), float32] */
}



## Frontends

Relay comes with a variety of frontends and supports most major frameworks, including TensorFlow, PyTorch, MxNet, ONNX, Keras and Caffe2.

Below we provide a couple examples of using these frontends to import models into Relay.

You can find specific tutorials on deploying pretrained models below:  

- [ONNX](https://docs.tvm.ai/tutorials/frontend/from_onnx.html#sphx-glr-tutorials-frontend-from-onnx-py)
- [TensorFlow](https://docs.tvm.ai/tutorials/frontend/from_tensorflow.html#sphx-glr-tutorials-frontend-from-tensorflow-py)
- [Keras](https://docs.tvm.ai/tutorials/frontend/from_keras.html#sphx-glr-tutorials-frontend-from-keras-py)
- [PyTorch](https://tvm.ai/2019/05/30/pytorch-frontend)
- [Caffe2](https://docs.tvm.ai/tutorials/frontend/from_caffe2.html#sphx-glr-tutorials-frontend-from-caffe2-py)

In [12]:
torch_resnet18 = torchvision.models.resnet18()
dummy_input = torch.randn(10, 3, 224, 224)
torch.onnx.export(torch_resnet18, dummy_input, "resnet.onnx", verbose=True)

graph(%0 : Float(10, 3, 224, 224),
      %conv1.weight : Float(64, 3, 7, 7),
      %bn1.weight : Float(64),
      %bn1.bias : Float(64),
      %bn1.running_mean : Float(64),
      %bn1.running_var : Float(64),
      %bn1.num_batches_tracked : Long(),
      %layer1.0.conv1.weight : Float(64, 64, 3, 3),
      %layer1.0.bn1.weight : Float(64),
      %layer1.0.bn1.bias : Float(64),
      %layer1.0.bn1.running_mean : Float(64),
      %layer1.0.bn1.running_var : Float(64),
      %layer1.0.bn1.num_batches_tracked : Long(),
      %layer1.0.conv2.weight : Float(64, 64, 3, 3),
      %layer1.0.bn2.weight : Float(64),
      %layer1.0.bn2.bias : Float(64),
      %layer1.0.bn2.running_mean : Float(64),
      %layer1.0.bn2.running_var : Float(64),
      %layer1.0.bn2.num_batches_tracked : Long(),
      %layer1.1.conv1.weight : Float(64, 64, 3, 3),
      %layer1.1.bn1.weight : Float(64),
      %layer1.1.bn1.bias : Float(64),
      %layer1.1.bn1.running_mean : Float(64),
      %layer1.1.bn1.running_var

In [13]:
onnx_resnet18 = onnx.load('resnet.onnx')
func, params = relay.frontend.from_onnx(onnx_resnet18, shape={ '0': (10, 3, 224, 224) })
print(func)



TVMError: Traceback (most recent call last):
  [bt] (8) 9   libtvm.dylib                        0x000000011b88ffdd tvm::relay::backend::RelayBuildModule::GetFunction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<tvm::runtime::ModuleNode> const&)::'lambda1'(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const + 429
  [bt] (7) 8   libtvm.dylib                        0x000000011b8901f2 tvm::relay::backend::RelayBuildModule::Build(tvm::relay::Function, tvm::Map<tvm::Integer, tvm::Target, void, void> const&, tvm::Target const&) + 130
  [bt] (6) 7   libtvm.dylib                        0x000000011b8903f9 tvm::relay::backend::RelayBuildModule::BuildRelay(tvm::relay::Function, std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, tvm::runtime::NDArray, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, tvm::runtime::NDArray> > > const&) + 345
  [bt] (5) 6   libtvm.dylib                        0x000000011b95e2ca tvm::relay::ModuleNode::FromExpr(tvm::relay::Expr const&, tvm::Map<tvm::relay::GlobalVar, tvm::relay::Function, void, void> const&) + 938
  [bt] (4) 5   libtvm.dylib                        0x000000011b95cb17 tvm::relay::ModuleNode::Add(tvm::relay::GlobalVar const&, tvm::relay::Function const&, bool) + 151
  [bt] (3) 4   libtvm.dylib                        0x000000011bc3b5b8 tvm::relay::InferType(tvm::relay::Function const&, tvm::relay::Module const&, tvm::relay::GlobalVar const&) + 472
  [bt] (2) 3   libtvm.dylib                        0x000000011bc3a687 tvm::relay::TypeInferencer::Infer(tvm::relay::Expr) + 135
  [bt] (1) 2   libtvm.dylib                        0x000000011b928e23 tvm::relay::ErrorReporter::RenderErrors(tvm::relay::Module const&, bool) + 5555
  [bt] (0) 1   libtvm.dylib                        0x000000011b4df949 dmlc::LogMessageFatal::~LogMessageFatal() + 57
  [bt] (8) 9   libtvm.dylib                        0x000000011b95cb17 tvm::relay::ModuleNode::Add(tvm::relay::GlobalVar const&, tvm::relay::Function const&, bool) + 151
  [bt] (7) 8   libtvm.dylib                        0x000000011bc3b5b8 tvm::relay::InferType(tvm::relay::Function const&, tvm::relay::Module const&, tvm::relay::GlobalVar const&) + 472
  [bt] (6) 7   libtvm.dylib                        0x000000011bc3a66b tvm::relay::TypeInferencer::Infer(tvm::relay::Expr) + 107
  [bt] (5) 6   libtvm.dylib                        0x000000011bc5705a tvm::relay::TypeSolver::Solve() + 1114
  [bt] (4) 5   libtvm.dylib                        0x000000011bc576f8 tvm::TypedEnvFunc<bool (tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&)>::operator()(tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&) const + 328
  [bt] (3) 4   libtvm.dylib                        0x000000011b9b0ba9 std::__1::__function::__func<void tvm::runtime::TypedPackedFunc<bool (tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&)>::AssignTypedLambda<bool (*)(tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&)>(bool (*)(tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&))::'lambda'(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*), std::__1::allocator<void tvm::runtime::TypedPackedFunc<bool (tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&)>::AssignTypedLambda<bool (*)(tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&)>(bool (*)(tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&))::'lambda'(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)>, void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&) + 137
  [bt] (2) 3   libtvm.dylib                        0x000000011b9b0c4f void tvm::runtime::detail::unpack_call_dispatcher<bool, 0, 4, bool (*)(tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&)>::run<tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue>(bool (* const&)(tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&), tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&) + 95
  [bt] (1) 2   libtvm.dylib                        0x000000011babe79e tvm::relay::ConcatenateRel(tvm::Array<tvm::relay::Type, void> const&, int, tvm::Attrs const&, tvm::relay::TypeReporter const&) + 1918
  [bt] (0) 1   libtvm.dylib                        0x000000011b4df949 dmlc::LogMessageFatal::~LogMessageFatal() + 57
  File "/Users/jroesch/Git/tvm/src/relay/ir/error.cc", line 132
TVMError: [1m
Error(s) have occurred. We have annotated the program with them:

[0m[1mIn `main`: 
[0mv0.0.1
fn (%v0: Tensor[(10, 3, 224, 224), float32]) {
  %0 = nn.conv2d(%v0, meta[relay.Constant][0], strides=[2, 2], padding=[3, 3], kernel_size=[7, 7])
  %1 = nn.batch_norm(%0, meta[relay.Constant][1], meta[relay.Constant][2], meta[relay.Constant][3], meta[relay.Constant][4], epsilon=1e-05)
  %2 = %1.0
  %3 = nn.relu(%2)
  %4 = nn.max_pool2d(%3, pool_size=[3, 3], strides=[2, 2], padding=[1, 1])
  %5 = nn.conv2d(%4, meta[relay.Constant][5], padding=[1, 1], kernel_size=[3, 3])
  %6 = nn.batch_norm(%5, meta[relay.Constant][6], meta[relay.Constant][7], meta[relay.Constant][8], meta[relay.Constant][9], epsilon=1e-05)
  %7 = %6.0
  %8 = nn.relu(%7)
  %9 = nn.conv2d(%8, meta[relay.Constant][10], padding=[1, 1], kernel_size=[3, 3])
  %10 = nn.batch_norm(%9, meta[relay.Constant][11], meta[relay.Constant][12], meta[relay.Constant][13], meta[relay.Constant][14], epsilon=1e-05)
  %11 = %10.0
  %12 = add(%11, %4)
  %13 = nn.relu(%12)
  %14 = nn.conv2d(%13, meta[relay.Constant][15], padding=[1, 1], kernel_size=[3, 3])
  %15 = nn.batch_norm(%14, meta[relay.Constant][16], meta[relay.Constant][17], meta[relay.Constant][18], meta[relay.Constant][19], epsilon=1e-05)
  %16 = %15.0
  %17 = nn.relu(%16)
  %18 = nn.conv2d(%17, meta[relay.Constant][20], padding=[1, 1], kernel_size=[3, 3])
  %19 = nn.batch_norm(%18, meta[relay.Constant][21], meta[relay.Constant][22], meta[relay.Constant][23], meta[relay.Constant][24], epsilon=1e-05)
  %20 = %19.0
  %21 = add(%20, %13)
  %22 = nn.relu(%21)
  %23 = nn.conv2d(%22, meta[relay.Constant][25], strides=[2, 2], padding=[1, 1], kernel_size=[3, 3])
  %24 = nn.batch_norm(%23, meta[relay.Constant][26], meta[relay.Constant][27], meta[relay.Constant][28], meta[relay.Constant][29], epsilon=1e-05)
  %25 = %24.0
  %26 = nn.relu(%25)
  %27 = nn.conv2d(%26, meta[relay.Constant][30], padding=[1, 1], kernel_size=[3, 3])
  %28 = nn.batch_norm(%27, meta[relay.Constant][31], meta[relay.Constant][32], meta[relay.Constant][33], meta[relay.Constant][34], epsilon=1e-05)
  %29 = %28.0
  %30 = nn.conv2d(%22, meta[relay.Constant][35], strides=[2, 2], kernel_size=[1, 1])
  %31 = nn.batch_norm(%30, meta[relay.Constant][36], meta[relay.Constant][37], meta[relay.Constant][38], meta[relay.Constant][39], epsilon=1e-05)
  %32 = %31.0
  %33 = add(%29, %32)
  %34 = nn.relu(%33)
  %35 = nn.conv2d(%34, meta[relay.Constant][40], padding=[1, 1], kernel_size=[3, 3])
  %36 = nn.batch_norm(%35, meta[relay.Constant][41], meta[relay.Constant][42], meta[relay.Constant][43], meta[relay.Constant][44], epsilon=1e-05)
  %37 = %36.0
  %38 = nn.relu(%37)
  %39 = nn.conv2d(%38, meta[relay.Constant][45], padding=[1, 1], kernel_size=[3, 3])
  %40 = nn.batch_norm(%39, meta[relay.Constant][46], meta[relay.Constant][47], meta[relay.Constant][48], meta[relay.Constant][49], epsilon=1e-05)
  %41 = %40.0
  %42 = add(%41, %34)
  %43 = nn.relu(%42)
  %44 = nn.conv2d(%43, meta[relay.Constant][50], strides=[2, 2], padding=[1, 1], kernel_size=[3, 3])
  %45 = nn.batch_norm(%44, meta[relay.Constant][51], meta[relay.Constant][52], meta[relay.Constant][53], meta[relay.Constant][54], epsilon=1e-05)
  %46 = %45.0
  %47 = nn.relu(%46)
  %48 = nn.conv2d(%47, meta[relay.Constant][55], padding=[1, 1], kernel_size=[3, 3])
  %49 = nn.batch_norm(%48, meta[relay.Constant][56], meta[relay.Constant][57], meta[relay.Constant][58], meta[relay.Constant][59], epsilon=1e-05)
  %50 = %49.0
  %51 = nn.conv2d(%43, meta[relay.Constant][60], strides=[2, 2], kernel_size=[1, 1])
  %52 = nn.batch_norm(%51, meta[relay.Constant][61], meta[relay.Constant][62], meta[relay.Constant][63], meta[relay.Constant][64], epsilon=1e-05)
  %53 = %52.0
  %54 = add(%50, %53)
  %55 = nn.relu(%54)
  %56 = nn.conv2d(%55, meta[relay.Constant][65], padding=[1, 1], kernel_size=[3, 3])
  %57 = nn.batch_norm(%56, meta[relay.Constant][66], meta[relay.Constant][67], meta[relay.Constant][68], meta[relay.Constant][69], epsilon=1e-05)
  %58 = %57.0
  %59 = nn.relu(%58)
  %60 = nn.conv2d(%59, meta[relay.Constant][70], padding=[1, 1], kernel_size=[3, 3])
  %61 = nn.batch_norm(%60, meta[relay.Constant][71], meta[relay.Constant][72], meta[relay.Constant][73], meta[relay.Constant][74], epsilon=1e-05)
  %62 = %61.0
  %63 = add(%62, %55)
  %64 = nn.relu(%63)
  %65 = nn.conv2d(%64, meta[relay.Constant][75], strides=[2, 2], padding=[1, 1], kernel_size=[3, 3])
  %66 = nn.batch_norm(%65, meta[relay.Constant][76], meta[relay.Constant][77], meta[relay.Constant][78], meta[relay.Constant][79], epsilon=1e-05)
  %67 = %66.0
  %68 = nn.relu(%67)
  %69 = nn.conv2d(%68, meta[relay.Constant][80], padding=[1, 1], kernel_size=[3, 3])
  %70 = nn.batch_norm(%69, meta[relay.Constant][81], meta[relay.Constant][82], meta[relay.Constant][83], meta[relay.Constant][84], epsilon=1e-05)
  %71 = %70.0
  %72 = nn.conv2d(%64, meta[relay.Constant][85], strides=[2, 2], kernel_size=[1, 1])
  %73 = nn.batch_norm(%72, meta[relay.Constant][86], meta[relay.Constant][87], meta[relay.Constant][88], meta[relay.Constant][89], epsilon=1e-05)
  %74 = %73.0
  %75 = add(%71, %74)
  %76 = nn.relu(%75)
  %77 = nn.conv2d(%76, meta[relay.Constant][90], padding=[1, 1], kernel_size=[3, 3])
  %78 = nn.batch_norm(%77, meta[relay.Constant][91], meta[relay.Constant][92], meta[relay.Constant][93], meta[relay.Constant][94], epsilon=1e-05)
  %79 = %78.0
  %80 = nn.relu(%79)
  %81 = nn.conv2d(%80, meta[relay.Constant][95], padding=[1, 1], kernel_size=[3, 3])
  %82 = nn.batch_norm(%81, meta[relay.Constant][96], meta[relay.Constant][97], meta[relay.Constant][98], meta[relay.Constant][99], epsilon=1e-05)
  %83 = %82.0
  %84 = add(%83, %76)
  %85 = nn.relu(%84)
  %86 = nn.global_avg_pool2d(%85)
  %87 = shape_of(%86, dtype="int32")
  %88 = take(%87, int64(0), axis=0)
  %89 = expand_dims(%88, axis=0)
  %90 = expand_dims(int64(-1), axis=0)
  %91 = (%89, %90)
  concatenate(%91)[31man internal invariant was violated while typechecking your program [17:22:35] /Users/jroesch/Git/tvm/src/relay/op/tensor/transform.cc:204: Check failed: e_dtype == dtype (int64 vs. int32) : relay.concatenate requires all tensors have the same dtype
; [39m
}
// meta data omitted. you can use show_meta_data=True to include meta data


## Text Format

Relay has a textual representation that can be used to write and print programs. The textual format is still being stablized but can still be of great use today. For example, instead of providing inscrutable graph representations of programs, we can produce human readable output by default.

There are a few different ways to interact with the textual format. The first is to just print out a Relay expression as we have seen above.

In [None]:
mlp, params = relay.testing.mlp.get_workload(1)
print(mlp)

By default the pretty printer renders the code without metadata. The pretty printer omits metadata because the metadata, which contains information such as constants, is often unreadable. For instance, imagine we perform an optimization, such as inlining the parameters into the program for further optimization. The metadata would include 100s of megabytes of parameters.

In [None]:
def inline_parameters(expr, params):
    param_map = dict((p.name_hint, p) for p in expr.params)
    params = dict((param_map[k], relay.const(params[k])) for k in params)
    new_body = relay.bind(expr.body, params)
    return relay.Function(relay.ir_pass.free_vars(new_body), new_body)

inline_mlp = inline_parameters(mlp, params)
print(inline_mlp.astext(show_meta_data=True))

Relay's pretty printer also allows users to attach debugging output and metadata to the IR. For example you can see the type information on the example above, but we can also customize the annotation process by passing a callback for annotating nodes. 

In [None]:
i = 0 
def ann(*args):
    global i
    i += 1
    return f" <expression: {i}>"

print(mlp.astext(show_meta_data=True, annotate=ann))

Finally, an important part of the Relay text format is the ability to load Relay code 
like a normal programming language. We can use the Relay parser to parse code. We actually do this to define the Relay
*prelude* the small standard library of utilities shipped in Relay. 

## Executing Relay

Now that we have looked at how to write and manipulate a Relay program, we will show you how to run one. Relay has multiple execution mechanisms: a custom *debug interpreter* for Relay which can be used for experimentation and debugging, TVM's older graph runtime, the existing execution mechanism, and Relay VM. Relay VM is a newly designed execution mechanism with the goal to smoothly execute all of Relay efficiently. 

We provide a high level interface which imposes some wrapping overhead but enables quick experimentation with each API. 

In [None]:
mod = relay.Module()
debug_ex = relay.create_executor('debug', mod=mod)
graph_ex = relay.create_executor('graph', mod=mod)
vm_ex = relay.create_executor('vm', mod=mod)

Each executor can be used to evaluate an expression given a Relay module, in this case we use an empty module, and will just evaluate the same expression, a MLP, using each one.

In [None]:
debug_mlp = debug_ex.evaluate(mlp)
graph_mlp = graph_ex.evaluate(mlp)
vm_mlp = vm_ex.evaluate(mlp)

Each one can be called like a normal Python function with the inputs passed as positional arguments and the parameters as keyword arguments.

In [None]:
data = numpy.random.rand(1, 1, 28, 28).astype('float32')
print("Debug: ", debug_mlp(data, **params))
print("Graph: ", graph_mlp(data, **params))
print("VM: ", vm_mlp(data, **params))

### Virtual Machine

The Relay virtual machine is worth highlighting, it is a brand new runtime mechanism for Relay which is beginning to stablize. We encourage new users to check it out and provide feedback on its design, perforomance, and more. The VM enables support for non-standard aspects of Relay including control-flow, closures, and data structures.

For more details on the virtual machine see [here](https://github.com/dmlc/tvm/issues/2810).

## Pass Manager

Relay has a flexible and configurable pass manager with an elegant API that be used to easily compose and schedule pass pipelines. We believe an easy-to-configure pipeline is important to enable intelligent exploration between a variety of 


In [None]:
resnet, params = relay.testing.resnet.get_workload()

seq = _transform.Sequential([
    relay.transform.InferType(),
    relay.transform.FoldConstant(),
    relay.transform.EliminateCommonSubexpr(),
    relay.transform.AlterOpLayout()
])

mod = relay.Module({"main": before()})
with relay.build_config(opt_level=3):
    with tvm.target.create("llvm"):
        mod = seq(mod)

    zz = mod["main"]

## Optimizations

Defining optimizations to transform your program is straight forward and easy to do in Relay.

For example let's define a constant evaluator for Relay.

## Quantization

Relay supports an automatic quantization framework which can be used to implement a variety of quantization schemes. You can find more details about it in our recent paper as well as it being used in action here.

Below we will apply the default quantization scheme to ResNet-18.

In [None]:
resnet, params = relay.testing.resnet.get_workload()

seq = _transform.Sequential([
    relay.transform.InferType(),
    relay.transform.FoldConstant(),
    relay.transform.EliminateCommonSubexpr(),
    relay.transform.AlterOpLayout()
])

mod = relay.Module({"main": before()})
with relay.build_config(opt_level=3):
    with tvm.target.create("llvm"):
        mod = seq(mod)
    
    zz = mod["main"]

## Heterogeneous Execution

Relay supports a high-level interface for scheduling computation across multiple heterogeneous devices. An interesting property of this pass is that it is not special. It is built using Relay's standard machinery for
passes. 

We implement this by using an annotation to mark which computations we would like to schedule on each device, 
and a pass inserts all the appropriate calls to synchronize memory across devices. 

The below pass uses this machinery to schedule all convolutions onto the GPU.

In [None]:
class ScheduleConv2d(ExprMutator):
    def __init__(self, device):
        self.device = device
        super().__init__()

    def visit_call(self, expr):
        visit = super().visit_call(expr)
        if expr.op == tvm.relay.op.get("nn.conv2d"):
            return relay.annotation.on_device(visit, self.device)
        else:
            return visit

def schedule_conv2d_on_gpu(expr):
    sched = ScheduleConv2d(tvm.gpu(0))
    return sched.visit(expr)

We can grab a model, we provide a few basic models in Relay's testing library. By default when printing a model we will see it rendered in Relay's textual format.

In [None]:
# We can grab a model, we provide a few basic models in Relay's testing library.
resnet, params = relay.testing.resnet.get_workload()
print(resnet)

We can now run the customized pass we defined above to schedule individual convolutions on the GPU.

In [None]:
resnet = schedule_conv2d_on_gpu(resnet)
print(resnet)


We can later rewrite away the device annotations to insert the copies.

In [None]:
resnet = relay.ir_pass.rewrite_annotated_ops(resnet, 0)
print(resnet)

Finally we will look at a couple case studies of what can be built using Relay. We will first look at how Relay is used as a backend in PyTorch integration, then how Relay can be used to compile a model down to traditional hardware, and finally how it can be used to support a custom accelerator, VTA, which we will dicuss in detail today.

## Ahead of time compilation

An example of what can be built using Relay can be found with the [ahead of time compiler](https://github.com/uwsampl/relay-aot). 

We have already installed the compiler above, we can simply import it and use it below.

In [None]:
cmlp = aot.compile(mlp, relay.Module({}))
print(cmlp(input, **params))

## PyTorch Integration

Recently Facebook engineers have begun to integrate TVM into PyTorch. The are using PyTorch's JIT functionality to generate Relay code which is then optimized and deployed. This support is currently being integrated into mainline TVM, and installed from [here](https://github.com/pytorch/tvm). 

Initial results are promising and a recent writeup on the design can be found on our [blog](https://tvm.ai/2019/05/30/pytorch-frontend). 

![results](https://i.imgur.com/KfJ7oas.png)


```python
def add(a, b, c):
    return a + b + c

# via tracing
relay_graph = torch_tvm.to_relay(add, inputs)

@torch.jit.script
def mul(a, b, c):
    return a * b * c

# via script
relay_graph = torch_tvm.to_relay(mul, inputs)
```

## VTA
TODO TALK WITH THIERRY