Release v0.4-pre-apache-incubation · apache/tvm

NOTE: This is a release pre apache incubation

This release features several major improvements. The high-level graph optimizer is now part of TVM repo. Some of the highlights are: Initial support of AutoTVM for automated optimization; customized accelerator backend VTA. Please also check out tvm.ai for latest blogposts.

The community welcomes new reviewers @kazum @alex-weaver @masahi @zhreshold @PariksheetPinjari909 @srkreddy1238 @eqy, new code owner @merrymercy, and new committer @yzhliu

Change List

Tensor Expression and Optimization

Tensor operator primitives
- Introduce attrs field to operator primitives(e.g. compute) to store additional metadata, the attrs can be used as hint for scheduling
Enable embedding of asm micro-kernels
Hybrid python programming model
- python AST based IR builder interface
- support GPU programs
AutoTVM, Automated tuning, and scheduling
- basic autotvm infra
- GPU IR verifier
- basic autotuning tutorial
- topi integration
ARM support
- winograd support
- initial support of ARM autotuning records
TOPI Vision
- Generic GPU sort support(useful for vision)
- SSD operator support
TOPI numpy consistency
- Rename all binary operators for numpy consistecy: broadcast_add-> add, broadcast_sub -> substract, broadcast_mul -> multiply, broadcast_div->divide
- New operators: slice, LRN, equal, not_equal, less, greater
- tutorials on topi
Initial low-bit operator support support
- Optimized popcount generation on ARM
- general bit-serial convolution and GEMM
- optimized low bit kernels
- parallel optimization
New topi backend optimization for intel graphics
Adapt AVX schedules for SSE target

Backend

VTA: customized accelerator backend
- custom hardware backend example
- tutorials on how to use customized accelerator
Initial experimental support for HLS backend
Bugfix in SPIRV code generator for vulkan
libdevice support, enable NVPTX backend

Runtime

Introduce NDArrayContainer for managed NDarray
RPC and Device API
- Support communication between big/small endian machines.
- RPC and device API protocol upgrade (this is a non-backward compatible change) to support big-small endian communication. This is a non-backward compatible change, need to use the latest version of TVM runtime with the RPC
- graduate rpc from contrib, tvm.contrib.rpc->tvm.rpc
  -Support tracker in Android RPC, add fault tolerance for AutoTVM
BIG.LITTLE aware threadpool
tvm4j graph runtime that runs end to end workload in java
DLPack support
- Support from_dlpack and to_dlpack
- Enables bridges to pytorch
Enable link of stackvm in runtime

NNVM

Tensorflow graphdef frontend
Keras frontend
- improved to support reuse layers, add activations
ONNX
- gather, LRN
CoreML frontend
- Support C-RNN and activation functions
Fix grads for sum and expand_like
Enhanced operator fusion for multiple elemwise branches
Separate nnvm fusion and compilation pass

Misc

Unified build system to cmake, customizable cmake path for vulkan, rocm, cuda

Contributors

See the complete list here. Thanks to all the contributors to contribute to this release.

Code reviewers

@yzhliu topi, tvm4j, nnvm
@kevinthesun nnvm
@Huyuwei topi operators
@tmoreau89 hardware backends
@comaniac fpga backends
@kazum nnvm, opencl backend, fpga
@nishi-t nnvm, opencl backend
@merrymercy topi, arm,
@vinx13 gpu backend
@masahi nnvm, topi
@eqy autotvm
@jroesch runtime
@PariksheetPinjari909 frontends, topi
@srkreddy1238 frontends, topi
@FrozenGene autotvm

Compiler

@alex-weaver vulkan
@were hybrid script mode
@nishi-t CUDA, fp16, int8 support
@ktabata intel FPGA support
@kazum xilinx fpga support
@cowanmeg arm optimized popcount
@tmoreau89 VTA customized accelerator

TOPI, graph optimization

@merrymercy AutoTVM
@yzhliu tvm4j graph runtime, x86
@Laurawly intel graphics
@abergeron conda build fix
@nhynes sgx random
@masahi topi, more robust op fusion
@kevinthesun vision ops
@grwlf argmax/min ops
@cowanmeg bit-serial operator
@ehsanmok topi tutorial
@zhiics refactor fusion and compilation into separate pass
@liangfu binary logical operators

Frontends

@srkreddy1238 tutorials for deployment, tensorflow frontend
@siju-samuel coreml, tf frontend
@PariksheetPinjari909 nnvm, slice
@kazum keras
@nishi-t mxnet, nnvm

Deploy

@eqy rpc, thread runtime
@dayanandasiet android tutorials

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4-pre-apache-incubation