Skip to content

v0.4-pre-apache-incubation

Compare
Choose a tag to compare
@tqchen tqchen released this 03 Sep 19:25

NOTE: This is a release pre apache incubation

This release features several major improvements. The high-level graph optimizer is now part of TVM repo. Some of the highlights are: Initial support of AutoTVM for automated optimization; customized accelerator backend VTA. Please also check out tvm.ai for latest blogposts.

The community welcomes new reviewers @kazum @alex-weaver @masahi @zhreshold @PariksheetPinjari909 @srkreddy1238 @eqy, new code owner @merrymercy, and new committer @yzhliu

Change List

Tensor Expression and Optimization

  • Tensor operator primitives
    • Introduce attrs field to operator primitives(e.g. compute) to store additional metadata, the attrs can be used as hint for scheduling
  • Enable embedding of asm micro-kernels
  • Hybrid python programming model
    • python AST based IR builder interface
    • support GPU programs
  • AutoTVM, Automated tuning, and scheduling
    • basic autotvm infra
    • GPU IR verifier
    • basic autotuning tutorial
    • topi integration
  • ARM support
    • winograd support
    • initial support of ARM autotuning records
  • TOPI Vision
    • Generic GPU sort support(useful for vision)
    • SSD operator support
  • TOPI numpy consistency
    • Rename all binary operators for numpy consistecy: broadcast_add-> add, broadcast_sub -> substract, broadcast_mul -> multiply, broadcast_div->divide
    • New operators: slice, LRN, equal, not_equal, less, greater
    • tutorials on topi
  • Initial low-bit operator support support
    • Optimized popcount generation on ARM
    • general bit-serial convolution and GEMM
    • optimized low bit kernels
    • parallel optimization
  • New topi backend optimization for intel graphics
  • Adapt AVX schedules for SSE target

Backend

  • VTA: customized accelerator backend
    • custom hardware backend example
    • tutorials on how to use customized accelerator
  • Initial experimental support for HLS backend
  • Bugfix in SPIRV code generator for vulkan
  • libdevice support, enable NVPTX backend

Runtime

  • Introduce NDArrayContainer for managed NDarray
  • RPC and Device API
    • Support communication between big/small endian machines.
    • RPC and device API protocol upgrade (this is a non-backward compatible change) to support big-small endian communication. This is a non-backward compatible change, need to use the latest version of TVM runtime with the RPC
    • graduate rpc from contrib, tvm.contrib.rpc->tvm.rpc
      -Support tracker in Android RPC, add fault tolerance for AutoTVM
  • BIG.LITTLE aware threadpool
  • tvm4j graph runtime that runs end to end workload in java
  • DLPack support
    • Support from_dlpack and to_dlpack
    • Enables bridges to pytorch
  • Enable link of stackvm in runtime

NNVM

  • Tensorflow graphdef frontend
  • Keras frontend
    • improved to support reuse layers, add activations
  • ONNX
    • gather, LRN
  • CoreML frontend
    • Support C-RNN and activation functions
  • Fix grads for sum and expand_like
  • Enhanced operator fusion for multiple elemwise branches
  • Separate nnvm fusion and compilation pass

Misc

  • Unified build system to cmake, customizable cmake path for vulkan, rocm, cuda

Contributors

See the complete list here. Thanks to all the contributors to contribute to this release.

Code reviewers

Compiler

TOPI, graph optimization

Frontends

Deploy

  • @eqy rpc, thread runtime
  • @dayanandasiet android tutorials