Skip to content

Releases: apache/tvm

Apache TVM v0.16.0

28 Apr 07:18
Compare
Choose a tag to compare

Introduction

The TVM community has worked since the v0.15.0 release to deliver the following new exciting improvements! This release version is:

  • First support of Relax, with dynamic shape and pipeline
  • Dlight module for optimizing LLM TIR workloads on GPU
  • Disco module for initial SPMD multi-GPU support

The main tags are below (bold text is with lots of progress):

  • Community, RFCs
  • Adreno, ArmComputeLibrary, Metal, cuda & cutlass & tensorrt, micoNPU, Runtime
  • Relax, Dlight, Disco
  • Arith, TIR, TVMScript
  • Docs, CI, Misc, BugFix

Please visit the full listing of commits for a complete view: v0.16.dev0...v0.16.0.rc0.

Community

  • #16695 - Add new key for release signing
  • #16419 - Add new key for release signing

RFCs

This new RFC explores how TVM can be utilized to generate code for the SME ISA to achieve improved inference performance on supported Arm®-based hardware implementing the SME extension.

  • #107 - [RFC] Scalable Matrix Extension enablement

Arith

  • #16735 - [Fixup] Require feature flag for tighter inequality bounds
  • #16588 - Provide tighter ConstIntBounds for special cases
  • #16704 - [Fix]Fix canonical simplification of LE

BYOC

  • #16567 - Skip processed functions in FuseOpsByPattern and RunCodegen

BugFix

  • #16766 - [Target] Added null check to fix segfault at ->defined() in cpu.cc DetectSystemTriple()
  • #16739 - [Ansor] Fixing Ansor Gradient Bug
  • #16820 - [Fix] PAPI docs
  • #16793 - [Fix] fix for numpy 2.0 compatibility
  • #16790 - [Fix] Fix build errors with VS2022
  • #16780 - [Fix] Fix numpy dtype map
  • #16773 - [Fix] Fix the purity flag of "vm.call_tir_dyn" and "kill" ops
  • #16770 - [Hotfix] Revert driver API pass ordering that breaks MLC, mark failing test
  • #16771 - [Fix] Remove redundant "remove_all_unused" in IPC memory lowering
  • #16746 - [Fix][Builtin] Fix "GetQueryPosition" of PagedKVCache
  • #16728 - [Fix] Introduce TVM_DEBUG_WITH_ABI_CHANGE to warn ABI changes in debug mode
  • #16714 - [Fix] PagedKVCache fetching compute stream when copy stream is needed
  • #16684 - [SLM] Produce well-formed Relax for nn.modules.KVCache
  • #16659 - add the default value for DFT in ONNX frontend
  • #16637 - [Transform] Preserve symbolic variables in FuseOps
  • #16649 - [FFI] Add a missing default for datatype lanes
  • #16492 - [Executor] fix debug_executor function debug_get_output
  • #16598 - [Transform]Handle non-composite lambda functions in FuseOps
  • #16565 - [Transform] Keep private non-primitive functions in FuseTIR
  • #16518 - Use xxx instead of pow(x,3)
  • #16436 - Ensure that bf16 arrays are created as expected
  • #16361 - Disable SingleEnvThreadVerifier
  • #16289 - [AUTOTVM][FIX] Typo fixes and add a warning in the Droplet Search

CI

  • #16837 - Disable flaky unit test
  • #16765 - [AOT][Testing] Improve output mismatch information on test failure
  • #16661 - add merge_with_main in unity
  • #16611 - [AOT][Testing] Print output values on test failure
  • #16546 - Disable testing that downloads from mxnet
  • #16521 - Fix CI Script and Broken Tests
  • #16502 - Support tvm-bot rerun for tvm-unity task
  • #16435 - Update image tag to 20240126-070121-8ade9c30e
  • #16420 - [WASM] Update emsdk and nodejs version
  • #16384 - Remove NVIDIA_DISABLE_REQUIRE
  • #16382 - In jenkins.cmd_utils.Sh.tee, check for failing subprocess
  • #16366 - Upgrade sccache version to 0.7.*
  • #16369 - Upgrade Unity ci images
  • #16344 - Update docker images tag to 20240105-165030-51bdaec6
  • #16340 - [Unity][UnitTest] Increase atol to resolve flaky CI failure
  • #16337 - [Hexagon][UnitTest] Disable flaky quantization test
  • #16336 - Upgrade cmake version to 3.24.0

Docker

  • #16755 - [SME]Add Fixed Virtual Platform (FVP) and toolchain install
  • #16348 - Upgrade pip in i386 container

Disco

  • #16618 - [Disco] Propagate structlog configuration to disco workers
  • #16639 - [Disco] Expose functions to query the per-worker device/rank
  • #16617 - [Disco] Implement Session.import_python_module method
  • #16715 - [Disco] Propagate structlog/logging config to workers
  • #16845 - [Debug][Disco] Check if a PackedFunc exists before calling it
  • #16817 - [Disco] Reduce Process/ThreadSession message queue reads and writes
  • #16807 - [Disco] Support setting workers' CPU affinity
  • #16375 - [Unity] Fix creation of disco ProcessSession
  • #16821 - [Fix] Add TVM_DLL to Disco session
  • #16752 - [Fix] Lazy import of "psutil" in disco process pool

Dlight

  • #16775 - [Fix][Dlight] (Low-batched-)GeMV on small spatial loops
  • #16429 - [Unity][Dlight][Fix] Reduction rule support dyn-shape epilogue
  • #16351 - [Unity] Add dlight.gpu.Fallback in DispatchSortScan, add argsort, topk, and cumprod
  • #16338 - [Unity][DLight] Introduce Specific Rule for RMSNorm
  • #16251 - [Unity][Dlight] Support dlight gemv rule on nested inner block
  • #16878 - [Dlight] Enhance vectorization loading weight for gemv
  • #16848 - [DLight] Fix a corner case for reduction rule
  • #16701 - [Dlight] Add fallback for low batch gemv with outer reduction
  • #16678 - [Dlight] LowBatchGemv rule only apply to function with spatial symbolic var
  • #16665 - [Dlight] Skip GeMV when normalization fails
  • #16579 - [Dlight] Scheduling Low batch GEMM using GEMV-like rule
  • #16579 - [Dlight] Scheduling Low batch GEMM using GEMV-like rule
  • #16321 - [DLight] Skip rule if target is not suitable
  • #16731 - [Dlight] Fix GeMV shared memory estimation

Docs

  • #16792 - [Doc] Fix set_axis_separator example
  • #16610 - [Doc] Fixed Docstring usage example in tvm.ir.make_node
  • #16572 - [Doc] Remove MxNet related tutorials
  • #16514 - [Unity][Doc] Document passes that depend on DataflowBlocks and encourage using ConvertToDataflow
  • #16482 - [Doc] Fix Docstring in extern.py for Sphinx
  • #16346 - [Doc] Fix minor error in "Expressions in Relay"

Frontend

  • #16001 - [ONNX] Fix interpreting auto_pad parameters in ConvTranspose operator
  • #16651 - [PaddlePaddle] PaddlePaddle model with NCHW data format that supports quantization
  • #16616 - [PaddlePaddle] Support conv2d when data_format is NHWC
  • [#16526](https://github.com/a...
Read more

Apache TVM v0.15.0

19 Jan 00:56
a340dbe
Compare
Choose a tag to compare

Introduction

NOTE: This is last release version before unity branch switch as main branch. No unity features.

The TVM community has worked since the v0.14.0 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):

  • Community, RFCs
  • Adreno, ArmComputeLibrary, Metal, cuda & cutlass & tensorrt, micoNPU, Runtime
  • Frontend & Relay
  • Arith, TOPI, TIR, TVMScript
  • Docs, CI, Misc, BugFix

Please visit the full listing of commits for a complete view: v0.14.0...v0.15.0.

Community

  • #16172 - Yixin Dong -> Reviewer
  • #16162 - Shuai Yuan -> Committer
  • #16164 - Qiang Zhang -> Committer
  • #16166 - Bohan Hou -> PMC
  • #16165 - Ruihang Lai -> PMC

RFCs

  • #105 - Add a new backend language——SYCL

Adreno

  • #15991 - [CI] Enhancements to Adreno specific CI utils
  • #15786 - [TOPI] Add conv2d transpose nchw texture schedule

Arith

  • #16227 - Simplify nested if_then_else when constant is appearing in then_expr

ArmComputeLibrary

  • #15990 - [ACL] Update Compute Library to v23.08

Metal

  • #16192 - [Device] Fix metal warp size
  • #16033 - [Codegen] Disable cross-function call in Metal codegen

cuda & cutlass & tensorrt

  • #16061 - [CUDA] Add an option for profiling cuda kernels

micoNPU

  • #16003 - [microNPU][ETHOSU] Fix ConcatRewriter args processing
  • #15929 - [microNPU][ETHOSU] Fix rounding mode in requantize operation

Runtime

  • #15896 - [CLML] Fix for CLML ops and enable more test case
  • #16133 - Parallel-for with threading backend
  • #16066 - Support clear global memory allocators
  • #16030 - Introduce TVM_MODULE_VTABLE Macros

BugFix

  • #16269 - Update pillow usage
  • #16272 - Fixed Inappropriate Logical Expression
  • #16216 - [TIR] Fix dynamic smem merge leaf alloc
  • #16190 - Fix the error of reloading the model library on the ROCm platform: "MIOpen Error: No invoker was registered for convolution forward.”
  • #16167 - [Relay][Pytorch] Fix missing .dtype
  • #16091 - [Fix] Fix topi.rms_norm with float32 upscale
  • #16081 - [Fix] Broken Windows Build with LLVM
  • #16051 - [Fix][TIR] Fix dtype issues for match_buffer and ramp node
  • #14655 - [VTA] Fix FSIM compile error on macOS
  • #16021 - [FFI] Typo fix of IncRef to DecRef
  • #16010 - [Fix][TIR] fix mul dtype mismatch
  • #16000 - [Fix][TIR] fix symbolic strides lower
  • #15970 - [Hotfix] Mark python-FFI handling with TVM_DLL
  • #15965 - [CI] Better to pass the build folder

CI

  • #16110 - Refactor unittest folder
  • #16055 - Fix broken links about Jenkins
  • #16062 - Use LLVM 17 for tests on ci_arm
  • #16018 - [Tests] Fix work_dir location used by test_micro_tuning_with_meta_schedule
  • #16019 - [Tests] Check int8+int32 testcases in test_estimate_peak_flops_cpu
  • #16017 - [Tests] Fix str vs. int comparison in test_num_threads

Docs

  • #16282 - [Doc] Fix minor error in doc (Add an operator to Relay)
  • #16152 - [DOC] Add v0.14.0 docs to site
  • #16127 - Revert "[#15157][Rust][Doc] Re-enable the Rust documentation build (#15213)"
  • #16097 - Add missing backtick to contribute/code_guide.rst
  • #16089 - Fix error on linting by adding --rev argument
  • #16024 - Update release_process.rst about version number modification

Frontend & Relay

  • #16243 - [TFLite] Add support for quantized mirror pad
  • #15914 - [TFLite]Support quantized SQUARE
  • #16159 - [KERAS] Fix bug concat convert for NCHW
  • #16319 - [Torch] add aten:broadcast_to
  • #16131 - [Pytorch] Add support for aten::unflatten
  • #16105 - [Pytorch] Add support for aten::bitwise_and
  • #16079 - [Pytorch] Add support for aten::swapaxes operator
  • #15502 - [Pytorch] aten::copy_ support for pytorch
  • #16180 - [Pytorch] Fix bug when converting models with torch.nn.ParameterList
  • #16143 - [Pytorch] Add support for aten::scaled_dot_product_attention
  • #16123 - [Pytorch] Add support for aten::linalg_vector_norm
  • #16171 - [Frontend] Preserve Pytorch Span Names
  • #16217 - [Frontend][QNN] fix access param_debug_name_map to node output name in fx-quantized graph node replacement
  • #16199 - [Frontend] Add support for aten::concat
  • #16151 - conv3d depthwise bug fix
  • #15928 - Expose qnn ops directly from relay.qnn module

TOPI

  • #16259 - Add support for group_conv3d_transpose_ncdhw for generic
  • #16052 - Enhance topi.nn.matmul
  • #16080 - Reduce code redundancy in conv2d weights transformation
  • #16248 - [TOPI] Add support for group_conv1d_transpose_ncw for generic
  • #16106 - [TOPI] Add conv2d NHWC hybrid schedule for arm_cpu

TIR

  • #16239 - [Schedule] TileWithTensorIntrin skip incorrect ComputeInline for input-padding
  • #16236 - ConvertSSA process entry func first
  • #16070 - [Transform] Introduce new InjectPermutedLayout pass
  • #16083 - Enhance Python Type Annotations for TIR Expr
  • #16073 - Support more mma intrinsics and get_mma_intrin_group utility
  • #16076 - Enhance Python Type Annotations for TIR stmt
  • #16074 - Fix the thread binding iter_var dtype in Bind primitive
  • #16063 - Fix pass RenewDefs error in gather/take case
  • #16027 - Fix software pipeline with dynamic loop extent

TVMScript

  • #16271 - Disable concise scoping when the scope stmt is explicitly annotated
  • #16041 - Fix mismatched dtype of IterVar in T.thread_binding
  • #15953 - [TIR] Pretty print TIR LLVM function name
  • #15972 - delete print extra info at parsing

Misc

  • #16279 - replace deprecated np.int with int to avoid crash
  • #16262 - Update conv2d.py
  • #16255 - [Support] Add Interrupt Handling in Pipe
  • #16104 - [LoopPartition] Fix a bug of LoopPartition in single point scenarioes
  • #16231 - [Target] Add Jetson AGX Orin tags
  • #16221 - remove deprecated np.int in slice converter (pytorch)
  • #16214 - [Python] Fix setup.py for inplace build
  • #16174 - Bump cryptography from 37.0.2 to 41.0.6 in /docker/python
  • [#16202](#16...
Read more

Apache TVM v0.14.0

23 Oct 07:41
7315c9d
Compare
Choose a tag to compare

Introduction

The TVM community has worked since the v0.13.0 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):

  • Community, RFC
  • Arith, MetaSchedule
  • Adreno, ArmComputeLibrary, Hexagon, Metal, OpenCL & CLML, ROCm, Vulkan, cuda & cutlass & tensorrt, micoNPU, web
  • Runtime, TVMC, AOT, LLVM, microTVM, CMSIS-NN
  • Frontend, Relay, BYOC
  • TOPI, TIR, TVMScript
  • Docs, CI, Docker
  • Misc, , BugFix

Please visit the full listing of commits for a complete view: v0.13.0...v0.14.0.

Community

  • #15307 - Qingchao Shen -> Reviewer
  • #15619 - community strategy decision process

RFC


AOT

  • #15301 - Avoid call_extern() with incorrect argument count
  • #15181 - Remove workaround to help resolve test flakiness

Adreno

  • #15830 - Minor changes for Adreno docs and help scripts
  • #15671 - [VM]Fix using buffers for weights in VM
  • #15391 - Small fixes in Adreno schedules

Arith

  • #15881 - Simplify the result of non-divisible floordiv
  • #15665 - Fix detect non-divisible iteration form like (x % 255) // 16
  • #15638 - MLIR PresburgerSet compile fix mlir >= 160
  • #15628 - Added simplification rule for multiple equality compares
  • #15558 - Fix detect linear equation with uint var
  • #14690 - Add tvm::arith::PresburgerSetNode to work with Presburger Set in MLIR
  • #15555 - Fix handling of overlapping predicates
  • #15471 - Enhance Canonical Simplify for LE
  • #15228 - Enhance buffer shape bound deduction to include offset

ArmComputeLibrary

  • #15600 - [ACL] Update Compute Library to v23.05.1
  • #15344 - [ACL] Update Compute Library to v23.05

BugFix

  • #15891 - [Relay]fix axis parsing of repeat converter in the MXNet frontend
  • #15873 - [Fix] Remove duplicated words from comments, NFC
  • #15868 - [Relay]Fix conv transpose with default strides in ONNX frontend
  • #15773 - [CPP] Fix cpp deploy bug
  • #15778 - [Hotfix] Fix Windows Pipe
  • #15748 - Move symbols that are relevant to the runtime from libtvm to…
  • #15752 - [Relay]fix the wrong calculate logic of operator flip in PyTorch frontend
  • #15715 - [Relay]Fix the wrong implementation about operator Threshold in oneflow
  • #15711 - [Strategy] Fix arm_cpu int8 conv2d strategy for dotprod and i8mm targets
  • #15717 - [Relay]fix the wrong implementation of Softplus in OneFlow
  • #15677 - [Arith] IterMapRewriter abort rewriting once failure
  • #15629 - [VTA] tvm.tir.Call has no name attribute
  • #15584 - [Relay][Strategy] Enable compile time transformation of weights matrix for arm_cpu NHWC quantized conv2d
  • #15542 - [Fix] Fix the typo in compile flag
  • #15484 - [TOPI] Fix a bug in arm_cpu int8 conv2d i8mm schedule
  • #15473 - [Relay] Fix some bugs of dominator pattern
  • #15478 - [TIR] ThreadSync with shared.dyn awareness
  • #15406 - [TIR]Ensure the Var's scope is correct
  • #15399 - [TIR] Fix multi-grouped multi-warp allreduce
  • #15350 - [Relay] fix a bug of printing dataflow pattern
  • #15385 - Work around "Internal Compiler Error" in MSVC
  • #15294 - [Bug][Relay] fix relay frontend pytorch op addmm bug
  • #15323 - [Fix][TIR] LowerThreadAllreduce with correct thread mask
  • #15291 - [Relay][GraphExecutor] Fix set_input_zero_copy() precision bug
  • #15225 - Fix function to read all file

CI

  • #15903 - [Target]Add LLVM functions for current system info
  • #15897 - [ADRENO] Few updates to Adreno docker setup
  • #15836 - Update ci-gpu image
  • #15668 - Allow Limit CPUs in Docker
  • #15568 - [Testing] Allow Capitalized name in CompareBeforeAfter
  • #15519 - [TEST] Run tests/python/relay/aot tests in ci-cortexm
  • #15485 - Remove cython version pin
  • #15421 - Bump Flax and Jaxlib versions to fix Jaxlib install error
  • #15226 - Add ml_dypes dependency for all docker images
  • #15353 - Pin cython version to fix cython compilation
  • #15352 - Make Graviton3 default AArch64 job runner node
  • #15339 - Update test to include unique attribute
  • #15277 - [Testing] Return BenchmarkResult in local_run and rpc_run
  • #15268 - [Testing] Add tvm.testing.local_run
  • #15136 - [UnitTest][NVPTX] Avoid cascading failures from CUDA postproc

CMSIS-NN

  • #15747 - Move CMSIS_5 from SHA to release based upgrade
  • #15407 - Support for Softmax Int16 operator

Docker

  • #15799 - Add LLVM 17 to the LLVM install script
  • #15862 - Upgrade oneflow to v0.8.0
  • #15819 - Install oneflow from PyPi
  • #15310 - Update ci-cortexm docker image
  • #15293 - tensorflow_aarch64 package upgrade

Docs

  • #15619 - community strategy decision process
  • #15508 - Add v0.13.0 docs to site
  • #15213 - [#15157][Rust][Doc] Re-enable the Rust documentation build

Frontend

  • #15821 - [TFLite]Support quantized ELU
  • #15844 - [TFLite]Fix test failures caused by div-by-zero
  • #15798 - [TFLite]Support quantized Pow
  • #15829 - [Relay][Keras][Bugfix] fix the converters of GRU and SimpleRNN about the go_backwards attribute
  • #15838 - Fix unnecessary pylint errors
  • #15802 - [SkipCI][Hotfix][TFLite] Disable test of quantized floor mod
  • #15790 - [TFLite]Support quantized LESS_EQUAL
  • #15775 - [TFLite]Support quantized GREATER_EQUAL
  • #15769 - [TFLite]Support quantized NOT_EQUAL
  • #15768 - [TFLite]Support quantized div
  • #15746 - [TFLite]Support quantized LESS
  • #15733 - [TFLite]Support quantized floor_mod
  • #15724 - [TFLite]Support quantized floor_div
  • #15602 - [ONNX][BugFix] Support If body with free variable from graph input
  • #15472 - [Relay][TFLite] Fix in qnn.conv2d when parameter groups not equal to 1
  • #15117 - [TFLITE] Add support for TFLite's regular NMS operator
  • #15415 - [ONNX] add onnx Mish operator
  • #15422 - [Keras] Add support for swish actiivation
  • #15370 - [Relay][Pytorch...
Read more

Apache TVM v0.13.0

14 Jul 02:37
97c5de6
Compare
Choose a tag to compare

Introduction

The TVM community has worked since the v0.12.0 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):

  • Community, RFC;
  • Frontend: TensorFlow/TFLite, Pytorch/Torch, Paddle, keras;
  • Runtime: Adreno, OpenCL & CLML, ROCm, CUDA & CUTLASS & TensorRT, Ethosn, Vulkan, Hexagon, Metal, others about runtime;
  • Relay, BYOC, TOPI, Arith, TIR, TVMScript, MetaSchedule;
  • microTVM, AOT, TVMC, LLVM;
  • CI, BugFix, Docs, Docker, Miscs;

Please visit the full listing of commits for a complete view: v0.12.0...v0.13.0.

Community

  • #15086 - Aleksei-grovety -> Reviewer
  • #14676 - Jiajun Jiang -> Reviewer
  • #14677 - Qiang Zhang -> Reviewer
  • #14622 - Sunghyun Park -> Reviewer
  • #14578 - Zihao Ye -> Committer
  • #14853 - Anirudh Sundar Subramaniam -> Committer
  • #14772 - Add new key for release signing

RFC


Frontend

  • #14830 - Use f-strings for string formatting, NFC
  • Keras
    • #15122 - [Relay][Keras] Fix SeparableConv2D conversion in dilation_rate attribute
    • #15107 - [Relay][Keras] Fix a wrong variable name in keras frontend
    • #15053 - [Relay][Keras] Fix the wrong implementation logic about cropping2D
    • #15082 - [Relay][Keras] Fix UpSampling2D about the wrong assertion about size
    • #15060 - [Relay][keras] Fix the bug about the attribute 'output_padding' in Deconv
    • #14707 - [Keras]fix a bug about alpha attribute in LeakyReLU which lead to passes conflict
    • #15175 - [Relay][Keras] Fix concatenate convert function in axis parsing
  • Paddle
    • #14801 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for gaussian_random/softplus/Conv3d/Conv2d
    • #14973 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for tanhshrink/pool3d/set_value ops for paddle frontend
    • #14826 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for p_norm/roi_align/softmax_with_cross_entropy
    • #14575 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for dropout/hard_sigmoid/pixel_shuffle
  • TFLite
    • #14667 - [TFLite]Support for quantized squared difference
    • #14819 - [TFLite]Generate name when tensor name is missing
    • #15173 - [FRONTEND][TFLITE]Fix int16 transpose conv loading
  • TensorFlow
    • #14546 - [Tensorflow] Fix conv2d_transpose for NHWC layout
  • PyTorch
    • #14747 - [PyTorch] Add aten::new_zeros
    • #14699 - [Torch] fix typo in new_full
    • #14963 - [PyTorch] Support use_input_stats in instance_norm
    • #14930 - Fix pytorch axis
  • ONNX
    • #15017 - [ONNX] Fix bug in scatter_elements

Runtime

  • #15182 - Add weak symbol to builtin fp16
  • #15161 - Clean TVM stacktrace in error messages
  • #15162 - Support void as dtype in FFI
  • #14902 - Update Module and Registry to use String Container
  • #14967 - [Runtime,RPC] Use f-strings for string formatting, NFC
  • #14887 - Make systemlib unique per prefix
  • #14775 - Added str for tvm._ffi.runtime_ctypes.TVMArray
  • #14656 - Fix Can't "query_imports" Bug of VM Executable

Adreno

  • #15061 - [TOPI]Fix problem with ceil_log2
  • #14996 - [OpenCL]Fix conv2d when output channels < 4

CMSIS-NN

  • #15059 - Update CMSIS-NN release to v4.1.0

OpenCL & CLML

  • #14972 - [OPENCL] Always use convert_T for type conversion
  • #14995 - [OpenCL] Improve diagnostic message
  • #14833 - [Codegen][OpenCL] fix amibiguous selection operator call
  • #14792 - [OpenCL] Refactor OpenCL runtime to support SPIRV binary ingestion
  • #14922 - [OpenCLML] Reactor and introduce on chip memory and memory planner
  • #14949 - [CodegenC] Updated unit test for sorted CodegenC output
  • #14767 - [OpenCLML] Transposed convolution support and other fixes

cuda & cutlass & tensorrt

  • #14751 - [CUDA] Fixed the call of the min function in the schedule for cuda
  • #14798 - [CUTLASS] Add NDEBUG option to CUTLASS compile to speed up attention kernel
  • #14782 - [Bugfix][Codegen][CUDA] Wrong casting in ASM

metal

  • #14962 - Fix int8 vectorized cast
  • #14846 - Fix vectorized select
  • #14727 - Update metal runtime to directly store kernel map
  • #14671 - Fix flaky memory issue due to racing

Vulkan

  • #15035 - [Vulkan] Allow DeclBuffer in CodeGenSPIRV
  • #14817 - [Vulkan] Add cooperative matrix support

Hexagon

  • #14997 - Remove "c" as aot_host_target tvm/contrib/hexagon/pytest_pl…
  • #14948 - Update instructions to compile hexagon runtime
  • #14965 - Add support for v73, make v68 default
  • #14720 - [TIR] Add get_vtcm_allocation_sizes with lowering
  • #14567 - [TIR] Use the "target" value in T.func_attr for VTCM limit

ROCm

  • #15106 - [TensorIR]AMD Matrix Core Support
  • #15088 - [Target]Replace rocm arch parsing from int to string

microTVM

  • #14872 - Use self.close_transport() on error

AOT

  • #15033 - Avoid Var-to-Var Let binding in AOTExecutorCodegen
  • #15032 - Remove duplication in tvm.testing.aot.compile_models
  • #14529 - Fix warning on dropping const in TVMAotExecutor_GetInputName

micoNPU

  • #15159 - [microNPU][ETHOSU] Fix compiler attributes types
  • #15147 - [microNPU][ETHOSU] Add option to disable copying constants for case without cascader
  • #15069 - [microNPU][ETHOSU] Fix SoftMax legalization parameters
  • #15115 - [microNPU][ETHOSU] Upgrade to 23.05 version of Arm(R) Ethos(TM)-U NPU drivers
  • #15114 - [microNPU] Upgrade Vela to v3.8.0
  • #15104 - [microNPU][ETHOSU] Fix minimum buffer size
  • #15063 - [microNPU][ETHOSU] Fix CopyComputeReordering pass arguments
  • #14861 - [microNPU][ETHOSU] Add offloading to the NPU the nn.avg_pool2d operator with a stride > 3
  • #14765 - [microNPU][ETHOSU] Channel pad offloaded to NPU
  • #14774 - [microNPU][ETHOSU] Fix Softmax quantization parameters
  • #14629 - [microNPU][ETHOSU] Softmax int8 legalization support
  • #14353 - [microNPU] Add support for MEAN with uint8 ifm
  • #14587 - [microNPU] Fix skip tests when Vela is not present
  • #14464 - [microNPU][ETHOSU] Add restrictions to convert to NHCWB16 layout in LayoutOptimization pass

BYOC

  • #15046 - Add GEMM kernel from FasterTransformer as submodule
  • #15029 - Hide internal cutlass symbols

Re...

Read more

Apache TVM v0.12.0

17 May 07:08
47e0440
Compare
Choose a tag to compare

Introduction

The TVM community has worked since the v0.11.1 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):

  • Community, RFC;
  • Runtime: ACL(ArmComputeLibrary), Adreno, OpenCL & CLML, ROCm, CUDA & CUTLASS & TensorRT, Ethosn, CRT, Hexagon, Metal, Web & WASM, others about runtime;
  • Frontend: TensorFlow/tflite, Pytorch/Torch, Paddle, OneFlow, keras;
  • TE, Relay, BYOC, TOPI, Arith, TIR, TVMScript, MetaSchedule, Schedule;
  • CI, Tests, BugFix, Docs, Docker, Build;
  • Android, microTVM, Target, AutoTVM, AOT, LLVM.

Please visit the full listing of commits for a complete view: v0.11.1...v0.12.0.

Thanks @ysh329 for the great effort to the release process as the release manager.

Community

RFC


Runtime

ArmComputeLibrary

Adreno

OpenCL & CLML

ROCm

CMSIS-NN

CUDA & CUTLASS & TensorRT

Ethosn

CRT

Hexagon

Metal

MicroNPU

Read more

Apache TVM v0.11.1

09 Mar 19:47
046910a
Compare
Choose a tag to compare

Introduction

This is a v0.11.1 bug fix release on top of v0.11.0 (see #13899), incorporating a fix to the Python dependencies description.

What's Changed

Python dependencies

  • Add typing_extensions requirement (#14244)
  • Adjust version to 0.11.1 (#14300)

Apache TVM v0.11.0

25 Feb 11:33
cd9193a
Compare
Choose a tag to compare

Introduction

The TVM community has worked since the v0.10.0 release to deliver the following new exciting improvements!

  • Metaschedule

    • Tuning API improvements and anchor-block tuning
  • TVMSCript metaprogramming

    • Lots of progress wiht TVMScript, with the introduction of a core parser, AST, Evaluator, Source and diagnostics

And many other general improvements to microTVM, code quality, CI, frontends, and more! Please visit the full listing of commits for a complete view: v0.10.0...v0.11.0.

RFCs

These RFCs have been merged in apache/tvm-rfcs since the last release.

What's Changed

Note that this list is not comprehensive of all PRs and discussions since v0.10. Please visit the full listing of commits for a complete view: v0.10.0...v0.11.0.

Adreno

  • [Adreno] Add global pooling schedule (#13573)
  • [Adreno] Add documentation for Adreno deployment (#13393)
  • [Adreno] Fix mem_scope annotations for prim funcs having several heads (#13153)
  • [Adreno] Adapt reduction schedule for adreno (#13100)
  • [Adreno] Fix winograd accuracy (#13117)
  • [Adreno][Textures] Fix static memory planner (#13253)
  • [DOCKER][Adreno]Docker infra for Adreno target with CLML support (#12833)

AoT

  • [AOT] Add CreateExecutorMetadata analysis pass (#13250)
  • [AOT] Add CreateFunctionMetadata analysis pass (#13095)
  • [AOT] Sanitize input/output name in runtime (#13046)

Arith

  • [Arith] Add internal NarrowPredicateExpression utility (#13041)
  • [Arith] Optional rewriting and simplification into AND of ORs (#12972)

arm

  • [bfloat16] Fixed dtype conversion in the arm_cpu injective schedule (#13417)

AutoTVM

  • [AutoTVM] Introducing multi_filter into ConfigSpace autotvm (#12545)

Build

  • [BUILD] Re-enable ccache by default (#12839)

CI

  • [ci] Fix docs deploy (#13570)
  • [ci] Split Jenkinsfile into platform-specific jobs (#13300)
  • [ci] Dis-allow any non-S3 URLs in CI (#13283)
  • [ci] Split out C++ unittests (#13335)
  • [CI] Separate the ci scripts into Github and Jenkins scripts (#13368)
  • [ci] Assert some tests are not skipped in the CI (#12915)
  • [ci] Ignore JUnit upload failures (#13142)
  • [ci] Lint for trailing newlines and spaces (#13058)
  • [ci] Template build steps (#12983)
  • [ci][docker] Allow usage of ECR images in PRs (#13590)
  • [ci][docker] Read docker image tags during CI runs (#13572)
  • [ci][wasm] Add package-lock.json to git (#13505)

CL

  • [ACL] Enable int8 data type in pooling operators (#13488)

CMSIS-NN

  • [CMSIS-NN] Support for int16 conv2d (#12950)
  • [CMSIS-NN] Support for int16 in fully connected layer (#13484)

DNNL

  • [AMP] refine AMP and the corresponding tests for bfloat16 (#12787)

Docker

  • [Docker]Refactor timezone script and NRF installation (#13342)

Docs

  • [docs] Fix empty code blocks in tutorials (#13188)

Ethos-N

  • [ETHOSN] Consolidate target string usage (#13159)
  • [ETHOSN] Throw error message when inference fails (#13022)
  • [ETHOSN] Inline non-compute-intensive partitions (#13092)
  • [ETHOSN] Transpose fully connected weights (#12970)
  • [ETHOSN] Support conversion of add/mul to requantize where possible (#12887)

Frontend

  • [TFLite] Enable int64 biases for int16 quantized operators (#12042)

Hexagon

  • [Hexagon] Add HVX quant conv2d implementation (#13256)
  • [Hexagon] Add test to show scheduling of resnet50 with async dma pipe… (#13352)
  • [Hexagon] Enable Hexagon User DMA bypass mode (#13381)
  • [Hexagon] Lint tests part 2 (#13271)
  • [Hexagon] Add pylint on tests (#13233)
  • [Hexagon] Add E2E test demonstrating how to apply blocked layout schedule to conv2d via metaschedule (#13180)
  • [Hexagon] Add a test to show how to use multi input async dma pipelin… (#13110)
  • [Hexagon]: Add upload function to hexagon session (#13161)
  • [Hexagon] Add support for instrumentation based profiling for Hexagon (#12971)
  • [Hexagon] Add power manager (#13162)
  • [Hexagon] Add scripts for e2e MetaSchedule tuning demonstration (#13135)
  • [Hexagon] Add feature to copy logcat to --hexagon-debug and add new --sysmon-profile option to run sysmon profiler during the test (#13107)
  • [Hexagon] Async DMA pipelining test suite (#13005)
  • [Hexagon] Enable multi input Async DMA; same queue / stage (#13037)
  • [Hexagon] Do not use target test fixture in Hexagon tests (#12981)
  • [Hexagon] 3-stage pipeline; multi queue async DMA for cache read / write (#12954)
  • [Hexagon] vrmpy tensorization for e2e compilation of int8 models (#12911)
  • [Hexagon] Support template-free meta schedule tuning (#12854)
  • [Hexagon] depth_to_space slice op (#12669)
  • [Hexagon] Make allocate_hexagon_array a hexagon contrib API (#13336)
  • [Hexagon] Add fix for vtcm allocation searches (#13197)
  • [MetaSchedule][Hexagon] Add postproc for verifying VTCM usage (#13538)
  • [Hexagon][QNN] Add TOPI strategies for qnn ops mul/tanh/subtract (#13416)
  • [Logging][Hexagon] Improve logging on Hexagon (#13072)
  • [Hexagon] [runtime] Per-thread hardware resource management (#13181)
  • [Hexagon] [runtime] Create objects to manage thread hardware resources (#13111)
  • [QNN][Hexagon] Disable QNN canonicalization pass (#12398)
  • [Hexagon] [runtime] Manage RPC and runtime buffers separately (#13028)
  • [Hexagon] [runtime] VTCM Allocator (#12947)
  • [TOPI][Hexagon] Add schedule and test for maxpool uint8 layout (#12826)
  • [TOPI][Hexagon] Implement quantize op for hexagon (#12820)
  • [Meta Schedule][XGBoost] Update the custom callback function of xgboost in meta schedule (#12141)
  • [TIR] [Hexagon] Add vdmpy intrinsic and transform_layout for tests (#13557)
  • [Hexagon] [runtime] Support VTCM alignments of 128 or 2k (#12999)
  • [HEXAGON][QHL] Clippling the inputs of HVX version of QHL Sigmoid operation (#12919)
  • [Hexagon] [runtime] Add user DMA to device API resource management (#12918)

LLVM

  • [LLVM] Emit fp16/fp32 builtins directly into target module (#12877)
  • [LLVM] Switch to using New Pass Manager (NPM) with LLVM 16+ (#13515)

MetaSchedule

  • [MetaSchedule] Make MultiLevelTiling apply condition customizable (#13535)
  • [MetaSchedule] Enhance Database Validation Script (#13459)
  • [MetaSchedule] Fix Dynamic Loop from AutoBinding (#13421)
  • [MetaSchedule] Support schedules with cache read in RewriteLayout (#13384)
  • [MetaSchedule] Improve inlining and VerifyGPUCode for quantized model workload (#13334)
  • [MetaSchedule] Add JSON Database Validation Scripts (#12948)
  • [MetaSchedule] Fix the order of applying AutoInline in ScheduleUsingAnchorTrace (#13329)
  • [MetaSchedule] Refactor ScheduleRule Attributes (#13195)
  • [MetaSchedule] Improve the script for TorchBench model tuning & benchmarking (#13255)
  • [MetaSchedule] Enable anchor-block tuning (#13206)
  • [MetaSchedule] Introduce a variant of ModuleEquality to enable ignoring NDArray raw data (#13091)
  • [MetaSchedule] Consolidate module hashing and equality testing (#13050)
  • [MetaSchedule] Support RewriteLayout postproc on AllocateConst (#12991)
  • [MetaSchedule] Tuning API cleanup & ergonomics (#12895)
  • [MetaSchedule] Fix XGBoost Import Issue (#12936)
  • [MetaSchedule] Add Script for TorchBench Model Tuning & Benchmarking (#12914)
  • [MetaSchedule] Restore num_threads parameter in tuning API (#13561)
  • [MetaSchedule] TorchBench tuning script: add option to disallow operators in sub graph (#13453)
  • [MetaSchedule] Fix segfault in gradient based scheduler (#13399)
  • [MetaSchedule] Add from-target Defaults for x86 VNNI Targets (#13383)
  • [MetaSchedule] Fix Task Hanging in EvolutionarySearch (#13246)
  • [MetaSchedule] Allow skipping exact NDArray rewrite in RemoveWeightLayoutRewriteBlock (#13052)
  • [MetaSchedule][UX] Support Interactive Performance Table Printing in Notebook (#13006)
  • [MetaSchedule][UX] User Interface for Jupyter Notebook (#12866)

microNPU

  • [microNPU] Upgrade Vela to v3.5.0 (#13394)
  • [microNPU] Fixed MergeConstants pass on striped networks (#13281)

microTVM

  • [microNPU] Upgrade Vela to v3.5.0 (#13394)
  • [microNPU] Fixed MergeConstants pass on striped networks (#13281)
  • [microTVM] Modernize Arm Cortex-M convolution schedules (#13242)
  • [microTVM] Improve code reuse in Corstone300 conv2d tests (#13051)
  • [microTVM] Add Cortex-M DSP schedules for optimal conv2d layouts (#12969)
  • [microTVM] Use default Project Options in template projects and add Makefile for Arduino template project (#12818)
  • [microTVM] Generalize depthwise_conv2d schedule (#12856)
  • [microTVM] add the option to open a saved micro project for debugging (#12495)
  • Added macro generation in MLF export (#12789)
  • [microTVM][Arduino]Add serial_number to project options and tests (#13518)
  • [microTVM][Zephyr] Add 'serial_number' option (#13377)
  • [microTVM][PyTorch][Tutorial]Adding a PyTorch tutorial for microTVM with CRT (#13324)

Misc

  • [CodegenC] Explicit forward function declarations (#13522)
  • [FQ2I] Support converting dense -> add to qnn.dense -> add -> requantize (#13578)
  • [Minor][Testing] Consolidate IRs into corresponding functions (#13339)
  • Add recursive on loop with marked kUnrolled (#13536)
  • Skip stride check if shape is 1 in IsContiguous (#13121)
  • [TEST] CPU feature detection for x86 and ARM dot product instructions (#12980)
  • [Node] Expose StructuralEqual/Hash handler implemenation...
Read more

Apache TVM v0.10.0

17 Oct 17:44
Compare
Choose a tag to compare

Introduction

The TVM community has worked since the v0.9 release to deliver the following new exciting improvments!

  • Metaschedule
    • Software pipelining and padding for irregular shapes for auto tensorization
    • Stabilized and polished user-interfaces (e.g. database changes, tune_relay)
    • A new MLP-based cost model
  • TIR
    • New schedule primitive for PadEinsum
    • A new TIR node: DeclBuffer
    • INT8 Intrinsics for TensorCores for CUDA!
  • microTVM
    • Improved schedule primitives for ARM v8-m ISA

And many other general improvements to code quality, TVMScript, and more! Please visit the full listing of commits for a complete view: v0.9.0...v0.10.0rc0.

RFCs

These RFCs have been merged in apache/tvm-rfcs since the last release.

What's Changed

Please visit the full listing of commits for a complete view: v0.9.0...v0.10.0rc0.

Note that this list is not comprehensive of all PRs and discussions since v0.9. A non-truncated summary can be found here: #12979

TIR

  • #12720 - [TIR] Implement API for padded layout transformations
  • #12797 - [TIR] Construct the inverse in SuggestIndexMap
  • #12827 - [TIR] Support pattern matching argmax/argmin generated by TOPI
  • #12750 - [TIR, Schedule] Add schedule primitive PadEinsum
  • #11639 - [TIR][Meta-Schedule] Tuple-reduction scheduling support
  • #12515 - [TIR][Arith] Add more strict checking in imm construction and folding.
  • #12717 - [TIR, Schedule] Check consumer in-bound and covered in reverse_compute_inline
  • #12652 - [TIR] Handle axis_separators during FlattenBuffer
  • #12623 - [TIR] Expose MMA-related PTX builtins
  • #12607 - [TIR][Schedule] enhance compute_at and reverse_compute_at primitive to choose possible position
    ...

Apache TVM v0.9.0

14 Jul 22:33
d361585
Compare
Choose a tag to compare

Introduction

The TVM community has worked since the v0.8 release to deliver many exciting features and improvements. v0.9.0 is the first release on the new quarterly release schedule and includes many highlights, such as:

  • MetaSchedule's full implementation
  • ARM cascading scheduler for Arm Ethos(TM)-U NPUs
  • Collage which brings tuning to BYOC
  • Several microTVM improvements
  • New tvm.relay.build parameters - runtime=, executor=,
  • AOT - Support for the C++ runtime (with llvm and c targets only) and support for host-driven AOT in the C runtime
  • Hexagon RPC support
    • Testing via Hexagon SDK simulator and on device via Snapdragon-based HDK boards and phones
    • AOT and USMP support
    • Threading
    • Initial op support
  • MLF - Support for multiple modules in a single MLF artifact
  • Several TIR schedule primitives and transforms including (abridged):
    • schedule.transform_layout - Applies a layout transformation to a buffer as specified by an IndexMap.
    • schedule.transform_block_layout - Applies a schedule transformation to a block as specified by an IndexMap.
    • schedule.set_axis_separators - Sets axis separators in a buffer to lower to multi-dimensional memory (e.g. texture memory).
    • transform.InjectSoftwarePipeline - Transforms annotated loop nest into a pipeline prologue, body and epilogue where producers and consumers are overlapped.
    • transform.CommonSubexprElimTIR - Implements common-subexpression elimination for TIR.
    • transform.InjectPTXAsyncCopy - Rewrites global to shared memory copies in CUDA with async copy when annotated tir::attr::async_scope.
    • transform.LowerCrossThreadReduction - Enables support for reductions across threads on GPUs.
  • And many more! See the list of RFCs and PRs included in v0.9.0 for a complete list, as well as the full change list.

RFCs

These RFCs have been merged in apache/tvm-rfcs since the last release.

What's Changed

Note that this list is not comprehensive of all PRs and discussions since v0.8. Please visit the full listing of commits for a complete view: v0.8.0...v0.9.0.rc0.

AOT

  • #11208 - Calculate used memory at the callsite of primitive functions
  • #11365 - Fix function number datatype from char to uint16_t
  • #11091 - Enable A-Normal Form in the AOT executor
  • #10753 - Support LLVM backend with C++ runtime
  • #10518 - Use python temporary directory for AOT tests
  • #10337 - BugFix of workspace calculation
  • #10282 - [runtime] Add Metadata classes for AOTExecutor
  • #9501 - [3/3][DeviceAPI] Wire up cpacked Device API context
  • #9500 - [2/3][DeviceAPI] Add Hooks for Activate/Deactivate/Open/Close
  • #9395 - [1/3][DeviceAPI] Connecting devices structure to relevant operators

BYOC

Read more

Apache TVM v0.8 Release Note

24 Nov 17:14
7b3a22e
Compare
Choose a tag to compare

Overview

Apache TVM v0.8 brings several major exciting experimental features, including:

  • PaddlePaddle frontend
  • TVMScript: round-trippable python-based syntax for TIR
  • TorchScript integration
  • TensorIR scheduling language
  • TensorRT and CUTLASS integration via BYOC
  • Int4 TensorCore support in AutoTVM
  • MicroTVM Project API and Zephyr, Arduino support
  • AOT executor
  • Robust Windows support
  • Affine analysis infra: iter-affine-map
  • Improved Vulkan backend
  • CUDA graph support in TVM runtime

Besides, The community has been working together to refactor and evolve the existing infrastructure, including but not limited to:

  • Relay compilation engine
  • Relay pattern language
  • CI and build process
  • Refactoring documentation and tutorials
  • Stablizing AutoScheduler
  • Stablizing TVMC command line driver interface
  • Stablizing target system
  • Frontend coverage, quantization, dynamic shape, training

Full changelog: https://gist.github.com/junrushao1994/c669905dbc41edc2e691316df49d8562.

Accepted RFCs

The community has adopted a formal RFC process. Below is a list of the formal RFCs accepted by the community since then:

  • [RFC-0005] Meta schedule (AutoTIR)
  • [RFC-0006] Automatic mixed-precision pass and support
  • [RFC-0007] Parametrized unit tests
  • [RFC-0008] MicroTVM Project API
  • [RFC-0009] Unified static memory planner
  • [RFC-0010] Target-registered compiler flow customisation
  • [RFC-0011] Arm® Ethos-U integration
  • [RFC-0014] Pipeline executor
  • [RFC-0015] Use CMSIS-NN with TVM
  • [RFC-0019] Add PaddlePaddle frontend
  • [RFC-0020] Extend metadata in project option
  • [RFC-0022] TIR non-scalar constants
  • [RFC-0023] Adding annotation field to tir.allocate nodes
  • [RFC-0025] PyTorchTVM
  • [RFC-0027] Formalize TVM documentation organization
  • [RFC-0028] Command line composition from internal registry
  • [RFC-0029] Migrating target attributes to IRModule
  • [RFC-0030] Command line configuration files
  • [RFC-0031] C Device API
  • [RFC-0036] TVMScript namespace
  • [RFC-0041] Update TVMScript block syntax

Features and Improvements

TE, TIR, TVMScript

AutoTVM, AutoScheduler, Meta Schedule

Operator Coverage

Read more