[Unity][Backend] Introduce a Relax version of AOTLowerMain by areusch · Pull Request #14409 · apache/tvm

areusch · 2023-03-27T18:08:05Z

This commit introduces a Relax pass, AOTLowerMain, which lowers a Relax main function into TIR. It functions similarly to the Relay pass of the same name. Behavioural examples can be seen in the tests.

This PR implements a flexible register-based VM to execute relax programs with dynamic shape and control flow. Design: https://github.com/tlc-pack/relax/wiki/Relax-VM-Design. Co-Authored-by: Ziheng Jiang <ziheng@apache.org> Co-Authored-by: Ruihang Lai <ruihangl@cs.cmu.edu> Co-Authored-by: Sunghyun Park <49998730+sunggg@users.noreply.github.com> Co-Authored-by: Junru Shao <junrushao1994@gmail.com> Co-Authored-by: Prakalp Srivastava <prakalp@octoml.ai> Co-Authored-by: Yong Wu <yongcale@gmail.com> Co-Authored-by: Steven S. Lyubomirsky <slyubomirsky@octoml.ai> Co-Authored-by: Tianqi Chen <tianqi.tchen@gmail.com> Co-Authored-by: Hongyi Jin <3231950289@qq.com>

* [Unity][IR] First-class StructInfo Relax tracks structural information (such as tensor shape) via `StructInfo` about the values in Relax. * Fix rust build --------- Co-authored-by: Junru Shao <junrushao1994@gmail.com>

…pache#13910) This PR setup a unity specific jenkins with minimum jenkinsfile without sharding and disables most of the tests to reduce overall cost. We can add tests of unty branch by configuring the specific groovy file.

[Unity] Basic StructInfo Analysis and Expr construction. This PR adds struct info analysis and expr support. These are logics to construct the IR node and perform struct info related analysis. Testcases are added to cover the IR node construction and related struct info analysis checks. Co-authored-by: Tianqi Chen <tianqi.tchen@gmail.com> Co-authored-by: Altan Haan <altanh@cs.washington.edu> Co-authored-by: Andrew Liu <andrewlliu@gmail.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Jiawei Liu <jaway.liu@gmail.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Lesheng Jin <34279105+LeshengJin@users.noreply.github.com> Co-authored-by: masahi <masahi129@gmail.com> Co-authored-by: Prakalp Srivastava <prakalp@octoml.ai> Co-authored-by: Ruihang Lai <ruihangl@cs.cmu.edu> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Steven S. <Lyubomirsky slyubomirsky@octoml.ai> Co-authored-by: Sunghyun Park <49998730+sunggg@users.noreply.github.com> Co-authored-by: Yixin Dong <ubospica@gmail.com> Co-authored-by: Yong Wu <yongcale@gmail.com> Co-authored-by: Ziheng Jiang <ziheng@apache.org>

This PR adds BlockBuilder: the core data structure to construct Relax AST, and ExprMutator: performs AST mutation for implementing transformation passes. Co-Authored-by: Tianqi Chen <tianqi.tchen@gmail.com> Co-Authored-by: Altan Haan <altanh@cs.washington.edu> Co-Authored-by: Andrew Liu <andrewlliu@gmail.com> Co-Authored-by: Hongyi Jin <3231950289@qq.com> Co-Authored-by: Jiawei Liu <jaway.liu@gmail.com> Co-Authored-by: Junru Shao <junrushao1994@gmail.com> Co-Authored-by: Lesheng Jin <34279105+LeshengJin@users.noreply.github.com> Co-Authored-by: masahi <masahi129@gmail.com> Co-Authored-by: Prakalp Srivastava <prakalp@octoml.ai> Co-Authored-by: Ruihang Lai <ruihangl@cs.cmu.edu> Co-Authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-Authored-by: Steven S. <Lyubomirsky slyubomirsky@octoml.ai> Co-Authored-by: Sunghyun Park <49998730+sunggg@users.noreply.github.com> Co-Authored-by: Yixin Dong <ubospica@gmail.com> Co-Authored-by: Yong Wu <yongcale@gmail.com> Co-Authored-by: Ziheng Jiang <ziheng@apache.org>

This PR adds the TVMScript parser/ir_builder support based on the blockbuilder. Co-authored-by: Ruihang Lai <ruihangl@cs.cmu.edu> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Tianqi Chen <tianqi.tchen@gmail.com> Co-authored-by: Yuchen Jin <yuchenj@cs.washington.edu> Co-authored-by: Steven S. Lyubomirsky <slyubomirsky@gmail.com> Co-authored-by: Yong Wu <yongcale@gmail.com>

This PR introduces Relax as a dialect supported by the TVMScript Printer. Some caveats: - Needs to rebase to mainline before merging. - Some tests are skiped because some operators are not upstreamed to the unity branch yet. Co-authored-by: Tianqi Chen <tianqi.tchen@gmail.com> Co-authored-by: Yuchen Jin <yuchenj@cs.washington.edu> Co-authored-by: Steven S. Lyubomirsky <slyubomirsky@gmail.com> Co-authored-by: Yong Wu <yongcale@gmail.com> Co-authored-by: Prakalp Srivastava <prakalp@octoml.ai> Co-authored-by: Sunghyun Park <49998730+sunggg@users.noreply.github.com> Co-authored-by: Ruihang Lai <ruihangl@cs.cmu.edu> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>

This PR introduces Relax `FunctionPass` and `DataflowBlockPass` API, and the `VMShapeLower` pass to lower the shape expression in Relax to TIR functions and VM shape heap builtin functions. Co-Authored-by: Ziheng Jiang <ziheng@apache.org> Co-Authored-by: Lesheng Jin <34279105+LeshengJin@users.noreply.github.com> Co-Authored-by: Altan Haan <altanh@cs.washington.edu> Co-Authored-by: Junru Shao <junrushao1994@gmail.com> Co-Authored-by: Prakalp Srivastava <prakalp@octoml.ai> Co-Authored-by: Ruihang Lai <ruihangl@cs.cmu.edu> Co-Authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-Authored-by: Steven S. <Lyubomirsky slyubomirsky@octoml.ai> Co-Authored-by: Sunghyun Park <49998730+sunggg@users.noreply.github.com> Co-Authored-by: Tianqi Chen <tianqi.tchen@gmail.com> Co-Authored-by: Yong Wu <yongcale@gmail.com>

This PR introduces the e2e Relax lowering flow (`relax.vm.build`). Tests for each pass in the flow are added. Co-Authored-by: Altan Haan <altanh@cs.washington.edu> Co-Authored-by: Andrew Liu <andrewlliu@gmail.com> Co-Authored-by: Hongyi Jin <3231950289@qq.com> Co-Authored-by: Jiawei Liu <jaway.liu@gmail.com> Co-Authored-by: Junru Shao <junrushao1994@gmail.com> Co-Authored-by: Prakalp Srivastava <prakalp@octoml.ai> Co-Authored-by: Ruihang Lai <ruihangl@cs.cmu.edu> Co-Authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-Authored-by: Steven S. <Lyubomirsky slyubomirsky@octoml.ai> Co-Authored-by: Sunghyun Park <49998730+sunggg@users.noreply.github.com> Co-Authored-by: Tianqi Chen <tianqi.tchen@gmail.com> Co-Authored-by: Yong Wu <yongcale@gmail.com> Co-Authored-by: Ziheng Jiang <ziheng@apache.org>

As we've introduced `arg_sinfo` in CallNode, implicit shape constructor is not widely used in TVMScript. This PR removes the implicit shape since it may cause confusion between shape and tuple.

This PR is about the high-level tensor computation operators in Relax. This PR includes the tensor indexing operators.

This PR is about the high-level tensor computation operators in Relax. This PR includes the set operators. Co-authored-by: Prakalp Srivastava <prakalp@octoml.ai>

This PR is about the high-level tensor computation operators in Relax. This PR includes the image operators.

This PR is about the high-level tensor computation operators in Relax. This PR includes the unary, binary and ternary arithmetic and comparison operators. Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Chaofan Lin <1713833595@qq.com>

This PR is about the high-level tensor computation operators in Relax. This PR includes the statistical operators.

This PR is about the high-level tensor computation operators in Relax. This PR includes the neural network operators.

This PR is about the high-level tensor computation operators in Relax. This PR includes the tensor creation operators.

This PR is about the high-level tensor computation operators in Relax. This PR includes the linear algebra operators. Co-authored-by: Siyuan Fneg <Hzfengsy@sjtu.edu.cn>

This PR is about the high-level tensor computation operators in Relax. This PR includes the search operators.

This PR is about the high-level tensor computation operators in Relax. This PR includes the tensor manipulation operators. Co-authored-by: Prakalp Srivastava <prakalp@octoml.ai>

This PR introduce NestedMsg to robustly handle nested-tuple analysis. Relax support nested tuple structures in the IR. Nested tuple structure is important to support advanced groupings in cases such as gradient calculation and other scenarios. The possible presence of nested tuple does mean that we need to to robustly handle analysis that contains nested tuple structures in a dataflow graph. This PR introduces a NestedMsg<T> class that corresponds to a possibly nested message tuple for a given leaf message class T. We also introduces various helper functions to compose and decompose messages. Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Yixin Dong <ubospica@gmail.com> Co-authored-by: Ruihang Lai <ruihangl@cs.cmu.edu>

[Unity][Pass] Operator fusion passes This PR introduces three passes for operator fusion: 1. AnnotateTIROpPattern: analysis the operator kind from PrimFunc. 2. FuseOps: fuse operators for Relax functions, which adds a new fused relax primitive function. 3. FuseTIR: fuse corresponding TIR PrimFuncs for the fused relax.

[VM] Supporting "compiled" exec mode. This PR adds support of "compiled" mode to the VM. The compiled mode translate the relax function into TIR function and drive it through the TIR function. It is different from the micro AOT codegen, which generate TIR code that targets the micro C runtime environment and useful for resource limited settings with smaller set of features. Both leverages the low-level TIR build that is also shared with TensorIR. The current implementation targets full TVM (VM) runtime, that comes with PackedFunc, object, tuple, closure and all kinds of rich structure support. This also mean that we can leverage the full runtime support to handle things like allocation, dynamic shape, easy plugins and python interaction, which are not available in more limited runtime. The user directly use the same API to load the generated code regardless of compiled mode or bytecode. And just need to change one line ```python ex = relax.vm.build(mod, target, exec_mode="compiled") ``` The simplicity is thanks to the TVM runtime archiecture that allows us to compose things together in objects. The only difference is how the PackedFunc of high-level driving is being provided. In the case of bytecode it is normal interpretation and in the case of compiled mode it is TIR. It is a complete implementation Unit-testcases are added. All codegen build tests are updated to include two exec_modes and have passed locally. Co-authored-by: Junru Shao <junrushao1994@gmail.com>

This PR introduces FoldConstant/BindParam passes. Co-authored-by: Yong Wu <yongcale@gmail.com> Co-Authored-by: Hongyi Jin <3231950289@qq.com> Co-Authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>

…pache#14014) Add TuningAPI and MetaSchedule tuning pass

This PR implements a Relay to Relax translator, which allows us to import Relay workloads to Relax for benchmarking and development purposes (tests and examples are added).

This PR enables async pipeline creation in webgpu module loading. This will enable us to report progress in shader compilation and leverage multi-threading on the host to compile shaders.

apache#14354) This PR is to fix the FuseOps error if there is no output of a given group, although the pass `DeadCodeElimination` can solve the problem, it is better to enhance the robustness of the pass `FuseOps`.

This PR fixes a bug in Infer Layout for reduction ops in when `axis` has negative indices.

This PR makes 2 changes: 1. Add Relax Op Maximum and Minimum 2. Add translation function for torch function/method silu, to, ones, full, masked_fill_, mean, rsqrt, neg, max in fx translator

…pache#14367) The current GlobalVar generated by `emit_te` has empty `checked_type_`, which will fail at the well-formed check, saying ``` Warning: This IR is not well formed: The checked_type_ of Expr I.GlobalVar("add") is nullptr. ``` This PR fixes this issue and enables well-formed checks in parser tests.

Fix assert to allow scalar (ndim=0) layout initialization.

Also include output dtype in simt MathInstruction.

This PR added Relax VM builtin functions to execute with CUDA graph. - vm.builtin.cuda_graph.get_cached_alloc: Allocate and cache storage objects for future vm invocation - vm.builtin.cuda_graph.run_or_capture: Launched captured CUDA graph or capture the CUDA graph using CUDA API and save in the cache The graph rewriting to enable CUDA graph backend will be done in a separate PR.

The file tests/cpp/nested_msg_test.cc may fail to compile if <array> is not included explicitly.

@jinhongyii

…14274) Currently, the BYOC system is based on op-level pattern matching, this PR intends to provide primary support for TIR-level pattern matching based on backend registration and dispatching. For now, it simply matches the first set of for loops in PrimFunc. Co-authored-by: Hongyi Jin (@jinhongyii)

This PR adds support for simple dynamic-shape-aware fusion, which is the first step towards supporting dynamic shapes. The main changes are as follows: - Fix FuncStructInfo in well-formed checks - Renew symbolic var defs in fuse_ops to prevent malformed functions

This PR adds a stop_lift_params op to as a hint to the parameter lifter to stop at that boundary point.

…che#14404) This PR enables relax parser to handle Var with ShapeExpr value occuring in R.Tensor annotations.

* [Unity][Pass] Add pass for CSE within dataflow * Fill in CSE definition and test cases * Missing trailing newline --------- Co-authored-by: Prakalp Srivastava <prakalp@octoml.ai>

tvm-bot · 2023-03-27T18:08:09Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

_{Generated by tvm-bot}

…ns (apache#14386) * Support Relax Constants in the QNN TOPI operations

This PR implements Conv1d. Unit tests are provided accordingly.

…pache#14412) This PR exposes the custom scale in `R.nn.attention` and adds its legalize op.

This commit introduces a Relax pass, AOTLowerMain, which lowers a Relax main function into TIR. It functions similarly to the Relay pass of the same name. Behavioural examples can be seen in the tests.

YuchenJin and others added 30 commits March 20, 2023 10:59

[Unity] Relax expressions and types (apache#13901)

2c7f480

[Unity][IR] First-class StructInfo (apache#13907)

4e659d1

* [Unity][IR] First-class StructInfo Relax tracks structural information (such as tensor shape) via `StructInfo` about the values in Relax. * Fix rust build --------- Co-authored-by: Junru Shao <junrushao1994@gmail.com>

[Unity] Relax VM codegen (apache#13954)

ea6cc94

[Unity][TVMScript] Use explicit R.shape in TVMScript (apache#13979)

75eecf7

As we've introduced `arg_sinfo` in CallNode, implicit shape constructor is not widely used in TVMScript. This PR removes the implicit shape since it may cause confusion between shape and tuple.

[Unity] Relax op: index (apache#13987)

de164d2

This PR is about the high-level tensor computation operators in Relax. This PR includes the tensor indexing operators.

[Unity] Relax op: datatype (apache#13986)

c8a1533

[Unity] Relax op: set (apache#13990)

c7a57ae

This PR is about the high-level tensor computation operators in Relax. This PR includes the set operators. Co-authored-by: Prakalp Srivastava <prakalp@octoml.ai>

[Unity] Relax op: image (apache#13994)

4240920

This PR is about the high-level tensor computation operators in Relax. This PR includes the image operators.

[Unity] Relax op: statistical (apache#13991)

a6a2e84

This PR is about the high-level tensor computation operators in Relax. This PR includes the statistical operators.

[Unity] Relax op: neural networks (apache#13993)

a96f200

This PR is about the high-level tensor computation operators in Relax. This PR includes the neural network operators.

[Unity] Relax op: creation (apache#13984)

4dd591b

This PR is about the high-level tensor computation operators in Relax. This PR includes the tensor creation operators.

[Unity] Relax op: linear algebra (apache#13988)

9694c67

This PR is about the high-level tensor computation operators in Relax. This PR includes the linear algebra operators. Co-authored-by: Siyuan Fneg <Hzfengsy@sjtu.edu.cn>

[Unity] Relax op: search (apache#13992)

7a8765d

This PR is about the high-level tensor computation operators in Relax. This PR includes the search operators.

[Unity] Relax op: manipulation (apache#13989)

4ab73ea

This PR is about the high-level tensor computation operators in Relax. This PR includes the tensor manipulation operators. Co-authored-by: Prakalp Srivastava <prakalp@octoml.ai>

[Unity][Pass] LambdaLift pass (apache#14012)

e78d523

[Unity][Pass] BindParams pass, FoldConstant pass (apache#14016)

63e2402

This PR introduces FoldConstant/BindParam passes. Co-authored-by: Yong Wu <yongcale@gmail.com> Co-Authored-by: Hongyi Jin <3231950289@qq.com> Co-Authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>

[Unity][Pass][TuningAPI] Introduce TuningAPI and MetaSchedule pass (a…

87659ea

…pache#14014) Add TuningAPI and MetaSchedule tuning pass

[Unity] Relay -> Relax translator (apache#14026)

388941a

This PR implements a Relay to Relax translator, which allows us to import Relay workloads to Relax for benchmarking and development purposes (tests and examples are added).

tqchen and others added 14 commits March 21, 2023 20:41

[Unity][WEB] Support async pipeline creation (apache#14362)

ee5a2c2

This PR enables async pipeline creation in webgpu module loading. This will enable us to report progress in shader compilation and leverage multi-threading on the host to compile shaders.

[Unity][Fix] Infer Layout must support negative axes (apache#14365)

04c6d7a

This PR fixes a bug in Infer Layout for reduction ops in when `axis` has negative indices.

[Unity] Add More Ops For FX Translator (apache#14348)

db96ee8

This PR makes 2 changes: 1. Add Relax Op Maximum and Minimum 2. Add translation function for torch function/method silu, to, ones, full, masked_fill_, mean, rsqrt, neg, max in fx translator

[Unity][Fix] Allow scalar layout initialization (apache#14370)

b8eb779

Fix assert to allow scalar (ndim=0) layout initialization.

[Unity] Also include output dtype in simt MathInstruction (apache#14372)

4e46ad4

Also include output dtype in simt MathInstruction.

[Unity] Add missing #include <array> (apache#14383)

cc03017

The file tests/cpp/nested_msg_test.cc may fail to compile if <array> is not included explicitly.

[Unity][Op] Add stop_lift_params (apache#14368)

eff9a0a

This PR adds a stop_lift_params op to as a hint to the parameter lifter to stop at that boundary point.

[Unity][TVMScript] Fix Shape Var occurrence in Tensor annotation (apa…

d377f69

…che#14404) This PR enables relax parser to handle Var with ShapeExpr value occuring in R.Tensor annotations.

[Unity][Transform] Common Subexpression Elimination (apache#14361)

f4d5964

* [Unity][Pass] Add pass for CSE within dataflow * Fill in CSE definition and test cases * Missing trailing newline --------- Co-authored-by: Prakalp Srivastava <prakalp@octoml.ai>

farshidsp and others added 6 commits March 27, 2023 13:57

[Unity][QNN][Hexagon]Support Relax Constants in the QNN TOPI operatio…

d97c43b

…ns (apache#14386) * Support Relax Constants in the QNN TOPI operations

[Unity][Op] Conv1d (apache#14388)

9a3ec23

This PR implements Conv1d. Unit tests are provided accordingly.

[Unity] Fix getting shapes for cutlass BYOC kernels (apache#14411)

cd3e107

[Unity][Op] Expose scale in R.nn.attention and add its legalize op (a…

23146d6

…pache#14412) This PR exposes the custom scale in `R.nn.attention` and adds its legalize op.

[Unity][Backend] Introduce a Relax version of AOTLowerMain

aa8ba2d

This commit introduces a Relax pass, AOTLowerMain, which lowers a Relax main function into TIR. It functions similarly to the Relay pass of the same name. Behavioural examples can be seen in the tests.

fix linting

528da5b

mbaret force-pushed the relax-aot-lower-main branch from 3cdcf13 to 528da5b Compare March 28, 2023 15:45

more linting

9f489dd

mbaret force-pushed the relax-aot-lower-main branch from 9c16912 to 9f489dd Compare March 28, 2023 17:55

tqchen force-pushed the unity branch 2 times, most recently from a425bc7 to 5c8b7af Compare April 1, 2023 20:00

junrushao force-pushed the unity branch 2 times, most recently from c95d45f to 45eeb8c Compare December 18, 2023 21:00

tqchen deleted the branch apache:unity March 29, 2024 12:18

tqchen closed this Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Unity][Backend] Introduce a Relax version of AOTLowerMain#14409

[Unity][Backend] Introduce a Relax version of AOTLowerMain#14409
areusch wants to merge 175 commits intoapache:unityfrom
areusch:relax-aot-lower-main

areusch commented Mar 27, 2023

Uh oh!

tvm-bot commented Mar 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

areusch commented Mar 27, 2023

Uh oh!

tvm-bot commented Mar 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants