[Unity][Backend] Introduce a Relax version of AOTLowerMain#14409
Closed
areusch wants to merge 175 commits intoapache:unityfrom
Closed
[Unity][Backend] Introduce a Relax version of AOTLowerMain#14409areusch wants to merge 175 commits intoapache:unityfrom
areusch wants to merge 175 commits intoapache:unityfrom
Conversation
This PR implements a flexible register-based VM to execute relax programs with dynamic shape and control flow. Design: https://github.com/tlc-pack/relax/wiki/Relax-VM-Design. Co-Authored-by: Ziheng Jiang <ziheng@apache.org> Co-Authored-by: Ruihang Lai <ruihangl@cs.cmu.edu> Co-Authored-by: Sunghyun Park <49998730+sunggg@users.noreply.github.com> Co-Authored-by: Junru Shao <junrushao1994@gmail.com> Co-Authored-by: Prakalp Srivastava <prakalp@octoml.ai> Co-Authored-by: Yong Wu <yongcale@gmail.com> Co-Authored-by: Steven S. Lyubomirsky <slyubomirsky@octoml.ai> Co-Authored-by: Tianqi Chen <tianqi.tchen@gmail.com> Co-Authored-by: Hongyi Jin <3231950289@qq.com>
* [Unity][IR] First-class StructInfo Relax tracks structural information (such as tensor shape) via `StructInfo` about the values in Relax. * Fix rust build --------- Co-authored-by: Junru Shao <junrushao1994@gmail.com>
…pache#13910) This PR setup a unity specific jenkins with minimum jenkinsfile without sharding and disables most of the tests to reduce overall cost. We can add tests of unty branch by configuring the specific groovy file.
[Unity] Basic StructInfo Analysis and Expr construction. This PR adds struct info analysis and expr support. These are logics to construct the IR node and perform struct info related analysis. Testcases are added to cover the IR node construction and related struct info analysis checks. Co-authored-by: Tianqi Chen <tianqi.tchen@gmail.com> Co-authored-by: Altan Haan <altanh@cs.washington.edu> Co-authored-by: Andrew Liu <andrewlliu@gmail.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Jiawei Liu <jaway.liu@gmail.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Lesheng Jin <34279105+LeshengJin@users.noreply.github.com> Co-authored-by: masahi <masahi129@gmail.com> Co-authored-by: Prakalp Srivastava <prakalp@octoml.ai> Co-authored-by: Ruihang Lai <ruihangl@cs.cmu.edu> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Steven S. <Lyubomirsky slyubomirsky@octoml.ai> Co-authored-by: Sunghyun Park <49998730+sunggg@users.noreply.github.com> Co-authored-by: Yixin Dong <ubospica@gmail.com> Co-authored-by: Yong Wu <yongcale@gmail.com> Co-authored-by: Ziheng Jiang <ziheng@apache.org>
This PR adds BlockBuilder: the core data structure to construct Relax AST, and ExprMutator: performs AST mutation for implementing transformation passes. Co-Authored-by: Tianqi Chen <tianqi.tchen@gmail.com> Co-Authored-by: Altan Haan <altanh@cs.washington.edu> Co-Authored-by: Andrew Liu <andrewlliu@gmail.com> Co-Authored-by: Hongyi Jin <3231950289@qq.com> Co-Authored-by: Jiawei Liu <jaway.liu@gmail.com> Co-Authored-by: Junru Shao <junrushao1994@gmail.com> Co-Authored-by: Lesheng Jin <34279105+LeshengJin@users.noreply.github.com> Co-Authored-by: masahi <masahi129@gmail.com> Co-Authored-by: Prakalp Srivastava <prakalp@octoml.ai> Co-Authored-by: Ruihang Lai <ruihangl@cs.cmu.edu> Co-Authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-Authored-by: Steven S. <Lyubomirsky slyubomirsky@octoml.ai> Co-Authored-by: Sunghyun Park <49998730+sunggg@users.noreply.github.com> Co-Authored-by: Yixin Dong <ubospica@gmail.com> Co-Authored-by: Yong Wu <yongcale@gmail.com> Co-Authored-by: Ziheng Jiang <ziheng@apache.org>
This PR adds the TVMScript parser/ir_builder support based on the blockbuilder. Co-authored-by: Ruihang Lai <ruihangl@cs.cmu.edu> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Tianqi Chen <tianqi.tchen@gmail.com> Co-authored-by: Yuchen Jin <yuchenj@cs.washington.edu> Co-authored-by: Steven S. Lyubomirsky <slyubomirsky@gmail.com> Co-authored-by: Yong Wu <yongcale@gmail.com>
This PR introduces Relax as a dialect supported by the TVMScript Printer. Some caveats: - Needs to rebase to mainline before merging. - Some tests are skiped because some operators are not upstreamed to the unity branch yet. Co-authored-by: Tianqi Chen <tianqi.tchen@gmail.com> Co-authored-by: Yuchen Jin <yuchenj@cs.washington.edu> Co-authored-by: Steven S. Lyubomirsky <slyubomirsky@gmail.com> Co-authored-by: Yong Wu <yongcale@gmail.com> Co-authored-by: Prakalp Srivastava <prakalp@octoml.ai> Co-authored-by: Sunghyun Park <49998730+sunggg@users.noreply.github.com> Co-authored-by: Ruihang Lai <ruihangl@cs.cmu.edu> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
This PR introduces Relax `FunctionPass` and `DataflowBlockPass` API, and the `VMShapeLower` pass to lower the shape expression in Relax to TIR functions and VM shape heap builtin functions. Co-Authored-by: Ziheng Jiang <ziheng@apache.org> Co-Authored-by: Lesheng Jin <34279105+LeshengJin@users.noreply.github.com> Co-Authored-by: Altan Haan <altanh@cs.washington.edu> Co-Authored-by: Junru Shao <junrushao1994@gmail.com> Co-Authored-by: Prakalp Srivastava <prakalp@octoml.ai> Co-Authored-by: Ruihang Lai <ruihangl@cs.cmu.edu> Co-Authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-Authored-by: Steven S. <Lyubomirsky slyubomirsky@octoml.ai> Co-Authored-by: Sunghyun Park <49998730+sunggg@users.noreply.github.com> Co-Authored-by: Tianqi Chen <tianqi.tchen@gmail.com> Co-Authored-by: Yong Wu <yongcale@gmail.com>
This PR introduces the e2e Relax lowering flow (`relax.vm.build`). Tests for each pass in the flow are added. Co-Authored-by: Altan Haan <altanh@cs.washington.edu> Co-Authored-by: Andrew Liu <andrewlliu@gmail.com> Co-Authored-by: Hongyi Jin <3231950289@qq.com> Co-Authored-by: Jiawei Liu <jaway.liu@gmail.com> Co-Authored-by: Junru Shao <junrushao1994@gmail.com> Co-Authored-by: Prakalp Srivastava <prakalp@octoml.ai> Co-Authored-by: Ruihang Lai <ruihangl@cs.cmu.edu> Co-Authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-Authored-by: Steven S. <Lyubomirsky slyubomirsky@octoml.ai> Co-Authored-by: Sunghyun Park <49998730+sunggg@users.noreply.github.com> Co-Authored-by: Tianqi Chen <tianqi.tchen@gmail.com> Co-Authored-by: Yong Wu <yongcale@gmail.com> Co-Authored-by: Ziheng Jiang <ziheng@apache.org>
As we've introduced `arg_sinfo` in CallNode, implicit shape constructor is not widely used in TVMScript. This PR removes the implicit shape since it may cause confusion between shape and tuple.
This PR is about the high-level tensor computation operators in Relax. This PR includes the tensor indexing operators.
This PR is about the high-level tensor computation operators in Relax. This PR includes the set operators. Co-authored-by: Prakalp Srivastava <prakalp@octoml.ai>
This PR is about the high-level tensor computation operators in Relax. This PR includes the image operators.
This PR is about the high-level tensor computation operators in Relax. This PR includes the unary, binary and ternary arithmetic and comparison operators. Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Chaofan Lin <1713833595@qq.com>
This PR is about the high-level tensor computation operators in Relax. This PR includes the statistical operators.
This PR is about the high-level tensor computation operators in Relax. This PR includes the neural network operators.
This PR is about the high-level tensor computation operators in Relax. This PR includes the tensor creation operators.
This PR is about the high-level tensor computation operators in Relax. This PR includes the linear algebra operators. Co-authored-by: Siyuan Fneg <Hzfengsy@sjtu.edu.cn>
This PR is about the high-level tensor computation operators in Relax. This PR includes the search operators.
This PR is about the high-level tensor computation operators in Relax. This PR includes the tensor manipulation operators. Co-authored-by: Prakalp Srivastava <prakalp@octoml.ai>
This PR introduce NestedMsg to robustly handle nested-tuple analysis. Relax support nested tuple structures in the IR. Nested tuple structure is important to support advanced groupings in cases such as gradient calculation and other scenarios. The possible presence of nested tuple does mean that we need to to robustly handle analysis that contains nested tuple structures in a dataflow graph. This PR introduces a NestedMsg<T> class that corresponds to a possibly nested message tuple for a given leaf message class T. We also introduces various helper functions to compose and decompose messages. Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Yixin Dong <ubospica@gmail.com> Co-authored-by: Ruihang Lai <ruihangl@cs.cmu.edu>
[Unity][Pass] Operator fusion passes This PR introduces three passes for operator fusion: 1. AnnotateTIROpPattern: analysis the operator kind from PrimFunc. 2. FuseOps: fuse operators for Relax functions, which adds a new fused relax primitive function. 3. FuseTIR: fuse corresponding TIR PrimFuncs for the fused relax.
[VM] Supporting "compiled" exec mode. This PR adds support of "compiled" mode to the VM. The compiled mode translate the relax function into TIR function and drive it through the TIR function. It is different from the micro AOT codegen, which generate TIR code that targets the micro C runtime environment and useful for resource limited settings with smaller set of features. Both leverages the low-level TIR build that is also shared with TensorIR. The current implementation targets full TVM (VM) runtime, that comes with PackedFunc, object, tuple, closure and all kinds of rich structure support. This also mean that we can leverage the full runtime support to handle things like allocation, dynamic shape, easy plugins and python interaction, which are not available in more limited runtime. The user directly use the same API to load the generated code regardless of compiled mode or bytecode. And just need to change one line ```python ex = relax.vm.build(mod, target, exec_mode="compiled") ``` The simplicity is thanks to the TVM runtime archiecture that allows us to compose things together in objects. The only difference is how the PackedFunc of high-level driving is being provided. In the case of bytecode it is normal interpretation and in the case of compiled mode it is TIR. It is a complete implementation Unit-testcases are added. All codegen build tests are updated to include two exec_modes and have passed locally. Co-authored-by: Junru Shao <junrushao1994@gmail.com>
This PR introduces FoldConstant/BindParam passes. Co-authored-by: Yong Wu <yongcale@gmail.com> Co-Authored-by: Hongyi Jin <3231950289@qq.com> Co-Authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
…pache#14014) Add TuningAPI and MetaSchedule tuning pass
This PR implements a Relay to Relax translator, which allows us to import Relay workloads to Relax for benchmarking and development purposes (tests and examples are added).
This PR enables async pipeline creation in webgpu module loading. This will enable us to report progress in shader compilation and leverage multi-threading on the host to compile shaders.
apache#14354) This PR is to fix the FuseOps error if there is no output of a given group, although the pass `DeadCodeElimination` can solve the problem, it is better to enhance the robustness of the pass `FuseOps`.
This PR fixes a bug in Infer Layout for reduction ops in when `axis` has negative indices.
This PR makes 2 changes: 1. Add Relax Op Maximum and Minimum 2. Add translation function for torch function/method silu, to, ones, full, masked_fill_, mean, rsqrt, neg, max in fx translator
…pache#14367) The current GlobalVar generated by `emit_te` has empty `checked_type_`, which will fail at the well-formed check, saying ``` Warning: This IR is not well formed: The checked_type_ of Expr I.GlobalVar("add") is nullptr. ``` This PR fixes this issue and enables well-formed checks in parser tests.
Fix assert to allow scalar (ndim=0) layout initialization.
Also include output dtype in simt MathInstruction.
This PR added Relax VM builtin functions to execute with CUDA graph. - vm.builtin.cuda_graph.get_cached_alloc: Allocate and cache storage objects for future vm invocation - vm.builtin.cuda_graph.run_or_capture: Launched captured CUDA graph or capture the CUDA graph using CUDA API and save in the cache The graph rewriting to enable CUDA graph backend will be done in a separate PR.
The file tests/cpp/nested_msg_test.cc may fail to compile if <array> is not included explicitly.
…14274) Currently, the BYOC system is based on op-level pattern matching, this PR intends to provide primary support for TIR-level pattern matching based on backend registration and dispatching. For now, it simply matches the first set of for loops in PrimFunc. Co-authored-by: Hongyi Jin (@jinhongyii)
This PR adds support for simple dynamic-shape-aware fusion, which is the first step towards supporting dynamic shapes. The main changes are as follows: - Fix FuncStructInfo in well-formed checks - Renew symbolic var defs in fuse_ops to prevent malformed functions
This PR adds a stop_lift_params op to as a hint to the parameter lifter to stop at that boundary point.
…che#14404) This PR enables relax parser to handle Var with ShapeExpr value occuring in R.Tensor annotations.
* [Unity][Pass] Add pass for CSE within dataflow * Fill in CSE definition and test cases * Missing trailing newline --------- Co-authored-by: Prakalp Srivastava <prakalp@octoml.ai>
Collaborator
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
…ns (apache#14386) * Support Relax Constants in the QNN TOPI operations
This PR implements Conv1d. Unit tests are provided accordingly.
…pache#14412) This PR exposes the custom scale in `R.nn.attention` and adds its legalize op.
This commit introduces a Relax pass, AOTLowerMain, which lowers a Relax main function into TIR. It functions similarly to the Relay pass of the same name. Behavioural examples can be seen in the tests.
3cdcf13 to
528da5b
Compare
9c16912 to
9f489dd
Compare
a425bc7 to
5c8b7af
Compare
c95d45f to
45eeb8c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit introduces a Relax pass, AOTLowerMain, which lowers a Relax main function into TIR. It functions similarly to the Relay pass of the same name. Behavioural examples can be seen in the tests.