[Relax] Add bitwise and logical ops (AND, NOT, OR, XOR) by vvchernov · Pull Request #15074 · apache/tvm

vvchernov · 2023-06-12T14:48:38Z

Support bitwise and logical unary (NOT) and binary (AND, OR, XOR) ops on Relax side.

) This PR adds a series of operators for relax: - `repeat(data: Expr, repeats: int, axis: Optional[int] = None)` - `tile(data: Expr, repeats: Union[int, Tuple[int], List[int]])` - `relax.nn.conv2d_transpose` - This operator is intended to find the gradient of conv2d w.r.t its input. For details see [pytorch document](https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html), and the document in `python/tvm/relax/op/nn/nn.py`. - Now TOPI support of conv2d_transpose is limited. It does not support dilations; layouts other than default; symbolic `output_padding`. - `relax.nn.avg_pool2d`

…s BYOC (apache#14252) * Add conv2d and residual block pattern for cutlass * Fix lint

This PR provides some quick fixes for fx_translator to import tanh and fix the error when importing torch.nn.functional.layer_norm.

…#14269) Some of Relay passes are target specific (for example, AlterOpLayout). This commit adds target information for Relay -> Relax conversion.

Introduce call_dps_packed to call packed functions in destination-passing style, reserving call_tir for TIR PrimFuncs instead. * [Unity] Introduce call_dps_packed * fix lint * Fix comments * Remove well_form update, enforce in InferStructInfoCallTIR * Update src/relax/op/op.cc * Update description of call_tir * Remove unnecessary check in passes

[ci] Use CPU-SMALL instances We prefer these instances in CI unless there is a lot of heavy building going on (the normal tvm build is cached with sccache so this is usually just for docker builds). Co-authored-by: driazati <driazati@users.noreply.github.com>

The current cross-function calls in TVMScript will cause PyLint warnings, since the GlobalVar will be marked as undefined vars, e.g.: ```python @I.ir_module class TestModule: @T.prim_func def tir_func( x: T.Buffer((T.int64(128),), "float32"), y: T.Buffer((T.int64(128),), "float32") ): T.evaluate(0) @R.function def foo(x: R.Tensor((128,), "float32")) -> R.Tensor((128,), "float32"): gv0 = R.call_tir(tir_func, x, R.Tensor((128,), dtype="float32")) # <= `tir_func` is not defined in Python syntax. return gv0 ``` This PR changes the behavior into `TestModule.tir_func` instead of direct `tir_func` ```python @I.ir_module class TestModule: @T.prim_func def tir_func( x: T.Buffer((T.int64(128),), "float32"), y: T.Buffer((T.int64(128),), "float32") ): T.evaluate(0) @R.function def foo(x: R.Tensor((128,), "float32")) -> R.Tensor((128,), "float32"): cls = TestModule # Use `cls` to refer the current Module gv0 = R.call_tir(cls.tir_func, x, R.Tensor((128,), dtype="float32")) return gv0 ``` NOTE: It's a breaking change, the old style is deprecated. Additionally, this PR contains the following minor fixes: - mark `R.function` as staticmethod as what we do for `T.prim_func` - make `I`, `R`, `T`, `cls` be the builtin keywords for the printer - define names for functions, modules to prevent naming conflict - checking the var names is valid via regex expression - fix typos

A hot fix for the struct info deduction for `vm.alloc_tensor`

This PR introduces the cumulative sum high-level operator. Also replace some `T.var("int64")` with `T.int64()` in `test_ast_printer`.

This PR adds an instrumentation option to the relax VM. The instrument will be called before/after each call instruction if specified. We also include a testing utility that leverages uses instrument. LibCompareVMInstrument leverages the instrument to compare implementations on another backend. Also updated a few places in web runtime to improve debugging.

…or (apache#14215) * [Unity][Pass] Add a pass to alter the TIR implementation of an operator (identified by operator_kind attribute on PrimFunc). It also inserts layout changes to i/o buffers at Relax level. * deep copy index map to avoid structural_equality fail * do not mark layouts as frozen * address comments * fix call_tir global symbol in tests

[Unity] Enable special dimension value 0 in reshape

This PR specifies the maximum buffer size limit of WebGPU runtime explicitly so that the limit will not be a tight one by default. In the future, we can analysis and detect the needed max buffer size limit and actively request the GPU.

* stub * wip * works * restore binding * attention test work * use RemoveAllUnused * simplified callback api * pass original call node to callback * clean test * add doc * add test for the case where the original call is returned * callback -> rewriter and other doc improvement

…pache#14317) * enable residual fusion support for matmul * disallow residual fusion without bias * support conv2d + residual add without bias via conv2d + bias pattern

… FuseOpsByPattern (apache#14310) * Change the input of FuseOpsByPattern and add check for result dependency in cutlass conv2d residual block * Rename FuseOpsPattern to FusionPattern and PatternCheckFunctionInput to PatternCheckContext

…4265) This PR enhances the current RewriteDataflowReshape transformation. Originally, it will use loop vars to prove the equality of addresses of LHS and RHS, which will cause some cases to fail due to the limitation of the arith module. Instead, we can just use block vars to do the proof, which is supposed to be equivalent and the expressions are simpler, which allow us to cover more cases.

…he#14275) - allow Conv2d using different alignment factors for input and epilogue, which can influence performance - store the profiler cache on disk, reducing CUTLASS profiler overhead across different runs - use the same set of default tile configurations as CUTLASS for sm80 https://github.com/NVIDIA/cutlass/blob/master/tools/library/scripts/generator.py#L1881

This pr modifies the topi implementation (which is also the legalizer's backend of Relax) of LayerNorm and GroupNorm operators to allow them to accept fp16 inputs, cast to fp32 internally, and produce fp16 outputs. This can help eliminate unnecessary casts caused by AMP.

This PR adds a new pass ConvertLayout that converts the layout of conv2d operators (NCHW->NHWC) and tries to propagate this conversion when appropriate. see https://github.com/spectrometerHBH/tvm/blob/convert-layout/src/relax/transform/convert_layout.cc#L39 on how this pass works It works on the op level, which is in parallel with the ongoing Layout Transformation effort that works on TIR level in the community.

This PR adds a new pass DeadCodeElimination, which currently removes unused local vars in df blocks.

This PR adds a new pass ToMixedPrecision to automatically cast fp32 models to fp16 when necessary. see https://github.com/spectrometerHBH/tvm/blob/amp/src/relax/transform/to_mixed_precision.cc#L51 on how this pass works.

fix: cpplint casting

This pr enhances introduces tir.is_scheduled to allow default schedule pass to skip some PrimFuncs.

In the case where the begin index of strided slice is out of [-ndim, dim), the strided slice operator will yield some incorrect shape inference. This PR corrected this issue by canonicalizing the begin and end index for strided slice and calculate the symbolic shape. Added some new unit tests for out of range begin locations and changed some symbolic shape tests. Co-authored-by: Tianqi Chen <tqchen@users.noreply.github.com> Co-authored-by: Ruihang Lai <ruihangl@cs.cmu.edu>

…e#14334) As a follow-up to apache#14262, I just noticed that I previously implemented a function called `RemoveAllUnused` (apache#14043) that can do function-wise DCE and should be more complete than apache#14262 as `RemoveAllUnused` can also remove dead dataflow blocks (added two test-cases to show that). For now: - `tvm.relax.transform.DeadCodeElimination` is the pass for running DCE over an IRModule. - `tvm.relax.analysis.remove_all_unused` is a function for running DCE over a function. `tvm.relax.transform.DeadCodeElimination` is implemented based on `tvm.relax.analysis.remove_all_unused`. I did not sync the function name for the two, as I saw there are other uses for `RemoveAllUnused` so just want to be conservative first.

The pass RemoveUnusedFunctions was renamed and enhanced to the DCE pass in apache#14262. This PR removes the Python interface of RemoveUnusedFunctions in transform.py. It is surprising that this Python API was never tested in any testing file before. This PR also fixes `alter_op_impl.cc` and `to_mixed_precision.cc` to for Clang build warnings.

This PR updates test cases to follow recent changes

* improved cutlass residual fusion * update cutlass residual test * update residual check * fix residual check * clean * fix * minor

apache#14971) * pass current bindings to rewriter * add check func to CombineParallelMatmul * clean * add doc for df binding rewrite update * add test * black

* [Unity] Add popn to kvcache * Temp disable problematic grad tests

…apache#14974) Previously the pass LazyTransformParams did not take the case where an output variable is used by other bindings inside the function. It generates wrong Relax function in this case. This PR fixes this issue. The new test case is an example to elaborate the issue. This PR introduces a sanity assertion check to ensure we handle the case properly. This PR also enhances variable naming. For binding variables of `set_item` and `kill_object`, we now use an underscore ("`_`") as its name, compared with the previous `lv` name.

…ut (apache#14961)

Reset match state when backtracking

…dient_numeric (apache#14987)

…ache#15001) * [Unity][CUTLASS] Fix circular import bug * [Unity][CUTLASS] remove unnecessary cutlass import. * [Unity][CUTLASS] remove not used __init__ file.

Updated the runtime to latest cutlass conflict resolutions

* [Unity] Fix importing tvm.contrib.cutlass * Update cutlass.py

* [FIX] add init file to `relax.backend.contrib` This PR adds `__init__.py` to `relax.backend.contrib`, fixing the package issue reported at mlc-ai/mlc-llm#311 * circular dep

* [Unity] Allow name_hint argument to BlockBuilder.emit_te * [Unity][NN] Pass name_hint from nn.emit to BlockBuilder.emit

* Dynamic attention * Fix lint

* [BYOC] Cache cuBlasLt handle with thread entry * fix * fix

…che#15025) * [Unity][NN] Allow nn.Placeholder/Parameter prior to BlockBuilder Prior to this commit, use of `nn.Placeholder` or `nn.Parameter` outside of a `with block_builder.function('name'):` scope resulted in an error. This commit updates the behavior to allow declaration prior to entering the `with` block. This can be useful for declaring a model object, which is then used to define several related functions. The scope was required so that `relax.BlockBuilder.current()` could de-duplicate variable names. While two distinct variables in Relax may have identical names, for user readability it is convenient to have all names be unique within a Relax function. This commit maintains the de-duplication of names if a `nn.Placeholder` or `nn.Parameter` is defined within an active `relax.BlockBuilder`, that context may be used to provide a unique name. * Lint fix

upd

apache#15064)

This PR adds a direct ONNX to relax importer and test suite. It has decent out of the box coverage and has been tested for numerous end to end use cases. I hope this will be a valuable tool for the Unity community.

tvm-bot · 2023-06-12T14:48:41Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @junrushao _{See #10317 for details}

_{Generated by tvm-bot}

vvchernov · 2023-06-12T14:52:46Z

Mistake with branch to pull (main instead of unity). See PR #15075

Ubospica and others added 30 commits April 1, 2023 15:31

[Unity] Allow user defined func attrs in emit_te (apache#14255)

07f7fc0

[Unity][BYOC] Add conv2d and residual block patterns for Relax cutlas…

04d6d15

…s BYOC (apache#14252) * Add conv2d and residual block pattern for cutlass * Fix lint

[Unity][Frontend] Import tanh and fix layer_norm (apache#14247)

df9d084

This PR provides some quick fixes for fx_translator to import tanh and fix the error when importing torch.nn.functional.layer_norm.

[Unity] Consider target context for Relay to Relax conversion (apache…

71c23b7

…#14269) Some of Relay passes are target specific (for example, AlterOpLayout). This commit adds target information for Relay -> Relax conversion.

[Unity] Mark tests that need python3.8 compact.

6be5c19

[Unity] Fix StructInfo Infer for vm.alloc_tensor (apache#14283)

68827fa

A hot fix for the struct info deduction for `vm.alloc_tensor`

[Unity][Op] Cumsum (apache#14297)

e3f9755

This PR introduces the cumulative sum high-level operator. Also replace some `T.var("int64")` with `T.int64()` in `test_ast_printer`.

[Unity][Op] Enable special dimension value 0 in reshape (apache#14311)

a46fd90

[Unity] Enable special dimension value 0 in reshape

[Unity][BYOC] Support matmul + residual block fusion in CUTLASS BYOC (a…

2f1642f

…pache#14317) * enable residual fusion support for matmul * disallow residual fusion without bias * support conv2d + residual add without bias via conv2d + bias pattern

[Unity][Transform] Simple Dead Code Elimination (apache#14262)

59746ea

This PR adds a new pass DeadCodeElimination, which currently removes unused local vars in df blocks.

[Unity][Transform] Automatic Mixed Precision (apache#14242)

371c3a2

This PR adds a new pass ToMixedPrecision to automatically cast fp32 models to fp16 when necessary. see https://github.com/spectrometerHBH/tvm/blob/amp/src/relax/transform/to_mixed_precision.cc#L51 on how this pass works.

[Unity][Lint] Fix cpplint casting (apache#14333)

6919830

fix: cpplint casting

[Unity][Transform] DefaultSchedule pass (apache#14266)

5229827

This pr enhances introduces tir.is_scheduled to allow default schedule pass to skip some PrimFuncs.

[Unity][BYOC] Update testcases to follow recent changes (apache#14339)

050562b

This PR updates test cases to follow recent changes

masahi and others added 26 commits May 27, 2023 19:06

[Unity][CUTLASS] Support more residual input shape (apache#14968)

c448c50

* improved cutlass residual fusion * update cutlass residual test * update residual check * fix residual check * clean * fix * minor

[Unity] Allow filtering out unwanted branches in matmul combining pass (

9c0b41b

apache#14971) * pass current bindings to rewriter * add check func to CombineParallelMatmul * clean * add doc for df binding rewrite update * add test * black

[Unity] Add popn to kvcache (apache#14970)

a2bddcf

* [Unity] Add popn to kvcache * Temp disable problematic grad tests

[Unity] Fix ConvertLayout on binary elemwise ops involving scalar inp…

34e87ae

…ut (apache#14961)

[Unity] Reset match state when backtracking (apache#14984)

44c849f

Reset match state when backtracking

[Unity][Training] Avoid problematic inputs to nll_loss in test_op_gra…

0a0b119

…dient_numeric (apache#14987)

[Unity][CUTLASS] Fix circular import bug in relax cutlass backend (ap…

f360556

…ache#15001) * [Unity][CUTLASS] Fix circular import bug * [Unity][CUTLASS] remove unnecessary cutlass import. * [Unity][CUTLASS] remove not used __init__ file.

[MERGE] Merge main into unity 2023-06-01

153f70a

Updated the runtime to latest cutlass conflict resolutions

[Unity] Fix importing tvm.contrib.cutlass (apache#15010)

2f13f2d

* [Unity] Fix importing tvm.contrib.cutlass * Update cutlass.py

[Unity] Fix cutlass BYOC after merge (apache#15012)

6d8fe1c

[Unity][FIX] add init file to relax.backend.contrib (apache#15023)

1a4e08e

* [FIX] add init file to `relax.backend.contrib` This PR adds `__init__.py` to `relax.backend.contrib`, fixing the package issue reported at mlc-ai/mlc-llm#311 * circular dep

[Unity] Allow name_hint in additional locations (apache#15027)

3dd9345

* [Unity] Allow name_hint argument to BlockBuilder.emit_te * [Unity][NN] Pass name_hint from nn.emit to BlockBuilder.emit

[Unity] Cutlass attention with dynamic sequence length (apache#15028)

7dd76cc

* Dynamic attention * Fix lint

[Unity][BYOC] Cache cuBlasLt handle with thread entry (apache#15030)

91b6cee

* [BYOC] Cache cuBlasLt handle with thread entry * fix * fix

[Unity] Add hexp for compute capacity <= 5.2 (apache#15070)

7c15b55

upd

[Unity] Add an API to create multiple kv caches with single allocation (

e9ddd47

apache#15064)

[Unity][Frontend] Add relax onnx importer and tests (apache#14999)

0406b9a

This PR adds a direct ONNX to relax importer and test suite. It has decent out of the box coverage and has been tested for numerous end to end use cases. I hope this will be a valuable tool for the Unity community.

add bitwise and logical binary ops (and, or, xor) to relax

247f93d

add bitwise and logical NOT to unary relax ops

781d3ad

legalize bitwise and logical NOT

5dfffe1

extand ir by new ops

6547306

add bitwise and logical ops to headers

a8a47e5

add bitwise and logical not to native code

8777b9a

add test

b00051e

github-actions bot requested a review from tqchen June 12, 2023 14:49

vvchernov closed this Jun 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relax] Add bitwise and logical ops (AND, NOT, OR, XOR)#15074

[Relax] Add bitwise and logical ops (AND, NOT, OR, XOR)#15074
vvchernov wants to merge 366 commits intoapache:mainfrom
Deelvin:vc/bitwise_logical

vvchernov commented Jun 12, 2023

Uh oh!

tvm-bot commented Jun 12, 2023

Uh oh!

vvchernov commented Jun 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

vvchernov commented Jun 12, 2023

Uh oh!

tvm-bot commented Jun 12, 2023

Uh oh!

vvchernov commented Jun 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants