Skip to content

[Relax] Add bitwise and logical ops (AND, NOT, OR, XOR)#15074

Closed
vvchernov wants to merge 366 commits intoapache:mainfrom
Deelvin:vc/bitwise_logical
Closed

[Relax] Add bitwise and logical ops (AND, NOT, OR, XOR)#15074
vvchernov wants to merge 366 commits intoapache:mainfrom
Deelvin:vc/bitwise_logical

Conversation

@vvchernov
Copy link
Copy Markdown
Contributor

Support bitwise and logical unary (NOT) and binary (AND, OR, XOR) ops on Relax side.

cc @tqchen

Ubospica and others added 30 commits April 1, 2023 15:31
)

This PR adds a series of operators for relax:
- `repeat(data: Expr, repeats: int, axis: Optional[int] = None)`
- `tile(data: Expr, repeats: Union[int, Tuple[int], List[int]])`
- `relax.nn.conv2d_transpose`
	- This operator is intended to find the gradient of conv2d w.r.t its input. For details see [pytorch document](https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html), and the document in `python/tvm/relax/op/nn/nn.py`.
	- Now TOPI support of conv2d_transpose is limited. It does not support dilations; layouts other than default; symbolic `output_padding`.
- `relax.nn.avg_pool2d`
…s BYOC (apache#14252)

* Add conv2d and residual block pattern for cutlass

* Fix lint
This PR provides some quick fixes for fx_translator to import tanh and fix the error when importing torch.nn.functional.layer_norm.
…#14269)

Some of Relay passes are target specific (for example, AlterOpLayout).
This commit adds target information for Relay -> Relax conversion.
Introduce call_dps_packed to call packed functions in destination-passing style, reserving call_tir for TIR PrimFuncs instead.

* [Unity] Introduce call_dps_packed

* fix lint

* Fix comments

* Remove well_form update, enforce in InferStructInfoCallTIR

* Update src/relax/op/op.cc

* Update description of call_tir

* Remove unnecessary check in passes
[ci] Use CPU-SMALL instances

We prefer these instances in CI unless there is a lot of heavy building
going on (the normal tvm build is cached with sccache so this is usually
just for docker builds).

Co-authored-by: driazati <driazati@users.noreply.github.com>
The current cross-function calls in TVMScript will cause PyLint warnings,
since the GlobalVar will be marked as undefined vars, e.g.:

```python
@I.ir_module
class TestModule:
    @T.prim_func
    def tir_func(
        x: T.Buffer((T.int64(128),), "float32"), y: T.Buffer((T.int64(128),), "float32")
    ):
        T.evaluate(0)

    @R.function
    def foo(x: R.Tensor((128,), "float32")) -> R.Tensor((128,), "float32"):
        gv0 = R.call_tir(tir_func, x, R.Tensor((128,), dtype="float32"))  # <= `tir_func` is not defined in Python syntax.
        return gv0
```

This PR changes the behavior into `TestModule.tir_func` instead of direct `tir_func`
```python
@I.ir_module
class TestModule:
    @T.prim_func
    def tir_func(
        x: T.Buffer((T.int64(128),), "float32"), y: T.Buffer((T.int64(128),), "float32")
    ):
        T.evaluate(0)

    @R.function
    def foo(x: R.Tensor((128,), "float32")) -> R.Tensor((128,), "float32"):
        cls = TestModule  # Use `cls` to refer the current Module
        gv0 = R.call_tir(cls.tir_func, x, R.Tensor((128,), dtype="float32"))
        return gv0
```

NOTE: It's a breaking change, the old style is deprecated.

Additionally, this PR contains the following minor fixes:
- mark `R.function` as staticmethod as what we do for `T.prim_func`
- make `I`, `R`, `T`, `cls` be the builtin keywords for the printer
- define names for functions, modules to prevent naming conflict
- checking the var names is valid via regex expression
- fix typos
A hot fix for the struct info deduction for `vm.alloc_tensor`
This PR introduces the cumulative sum high-level operator.

Also replace some `T.var("int64")` with `T.int64()` in `test_ast_printer`.
This PR adds an instrumentation option to the relax VM.
The instrument will be called before/after each call
instruction if specified.

We also include a testing utility that leverages uses
instrument. LibCompareVMInstrument leverages the instrument
to compare implementations on another backend.

Also updated a few places in web runtime to improve debugging.
…or (apache#14215)

* [Unity][Pass] Add a pass to alter the TIR
implementation of an operator (identified
by operator_kind attribute on PrimFunc).
It also inserts layout changes to i/o
buffers at Relax level.

* deep copy index map to avoid structural_equality fail

* do not mark layouts as frozen

* address comments

* fix call_tir global symbol in tests
[Unity] Enable special dimension value 0 in reshape
This PR specifies the maximum buffer size limit of WebGPU runtime
explicitly so that the limit will not be a tight one by default.

In the future, we can analysis and detect the needed max buffer size
limit and actively request the GPU.
* stub

* wip

* works

* restore binding

* attention test work

* use RemoveAllUnused

* simplified callback api

* pass original call node to callback

* clean test

* add doc

* add test for the case where the original call is returned

* callback -> rewriter and other doc improvement
…pache#14317)

* enable residual fusion support for matmul

* disallow residual fusion without bias

* support conv2d + residual add without bias via conv2d + bias pattern
… FuseOpsByPattern (apache#14310)

* Change the input of FuseOpsByPattern and add check for result dependency in cutlass conv2d residual block

* Rename FuseOpsPattern to FusionPattern and PatternCheckFunctionInput to PatternCheckContext
…4265)

This PR enhances the current RewriteDataflowReshape transformation.

Originally, it will use loop vars to prove the equality of addresses of LHS and RHS, which will cause some cases to fail due to the limitation of the arith module.

Instead, we can just use block vars to do the proof, which is supposed to be equivalent and the expressions are simpler, which allow us to cover more cases.
…he#14275)

- allow Conv2d using different alignment factors for input and epilogue, which can influence performance 
- store the profiler cache on disk, reducing CUTLASS profiler overhead across different runs
- use the same set of default tile configurations as CUTLASS for sm80 https://github.com/NVIDIA/cutlass/blob/master/tools/library/scripts/generator.py#L1881
This pr modifies the topi implementation (which is also the legalizer's backend of Relax) of LayerNorm and GroupNorm operators to allow them to accept fp16 inputs, cast to fp32 internally, and produce fp16 outputs.

This can help eliminate unnecessary casts caused by AMP.
This PR adds a new pass ConvertLayout that converts the layout of conv2d operators (NCHW->NHWC) and tries to propagate this conversion when appropriate.

see https://github.com/spectrometerHBH/tvm/blob/convert-layout/src/relax/transform/convert_layout.cc#L39 on how this pass works

It works on the op level, which is in parallel with the ongoing Layout Transformation effort that works on TIR level in the community.
This PR adds a new pass DeadCodeElimination, which currently removes unused local vars in df blocks.
This PR adds a new pass ToMixedPrecision to automatically cast fp32 models to fp16 when necessary.

see https://github.com/spectrometerHBH/tvm/blob/amp/src/relax/transform/to_mixed_precision.cc#L51 on how this pass works.
This pr enhances introduces tir.is_scheduled to allow default schedule pass to skip some PrimFuncs.
In the case where the begin index of strided slice is out of [-ndim, dim), the strided slice operator will yield some incorrect shape inference. This PR corrected this issue by canonicalizing the begin and end index for strided slice and calculate the symbolic shape. Added some new unit tests for out of range begin locations and changed some symbolic shape tests.

Co-authored-by: Tianqi Chen <tqchen@users.noreply.github.com>
Co-authored-by: Ruihang Lai <ruihangl@cs.cmu.edu>
…e#14334)

As a follow-up to apache#14262, I just noticed that I previously implemented a function called `RemoveAllUnused` (apache#14043) that can do function-wise DCE and should be more complete than apache#14262 as `RemoveAllUnused` can also remove dead dataflow blocks (added two test-cases to show that). 

For now:
- `tvm.relax.transform.DeadCodeElimination` is the pass for running DCE over an IRModule.
- `tvm.relax.analysis.remove_all_unused` is a function for running DCE over a function.

`tvm.relax.transform.DeadCodeElimination` is implemented based on `tvm.relax.analysis.remove_all_unused`. I did not sync the function name for the two, as I saw there are other uses for `RemoveAllUnused` so just want to be conservative first.
The pass RemoveUnusedFunctions was renamed and enhanced to the DCE pass
in apache#14262. This PR removes the Python interface of RemoveUnusedFunctions
in transform.py.

It is surprising that this Python API was never tested in any testing
file before.

This PR also fixes `alter_op_impl.cc` and `to_mixed_precision.cc` to for
Clang build warnings.
This PR updates test cases to follow recent changes
masahi and others added 26 commits May 27, 2023 19:06
* improved cutlass residual fusion

* update cutlass residual test

* update residual check

* fix residual check

* clean

* fix

* minor
apache#14971)

* pass current bindings to rewriter

* add check func to CombineParallelMatmul

* clean

* add doc for df binding rewrite update

* add test

* black
* [Unity] Add popn to kvcache

* Temp disable problematic grad tests
…apache#14974)

Previously the pass LazyTransformParams did not take the case where an
output variable is used by other bindings inside the function. It
generates wrong Relax function in this case.

This PR fixes this issue. The new test case is an example to elaborate
the issue. This PR introduces a sanity assertion check to ensure we
handle the case properly.

This PR also enhances variable naming. For binding variables of
`set_item` and `kill_object`, we now use an underscore ("`_`") as its
name, compared with the previous `lv` name.
…ache#15001)

* [Unity][CUTLASS] Fix circular import bug

* [Unity][CUTLASS] remove unnecessary cutlass import.

* [Unity][CUTLASS] remove not used __init__ file.
Updated the runtime to latest
cutlass conflict resolutions
* [Unity] Fix importing tvm.contrib.cutlass

* Update cutlass.py
* [FIX] add init file to `relax.backend.contrib`

This PR adds `__init__.py` to `relax.backend.contrib`, fixing the package
issue reported at mlc-ai/mlc-llm#311

* circular dep
* [Unity] Allow name_hint argument to BlockBuilder.emit_te

* [Unity][NN] Pass name_hint from nn.emit to BlockBuilder.emit
* [BYOC] Cache cuBlasLt handle with thread entry

* fix

* fix
…che#15025)

* [Unity][NN] Allow nn.Placeholder/Parameter prior to BlockBuilder

Prior to this commit, use of `nn.Placeholder` or `nn.Parameter`
outside of a `with block_builder.function('name'):` scope resulted in
an error.  This commit updates the behavior to allow declaration prior
to entering the `with` block.  This can be useful for declaring a
model object, which is then used to define several related functions.

The scope was required so that `relax.BlockBuilder.current()` could
de-duplicate variable names.  While two distinct variables in Relax
may have identical names, for user readability it is convenient to
have all names be unique within a Relax function.  This commit
maintains the de-duplication of names if a `nn.Placeholder` or
`nn.Parameter` is defined within an active `relax.BlockBuilder`, that
context may be used to provide a unique name.

* Lint fix
This PR adds a direct ONNX to relax importer and test suite. It has decent out of the box coverage and has been tested for numerous end to end use cases. I hope this will be a valuable tool for the Unity community.
@tvm-bot
Copy link
Copy Markdown
Collaborator

tvm-bot commented Jun 12, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@github-actions github-actions bot requested a review from tqchen June 12, 2023 14:49
@vvchernov vvchernov closed this Jun 12, 2023
@vvchernov
Copy link
Copy Markdown
Contributor Author

Mistake with branch to pull (main instead of unity). See PR #15075

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.