[Unity][Schedule] Loop-Partition Scheduling Primitive by rutkoor · Pull Request #16430 · apache/tvm

rutkoor · 2024-01-19T04:51:40Z

This PR introduces Loop-Partition scheduling primitive to decompose the single loop into sequence of multiple loops.
This PR is extension of previously created PR(#15901)

…e#16382) Prior to this commit, the `Sh.tee` method was implemented by calling `f"{cmd} | tee"` in `subprocess.run`. While the `check=True` flag was used, the return code was from `tee`, not from the command itself. This causes failures in the command itself to be silently ignored, such as in [this CI pipeline](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-i386/detail/PR-16183/37/pipeline) in the `ci/scripts/jenkins/s3.py` step. This commit updates `Sh.tee` to call `subprocess.Popen` for `cmd`, tee the stdout, and check the return code. (Roughly adapted from [this stackoverflow post](https://stackoverflow.com/a/56484734).)

* [RPC] Fix tuning on macOS and Windows (apache#15771) Fix regression in (apache#15187) when multiprocessing start method is not 'fork', which prevented tuning from working. This affects macOS and Windows. Also in python 3.14 the default start method will be 'spawn'. * [RPC] clean up _serve_loop function

[Runtime] Enable RPCObjectRef return in RPC This PR enables RPCObjectRef return object similar to the disco transporation. This allows us to do advanced remote debugging when remote vm requires advanced object input like kv cache and shape. To keep the implementation with minRPC(used in some of the limited protocols) forn now, we only support RPCObjectRef for now and do not enable unpacking Shape and String.

…16390) fix a typo mistake in pytorch frontend nonzero_numpy

[OpenCL] Fix OpenCL tests compilation Found a problem when you are in a different cmake project (not TVM) and you run TVM build with OpenCL tests, then `CMAKE_SOURCE_DIR` returns the path to the `CMakeList.txt` in the current project (not to the TVM) and in this case we will see the following error: `No SOURCES given to target: opencl-cpptest`. To be consistent with code style in `OpenCL.cmake`, I removed the usage of `CMAKE_SOURCE_DIR` variable. It also fixes the issue if TVM cmake was called from directory with another cmake project.

This PR introduces a new attribute for device backends: `total_global_memory`. This attributes returns the total available global memory on a device in bytes. Tested locally on CUDA/ROCm/Metal/OpenCL: ```python >>> import tvm >>> tvm.metal().total_global_memory 154618822656 ```

- Handle tvm_thread_invariant as no op. - `llvm.amdgcn.ds.bpermute` requires i32 as its input, but it can handle all 32 bit types - ocml intrinsics lead to incorrect codegen when used with vectorization, remove it and use llvm intrinsics instead

Co-authored-by: Andrey Malyshev <sunnyppanda@gmail.com>

…pache#16400) add logical_or to relay pytorch frontend

Co-authored-by: Star Yuan <ysh329@apache.org>

Fixed the softmax layer for 4D tensors to support for NCHW and NHWC layout types. Enabled relevant test cases for softmax layer

fix pytorch frontend linspace op

This PR updates the emsdk and nodejs version of docker.

Lunderberg and others added 16 commits January 11, 2024 17:25

[Relay][Frontend][Torch] fix a typo mistake in nonzero_numpy (apache#…

196b413

…16390) fix a typo mistake in pytorch frontend nonzero_numpy

[CI] Remove NVIDIA_DISABLE_REQUIRE (apache#16384)

3e52c3d

Add NVIDIA Hopper H100 target tag (apache#16407)

3053f65

Co-authored-by: Andrey Malyshev <sunnyppanda@gmail.com>

[Relay][Frontend][Torch] fix pytorch frontend not support logical or (a…

12ad4fb

…pache#16400) add logical_or to relay pytorch frontend

[COMMUNITY] Add new key for release signing (apache#16419)

7ef521f

Co-authored-by: Star Yuan <ysh329@apache.org>

[RUNTIME][CLML] Fix for Softmax op for 4D tensors (apache#16328)

a5e883e

Fixed the softmax layer for 4D tensors to support for NCHW and NHWC layout types. Enabled relevant test cases for softmax layer

[Relay][Frontend][Torch] fix pytorch frontend linspace op (apache#16417)

e1c430c

fix pytorch frontend linspace op

[CMake] Enable cuda lang if USE_CUDA is on (apache#16426)

827beed

[CI][WASM] Update emsdk and nodejs version (apache#16420)

614a7a9

This PR updates the emsdk and nodejs version of docker.

Loop-Partition Scheduling primitive

6e81154

rutkoor closed this Jan 19, 2024

rutkoor deleted the loop-partition-sch-primitive branch January 19, 2024 04:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Unity][Schedule] Loop-Partition Scheduling Primitive#16430

[Unity][Schedule] Loop-Partition Scheduling Primitive#16430
rutkoor wants to merge 16 commits into
apache:unityfrom
rutkoor:loop-partition-sch-primitive

rutkoor commented Jan 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Conversation

rutkoor commented Jan 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants