[Unity][Schedule] Loop-Partition Scheduling Primitive#16430
Closed
rutkoor wants to merge 16 commits into
Closed
Conversation
…e#16382) Prior to this commit, the `Sh.tee` method was implemented by calling `f"{cmd} | tee"` in `subprocess.run`. While the `check=True` flag was used, the return code was from `tee`, not from the command itself. This causes failures in the command itself to be silently ignored, such as in [this CI pipeline](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-i386/detail/PR-16183/37/pipeline) in the `ci/scripts/jenkins/s3.py` step. This commit updates `Sh.tee` to call `subprocess.Popen` for `cmd`, tee the stdout, and check the return code. (Roughly adapted from [this stackoverflow post](https://stackoverflow.com/a/56484734).)
* [RPC] Fix tuning on macOS and Windows (apache#15771) Fix regression in (apache#15187) when multiprocessing start method is not 'fork', which prevented tuning from working. This affects macOS and Windows. Also in python 3.14 the default start method will be 'spawn'. * [RPC] clean up _serve_loop function
[Runtime] Enable RPCObjectRef return in RPC This PR enables RPCObjectRef return object similar to the disco transporation. This allows us to do advanced remote debugging when remote vm requires advanced object input like kv cache and shape. To keep the implementation with minRPC(used in some of the limited protocols) forn now, we only support RPCObjectRef for now and do not enable unpacking Shape and String.
…16390) fix a typo mistake in pytorch frontend nonzero_numpy
[OpenCL] Fix OpenCL tests compilation Found a problem when you are in a different cmake project (not TVM) and you run TVM build with OpenCL tests, then `CMAKE_SOURCE_DIR` returns the path to the `CMakeList.txt` in the current project (not to the TVM) and in this case we will see the following error: `No SOURCES given to target: opencl-cpptest`. To be consistent with code style in `OpenCL.cmake`, I removed the usage of `CMAKE_SOURCE_DIR` variable. It also fixes the issue if TVM cmake was called from directory with another cmake project.
This PR introduces a new attribute for device backends: `total_global_memory`. This attributes returns the total available global memory on a device in bytes. Tested locally on CUDA/ROCm/Metal/OpenCL: ```python >>> import tvm >>> tvm.metal().total_global_memory 154618822656 ```
- Handle tvm_thread_invariant as no op. - `llvm.amdgcn.ds.bpermute` requires i32 as its input, but it can handle all 32 bit types - ocml intrinsics lead to incorrect codegen when used with vectorization, remove it and use llvm intrinsics instead
Co-authored-by: Andrey Malyshev <sunnyppanda@gmail.com>
…pache#16400) add logical_or to relay pytorch frontend
Co-authored-by: Star Yuan <ysh329@apache.org>
Fixed the softmax layer for 4D tensors to support for NCHW and NHWC layout types. Enabled relevant test cases for softmax layer
fix pytorch frontend linspace op
This PR updates the emsdk and nodejs version of docker.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces Loop-Partition scheduling primitive to decompose the single loop into sequence of multiple loops.
This PR is extension of previously created PR(#15901)