Skip to content

[Unity][Schedule] Loop-Partition Scheduling Primitive#16430

Closed
rutkoor wants to merge 16 commits into
apache:unityfrom
rutkoor:loop-partition-sch-primitive
Closed

[Unity][Schedule] Loop-Partition Scheduling Primitive#16430
rutkoor wants to merge 16 commits into
apache:unityfrom
rutkoor:loop-partition-sch-primitive

Conversation

@rutkoor
Copy link
Copy Markdown
Contributor

@rutkoor rutkoor commented Jan 19, 2024

This PR introduces Loop-Partition scheduling primitive to decompose the single loop into sequence of multiple loops.
This PR is extension of previously created PR(#15901)

Lunderberg and others added 16 commits January 11, 2024 17:25
…e#16382)

Prior to this commit, the `Sh.tee` method was implemented by calling
`f"{cmd} | tee"` in `subprocess.run`.  While the `check=True` flag was
used, the return code was from `tee`, not from the command itself.
This causes failures in the command itself to be silently ignored,
such as in [this CI
pipeline](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-i386/detail/PR-16183/37/pipeline)
in the `ci/scripts/jenkins/s3.py` step.

This commit updates `Sh.tee` to call `subprocess.Popen` for `cmd`, tee
the stdout, and check the return code.  (Roughly adapted from [this
stackoverflow post](https://stackoverflow.com/a/56484734).)
* [RPC] Fix tuning on macOS and Windows (apache#15771)

Fix regression in (apache#15187) when multiprocessing start method is not 'fork',
which prevented tuning from working. This affects macOS and Windows.
Also in python 3.14 the default start method will be 'spawn'.

* [RPC] clean up _serve_loop function
[Runtime] Enable RPCObjectRef return in RPC

This PR enables RPCObjectRef return object similar to the disco transporation.
This allows us to do advanced remote debugging when remote vm requires
advanced object input like kv cache and shape.

To keep the implementation with minRPC(used in some of the limited protocols) forn now,
we only support RPCObjectRef for now and do not enable unpacking Shape and String.
…16390)

fix a typo mistake in pytorch frontend nonzero_numpy
[OpenCL] Fix OpenCL tests compilation

Found a problem when you are in a different cmake project (not TVM) and you run TVM build with OpenCL tests, then `CMAKE_SOURCE_DIR` returns the path to the `CMakeList.txt` in the current project (not to the TVM) and in this case we will see the following error: `No SOURCES given to target: opencl-cpptest`.

To be consistent with code style in `OpenCL.cmake`, I removed the usage of `CMAKE_SOURCE_DIR` variable. It also fixes the issue if TVM cmake was called from directory with another cmake project.
This PR introduces a new attribute for device backends:
`total_global_memory`. This attributes returns the total available
global memory on a device in bytes.

Tested locally on CUDA/ROCm/Metal/OpenCL:
```python
>>> import tvm
>>> tvm.metal().total_global_memory
154618822656
```
- Handle tvm_thread_invariant as no op.
- `llvm.amdgcn.ds.bpermute` requires i32 as its input, but it can handle all 32 bit types
- ocml intrinsics lead to incorrect codegen when used with vectorization, remove it and use llvm intrinsics instead
Co-authored-by: Andrey Malyshev <sunnyppanda@gmail.com>
Co-authored-by: Star Yuan <ysh329@apache.org>
Fixed the softmax layer for 4D tensors to support for NCHW and NHWC
layout types.
Enabled relevant test cases for softmax layer
This PR updates the emsdk and nodejs version of docker.
@rutkoor rutkoor closed this Jan 19, 2024
@rutkoor rutkoor deleted the loop-partition-sch-primitive branch January 19, 2024 04:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.