New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Unity] [Tracking Issue] Heterogeneous execution for Relax #15101
Comments
This pr adds RealizeVDevice pass as mentioned in #15101 * [Unity] RealizeVDevice pass * Replace hint_on_device with to_vdevice in pass * Add sugar to to_vdevice
Thank you for bringing support for heterogeneous graphs @yongwww! Shall we close this as completed? |
Thank you for your contirbution @yongwww! I am wondering are there any e2e test cases I could follow? It seems I can't specify more than one target in the |
@csullivan will close this tracking issue in the coming days, I have a pr to be up to add e2e test cases. |
@qelk123 great question!!! currently relax.build doesn't support multiple targets. I have a pr to enable this e2e test case, the pr is supposed to be up in the coming days (hopefully by end of this week). What we plan to do in Relax is:
|
@yongwww Thank you very much! I am currently working on adding end-to-end multi-device support for my Relax heterogeneous graph based on this issue. At the moment, it works on some simple graphs. As you mentioned, these three aspects are also the main aspects I am focusing on during development. However, I have encountered some constraints during my experiments. One such constraint is that when I am not using the R.dataflow scope, the vdevice information does not appear to be properly propagated. Is the current design intended to support heterogeneous execution only within the dataflow scope? |
@qelk123 Thank you for sharing your progress and challenges on the end-to-end multi-device support. It's promising to hear that it's operational for simpler graphs. The VDevice propagation is intended to function not just for the dataflow block, but also for the binding block. If the vdevice isn't propagating correctly for the binding block, that's a bug that needs fixing. I'll attempt to reproduce the problem and address it. If possible, could you share the specific test case you're working with? Additionally, don't hesitate to report the limitations you've encountered, welcome to submit the patch you have to tvm unity! I'm eager to explore possible enhancements. |
The e2e multi-device test cases were added in #15823. Will close this issue once it lands |
A very useful job! And I try it on tests/python/relax/test_codegen_cudnn.py::test_conv2d_offload according to tests/python/relax/test_vm_multi_device.py::test_multi_device as shown below import numpy as np
import tvm
import tvm.testing
import tvm.topi.testing
from tvm import relax
from tvm.relax.backend.contrib.cudnn import partition_for_cudnn
from tvm.script import relax as R, ir as I
from tvm.script.ir_builder import IRBuilder
from tvm.script.ir_builder import relax as relax_builder
data_shape, weight_shape, dtype = (
(16, 32, 32, 16),
(32, 3, 3, 16),
"float32",
)
input_np = np.random.randn(*data_shape).astype(dtype)
weight_np = np.random.randn(*weight_shape).astype(dtype)
oc = weight_shape[0]
bias_np = np.random.randn(1, 1, 1, oc).astype(dtype)
args = (input_np, weight_np, bias_np)
with IRBuilder() as builder:
with relax_builder.function():
R.func_name("main")
data = R.arg("data", R.Tensor(data_shape, dtype))
weight = R.arg("weight", R.Tensor(weight_shape, dtype))
bias = R.arg("bias", R.Tensor((1, 1, 1, weight_shape[0]), dtype))
with R.dataflow() as frame:
output = R.emit(
R.nn.conv2d(
data,
weight,
out_dtype=dtype,
padding=(1, 1),
data_layout="NHWC",
kernel_layout="OHWI",
)
)
output = R.emit(output + bias)
output = R.emit(relax.op.to_vdevice(output, I.vdevice("llvm")))
output = R.emit(R.multiply(output, R.const(2, "float32")))
R.output(output)
R.func_ret_value(frame.output_vars[0])
func = builder.get()
mod = tvm.IRModule(
{"main": func},
global_infos={
"vdevice": [
I.vdevice("cuda", 0),
I.vdevice("llvm"),
]
},
)
mod = partition_for_cudnn(mod)
mod = relax.transform.RunCodegen()(mod)
devs = [tvm.device("cuda", 0), tvm.device("llvm")]
mod = relax.transform.RealizeVDevice()(mod)
mod = relax.transform.LegalizeOps()(mod)
mod = tvm.tir.transform.DefaultGPUSchedule()(mod)
with tvm.transform.PassContext(config={"relax.backend.use_cuda_graph": False}):
ex = relax.build(mod)
vm = relax.VirtualMachine(ex, devs)
f = vm["main"]
inputs = [tvm.nd.array(inp, tvm.device("cuda", 0)) for inp in input_np]
print(f(*inputs).numpy()) but raise following error
Is there any problem with byoc or I miss something? |
@liquanfeng thanks for reporting this! As the error shown, the vdevice is not defined in the global_info of the IRModule. Reason is a new vdevice was created with I.vdevice() in
|
This issue is to track progress for Relax Heterogenous support proposed in here.
cc @quic-sanirudh
The text was updated successfully, but these errors were encountered: