[BugFix][Relax] Select target-specific pipeline in tvm.compile when GPU target is provided#19384
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the Relax VM build process to prefer target-specific default pipelines when the "default" pipeline string is provided and a target is available. A review comment correctly identified that the implementation incorrectly attempts to access the pipeline submodule directly, which would likely trigger an AttributeError and cause an unintended fallback to the generic pipeline; a suggestion was provided to use the properly exported relax.get_default_pipeline function instead.
…arget is provided relax.build() with relax_pipeline="default" always resolved to default_build_pipeline, which omits FuseOps, FuseTIR, and DLight scheduling. On CUDA this left individual TIR functions (e.g. maximum, minimum from Clip/ReLU6) without thread bindings, causing VerifyMemory to fail: Memory verification failed: Variable X is directly accessed by host memory (it is not contained in a thread environment or in the function arguments). When relax_pipeline="default" and a target is provided, prefer relax.pipeline.get_default_pipeline(target), which includes the full legalization + fusion + DLight scheduling pipeline. Falls back to default_build_pipeline if no target-specific pipeline is registered (e.g. ValueError or AttributeError from get_default_pipeline).
e715582 to
3203e19
Compare
tvm.compile when target is provided
…pipeline `cpu_generic.get_default_pipeline` was missing `DispatchSampling` and `DispatchSortScan` from its `library_dispatch_passes`, causing ops like `relax.cumsum` and `relax.topk` to reach CodeGenVM without being dispatched, resulting in "CodeGenVM cannot handle this intrinsic" errors on CPU/llvm targets.
32d3457 to
e6d872a
Compare
The previous fix applied get_default_pipeline(target) whenever a target
was provided, including CPU (llvm). The CPU-specific pipeline includes
FoldConstant and FuseOps/FuseTIR which DCE unused call_pure_packed
calls -- correct per the pure semantics, but it broke existing tests
that relied on their side effects.
Narrow the scope: only use get_default_pipeline for GPU targets
(identified by 'gpu' in target.keys). CPU targets continue to use
get_pipeline('default'), which is the previous behaviour.
tvm.compile when target is provided|
While investigating the CPU test failures, I noticed that z = R.call_pure_packed(
"test.vm.identity", x, y, sinfo_args=(R.Tensor(ndim=2, dtype="float32"))
)
return y # z unused — relies on y being modified as a side effect
I'll send a follow-up PR to fix the test after this merges. |
Problem
relax.build()(exposed astvm.compile) withrelax_pipeline="default"alwaysresolved to
default_build_pipeline, regardless of the target.default_build_pipelinedoes not include DLight scheduling — it is atarget-agnostic lowering pipeline. On CUDA, this left TIR functions generated
from ops like
Clip/ReLU6without thread bindings, causingVerifyMemoryto fail:Fix
When
relax_pipeline="default"and the target is a GPU target(
"gpu" in target.keys), userelax.get_default_pipeline(target)whichincludes target-aware DLight scheduling. Fall back to
default_build_pipelineif no target-specific pipeline is registered.
CPU targets (
llvm,c) continue to usedefault_build_pipelineunchanged.The CPU-specific pipeline adds
FuseOps/FuseTIR/FoldConstanton top,which can DCE
call_pure_packedcalls whose results are unused — correctper pure semantics, but a separate concern from this fix.