Skip to content

[REFACTOR][RUNTIME][CODEGEN] Backend specific target and runtime to enable cross-compile fallback#19465

Merged
tqchen merged 10 commits intoapache:mainfrom
tqchen:tvm-unify-device-module
Apr 29, 2026
Merged

[REFACTOR][RUNTIME][CODEGEN] Backend specific target and runtime to enable cross-compile fallback#19465
tqchen merged 10 commits intoapache:mainfrom
tqchen:tvm-unify-device-module

Conversation

@tqchen
Copy link
Copy Markdown
Member

@tqchen tqchen commented Apr 28, 2026

Why

This refactor reshapes each backend into a self-contained src/target/<X>/
cluster (with optional src/target/<X>/llvm/ for LLVM-dependent codegen) and
introduces a per-backend fallback module that absorbs the cross-compile role
cleanly — without target/opt/ stubs, without DeviceSourceModuleNode, and
without leaking a synthetic kind() to consumers.

High-level principles

  • One directory per backend. All codegen-side files for backend <X> live
    under src/target/<X>/. Optional src/target/<X>/llvm/ subdir for files
    that require USE_LLVM at build time. Backend grouping wins over
    build-dependency grouping (the latter being upstream's target/llvm/).
  • Plugin-only runtime modules. src/runtime/<X>/<X>_module.h is deleted.
    The runtime's real <X>ModuleNode is reachable only via the FFI registry
    (ffi.Module.create.<kind>, ffi.Module.load_from_bytes.<kind>). No
    C++ API surface other than the static registrations.
  • Per-backend fallback module for cross-compile. Each <X> gets a
    <X>FallbackModuleNode in src/target/<X>/<X>_fallback_module.{h,cc}.
    Same kind() as the real module. Codegen-time only — never reachable via
    load. GetFunction errors with a backend-specific "runtime not linked"
    message; InspectSource works.
  • Codegen-side wrapper does the fallback selection. Codegen calls
    <X>ModuleCreateWithFallback(...), which tries ffi.Module.create.<kind>
    via the registry; on miss, falls through to <X>FallbackModuleCreate
    (plain C++; reachable directly from the fallback header). When USE_<X>=ON
    is in effect, the registry hit returns the real module; when
    USE_<X>=OFF, the fallback is what codegen gets. No CMake if/else
    gating; fallback always compiled.

Specific changes

New per-backend directories (codegen + fallback)

  • src/target/cuda/codegen_cuda.cc + intrin_rule_cuda.cc + fallback module pair + llvm/codegen_nvptx.cc
  • src/target/rocm/ — fallback module pair + llvm/codegen_amdgpu.cc + llvm/intrin_rule_rocm.cc
  • src/target/hexagon/ — fallback module pair + llvm/codegen_hexagon.cc + llvm/intrin_rule_hexagon.cc
  • src/target/metal/codegen_metal.cc + intrin_rule_metal.cc + fallback module pair
  • src/target/vulkan/build_vulkan.cc + the rest of target/spirv/ absorbed + fallback module pair
  • src/target/opencl/codegen_opencl.cc + intrin_rule_opencl.cc + fallback module pair
  • src/target/webgpu/codegen_webgpu.cc + fallback module pair (was WebGPUSourceModuleNode, renamed)

New fallback classes

<X>FallbackModuleNode for X in {CUDA, ROCm, Hexagon, Metal, Vulkan, OpenCL, WebGPU}. Each:

  • kind() matches the real backend
  • Stores (code or smap, fmt, fmap, source) — no driver/runtime calls
  • GetFunction errors with backend-specific "runtime not linked" message
  • InspectSource works fully
  • SaveToBytes byte-identical to real

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a unified DeviceModuleCreate interface and a DeviceGenericModule carrier to support cross-compilation and lazy JIT realization of device modules across various backends, including CUDA, Hexagon, Metal, OpenCL, ROCM, Vulkan, and WebGPU. This architecture allows the system to handle device code even when the corresponding runtime is not present in the current environment, effectively replacing several legacy "off" stub files. A critical issue was identified in the thread-safe initialization of DeviceGenericModuleNode, where the double-checked locking pattern is currently racy and needs to be corrected to prevent potential data races.

Comment thread src/runtime/device_module.cc Outdated
tqchen added a commit to tqchen/tvm that referenced this pull request Apr 28, 2026
…penCL InspectSource fix

Several per-user simplifications motivated by post-review feedback on PR apache#19465.

WebGPU runs in tvmjs and has no C++ runtime factory; routing it through
DeviceGenericModule added indirection with no benefit.  Restore
WebGPUSourceModuleNode in codegen_webgpu.cc so BuildWebGPU constructs it
directly, and add a comment explaining the exemption.

Source-only backends (CUDA fmt=="cuda", OpenCL fmt=="cl") no longer populate
a redundant source map entry — code_ IS the human-readable source.
DeviceGenericModuleNode::InspectSource now falls back to code_ when the
format matches fmt_ or when format is empty (no-arg default), fixing the
test_opencl_erf regression where inspect_source() returned "" on CPU CI.

Drop realize_attempted_: realized_.has_value() is the correct guard; on throw
the next caller retries cleanly without a stale error cache.

Add Serializer<ffi::Bytes> to serializer.h so ffi::Map<String,Bytes> can be
written/read generically via stream.Write(source_) in DeviceGenericModuleNode.

Drop version string from Metal smap serialization (SerializeMetalSmap,
SaveToBytes, LoadFromBytes); the format is implicit in fmt.

Fix ffi::String constructed from ffi::Bytes via std::string intermediate in
cuda_module.cc, rocm_module.cc, hexagon_module.cc — use ffi::String(data, size)
directly.
@tqchen tqchen changed the title [REFACTOR][RUNTIME][CODEGEN] Unify device module factory signatures; add DeviceGenericModule [REFACTOR][RUNTIME][CODEGEN] Backend specific target and runtime to enable cross-compile fallback Apr 29, 2026
tqchen added 7 commits April 29, 2026 13:49
…allback design (CUDA)

This commit reshapes the CUDA backend into a self-contained
`src/target/cuda/` cluster and introduces a per-backend fallback module
design that absorbs the cross-compile role cleanly — without
`target/opt/` stubs, without leaking a synthetic `kind()` to consumers.

The runtime `CUDAModuleNode` becomes plugin-only: its header is removed
and the only access is via the FFI registry keys
`ffi.Module.create.cuda` and `ffi.Module.load_from_bytes.cuda`.  The
codegen-facing factory `CUDAModuleCreateWithFallback` lives in
`src/target/cuda/cuda_fallback_module.h`; it tries the registry and on
miss (or when `TVM_COMPILE_FORCE_FALLBACK` is set) constructs a
`CUDAFallbackModuleNode` that mirrors the real module's `kind()` and
`SaveToBytes` byte-for-byte.

The `fmt=="cuda"` JIT path moves into `CUDAModuleNode::JitCompileFromSource`,
invoked from BOTH the create-lambda (when codegen passes raw source) AND
`LoadFromBytes` (when the saved fmt is "cuda" — the cross-compile receiver
path).  The fallback never JITs.

Main changes:

- Add `src/target/cuda/{cuda_fallback_module.h,cuda_fallback_module.cc}`
- Move `target/source/{codegen_cuda,intrin_rule_cuda,ptx,literal/cuda_*}`
  into `target/cuda/`; move `target/llvm/codegen_nvptx.cc` into
  `target/cuda/llvm/`
- Absorb `BuildCUDA` from `target/opt/build_cuda_on.cc` into
  `target/cuda/codegen_cuda.cc`; delete `target/opt/build_cuda_*.cc`
- Plugin-only runtime: delete `src/runtime/cuda/cuda_module.h`; rewrite
  `cuda_module.cc` with unified signature `(Bytes code, String fmt,
  Map fmap, Map<String,String> source)`, source-not-serialized save/load,
  and `JitCompileFromSource` static helper
- `tvm_callback_cuda_compile` Python signature simplified from
  `(code, target)` to `(code,)`; current target is fetched inside
  `compile_cuda` via `Target.current(allow_none=True)`
- Add `Serializer<ffi::Bytes>` so `stream.Write(code)` works without a
  `std::string` round-trip
- Add `runtime.LoadMetaDataFromJSON` global func; rewire
  `_create_cuda_module` (tirx external_kernel) to construct in-memory
  via `ffi.Module.create.cuda`, eliminating the load_from_file path
- Update `cmake/modules/CUDA.cmake` to drop opt stubs; update
  `cmake/modules/LLVM.cmake` to glob `target/*/llvm/`; update
  `CMakeLists.txt` CODEGEN_SRCS to include `target/cuda/`
- Replace ad-hoc `test_device_generic_module.py` with
  `test_cuda_fallback_module.py` exercising the env-var-forced fallback
  through normal compile + export + load + run
Mirror the per-backend pattern landed for CUDA: cluster ROCm codegen
under src/target/rocm/, introduce ROCmFallbackModuleNode for codegen
on USE_ROCM=OFF hosts, and make src/runtime/rocm/ plugin-only.  The
fallback's saved bytes are byte-identical to ROCMModuleNode's, so a
USE_ROCM=ON receiver loads them transparently.

- Move src/target/llvm/{codegen_amdgpu,intrin_rule_rocm}.cc into
  src/target/rocm/llvm/ (history preserved via git mv).
- Add src/target/rocm/rocm_fallback_module.{h,cc} exposing
  ROCmModuleCreateWithFallback (wrapper) and ROCmFallbackModuleCreate.
- Refactor src/runtime/rocm/rocm_module.cc to the unified factory
  signature (Bytes code, String fmt, Map fmap, Map<String,String>
  source); drop WriteToFile + LoadFile registrations; bespoke
  hip_source_/assembly_ fields collapse into the source map (keys
  "hip" / "asm").
- Delete src/runtime/rocm/rocm_module.h (plugin-only rule) and
  src/target/opt/build_rocm_off.cc (replaced by the fallback).
- BuildAMDGPU now calls target::ROCmModuleCreateWithFallback with the
  hsaco bytes + LL/asm in the source map.
- LLVM.cmake globs src/target/rocm/llvm/; CMakeLists.txt CODEGEN_SRCS
  picks up src/target/rocm/.
Mirror the per-backend pattern: cluster Hexagon codegen under
src/target/hexagon/, introduce HexagonFallbackModuleNode for codegen on
USE_HEXAGON=OFF hosts, and make src/runtime/hexagon/ plugin-only.  The
fallback's saved bytes are byte-identical to HexagonModuleNode's, so a
USE_HEXAGON=ON receiver loads them transparently.

Hexagon is the only backend whose source map carries binary auxiliaries
(object code, bitcode) alongside text (asm, IR), so its source map type
is `Map<String, Variant<String, Bytes>>` — Variant lets each upstream
blob be passed in its natural form.

The `code` payload migrates from a filename string to bytes: BuildHexagon
slurps the linked .so file into memory and the unified factory takes the
bytes directly.  Receiver-side materialization to a tempfile + dlopen is
left as a follow-up when Hexagon execution moves into HexagonModuleNode
(today execution is via a separate RPC-based runtime path).

- Move src/target/llvm/{codegen_hexagon,intrin_rule_hexagon}.cc into
  src/target/hexagon/llvm/ (history preserved via git mv).
- Add src/target/hexagon/hexagon_fallback_module.{h,cc} exposing
  HexagonModuleCreateWithFallback (wrapper) and HexagonFallbackModuleCreate.
- Refactor src/runtime/hexagon/hexagon_module.cc to the unified factory
  signature (Bytes code, String fmt, Map fmap, Map<String, Variant<String,
  Bytes>> source); drop bespoke asm_/obj_/ir_/bc_ fields and the
  WriteToFile registration; collapse auxiliaries into the source map.
- Delete src/runtime/hexagon/hexagon_module.h (plugin-only rule) and
  src/target/opt/build_hexagon_off.cc (replaced by the fallback).
- BuildHexagon now reads the .so file as bytes and calls
  target::HexagonModuleCreateWithFallback with the unified payload.
- LLVM.cmake globs src/target/hexagon/llvm/; CMakeLists.txt CODEGEN_SRCS
  picks up src/target/hexagon/.
Mirror the per-backend pattern: cluster Metal codegen under
src/target/metal/, introduce MetalFallbackModuleNode for codegen on
USE_METAL=OFF hosts (typical Linux CI), and make src/runtime/metal/
plugin-only.  The fallback's saved bytes are byte-identical to
MetalModuleNode's, so a USE_METAL=ON receiver (macOS) loads them
transparently and JIT-compiles MSL via MTLDevice::newLibraryWithSource
on first GetFunction.

Metal is the first multi-shader backend in this PR — its per-kernel
payload is Map<String, Bytes>.  The text-vs-binary distinction lives in
fmt ("metal" source / "metallib" compiled), not in the container; this
is the uniform shape used by all multi-shader backends (Metal, Vulkan,
WebGPU).  The auxiliary in-memory source map is Map<String, String>
(text-only — MSL aggregated source dump).

The save/load format simplifies from upstream's 4 fields
([version][smap][fmap][fmt]) to the cross-backend uniform 3 fields
([fmt][fmap][smap]) — the version field was reserved-for-future and
unused.  Both real and fallback now produce the same shape.

- Move src/target/source/{codegen_metal,intrin_rule_metal}.{cc,h} into
  src/target/metal/ (history preserved via git mv).
- Add src/target/metal/metal_fallback_module.{h,cc} exposing
  MetalModuleCreateWithFallback (wrapper) and MetalFallbackModuleCreate.
- Refactor src/runtime/metal/metal_module.mm to the unified factory
  signature (Map<String, Bytes> smap, String fmt, Map fmap,
  Map<String, String> source); 3-field SaveToBytes; drop the unused
  runtime.module.create_metal_module FFI registration; drop the
  WriteToFile override; drop the version field.
- Delete src/runtime/metal/metal_module.h (plugin-only rule) and
  src/target/opt/build_metal_off.cc (replaced by the fallback).
- BuildMetal builds an ffi::Map<String, Bytes> smap and calls
  target::MetalModuleCreateWithFallback with the unified payload;
  aggregated MSL source dump goes into the in-memory source map keyed
  by "metal".
- CMakeLists.txt CODEGEN_SRCS picks up src/target/metal/.
Cluster WebGPU codegen under src/target/webgpu/ and rename the existing
WebGPUSourceModuleNode to WebGPUFallbackModuleNode under the per-backend
naming convention.  WebGPU is unique among the backends in that it has
no native C++ runtime — the real receiver is the wasm runtime in
web/emcc/webgpu_runtime.cc, which registers only the load-from-bytes
side of the FFI registry.  So WebGPUFallbackModuleNode IS the canonical
C++-side module; the "fallback" name is uniformity rather than runtime-
absent indication.

The on-disk byte format is preserved as 2 fields [fmap][smap] to match
what the wasm-side WebGPUModuleLoadFromBytes already reads (no fmt
field — WebGPU is single-format "wgsl" today).  smap is now
Map<String, Bytes> (uniform multi-shader shape) but its wire format is
byte-identical to the previous unordered_map<string, string>; no
wasm-side change required.

- Move src/target/source/{codegen_webgpu,intrin_rule_webgpu}.{cc,h} into
  src/target/webgpu/ (history preserved via git mv); update include
  guard + relative include paths.
- Add src/target/webgpu/webgpu_fallback_module.{h,cc} exposing
  WebGPUModuleCreateWithFallback (wrapper) and WebGPUFallbackModuleCreate.
  The wrapper tries ffi.Module.create.webgpu in the registry (always
  absent on the C++ side today) and falls through to the fallback.
- Refactor BuildWebGPU to construct ffi::Map<String, Bytes> smap and
  call target::WebGPUModuleCreateWithFallback; aggregated WGSL source
  dump goes into the in-memory source map keyed by "wgsl" (never
  serialized).  The inline WebGPUSourceModuleNode class body is removed.
- CMakeLists.txt CODEGEN_SRCS picks up src/target/webgpu/.
…o target/vulkan/

Per per-backend cluster + plugin-only header rule (commits 1-4 +7 of
this series), Vulkan migrates to the unified fallback pattern.  The
SPIR-V tooling, previously shared across Vulkan and OpenCL via
src/target/spirv/, is absorbed into src/target/vulkan/ — SPIR-V is now
Vulkan-internal tooling, decoupling it from the OpenCL path which is
removed entirely in the next commit.

Codegen-side (src/target/vulkan/):
- All of src/target/spirv/* moved here via git mv (build_vulkan.cc,
  codegen_spirv.{cc,h}, intrin_rule_spirv.cc, ir_builder.{cc,h},
  spirv_support.{cc,h}, spirv_utils.{cc,h}).  The src/target/spirv/
  directory is removed.
- New vulkan_fallback_module.{h,cc} defining VulkanFallbackModuleNode
  (kind="vulkan", multi-shader smap of packed SPIRVShader bytes).
- VulkanFallbackModuleCreate plain C++ factory + inline
  VulkanModuleCreateWithFallback wrapper (registry hit /
  TVM_COMPILE_FORCE_FALLBACK env / fallback fallthrough).
- BuildSPIRV in build_vulkan.cc switched to the unified factory shape:
  (Map<String, Bytes> smap, String fmt, Map fmap, Map<String,String>
  source).  Each smap value is a self-packed SPIRVShader (flag + data
  vector), serialized via support::BytesOutStream.

Runtime-side (src/runtime/vulkan/):
- vulkan_module.cc: refactored to plugin-only.  ffi.Module.create.vulkan
  registered with the unified factory signature; ffi.Module.load_from_bytes.vulkan
  reconstructs a real VulkanModuleNode from the cross-backend uniform
  3-field [fmt][fmap][smap] byte stream.
- vulkan_module.h DELETED (plugin-only invariant — no exported header).
- vulkan_wrapped_func.h: VulkanModuleNode constructor and storage
  refactored: holds an internal_smap_ (deserialized SPIRVShader, used
  by GetPipeline) + smap_ (Map<String, Bytes>, kept for byte-identical
  SaveToBytes vs the fallback) + Map<String, String> source map.
- vulkan_wrapped_func.cc: WriteToFile and the bespoke 4-byte magic
  prefix dropped.  SaveToBytes writes the cross-backend 3-field shape.
  InspectSource looks up by format key with "spv" fallback.

CMake:
- cmake/modules/Vulkan.cmake: COMPILER_VULKAN_SRCS now globs
  src/target/vulkan/{build_vulkan,codegen_spirv,intrin_rule_spirv,
  ir_builder,spirv_support,spirv_utils}.cc explicitly (only when
  USE_VULKAN=ON, since these need libspirv tooling).
- cmake/modules/OpenCL.cmake: drops the dead reference to the moved
  src/target/spirv/spirv_utils.cc.  The fallback codegen path in
  codegen_opencl.cc still calls LowerToSPIRV via the new
  ../vulkan/spirv_utils.h include — that path is removed in the next
  commit alongside opencl_module_spirv.cc.
- CMakeLists.txt: CODEGEN_SRCS adds src/target/vulkan/vulkan_fallback_module.cc
  (always compiled, no SPIR-V deps).

Other:
- src/target/source/codegen_opencl.cc: include path adjusted from
  ../spirv/spirv_utils.h to ../vulkan/spirv_utils.h.  The whole file
  moves out of source/ in the next commit.
- src/runtime/spirv/spirv_shader.h shim is RETAINED in this commit;
  src/runtime/opencl/opencl_module_spirv.cc still includes it via
  "../spirv/spirv_shader.h".  The shim is deleted alongside
  opencl_module_spirv.cc in the next commit.
- New test tests/python/runtime/test_vulkan_fallback_module.py
  exercises the codegen→fallback→export pipeline (skipped at runtime-
  precondition step on USE_VULKAN=ON).

Verification:
- Full clean ninja build EXIT=0 (not re-run after format-only changes).
- ./build/cpptest: 118/118.
- tests/python/runtime/: 75 passed, 1 skipped (precondition test
  expectedly skipped on USE_VULKAN=ON).
- tests/python/all-platform-minimal-test/: 65 passed, 77 skipped.
- tests/python/codegen/test_target_codegen_cuda.py: 50 passed, 6
  skipped.
- tests/python/codegen/test_target_codegen_opencl.py::test_opencl_erf:
  passed.
- tests/python/codegen/test_target_codegen_vulkan.py: 84 passed, 36
  skipped, 3 xfailed; 8 [nvptx] failures pre-exist on this LLVM 15
  container (sm_89 not recognized — unrelated to this commit).
- pre-commit: all hooks passed.
…+ target/opencl/

Per per-backend cluster + plugin-only header rule (commits 1-5 + 7),
OpenCL migrates to the unified fallback pattern.  This is the final
backend in the series and also the OpenCL-SPIRV decoupling step:
SPIR-V tooling now lives only under src/target/vulkan/ (absorbed in
the previous commit), and OpenCL becomes purely source-based
(fmt=="cl" and the binary formats xclbin/awsxclbin/aocx).

Codegen-side (src/target/opencl/):
- src/target/source/{codegen_opencl.cc,codegen_opencl.h,intrin_rule_opencl.cc}
  moved here via git mv.
- New opencl_fallback_module.{h,cc} defining OpenCLFallbackModuleNode
  (kind="opencl", single-binary `code` of ffi::Bytes).
- OpenCLFallbackModuleCreate plain C++ factory + inline
  OpenCLModuleCreateWithFallback wrapper (registry hit /
  TVM_COMPILE_FORCE_FALLBACK env / fallback fallthrough).
- BuildOpenCL switched to the unified factory shape: (Bytes code,
  String fmt, Map fmap, Map<String,String> source).  The TVM_ENABLE_SPIRV
  conditional branch that called LowerToSPIRV + the legacy
  OpenCLSPIRVModuleNode is REMOVED — OpenCL no longer carries a
  SPIR-V path.

Runtime-side (src/runtime/opencl/):
- opencl_module.cc: refactored to plugin-only.
  ffi.Module.create.opencl registered with the unified factory
  signature; ffi.Module.load_from_bytes.opencl reconstructs a real
  OpenCLModuleNode from the cross-backend uniform 3-field
  [fmt][fmap][code] byte stream.  WriteToFile and the disk
  load_from_file.cl/clbin registrations dropped.
- opencl_common.h: OpenCLModuleNode constructor refactored to
  (ffi::Bytes code, ffi::String fmt, fmap, ffi::Map<ffi::String,
  ffi::String> source); WriteToFile final declaration removed.
- opencl_module.h DELETED (plugin-only invariant — no exported header).
- opencl_module_spirv.cc DELETED (SPIR-V path removed entirely).
- The `ffi.Module.create.opencl.spirv` registration goes away with
  opencl_module_spirv.cc.

Other deletions:
- src/target/opt/build_opencl_off.cc (replaced by the always-compiled
  fallback; the OpenCL-SPIRV stub it carried is no longer needed
  since SPIR-V is no longer an OpenCL path).
- src/runtime/spirv/spirv_shader.h (deprecated shim; its only
  remaining consumer was opencl_module_spirv.cc which is now gone).

CMake:
- cmake/modules/OpenCL.cmake: drops the USE_OPENCL=OFF clause that
  appended target/opt/build_opencl_off.cc to COMPILER_SRCS.
- CMakeLists.txt: CODEGEN_SRCS adds src/target/opencl/*.cc.

Test:
- New tests/python/runtime/test_opencl_fallback_module.py exercises
  the codegen→fallback→export pipeline with TVM_COMPILE_FORCE_FALLBACK=1
  on USE_OPENCL=ON hosts and verifies registry-precondition on
  USE_OPENCL=OFF hosts.

Verification:
- Full incremental ninja build EXIT=0 (post-cmake-reconfigure rebuild).
- ./build/cpptest: 118/118.
- tests/python/runtime/: 76 passed, 2 skipped (Vulkan + OpenCL
  precondition tests expectedly skipped on USE_X=ON build).
- tests/python/all-platform-minimal-test/: 65 passed, 77 skipped.
- tests/python/codegen/test_target_codegen_cuda.py: 50 passed, 6 skipped.
- tests/python/codegen/test_target_codegen_opencl.py: 7 passed (full
  file, all surviving tests).
- tests/python/codegen/test_target_codegen_vulkan.py: 84 passed, 36
  skipped, 3 xfailed; 8 [nvptx] failures pre-exist on this LLVM 15
  container (sm_89 not recognized — unrelated to this commit).
- pre-commit: all hooks passed.
@tqchen tqchen force-pushed the tvm-unify-device-module branch from 164923e to c534a14 Compare April 29, 2026 16:09
tqchen added 3 commits April 29, 2026 17:14
Trim the 7 src/target/<backend>/<backend>_fallback_module.h doc
comments to a single uniform paragraph: introduction of
`<X>ModuleCreateWithFallback`, the registry-lookup fallback flow,
and the rationale that codegen must succeed when the device
runtime is not linked. Drops the USE_<X>=ON/OFF mentions, the
TVM_COMPILE_FORCE_FALLBACK env var note, the saved-bytes
byte-identical claim, and the per-backend extra paragraphs
(Hexagon Variant source-map, Metal multi-shader, Vulkan SPIR-V
shader, OpenCL SPIRV-removal note, WebGPU canonical-module note,
ROCm no-source-JIT note) — those details belong in commit
messages and the runtime-side files, not in the codegen-facing
header.
…le headers

Append "This setup is helpful for cross compilation where we compile
on one env and run on another." to each of the 7 codegen-facing
fallback module header doc comments.  The cross-compilation use case
is the load-bearing motivation for the fallback design and was
implicit in the prior wording; making it explicit helps readers
understand why per-backend fallback modules exist at all.
…gen tests

Replace the 7 tests/python/runtime/test_<backend>_fallback_module.py
files with one export+load test per backend blended into the
existing tests/python/codegen/test_target_codegen_<backend>.py.
Tests reuse the file's existing fixture / decorator / target-string
patterns. Hexagon and WebGPU intentionally have no fallback test
for now — their standalone files are deleted with no replacement.
@tqchen tqchen merged commit 6e8f77d into apache:main Apr 29, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants