[REFACTOR][RUNTIME][CODEGEN] Backend specific target and runtime to enable cross-compile fallback#19465
Merged
tqchen merged 10 commits intoapache:mainfrom Apr 29, 2026
Merged
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces a unified DeviceModuleCreate interface and a DeviceGenericModule carrier to support cross-compilation and lazy JIT realization of device modules across various backends, including CUDA, Hexagon, Metal, OpenCL, ROCM, Vulkan, and WebGPU. This architecture allows the system to handle device code even when the corresponding runtime is not present in the current environment, effectively replacing several legacy "off" stub files. A critical issue was identified in the thread-safe initialization of DeviceGenericModuleNode, where the double-checked locking pattern is currently racy and needs to be corrected to prevent potential data races.
tqchen
added a commit
to tqchen/tvm
that referenced
this pull request
Apr 28, 2026
…penCL InspectSource fix Several per-user simplifications motivated by post-review feedback on PR apache#19465. WebGPU runs in tvmjs and has no C++ runtime factory; routing it through DeviceGenericModule added indirection with no benefit. Restore WebGPUSourceModuleNode in codegen_webgpu.cc so BuildWebGPU constructs it directly, and add a comment explaining the exemption. Source-only backends (CUDA fmt=="cuda", OpenCL fmt=="cl") no longer populate a redundant source map entry — code_ IS the human-readable source. DeviceGenericModuleNode::InspectSource now falls back to code_ when the format matches fmt_ or when format is empty (no-arg default), fixing the test_opencl_erf regression where inspect_source() returned "" on CPU CI. Drop realize_attempted_: realized_.has_value() is the correct guard; on throw the next caller retries cleanly without a stale error cache. Add Serializer<ffi::Bytes> to serializer.h so ffi::Map<String,Bytes> can be written/read generically via stream.Write(source_) in DeviceGenericModuleNode. Drop version string from Metal smap serialization (SerializeMetalSmap, SaveToBytes, LoadFromBytes); the format is implicit in fmt. Fix ffi::String constructed from ffi::Bytes via std::string intermediate in cuda_module.cc, rocm_module.cc, hexagon_module.cc — use ffi::String(data, size) directly.
tlopex
approved these changes
Apr 29, 2026
…allback design (CUDA)
This commit reshapes the CUDA backend into a self-contained
`src/target/cuda/` cluster and introduces a per-backend fallback module
design that absorbs the cross-compile role cleanly — without
`target/opt/` stubs, without leaking a synthetic `kind()` to consumers.
The runtime `CUDAModuleNode` becomes plugin-only: its header is removed
and the only access is via the FFI registry keys
`ffi.Module.create.cuda` and `ffi.Module.load_from_bytes.cuda`. The
codegen-facing factory `CUDAModuleCreateWithFallback` lives in
`src/target/cuda/cuda_fallback_module.h`; it tries the registry and on
miss (or when `TVM_COMPILE_FORCE_FALLBACK` is set) constructs a
`CUDAFallbackModuleNode` that mirrors the real module's `kind()` and
`SaveToBytes` byte-for-byte.
The `fmt=="cuda"` JIT path moves into `CUDAModuleNode::JitCompileFromSource`,
invoked from BOTH the create-lambda (when codegen passes raw source) AND
`LoadFromBytes` (when the saved fmt is "cuda" — the cross-compile receiver
path). The fallback never JITs.
Main changes:
- Add `src/target/cuda/{cuda_fallback_module.h,cuda_fallback_module.cc}`
- Move `target/source/{codegen_cuda,intrin_rule_cuda,ptx,literal/cuda_*}`
into `target/cuda/`; move `target/llvm/codegen_nvptx.cc` into
`target/cuda/llvm/`
- Absorb `BuildCUDA` from `target/opt/build_cuda_on.cc` into
`target/cuda/codegen_cuda.cc`; delete `target/opt/build_cuda_*.cc`
- Plugin-only runtime: delete `src/runtime/cuda/cuda_module.h`; rewrite
`cuda_module.cc` with unified signature `(Bytes code, String fmt,
Map fmap, Map<String,String> source)`, source-not-serialized save/load,
and `JitCompileFromSource` static helper
- `tvm_callback_cuda_compile` Python signature simplified from
`(code, target)` to `(code,)`; current target is fetched inside
`compile_cuda` via `Target.current(allow_none=True)`
- Add `Serializer<ffi::Bytes>` so `stream.Write(code)` works without a
`std::string` round-trip
- Add `runtime.LoadMetaDataFromJSON` global func; rewire
`_create_cuda_module` (tirx external_kernel) to construct in-memory
via `ffi.Module.create.cuda`, eliminating the load_from_file path
- Update `cmake/modules/CUDA.cmake` to drop opt stubs; update
`cmake/modules/LLVM.cmake` to glob `target/*/llvm/`; update
`CMakeLists.txt` CODEGEN_SRCS to include `target/cuda/`
- Replace ad-hoc `test_device_generic_module.py` with
`test_cuda_fallback_module.py` exercising the env-var-forced fallback
through normal compile + export + load + run
Mirror the per-backend pattern landed for CUDA: cluster ROCm codegen
under src/target/rocm/, introduce ROCmFallbackModuleNode for codegen
on USE_ROCM=OFF hosts, and make src/runtime/rocm/ plugin-only. The
fallback's saved bytes are byte-identical to ROCMModuleNode's, so a
USE_ROCM=ON receiver loads them transparently.
- Move src/target/llvm/{codegen_amdgpu,intrin_rule_rocm}.cc into
src/target/rocm/llvm/ (history preserved via git mv).
- Add src/target/rocm/rocm_fallback_module.{h,cc} exposing
ROCmModuleCreateWithFallback (wrapper) and ROCmFallbackModuleCreate.
- Refactor src/runtime/rocm/rocm_module.cc to the unified factory
signature (Bytes code, String fmt, Map fmap, Map<String,String>
source); drop WriteToFile + LoadFile registrations; bespoke
hip_source_/assembly_ fields collapse into the source map (keys
"hip" / "asm").
- Delete src/runtime/rocm/rocm_module.h (plugin-only rule) and
src/target/opt/build_rocm_off.cc (replaced by the fallback).
- BuildAMDGPU now calls target::ROCmModuleCreateWithFallback with the
hsaco bytes + LL/asm in the source map.
- LLVM.cmake globs src/target/rocm/llvm/; CMakeLists.txt CODEGEN_SRCS
picks up src/target/rocm/.
Mirror the per-backend pattern: cluster Hexagon codegen under
src/target/hexagon/, introduce HexagonFallbackModuleNode for codegen on
USE_HEXAGON=OFF hosts, and make src/runtime/hexagon/ plugin-only. The
fallback's saved bytes are byte-identical to HexagonModuleNode's, so a
USE_HEXAGON=ON receiver loads them transparently.
Hexagon is the only backend whose source map carries binary auxiliaries
(object code, bitcode) alongside text (asm, IR), so its source map type
is `Map<String, Variant<String, Bytes>>` — Variant lets each upstream
blob be passed in its natural form.
The `code` payload migrates from a filename string to bytes: BuildHexagon
slurps the linked .so file into memory and the unified factory takes the
bytes directly. Receiver-side materialization to a tempfile + dlopen is
left as a follow-up when Hexagon execution moves into HexagonModuleNode
(today execution is via a separate RPC-based runtime path).
- Move src/target/llvm/{codegen_hexagon,intrin_rule_hexagon}.cc into
src/target/hexagon/llvm/ (history preserved via git mv).
- Add src/target/hexagon/hexagon_fallback_module.{h,cc} exposing
HexagonModuleCreateWithFallback (wrapper) and HexagonFallbackModuleCreate.
- Refactor src/runtime/hexagon/hexagon_module.cc to the unified factory
signature (Bytes code, String fmt, Map fmap, Map<String, Variant<String,
Bytes>> source); drop bespoke asm_/obj_/ir_/bc_ fields and the
WriteToFile registration; collapse auxiliaries into the source map.
- Delete src/runtime/hexagon/hexagon_module.h (plugin-only rule) and
src/target/opt/build_hexagon_off.cc (replaced by the fallback).
- BuildHexagon now reads the .so file as bytes and calls
target::HexagonModuleCreateWithFallback with the unified payload.
- LLVM.cmake globs src/target/hexagon/llvm/; CMakeLists.txt CODEGEN_SRCS
picks up src/target/hexagon/.
Mirror the per-backend pattern: cluster Metal codegen under
src/target/metal/, introduce MetalFallbackModuleNode for codegen on
USE_METAL=OFF hosts (typical Linux CI), and make src/runtime/metal/
plugin-only. The fallback's saved bytes are byte-identical to
MetalModuleNode's, so a USE_METAL=ON receiver (macOS) loads them
transparently and JIT-compiles MSL via MTLDevice::newLibraryWithSource
on first GetFunction.
Metal is the first multi-shader backend in this PR — its per-kernel
payload is Map<String, Bytes>. The text-vs-binary distinction lives in
fmt ("metal" source / "metallib" compiled), not in the container; this
is the uniform shape used by all multi-shader backends (Metal, Vulkan,
WebGPU). The auxiliary in-memory source map is Map<String, String>
(text-only — MSL aggregated source dump).
The save/load format simplifies from upstream's 4 fields
([version][smap][fmap][fmt]) to the cross-backend uniform 3 fields
([fmt][fmap][smap]) — the version field was reserved-for-future and
unused. Both real and fallback now produce the same shape.
- Move src/target/source/{codegen_metal,intrin_rule_metal}.{cc,h} into
src/target/metal/ (history preserved via git mv).
- Add src/target/metal/metal_fallback_module.{h,cc} exposing
MetalModuleCreateWithFallback (wrapper) and MetalFallbackModuleCreate.
- Refactor src/runtime/metal/metal_module.mm to the unified factory
signature (Map<String, Bytes> smap, String fmt, Map fmap,
Map<String, String> source); 3-field SaveToBytes; drop the unused
runtime.module.create_metal_module FFI registration; drop the
WriteToFile override; drop the version field.
- Delete src/runtime/metal/metal_module.h (plugin-only rule) and
src/target/opt/build_metal_off.cc (replaced by the fallback).
- BuildMetal builds an ffi::Map<String, Bytes> smap and calls
target::MetalModuleCreateWithFallback with the unified payload;
aggregated MSL source dump goes into the in-memory source map keyed
by "metal".
- CMakeLists.txt CODEGEN_SRCS picks up src/target/metal/.
Cluster WebGPU codegen under src/target/webgpu/ and rename the existing
WebGPUSourceModuleNode to WebGPUFallbackModuleNode under the per-backend
naming convention. WebGPU is unique among the backends in that it has
no native C++ runtime — the real receiver is the wasm runtime in
web/emcc/webgpu_runtime.cc, which registers only the load-from-bytes
side of the FFI registry. So WebGPUFallbackModuleNode IS the canonical
C++-side module; the "fallback" name is uniformity rather than runtime-
absent indication.
The on-disk byte format is preserved as 2 fields [fmap][smap] to match
what the wasm-side WebGPUModuleLoadFromBytes already reads (no fmt
field — WebGPU is single-format "wgsl" today). smap is now
Map<String, Bytes> (uniform multi-shader shape) but its wire format is
byte-identical to the previous unordered_map<string, string>; no
wasm-side change required.
- Move src/target/source/{codegen_webgpu,intrin_rule_webgpu}.{cc,h} into
src/target/webgpu/ (history preserved via git mv); update include
guard + relative include paths.
- Add src/target/webgpu/webgpu_fallback_module.{h,cc} exposing
WebGPUModuleCreateWithFallback (wrapper) and WebGPUFallbackModuleCreate.
The wrapper tries ffi.Module.create.webgpu in the registry (always
absent on the C++ side today) and falls through to the fallback.
- Refactor BuildWebGPU to construct ffi::Map<String, Bytes> smap and
call target::WebGPUModuleCreateWithFallback; aggregated WGSL source
dump goes into the in-memory source map keyed by "wgsl" (never
serialized). The inline WebGPUSourceModuleNode class body is removed.
- CMakeLists.txt CODEGEN_SRCS picks up src/target/webgpu/.
…o target/vulkan/
Per per-backend cluster + plugin-only header rule (commits 1-4 +7 of
this series), Vulkan migrates to the unified fallback pattern. The
SPIR-V tooling, previously shared across Vulkan and OpenCL via
src/target/spirv/, is absorbed into src/target/vulkan/ — SPIR-V is now
Vulkan-internal tooling, decoupling it from the OpenCL path which is
removed entirely in the next commit.
Codegen-side (src/target/vulkan/):
- All of src/target/spirv/* moved here via git mv (build_vulkan.cc,
codegen_spirv.{cc,h}, intrin_rule_spirv.cc, ir_builder.{cc,h},
spirv_support.{cc,h}, spirv_utils.{cc,h}). The src/target/spirv/
directory is removed.
- New vulkan_fallback_module.{h,cc} defining VulkanFallbackModuleNode
(kind="vulkan", multi-shader smap of packed SPIRVShader bytes).
- VulkanFallbackModuleCreate plain C++ factory + inline
VulkanModuleCreateWithFallback wrapper (registry hit /
TVM_COMPILE_FORCE_FALLBACK env / fallback fallthrough).
- BuildSPIRV in build_vulkan.cc switched to the unified factory shape:
(Map<String, Bytes> smap, String fmt, Map fmap, Map<String,String>
source). Each smap value is a self-packed SPIRVShader (flag + data
vector), serialized via support::BytesOutStream.
Runtime-side (src/runtime/vulkan/):
- vulkan_module.cc: refactored to plugin-only. ffi.Module.create.vulkan
registered with the unified factory signature; ffi.Module.load_from_bytes.vulkan
reconstructs a real VulkanModuleNode from the cross-backend uniform
3-field [fmt][fmap][smap] byte stream.
- vulkan_module.h DELETED (plugin-only invariant — no exported header).
- vulkan_wrapped_func.h: VulkanModuleNode constructor and storage
refactored: holds an internal_smap_ (deserialized SPIRVShader, used
by GetPipeline) + smap_ (Map<String, Bytes>, kept for byte-identical
SaveToBytes vs the fallback) + Map<String, String> source map.
- vulkan_wrapped_func.cc: WriteToFile and the bespoke 4-byte magic
prefix dropped. SaveToBytes writes the cross-backend 3-field shape.
InspectSource looks up by format key with "spv" fallback.
CMake:
- cmake/modules/Vulkan.cmake: COMPILER_VULKAN_SRCS now globs
src/target/vulkan/{build_vulkan,codegen_spirv,intrin_rule_spirv,
ir_builder,spirv_support,spirv_utils}.cc explicitly (only when
USE_VULKAN=ON, since these need libspirv tooling).
- cmake/modules/OpenCL.cmake: drops the dead reference to the moved
src/target/spirv/spirv_utils.cc. The fallback codegen path in
codegen_opencl.cc still calls LowerToSPIRV via the new
../vulkan/spirv_utils.h include — that path is removed in the next
commit alongside opencl_module_spirv.cc.
- CMakeLists.txt: CODEGEN_SRCS adds src/target/vulkan/vulkan_fallback_module.cc
(always compiled, no SPIR-V deps).
Other:
- src/target/source/codegen_opencl.cc: include path adjusted from
../spirv/spirv_utils.h to ../vulkan/spirv_utils.h. The whole file
moves out of source/ in the next commit.
- src/runtime/spirv/spirv_shader.h shim is RETAINED in this commit;
src/runtime/opencl/opencl_module_spirv.cc still includes it via
"../spirv/spirv_shader.h". The shim is deleted alongside
opencl_module_spirv.cc in the next commit.
- New test tests/python/runtime/test_vulkan_fallback_module.py
exercises the codegen→fallback→export pipeline (skipped at runtime-
precondition step on USE_VULKAN=ON).
Verification:
- Full clean ninja build EXIT=0 (not re-run after format-only changes).
- ./build/cpptest: 118/118.
- tests/python/runtime/: 75 passed, 1 skipped (precondition test
expectedly skipped on USE_VULKAN=ON).
- tests/python/all-platform-minimal-test/: 65 passed, 77 skipped.
- tests/python/codegen/test_target_codegen_cuda.py: 50 passed, 6
skipped.
- tests/python/codegen/test_target_codegen_opencl.py::test_opencl_erf:
passed.
- tests/python/codegen/test_target_codegen_vulkan.py: 84 passed, 36
skipped, 3 xfailed; 8 [nvptx] failures pre-exist on this LLVM 15
container (sm_89 not recognized — unrelated to this commit).
- pre-commit: all hooks passed.
…+ target/opencl/
Per per-backend cluster + plugin-only header rule (commits 1-5 + 7),
OpenCL migrates to the unified fallback pattern. This is the final
backend in the series and also the OpenCL-SPIRV decoupling step:
SPIR-V tooling now lives only under src/target/vulkan/ (absorbed in
the previous commit), and OpenCL becomes purely source-based
(fmt=="cl" and the binary formats xclbin/awsxclbin/aocx).
Codegen-side (src/target/opencl/):
- src/target/source/{codegen_opencl.cc,codegen_opencl.h,intrin_rule_opencl.cc}
moved here via git mv.
- New opencl_fallback_module.{h,cc} defining OpenCLFallbackModuleNode
(kind="opencl", single-binary `code` of ffi::Bytes).
- OpenCLFallbackModuleCreate plain C++ factory + inline
OpenCLModuleCreateWithFallback wrapper (registry hit /
TVM_COMPILE_FORCE_FALLBACK env / fallback fallthrough).
- BuildOpenCL switched to the unified factory shape: (Bytes code,
String fmt, Map fmap, Map<String,String> source). The TVM_ENABLE_SPIRV
conditional branch that called LowerToSPIRV + the legacy
OpenCLSPIRVModuleNode is REMOVED — OpenCL no longer carries a
SPIR-V path.
Runtime-side (src/runtime/opencl/):
- opencl_module.cc: refactored to plugin-only.
ffi.Module.create.opencl registered with the unified factory
signature; ffi.Module.load_from_bytes.opencl reconstructs a real
OpenCLModuleNode from the cross-backend uniform 3-field
[fmt][fmap][code] byte stream. WriteToFile and the disk
load_from_file.cl/clbin registrations dropped.
- opencl_common.h: OpenCLModuleNode constructor refactored to
(ffi::Bytes code, ffi::String fmt, fmap, ffi::Map<ffi::String,
ffi::String> source); WriteToFile final declaration removed.
- opencl_module.h DELETED (plugin-only invariant — no exported header).
- opencl_module_spirv.cc DELETED (SPIR-V path removed entirely).
- The `ffi.Module.create.opencl.spirv` registration goes away with
opencl_module_spirv.cc.
Other deletions:
- src/target/opt/build_opencl_off.cc (replaced by the always-compiled
fallback; the OpenCL-SPIRV stub it carried is no longer needed
since SPIR-V is no longer an OpenCL path).
- src/runtime/spirv/spirv_shader.h (deprecated shim; its only
remaining consumer was opencl_module_spirv.cc which is now gone).
CMake:
- cmake/modules/OpenCL.cmake: drops the USE_OPENCL=OFF clause that
appended target/opt/build_opencl_off.cc to COMPILER_SRCS.
- CMakeLists.txt: CODEGEN_SRCS adds src/target/opencl/*.cc.
Test:
- New tests/python/runtime/test_opencl_fallback_module.py exercises
the codegen→fallback→export pipeline with TVM_COMPILE_FORCE_FALLBACK=1
on USE_OPENCL=ON hosts and verifies registry-precondition on
USE_OPENCL=OFF hosts.
Verification:
- Full incremental ninja build EXIT=0 (post-cmake-reconfigure rebuild).
- ./build/cpptest: 118/118.
- tests/python/runtime/: 76 passed, 2 skipped (Vulkan + OpenCL
precondition tests expectedly skipped on USE_X=ON build).
- tests/python/all-platform-minimal-test/: 65 passed, 77 skipped.
- tests/python/codegen/test_target_codegen_cuda.py: 50 passed, 6 skipped.
- tests/python/codegen/test_target_codegen_opencl.py: 7 passed (full
file, all surviving tests).
- tests/python/codegen/test_target_codegen_vulkan.py: 84 passed, 36
skipped, 3 xfailed; 8 [nvptx] failures pre-exist on this LLVM 15
container (sm_89 not recognized — unrelated to this commit).
- pre-commit: all hooks passed.
164923e to
c534a14
Compare
Trim the 7 src/target/<backend>/<backend>_fallback_module.h doc comments to a single uniform paragraph: introduction of `<X>ModuleCreateWithFallback`, the registry-lookup fallback flow, and the rationale that codegen must succeed when the device runtime is not linked. Drops the USE_<X>=ON/OFF mentions, the TVM_COMPILE_FORCE_FALLBACK env var note, the saved-bytes byte-identical claim, and the per-backend extra paragraphs (Hexagon Variant source-map, Metal multi-shader, Vulkan SPIR-V shader, OpenCL SPIRV-removal note, WebGPU canonical-module note, ROCm no-source-JIT note) — those details belong in commit messages and the runtime-side files, not in the codegen-facing header.
…le headers Append "This setup is helpful for cross compilation where we compile on one env and run on another." to each of the 7 codegen-facing fallback module header doc comments. The cross-compilation use case is the load-bearing motivation for the fallback design and was implicit in the prior wording; making it explicit helps readers understand why per-backend fallback modules exist at all.
…gen tests Replace the 7 tests/python/runtime/test_<backend>_fallback_module.py files with one export+load test per backend blended into the existing tests/python/codegen/test_target_codegen_<backend>.py. Tests reuse the file's existing fixture / decorator / target-string patterns. Hexagon and WebGPU intentionally have no fallback test for now — their standalone files are deleted with no replacement.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
This refactor reshapes each backend into a self-contained
src/target/<X>/cluster (with optional
src/target/<X>/llvm/for LLVM-dependent codegen) andintroduces a per-backend fallback module that absorbs the cross-compile role
cleanly — without
target/opt/stubs, withoutDeviceSourceModuleNode, andwithout leaking a synthetic
kind()to consumers.High-level principles
<X>liveunder
src/target/<X>/. Optionalsrc/target/<X>/llvm/subdir for filesthat require
USE_LLVMat build time. Backend grouping wins overbuild-dependency grouping (the latter being upstream's
target/llvm/).src/runtime/<X>/<X>_module.his deleted.The runtime's real
<X>ModuleNodeis reachable only via the FFI registry(
ffi.Module.create.<kind>,ffi.Module.load_from_bytes.<kind>). NoC++ API surface other than the static registrations.
<X>gets a<X>FallbackModuleNodeinsrc/target/<X>/<X>_fallback_module.{h,cc}.Same
kind()as the real module. Codegen-time only — never reachable viaload.
GetFunctionerrors with a backend-specific "runtime not linked"message;
InspectSourceworks.<X>ModuleCreateWithFallback(...), which triesffi.Module.create.<kind>via the registry; on miss, falls through to
<X>FallbackModuleCreate(plain C++; reachable directly from the fallback header). When
USE_<X>=ONis in effect, the registry hit returns the real module; when
USE_<X>=OFF, the fallback is what codegen gets. No CMakeif/elsegating; fallback always compiled.
Specific changes
New per-backend directories (codegen + fallback)
src/target/cuda/—codegen_cuda.cc+intrin_rule_cuda.cc+ fallback module pair +llvm/codegen_nvptx.ccsrc/target/rocm/— fallback module pair +llvm/codegen_amdgpu.cc+llvm/intrin_rule_rocm.ccsrc/target/hexagon/— fallback module pair +llvm/codegen_hexagon.cc+llvm/intrin_rule_hexagon.ccsrc/target/metal/—codegen_metal.cc+intrin_rule_metal.cc+ fallback module pairsrc/target/vulkan/—build_vulkan.cc+ the rest oftarget/spirv/absorbed + fallback module pairsrc/target/opencl/—codegen_opencl.cc+intrin_rule_opencl.cc+ fallback module pairsrc/target/webgpu/—codegen_webgpu.cc+ fallback module pair (wasWebGPUSourceModuleNode, renamed)New fallback classes
<X>FallbackModuleNodefor X in {CUDA,ROCm,Hexagon,Metal,Vulkan,OpenCL,WebGPU}. Each:kind()matches the real backend(code or smap, fmt, fmap, source)— no driver/runtime callsGetFunctionerrors with backend-specific "runtime not linked" messageInspectSourceworks fullySaveToBytesbyte-identical to real