[Runtime][Metal] add TVM_METAL_STORAGE_MODE env opt-in for Shared/Managed buffers#19504
[Runtime][Metal] add TVM_METAL_STORAGE_MODE env opt-in for Shared/Managed buffers#19504apstenku123 wants to merge 1 commit intoapache:mainfrom
Conversation
…aged buffers Replaces the historical hard-coded MTLResourceStorageModePrivate at MetalWorkspace::AllocDataSpace with a function-local static cache seeded from the env var. Default behavior unchanged (env unset => Private). Accepted values (case-insensitive): private | shared | managed. Unknown values fall back to Private with a warning. Also registers a metal.GetStorageMode FFI helper for parity tests. Motivation: zero-copy DLPack interop with MLX (which uses MTLResourceStorageModeShared everywhere). Two allocators on the same MTLDevice cannot share an MTLBuffer with different page-mapping semantics, so DLPack capsules cross-imported between TVM-NDArray and mx.array currently fail with std::bad_cast. With env=shared, both allocators agree. Live-verified on Apple Silicon (Xcode 21, MacOSX26.4 SDK): runtime_check.mm probes confirm MTLBuffer.storageMode matches the env var setting in all 4 scenarios (default, shared, managed, private). Build: cmake -DUSE_METAL=ON .. && make tvm_runtime — 3 min on M-series. Untouched on purpose: - StagingBufferPool::GetOrCreate (host-staging, must remain Shared) - MetalThreadEntry::GetTempBuffer (GPU->CPU readback bounce, must remain Shared) - metal_module.mm kernel dispatch (Shared buffers are valid kernel args) Polished version of the patch by parallel test agent (cppmega-mlx Path C work). Replaces the earlier draft that had a smaller diff but missed the FFI registration helper.
There was a problem hiding this comment.
Code Review
This pull request introduces the ability to configure the Metal storage mode via the TVM_METAL_STORAGE_MODE environment variable, facilitating zero-copy DLPack interop with frameworks like MLX. Key changes include the implementation of GetMetalStorageOptions to parse and cache the storage mode, updating buffer allocations to respect this setting, and adding a new FFI function metal.GetStorageMode to query the current configuration. I have no feedback to provide.
There was a problem hiding this comment.
Pull request overview
Adds an opt-in TVM_METAL_STORAGE_MODE environment variable to control Metal device data buffer storage mode (Private/Shared/Managed) to enable zero-copy interop (e.g., via DLPack) while preserving existing default behavior.
Changes:
- Introduces
GetMetalStorageOptions()to parse/cacheTVM_METAL_STORAGE_MODE(case-insensitive) and select the correspondingMTLResourceStorageMode*. - Switches
MetalWorkspace::AllocDataSpaceto allocate buffers using the resolved storage mode. - Registers a new FFI helper
metal.GetStorageModefor tests/introspection.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (opts == MTLResourceStorageModeShared) return ffi::String("shared"); | ||
| if (opts == MTLResourceStorageModeManaged) return ffi::String("managed"); |
| return MTLResourceStorageModePrivate; | ||
| } | ||
| // Lowercase a small bounded copy for case-insensitive comparison. | ||
| std::string v(raw); |
Files documenting the actual PRs we just opened upstream: - PR #1: ml-explore/mlx#3476 — from_dlpack Metal-aware consumer (against main, clean) - PR #2: apache/tvm#19504 — TVM_METAL_STORAGE_MODE env opt-in (against main, clean) - PR #3: tile-ai/tilelang#2139 — mixed-dtype T.gemm via scalar fallback (stacks on PR #2130) - PR #4: tile-ai/tilelang#2140 — FP8-input T.gemm scalar fallback routing (stacks on PR #2130) - PR #5: tile-ai/tilelang#2141 — T.Pipelined num_stages>1 3D buffer fix (stacks on PR #2130) - PR #6: tile-ai/tilelang#2142 — T.fp8_scaled_matmul DSL intrinsic (stacks on PR #2130) Deferred (split into companion PRs needed): tilelang_metal_fp8 and tilelang_metal_fp8_vector each touch both tilelang supermodule and the TileLang/tvm vendored submodule. These need 2 PRs each — one to tile-ai/tilelang, one to TileLang/tvm — separate filing round. PRs #3-#6 are independent of each other; each branches directly from jorgecurious/tilelang:metal-gemm-upstream-rebase HEAD 971c17b, so they can be reviewed in any order. They DO depend on the upstream 4-PR Apple Metal landing chain (#1869, #2118, #2121, #2130) merging first; if any of those land separately, ours can be retargeted at main.
Summary
Adds an opt-in environment variable
TVM_METAL_STORAGE_MODEthat lets users allocate device data buffers asMTLResourceStorageModeShared(orManaged) instead of the defaultMTLResourceStorageModePrivate. Default behaviour is unchanged.privateMTLResourceStorageModePrivatesharedMTLResourceStorageModeSharedmanagedMTLResourceStorageModeManagedMTLResourceStorageModePrivate+ warnThe env var is read once on first
MetalWorkspace::AllocDataSpaceand cached for the lifetime of the process; no per-allocation overhead. A new FFI helpermetal.GetStorageModeis registered alongside the existingmetal.GetProfileCounters/metal.ResetProfileCountershelpers so tests can verify the resolved mode without an ObjC bridge.The staging-buffer pool (
metal_common.h:383) and temp-buffer pool (metal_device_api.mm:374) already useMTLStorageModeSharedand are intentionally untouched — they're host-staging by design and don't fall under the data-space allocator.Why
TVM's Metal device API has always allocated
MTLBufferwithMTLResourceStorageModePrivate. This is the right choice for pure-GPU workloads (no CPU page mapping), but it blocks zero-copy DLPack interop with other Metal-using frameworks that allocate Shared/Managed buffers — notablyml-explore/mlx, which usesMTLResourceStorageModeSharedeverywhere. Two allocators on the sameMTLDeviceproduce buffers with different page-mapping semantics; DLPack capsules from TVM cannot be consumed bymx.array(live-tested:std::bad_castonmx.array(tvm_metal_capsule)).This change unblocks the bridge from TVM-NDArray to
mlx.array(both wrapMTLBuffer; require matching storage mode for the same foreign capsule to be consumable). It is the producer half of a pair; the consumer half is a parallel ml-explore/mlx PR that addsmx.from_dlpack(obj).Test plan
xcrun --sdk macosx clang++ -std=c++17 -framework Metal syntax_check.mm -o syntax_check && ./syntax_check— exercises env-var parsing for all 6 cases (unset, shared, mixed-case Shared, invalid, managed, private).mkdir build && cd build && cmake -DUSE_METAL=ON -DUSE_LLVM=ON -DCMAKE_BUILD_TYPE=Release .. && make -j tvm_runtime./runtime_check(TVM-linked probe) — validates that the env var flows to a realMTLBuffer.storageMode. Live captured 2026-05-03 on Apple M4 Max for unset/shared/managed/private.TVM_METAL_STORAGE_MODE=shared python -c "import tvm; arr = tvm.nd.empty((4,), dtype='float32', device=tvm.metal()); print(arr.shape)"Caveats / non-goals
src/runtime/metal/metal_device_api.mm; it does not yet add an upstreamtests/python/runtime/...file. A subprocess-isolated Python test for the env-cache behaviour can be folded in if maintainers want it in tree.Pairing
Paired upstream patch: ml-explore/mlx adds
mx.from_dlpack(obj)Metal-aware consumer (filed in parallel). Both patches must land for the zero-copy MLX↔TVM use case to work end-to-end.Attribution
Co-developed with
cppmega.mlxfor Apple-Silicon Metal interop with MLX.