[Runtime][Metal] add TVM_METAL_STORAGE_MODE env opt-in for Shared/Managed buffers by apstenku123 · Pull Request #19504 · apache/tvm

apstenku123 · 2026-05-04T08:49:42Z

Summary

Adds an opt-in environment variable TVM_METAL_STORAGE_MODE that lets users allocate device data buffers as MTLResourceStorageModeShared (or Managed) instead of the default MTLResourceStorageModePrivate. Default behaviour is unchanged.

value	mode	semantics
unset / `private`	`MTLResourceStorageModePrivate`	default, GPU-only, preserves historical behaviour
`shared`	`MTLResourceStorageModeShared`	CPU+GPU mapped — required for zero-copy DLPack to MLX
`managed`	`MTLResourceStorageModeManaged`	macOS-only intermediate (driver tracks dirty pages)
anything else	`MTLResourceStorageModePrivate` + warn	safe fall-back

The env var is read once on first MetalWorkspace::AllocDataSpace and cached for the lifetime of the process; no per-allocation overhead. A new FFI helper metal.GetStorageMode is registered alongside the existing metal.GetProfileCounters / metal.ResetProfileCounters helpers so tests can verify the resolved mode without an ObjC bridge.

The staging-buffer pool (metal_common.h:383) and temp-buffer pool (metal_device_api.mm:374) already use MTLStorageModeShared and are intentionally untouched — they're host-staging by design and don't fall under the data-space allocator.

Why

TVM's Metal device API has always allocated MTLBuffer with MTLResourceStorageModePrivate. This is the right choice for pure-GPU workloads (no CPU page mapping), but it blocks zero-copy DLPack interop with other Metal-using frameworks that allocate Shared/Managed buffers — notably ml-explore/mlx, which uses MTLResourceStorageModeShared everywhere. Two allocators on the same MTLDevice produce buffers with different page-mapping semantics; DLPack capsules from TVM cannot be consumed by mx.array (live-tested: std::bad_cast on mx.array(tvm_metal_capsule)).

This change unblocks the bridge from TVM-NDArray to mlx.array (both wrap MTLBuffer; require matching storage mode for the same foreign capsule to be consumable). It is the producer half of a pair; the consumer half is a parallel ml-explore/mlx PR that adds mx.from_dlpack(obj).

Test plan

xcrun --sdk macosx clang++ -std=c++17 -framework Metal syntax_check.mm -o syntax_check && ./syntax_check — exercises env-var parsing for all 6 cases (unset, shared, mixed-case Shared, invalid, managed, private).
Build runtime: mkdir build && cd build && cmake -DUSE_METAL=ON -DUSE_LLVM=ON -DCMAKE_BUILD_TYPE=Release .. && make -j tvm_runtime
./runtime_check (TVM-linked probe) — validates that the env var flows to a real MTLBuffer.storageMode. Live captured 2026-05-03 on Apple M4 Max for unset/shared/managed/private.
TVM_METAL_STORAGE_MODE=shared python -c "import tvm; arr = tvm.nd.empty((4,), dtype='float32', device=tvm.metal()); print(arr.shape)"
CI: macos-arm64 runner in apache/tvm should exercise the existing Metal tests; default behaviour (env unset) is unchanged.

Caveats / non-goals

This is a copy-elision interop patch, not a kernel-speed patch. Default Private mode remains the right choice for TVM-only workloads.
The patch artifact only changes src/runtime/metal/metal_device_api.mm; it does not yet add an upstream tests/python/runtime/... file. A subprocess-isolated Python test for the env-cache behaviour can be folded in if maintainers want it in tree.
Local Metal microbenchmarks on Apple M4 Max show Shared buffers remove the staging-buffer + blit/wait cost at CPU↔Metal transfer boundaries (e.g., 1 MiB CPU→Metal median 138.375 µs Private vs 12.750 µs Shared in a downstream probe). These numbers are local-health checks, not in-tree benchmarks.

Pairing

Paired upstream patch: ml-explore/mlx adds mx.from_dlpack(obj) Metal-aware consumer (filed in parallel). Both patches must land for the zero-copy MLX↔TVM use case to work end-to-end.

Attribution

Co-developed with cppmega.mlx for Apple-Silicon Metal interop with MLX.

…aged buffers Replaces the historical hard-coded MTLResourceStorageModePrivate at MetalWorkspace::AllocDataSpace with a function-local static cache seeded from the env var. Default behavior unchanged (env unset => Private). Accepted values (case-insensitive): private | shared | managed. Unknown values fall back to Private with a warning. Also registers a metal.GetStorageMode FFI helper for parity tests. Motivation: zero-copy DLPack interop with MLX (which uses MTLResourceStorageModeShared everywhere). Two allocators on the same MTLDevice cannot share an MTLBuffer with different page-mapping semantics, so DLPack capsules cross-imported between TVM-NDArray and mx.array currently fail with std::bad_cast. With env=shared, both allocators agree. Live-verified on Apple Silicon (Xcode 21, MacOSX26.4 SDK): runtime_check.mm probes confirm MTLBuffer.storageMode matches the env var setting in all 4 scenarios (default, shared, managed, private). Build: cmake -DUSE_METAL=ON .. && make tvm_runtime — 3 min on M-series. Untouched on purpose: - StagingBufferPool::GetOrCreate (host-staging, must remain Shared) - MetalThreadEntry::GetTempBuffer (GPU->CPU readback bounce, must remain Shared) - metal_module.mm kernel dispatch (Shared buffers are valid kernel args) Polished version of the patch by parallel test agent (cppmega-mlx Path C work). Replaces the earlier draft that had a smaller diff but missed the FFI registration helper.

gemini-code-assist

Code Review

This pull request introduces the ability to configure the Metal storage mode via the TVM_METAL_STORAGE_MODE environment variable, facilitating zero-copy DLPack interop with frameworks like MLX. Key changes include the implementation of GetMetalStorageOptions to parse and cache the storage mode, updating buffer allocations to respect this setting, and adding a new FFI function metal.GetStorageMode to query the current configuration. I have no feedback to provide.

Copilot

Pull request overview

Adds an opt-in TVM_METAL_STORAGE_MODE environment variable to control Metal device data buffer storage mode (Private/Shared/Managed) to enable zero-copy interop (e.g., via DLPack) while preserving existing default behavior.

Changes:

Introduces GetMetalStorageOptions() to parse/cache TVM_METAL_STORAGE_MODE (case-insensitive) and select the corresponding MTLResourceStorageMode*.
Switches MetalWorkspace::AllocDataSpace to allocate buffers using the resolved storage mode.
Registers a new FFI helper metal.GetStorageMode for tests/introspection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+             if (opts == MTLResourceStorageModeShared) return ffi::String("shared");
+             if (opts == MTLResourceStorageModeManaged) return ffi::String("managed");


+      return MTLResourceStorageModePrivate;
+    }
+    // Lowercase a small bounded copy for case-insensitive comparison.
+    std::string v(raw);


Files documenting the actual PRs we just opened upstream: - PR #1: ml-explore/mlx#3476 — from_dlpack Metal-aware consumer (against main, clean) - PR #2: apache/tvm#19504 — TVM_METAL_STORAGE_MODE env opt-in (against main, clean) - PR #3: tile-ai/tilelang#2139 — mixed-dtype T.gemm via scalar fallback (stacks on PR #2130) - PR #4: tile-ai/tilelang#2140 — FP8-input T.gemm scalar fallback routing (stacks on PR #2130) - PR #5: tile-ai/tilelang#2141 — T.Pipelined num_stages>1 3D buffer fix (stacks on PR #2130) - PR #6: tile-ai/tilelang#2142 — T.fp8_scaled_matmul DSL intrinsic (stacks on PR #2130) Deferred (split into companion PRs needed): tilelang_metal_fp8 and tilelang_metal_fp8_vector each touch both tilelang supermodule and the TileLang/tvm vendored submodule. These need 2 PRs each — one to tile-ai/tilelang, one to TileLang/tvm — separate filing round. PRs #3-#6 are independent of each other; each branches directly from jorgecurious/tilelang:metal-gemm-upstream-rebase HEAD 971c17b, so they can be reviewed in any order. They DO depend on the upstream 4-PR Apple Metal landing chain (#1869, #2118, #2121, #2130) merging first; if any of those land separately, ours can be retargeted at main.

Copilot AI review requested due to automatic review settings May 4, 2026 08:49

Copilot started reviewing on behalf of apstenku123 May 4, 2026 08:50 View session

gemini-code-assist Bot reviewed May 4, 2026

View reviewed changes

Copilot AI reviewed May 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Runtime][Metal] add TVM_METAL_STORAGE_MODE env opt-in for Shared/Managed buffers#19504

[Runtime][Metal] add TVM_METAL_STORAGE_MODE env opt-in for Shared/Managed buffers#19504
apstenku123 wants to merge 1 commit intoapache:mainfrom
apstenku123:cppmega/metal-shared-storage-opt-in

apstenku123 commented May 4, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if (opts == MTLResourceStorageModeShared) return ffi::String("shared");
		if (opts == MTLResourceStorageModeManaged) return ffi::String("managed");

Conversation

apstenku123 commented May 4, 2026

Summary

Why

Test plan

Caveats / non-goals

Pairing

Attribution

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants