[TIRx] Post-bringup op-dispatch / codegen / TVMScript follow-ups by spectrometerHBH · Pull Request #19657 · apache/tvm

spectrometerHBH · 2026-06-02T19:42:31Z

Summary

Follow-up work on top of the TIRx infrastructure bring-up (#19581). It extends the TIRx operator-dispatch, codegen, and TVMScript surfaces with the next batch of low-level programming features for Blackwell-class GPUs, while keeping s_tir script support intact.

Main Changes

op-dispatch: warp ldmatrix/stmatrix copy dispatch; split CUDA copy into register / gmem-smem / ldgsts paths; tcgen05.ld/st .16x{64,128,256}b dispatch with a factory and M=128 layout; element-wise broadcast at the layout level with a copy vec-alignment fix.
gemm: CUDA synchronous mma.sync tensor-core dispatch; accept a Layout F C operand for M=64 MMAs.
op: add the permute_layout primitive (replaces permute_dims).
tvmscript: add the Tx.jit decorator, Tx.constexpr compile-time params, and Tx.wg_reg_tile.
lower-tirx: introduce the Tx.device_entry() marker (replacing ScopeKind::kKernel); canonical thread filters that drop the Tx.filter wrapper.
codegen: add a typed-pointer byte-offset intrinsic; remove the entry_cluster_sync codegen attribute.

Validation

pre-commit run (changed files) — clean
ninja -C build -j$(nproc) — builds
pytest tests/python/tirx/ -n 16
- 1997 passed, 39 skipped, 3 xpassed
python -m pytest tests/python/all-platform-minimal-test
- 37 passed, 105 skipped
TVM_TEST_TARGETS=llvm pytest tests/python/tirx-analysis tests/python/tirx-base tests/python/tirx-transform -n 16
- 630 passed, 25 skipped, 8 xfailed, 1 xpassed

Local CI Notes

Several full CI-equivalent jobs are not locally reproducible because this machine is missing parts of the Apache TVM CI environment (e.g., specific llvm-config versions, Vulkan, ROCm, ARM/QEMU cross-toolchain, and web/wasm components). The Blackwell/Trainium kernel tests are maintained downstream and are intentionally not part of this PR.

gemini-code-assist

Code Review

This pull request refactors the TVM TIRx execution scopes and dispatchers, removing the legacy kernel and world scopes in favor of a flat Tx.device_entry() marker, introducing a canonical thread-filter grammar, and rewriting the copy and elementwise dispatchers to use a unified layout-alignment and partitioning algorithm. It also adds support for Blackwell tcgen05 datapath layouts, replaces permute_dims with permute_layout for swizzled warp transposes, and optimizes the integer set evaluator to prevent exponential re-expansion on deep variable dependency chains. A critical correctness bug was identified in the _carve_tail helper, where the failure to fuse the consumed suffix into a single element results in incorrect outer loop iterations and misaligned vector memory accesses.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

spectrometerHBH · 2026-06-02T19:58:23Z

@tvm-bot rerun

Follow-up work on top of the TIRx infrastructure (apache#19581): - op-dispatch: warp ldmatrix/stmatrix copy dispatch; split CUDA copy into reg + gmem_smem + ldgsts; tcgen05.ld/st .16x{64,128,256}b dispatch + factory + M=128 layout; element-wise broadcast at the layout level + copy vec-alignment fix - gemm: CUDA synchronous mma.sync tensor-core dispatch; accept Layout F C operand for M=64 MMAs - op: add permute_layout primitive (removes permute_dims) - tvmscript: Tx.jit decorator, Tx.constexpr params, Tx.wg_reg_tile - lower-tirx: Tx.device_entry() marker replacing ScopeKind::kKernel; canonical thread filters (drop Tx.filter wrapper) - codegen: typed pointer byte-offset intrinsic; remove the entry_cluster_sync codegen attribute

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread python/tvm/tirx/operator/tile_primitive/cuda/copy/_common.py

spectrometerHBH force-pushed the apache-pr-tirx-followups branch from 459fa0e to 2b2adc8 Compare June 2, 2026 20:00

tqchen approved these changes Jun 2, 2026

View reviewed changes

tlopex approved these changes Jun 2, 2026

View reviewed changes

spectrometerHBH force-pushed the apache-pr-tirx-followups branch from 2b2adc8 to 32176c5 Compare June 2, 2026 20:25

tqchen merged commit 57c638f into apache:main Jun 2, 2026
9 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIRx] Post-bringup op-dispatch / codegen / TVMScript follow-ups#19657

[TIRx] Post-bringup op-dispatch / codegen / TVMScript follow-ups#19657
tqchen merged 1 commit into
apache:mainfrom
spectrometerHBH:apache-pr-tirx-followups

spectrometerHBH commented Jun 2, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

spectrometerHBH commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

spectrometerHBH commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Main Changes

Validation

Local CI Notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

spectrometerHBH commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

spectrometerHBH commented Jun 2, 2026 •

edited

Loading