Skip to content

Register-to-shared tile reassignment fails in CUDA tile kernels #1439

@Glitchfix

Description

@Glitchfix

Bug Description

The CUDA tile backend is missing a same-shape assignment path from a register-backed tile into an existing shared-memory-backed tile. Kernels that keep tile operands live across wp.tile_matmul() and then assign the next register-loaded operands back to those variables require this storage migration.

One example is a pipelined tile matmul kernel:

@wp.kernel
def matmul_pipelined(A: wp.array2d(dtype=float), B: wp.array2d(dtype=float), C: wp.array2d(dtype=float)):
    i, j = wp.tid()

    acc = wp.tile_zeros(shape=(8, 4), dtype=wp.float32)

    a = wp.tile_load(A, shape=(8, 8), offset=(i * 8, 0), storage="register")
    b = wp.tile_load(B, shape=(8, 4), offset=(0, j * 4), storage="register")

    count = int(A.shape[1] / 8)
    for k in range(1, count):
        a_next = wp.tile_load(A, shape=(8, 8), offset=(i * 8, k * 8), storage="register")
        b_next = wp.tile_load(B, shape=(8, 4), offset=(k * 8, j * 4), storage="register")

        wp.tile_matmul(a, b, acc)

        a = a_next
        b = b_next

    wp.tile_matmul(a, b, acc)
    wp.tile_store(C, acc, offset=(i * 8, j * 4))

Expected behavior:

  • Forward assignment copies the register tile values into the shared tile destination.
  • Backward assignment accumulates the shared destination adjoint into the register source adjoint.
  • The overwritten shared destination adjoint is cleared after propagation.

Actual behavior:

The native tile assignment overloads do not cover tile_shared_t <- tile_register_t, so this reassignment pattern is not handled correctly in forward execution or adjoint propagation.

System Information

  • Warp version: main / development branch
  • Backend: CUDA tile backend

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingtileTile programming model

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions