Register-to-shared tile reassignment fails in CUDA tile kernels

### Bug Description

The CUDA tile backend is missing a same-shape assignment path from a register-backed tile into an existing shared-memory-backed tile. Kernels that keep tile operands live across `wp.tile_matmul()` and then assign the next register-loaded operands back to those variables require this storage migration.

One example is a pipelined tile matmul kernel:

```python
@wp.kernel
def matmul_pipelined(A: wp.array2d(dtype=float), B: wp.array2d(dtype=float), C: wp.array2d(dtype=float)):
    i, j = wp.tid()

    acc = wp.tile_zeros(shape=(8, 4), dtype=wp.float32)

    a = wp.tile_load(A, shape=(8, 8), offset=(i * 8, 0), storage="register")
    b = wp.tile_load(B, shape=(8, 4), offset=(0, j * 4), storage="register")

    count = int(A.shape[1] / 8)
    for k in range(1, count):
        a_next = wp.tile_load(A, shape=(8, 8), offset=(i * 8, k * 8), storage="register")
        b_next = wp.tile_load(B, shape=(8, 4), offset=(k * 8, j * 4), storage="register")

        wp.tile_matmul(a, b, acc)

        a = a_next
        b = b_next

    wp.tile_matmul(a, b, acc)
    wp.tile_store(C, acc, offset=(i * 8, j * 4))
```

Expected behavior:

- Forward assignment copies the register tile values into the shared tile destination.
- Backward assignment accumulates the shared destination adjoint into the register source adjoint.
- The overwritten shared destination adjoint is cleared after propagation.

Actual behavior:

The native tile assignment overloads do not cover `tile_shared_t <- tile_register_t`, so this reassignment pattern is not handled correctly in forward execution or adjoint propagation.

### System Information

- Warp version: `main` / development branch
- Backend: CUDA tile backend


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Register-to-shared tile reassignment fails in CUDA tile kernels #1439

Bug Description

System Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Register-to-shared tile reassignment fails in CUDA tile kernels #1439

Description

Bug Description

System Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions