-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Labels
Description
Which component requires the feature?
CuTe DSL
Feature Request
Recently I've been up to megakernel, which usually involves changing TMA descriptor. Unlike grouped gemm, the TMA descriptors are probably unchanged, so it will be optimal to be stored in grid constant cache at the very beginning. However, CuTe DSL doesn't support annotating a kernel parameter as grid_constant just like the annotation __grid_constant__ in CuTe does. Typically the tensormap is constructed on the host using cuTensorMapEncodeTiled.
Essentially, what I find useful is something like this:
@cute.kernel
def kernel(
tma_atom: cute.CopyAtom,
@cute.grid_constant tma_desc: cute.Tensor,
):
cute.copy(
tma_atom,
src,
dst,
tma_desc_ptr=tma_desc.iterator,
)Reactions are currently unavailable