Skip to content

[BUG] cutlass 4.5.0/4.5.1cute.FastDivmodDivisor.divisor can reference SSA value from outside isolated region #3243

@tridao

Description

@tridao

Which component has the problem?

CuTe DSL

Bug Report

In CUTLASS / CuTe DSL 4.5.1, cute.FastDivmodDivisor has a public .divisor property, but accessing it after the FDD is stored in a JIT-created params object and passed to a kernel fails MLIR verification with a region-isolation error.

The fast-divmod object itself works for divmod(x, fdd), but fdd.divisor appears to preserve the original construction-side value instead of reconstructing/lowering a region-local divisor value.

Repro

Save as repro_fdd_divisor.py and run python repro_fdd_divisor.py.

from dataclasses import dataclass

import cutlass
import cutlass.cute as cute
from cutlass import Int32


@dataclass
class Params:
    fdd: cute.FastDivmodDivisor


@cute.jit
def make_params(divisor: Int32) -> Params:
    return Params(cute.FastDivmodDivisor(divisor))


@cute.kernel
def write_divisor(out: cute.Tensor, params: Params):
    tidx, _, _ = cute.arch.thread_idx()
    if tidx == 0:
        # This access fails MLIR verification.
        out[0] = params.fdd.divisor


@cute.jit
def entry(out: cute.Tensor, divisor: Int32):
    params = make_params(divisor)
    write_divisor(out, params).launch(grid=(1, 1, 1), block=(32, 1, 1))


if __name__ == "__main__":
    print("cutlass", cutlass.__version__, cutlass.__file__)
    out_fake = cute.runtime.make_fake_tensor(
        cutlass.Int32, (1,), stride=(1,), assumed_align=4
    )
    cute.compile(entry, out_fake, Int32(0))

Observed

With cutlass==4.5.1:

  cutlass 4.5.1 .../site-packages/nvidia_cutlass_dsl/python_packages/cutlass/__init__.py

  Verification failed:
  error: "out[0] = params.fdd.divisor"(...): 'cute.memref.store' op using value defined outside the region
   note: see current operation: "cute.memref.store"(%arg0, %15, <<UNKNOWN SSA VALUE>>) ...
   note: required by region isolation constraints

Expected

Accessing params.fdd.divisor inside the kernel should produce a region-local value and compile successfully, since .divisor is documented/public on FastDivmodDivisor.

Additional note

Likely cause: FastDivmodDivisor.extract_mlir_values() only extracts _divisor_mlir, while new_from_mlir_values() preserves _original_divisor from the original object. Later .divisor returns that preserved original value, which may be defined
outside the isolated kernel/JIT region.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions