Which component has the problem?
CuTe DSL
Bug Report
In CUTLASS / CuTe DSL 4.5.1, cute.FastDivmodDivisor has a public .divisor property, but accessing it after the FDD is stored in a JIT-created params object and passed to a kernel fails MLIR verification with a region-isolation error.
The fast-divmod object itself works for divmod(x, fdd), but fdd.divisor appears to preserve the original construction-side value instead of reconstructing/lowering a region-local divisor value.
Repro
Save as repro_fdd_divisor.py and run python repro_fdd_divisor.py.
from dataclasses import dataclass
import cutlass
import cutlass.cute as cute
from cutlass import Int32
@dataclass
class Params:
fdd: cute.FastDivmodDivisor
@cute.jit
def make_params(divisor: Int32) -> Params:
return Params(cute.FastDivmodDivisor(divisor))
@cute.kernel
def write_divisor(out: cute.Tensor, params: Params):
tidx, _, _ = cute.arch.thread_idx()
if tidx == 0:
# This access fails MLIR verification.
out[0] = params.fdd.divisor
@cute.jit
def entry(out: cute.Tensor, divisor: Int32):
params = make_params(divisor)
write_divisor(out, params).launch(grid=(1, 1, 1), block=(32, 1, 1))
if __name__ == "__main__":
print("cutlass", cutlass.__version__, cutlass.__file__)
out_fake = cute.runtime.make_fake_tensor(
cutlass.Int32, (1,), stride=(1,), assumed_align=4
)
cute.compile(entry, out_fake, Int32(0))
Observed
With cutlass==4.5.1:
cutlass 4.5.1 .../site-packages/nvidia_cutlass_dsl/python_packages/cutlass/__init__.py
Verification failed:
error: "out[0] = params.fdd.divisor"(...): 'cute.memref.store' op using value defined outside the region
note: see current operation: "cute.memref.store"(%arg0, %15, <<UNKNOWN SSA VALUE>>) ...
note: required by region isolation constraints
Expected
Accessing params.fdd.divisor inside the kernel should produce a region-local value and compile successfully, since .divisor is documented/public on FastDivmodDivisor.
Additional note
Likely cause: FastDivmodDivisor.extract_mlir_values() only extracts _divisor_mlir, while new_from_mlir_values() preserves _original_divisor from the original object. Later .divisor returns that preserved original value, which may be defined
outside the isolated kernel/JIT region.
Which component has the problem?
CuTe DSL
Bug Report
In CUTLASS / CuTe DSL 4.5.1,
cute.FastDivmodDivisorhas a public.divisorproperty, but accessing it after the FDD is stored in a JIT-created params object and passed to a kernel fails MLIR verification with a region-isolation error.The fast-divmod object itself works for
divmod(x, fdd), butfdd.divisorappears to preserve the original construction-side value instead of reconstructing/lowering a region-local divisor value.Repro
Save as
repro_fdd_divisor.pyand runpython repro_fdd_divisor.py.Observed
With cutlass==4.5.1:
Expected
Accessing params.fdd.divisor inside the kernel should produce a region-local value and compile successfully, since .divisor is documented/public on FastDivmodDivisor.
Additional note
Likely cause: FastDivmodDivisor.extract_mlir_values() only extracts _divisor_mlir, while new_from_mlir_values() preserves _original_divisor from the original object. Later .divisor returns that preserved original value, which may be defined
outside the isolated kernel/JIT region.