-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Labels
Description
Which component has the problem?
CuTe DSL
Bug Report
Describe the bug
The functions used for creating tensors , create_tensors_abc_for_all_groups () and create_tensor_and_stride () , in cutlass/examples/python/CuTeDSL/blackwell/grouped_blockscaled_gemm.py seem incorrect for cutlass.Float4E2M1FN and torch.float4_e2m1fn_x2. The 4 bit packing does not appear to have been coded.
Steps/Code to reproduce bug
See above functions
Expected behavior
k//2 vs k , for example in create_tensor_and_stride(l, m, k, a_major == "m", ab_dtype)
Environment details (please complete the following information):
cutlass master
Additional context
N/A
Reactions are currently unavailable