Skip to content

[QST] How to efficiently convert fp32 Tensor to fp16 Tensor in Cutlass 3.x #802

@tridao

Description

@tridao

I'd like to convert a fp32 Tensor (in registers) to a fp16 Tensor (in registers), ideally using the __float22half2_rn function for efficiency.
Cutlass 2.x has NumericArrayConverter that specializes to fp32 -> fp16 conversion that uses this function.

For Cutlass 3.x, I'm currently doing:

// accum is a fp32 Tensor in register
Tensor acc_fp16 = make_tensor<cutlass::half_t>(shape(accum));
for (int i = 0; i < size(accum); ++i) { acc_fp16(i) = accum(i); }

However this might not be as efficient, as it doesn't use the function for float2 -> half2 conversion. I looked at the generated PTX and it's not using anything like cvt.rn.f16x2.f32.

Should I cast the fp32 Tensor to Array, then use NumericArrayConverter, then cast the result back to Tensor?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions