PR #265 * This ```has_main_k0_block_loop ``` logic could be simplified using lambda https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/e4584d91acc14a22426cbf081c8cc8394c136f6b/include/ck/tensor_operation/gpu/device/device_convnd_backward_weight_xdl_c_shuffle_nhwc_kyxc_nhwk.hpp#L1079 example: https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/56adf7e9cc4fcf6592151281a727e96b625bc54f/include/ck/tensor_operation/gpu/device/device_gemm_multiple_d_xdl_cshuffle.hpp#L575-L633 * This is hardcoded to ```bhalf_t``` and ```float```, the name should reflect that ```TypeConvertFp32ToBf16Functor``` https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/e4584d91acc14a22426cbf081c8cc8394c136f6b/include/ck/tensor_operation/gpu/device/device_convnd_backward_weight_xdl_c_shuffle_nhwc_kyxc_nhwk.hpp#L674-L675
PR #265
This
has_main_k0_block_looplogic could be simplified using lambdahttps://github.com/ROCmSoftwarePlatform/composable_kernel/blob/e4584d91acc14a22426cbf081c8cc8394c136f6b/include/ck/tensor_operation/gpu/device/device_convnd_backward_weight_xdl_c_shuffle_nhwc_kyxc_nhwk.hpp#L1079
example:
https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/56adf7e9cc4fcf6592151281a727e96b625bc54f/include/ck/tensor_operation/gpu/device/device_gemm_multiple_d_xdl_cshuffle.hpp#L575-L633
This is hardcoded to
bhalf_tandfloat, the name should reflect thatTypeConvertFp32ToBf16Functorhttps://github.com/ROCmSoftwarePlatform/composable_kernel/blob/e4584d91acc14a22426cbf081c8cc8394c136f6b/include/ck/tensor_operation/gpu/device/device_convnd_backward_weight_xdl_c_shuffle_nhwc_kyxc_nhwk.hpp#L674-L675