Reduction for int8 and bfloat16#125
Conversation
…y Blockwise Reduction
…rface template parameter
…lice-size/vector-size configurations
…) to make int8 completely pass
…ReduceDim or InvariantDims/ReduceDims
…evice_reduce_xxx instances
…mple_reduce/test_reduce/
…vior when no command argument
| using kInDataType = ck::half_t; | ||
| using kOutDataType = ck::half_t; | ||
| using kAccDataType = float; | ||
| using hInDataType = half_float::half; |
There was a problem hiding this comment.
HostInDataType
HostOutDataType
HostAccDataType
https://github.com/ROCmSoftwarePlatform/composable_kernel/wiki/Coding-Style#naming-style
| using kInDataType = ck::half_t; | ||
| using kOutDataType = ck::half_t; | ||
| using kAccDataType = float; | ||
| using hInDataType = half_float::half; |
There was a problem hiding this comment.
Something wrong with using ck::half_t on host?
There was a problem hiding this comment.
The reason is that Reduction needs to use abs() and isnan() on fp16. But for ck::half_t, the __habs() and __hisnan() can only be used in __device__ mode to do the functionality of abs() and isnan(). In the other side, half_float::half has direct and complete implementation of abs() and isnan() on the host side.
| struct DeviceReduce : public BaseOperator | ||
| { | ||
| virtual size_t GetWorkspaceSizeInBytes(const std::vector<int>& inLengths) | ||
| virtual size_t GetWorkspaceSizeInBytes(const std::vector<int> inLengths, |
There was a problem hiding this comment.
please use long_index_t,
I'm going to make sure all files in include/ck including device operation meet this standard
https://github.com/ROCmSoftwarePlatform/composable_kernel/wiki/Coding-Style#integer-type
asroy
left a comment
There was a problem hiding this comment.
Please fix naming issue and merge conflict. Otherwise LGTM
This P.R provide the following:
GetWorkspaceSizeInBytes()ofDeviceReduceMultiblockPartialReducedoes not calculate workspace correctly, which could cause GPU memory fault